[infinispan-dev] state transfer exceptions at REPL

Sanne Grinovero sanne at infinispan.org
Wed Feb 8 05:48:27 EST 2012


Thanks Dan!
Seems to work fine now. I still don't like the exceptions being logged
when a node is shutting down, but they are harmless.

Cheers,
Sanne

On 8 February 2012 10:17, Dan Berindei <dan.berindei at gmail.com> wrote:
> Sanne,
>
> I was able to run LiveRunningTest as well after I removed
> TestableJGroupsTransport from the Infinispan configuration, and I
> disabled queueing in the SHARED_LOOPBACK OOB thread pool:
>
>   <SHARED_LOOPBACK
>         thread_pool.enabled="true"
>         thread_pool.min_threads="2"
>         thread_pool.max_threads="30"
>         thread_pool.keep_alive_time="60000"
>         thread_pool.queue_enabled="false"
>         thread_pool.queue_max_size="100"
>         thread_pool.rejection_policy="Discard"
>
>         oob_thread_pool.enabled="true"
>         oob_thread_pool.min_threads="2"
>         oob_thread_pool.max_threads="30"
>         oob_thread_pool.keep_alive_time="60000"
>         oob_thread_pool.queue_enabled="false"
>         oob_thread_pool.queue_max_size="100"
>         oob_thread_pool.rejection_policy="Discard"
>         />
>
>
> I think the test fails with queuing enabled and core thread pool size
> 2 because the coordinator sends a PREPARE_VIEW command and several
> APPLY_STATE commands (at least one for each cache) at approximately
> the same time. If two APPLY_STATE commands get to the other node
> before the PREPARE_VIEW command, they will be stuck waiting for state
> transfer to start.
>
> FD also sends messages using OOB, so if the OOB thread pool stops
> processing messages FD on other members will soon suspect the stuck
> member and kick it out of the cluster.
>
> For now I think increasing the number of available threads is the only
> solution. For 5.2 I'm thinking of moving both the sending of state and
> the handling of state to a separate thread, so that OOB threads won't
> have to block waiting for the state transfer to start.
>
> Cheers
> Dan
>
>
> On Wed, Feb 8, 2012 at 9:59 AM, Dan Berindei <dan.berindei at gmail.com> wrote:
>> Hi Sanne
>>
>> I got the sources and even TwoNodesTest hang for me every time.
>>
>> I think the problem is that your TestableJGroupsTransport is trying to
>> modify the cluster name during startup - which is no longer supported.
>>
>> I have also created https://issues.jboss.org/browse/ISPN-1852 to fix
>> startup so that after an error like this another getCache() call
>> doesn't block forever. Ideally it should report the same error,
>> whether we attempt to start the component again or we save the
>> exception somewhere.
>>
>> Cheers
>> Dan
>>
>>
>> On Tue, Feb 7, 2012 at 6:15 PM, Sanne Grinovero <sanne at infinispan.org> wrote:
>>> Dan,
>>> you can easily checkout Hibernate Search, it's a Maven project and you
>>> should be able to set it up in your IDE quickly.
>>>
>>> git clone git://github.com/Sanne/hibernate-search.git
>>> git checkout componentsUpdates
>>>
>>> Then the failing test is in the module "hibernate-search-infinispan"..
>>> which is just a couple of classes.
>>>
>>> Sanne
>>>
>>>
>>>
>>> On 7 February 2012 16:10, Dan Berindei <dan.berindei at gmail.com> wrote:
>>>> Rado, is there a specific test in the AS7 test suite that is failing?
>>>> Is it only in Jenkins or on your machine as well?
>>>>
>>>> I only know about https://issues.jboss.org/browse/ISPN-1806, but Paul
>>>> said that he doesn't see it any more in CI runs (he never managed to
>>>> reproduce it on his machine).
>>>>
>>>> Cheers
>>>> Dan
>>>>
>>>>
>>>> On Tue, Feb 7, 2012 at 3:13 PM, Radoslav Husar <rhusar at redhat.com> wrote:
>>>>> I am also seeing this/similar exception in AS7 during session
>>>>> replication even with 5.1.1.FINAL :-(
>>>>>
>>>>> On 02/07/2012 01:54 PM, Dan Berindei wrote:
>>>>>> Sanne, this sounds very similar to
>>>>>> https://issues.jboss.org/browse/ISPN-1814, but I thought I had fixed
>>>>>> that for 5.1.1.FINAL.
>>>>>>
>>>>>> I see CacheViewsManagerImpl is trying to install a view with 6 nodes,
>>>>>> should there be 6 nodes in the cluster or should there be less nodes?
>>>>>> Do you have DEBUG logs for org.infinispan and org.jgroups?
>>>>>>
>>>>>> Cheers
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 7, 2012 at 12:58 PM, Sanne Grinovero<sanne at infinispan.org>  wrote:
>>>>>>> Can anyone explain this error?
>>>>>>>
>>>>>>> I'm updating Hibernate Search, and having a simple test which in a loop does:
>>>>>>>
>>>>>>> - write to shared index
>>>>>>> - add a node / remove a node
>>>>>>> - wait for joins
>>>>>>> - verifies index state
>>>>>>>
>>>>>>> This is expected to work, as it already did with all previous
>>>>>>> Infinispan versions.
>>>>>>>
>>>>>>> Using Infinispan 5.1.1.FINAL and JGroups 3.0.5.Final.
>>>>>>>
>>>>>>> 2012-02-07 10:42:38,668 WARN  [CacheViewControlCommand]
>>>>>>> (OOB-4,sanne-20017) ISPN000071: Caught exception when handling command
>>>>>>> CacheViewControlCommand{cache=LuceneIndexesMetadata,
>>>>>>> type=PREPARE_VIEW, sender=sanne-3158, newViewId=8,
>>>>>>> newMembers=[sanne-3158, sanne-63971, sanne-20017, sanne-2794,
>>>>>>> sanne-25511, sanne-30075], oldViewId=7, oldMembers=[sanne-3158,
>>>>>>> sanne-63971, sanne-20017, sanne-2794, sanne-25511]}
>>>>>>> java.util.concurrent.ExecutionException:
>>>>>>> org.infinispan.remoting.transport.jgroups.SuspectException: One or
>>>>>>> more nodes have left the cluster while replicating command
>>>>>>> StateTransferControlCommand{cache=LuceneIndexesMetadata,
>>>>>>> type=APPLY_STATE, sender=sanne-20017, viewId=8, state=4}
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev



More information about the infinispan-dev mailing list