Thanks Dan!
Seems to work fine now. I still don't like the exceptions being logged
when a node is shutting down, but they are harmless.
Cheers,
Sanne
On 8 February 2012 10:17, Dan Berindei <dan.berindei(a)gmail.com> wrote:
Sanne,
I was able to run LiveRunningTest as well after I removed
TestableJGroupsTransport from the Infinispan configuration, and I
disabled queueing in the SHARED_LOOPBACK OOB thread pool:
<SHARED_LOOPBACK
thread_pool.enabled="true"
thread_pool.min_threads="2"
thread_pool.max_threads="30"
thread_pool.keep_alive_time="60000"
thread_pool.queue_enabled="false"
thread_pool.queue_max_size="100"
thread_pool.rejection_policy="Discard"
oob_thread_pool.enabled="true"
oob_thread_pool.min_threads="2"
oob_thread_pool.max_threads="30"
oob_thread_pool.keep_alive_time="60000"
oob_thread_pool.queue_enabled="false"
oob_thread_pool.queue_max_size="100"
oob_thread_pool.rejection_policy="Discard"
/>
I think the test fails with queuing enabled and core thread pool size
2 because the coordinator sends a PREPARE_VIEW command and several
APPLY_STATE commands (at least one for each cache) at approximately
the same time. If two APPLY_STATE commands get to the other node
before the PREPARE_VIEW command, they will be stuck waiting for state
transfer to start.
FD also sends messages using OOB, so if the OOB thread pool stops
processing messages FD on other members will soon suspect the stuck
member and kick it out of the cluster.
For now I think increasing the number of available threads is the only
solution. For 5.2 I'm thinking of moving both the sending of state and
the handling of state to a separate thread, so that OOB threads won't
have to block waiting for the state transfer to start.
Cheers
Dan
On Wed, Feb 8, 2012 at 9:59 AM, Dan Berindei <dan.berindei(a)gmail.com> wrote:
> Hi Sanne
>
> I got the sources and even TwoNodesTest hang for me every time.
>
> I think the problem is that your TestableJGroupsTransport is trying to
> modify the cluster name during startup - which is no longer supported.
>
> I have also created
https://issues.jboss.org/browse/ISPN-1852 to fix
> startup so that after an error like this another getCache() call
> doesn't block forever. Ideally it should report the same error,
> whether we attempt to start the component again or we save the
> exception somewhere.
>
> Cheers
> Dan
>
>
> On Tue, Feb 7, 2012 at 6:15 PM, Sanne Grinovero <sanne(a)infinispan.org> wrote:
>> Dan,
>> you can easily checkout Hibernate Search, it's a Maven project and you
>> should be able to set it up in your IDE quickly.
>>
>> git clone
git://github.com/Sanne/hibernate-search.git
>> git checkout componentsUpdates
>>
>> Then the failing test is in the module "hibernate-search-infinispan"..
>> which is just a couple of classes.
>>
>> Sanne
>>
>>
>>
>> On 7 February 2012 16:10, Dan Berindei <dan.berindei(a)gmail.com> wrote:
>>> Rado, is there a specific test in the AS7 test suite that is failing?
>>> Is it only in Jenkins or on your machine as well?
>>>
>>> I only know about
https://issues.jboss.org/browse/ISPN-1806, but Paul
>>> said that he doesn't see it any more in CI runs (he never managed to
>>> reproduce it on his machine).
>>>
>>> Cheers
>>> Dan
>>>
>>>
>>> On Tue, Feb 7, 2012 at 3:13 PM, Radoslav Husar <rhusar(a)redhat.com>
wrote:
>>>> I am also seeing this/similar exception in AS7 during session
>>>> replication even with 5.1.1.FINAL :-(
>>>>
>>>> On 02/07/2012 01:54 PM, Dan Berindei wrote:
>>>>> Sanne, this sounds very similar to
>>>>>
https://issues.jboss.org/browse/ISPN-1814, but I thought I had fixed
>>>>> that for 5.1.1.FINAL.
>>>>>
>>>>> I see CacheViewsManagerImpl is trying to install a view with 6
nodes,
>>>>> should there be 6 nodes in the cluster or should there be less
nodes?
>>>>> Do you have DEBUG logs for org.infinispan and org.jgroups?
>>>>>
>>>>> Cheers
>>>>> Dan
>>>>>
>>>>>
>>>>> On Tue, Feb 7, 2012 at 12:58 PM, Sanne
Grinovero<sanne(a)infinispan.org> wrote:
>>>>>> Can anyone explain this error?
>>>>>>
>>>>>> I'm updating Hibernate Search, and having a simple test which
in a loop does:
>>>>>>
>>>>>> - write to shared index
>>>>>> - add a node / remove a node
>>>>>> - wait for joins
>>>>>> - verifies index state
>>>>>>
>>>>>> This is expected to work, as it already did with all previous
>>>>>> Infinispan versions.
>>>>>>
>>>>>> Using Infinispan 5.1.1.FINAL and JGroups 3.0.5.Final.
>>>>>>
>>>>>> 2012-02-07 10:42:38,668 WARN [CacheViewControlCommand]
>>>>>> (OOB-4,sanne-20017) ISPN000071: Caught exception when handling
command
>>>>>> CacheViewControlCommand{cache=LuceneIndexesMetadata,
>>>>>> type=PREPARE_VIEW, sender=sanne-3158, newViewId=8,
>>>>>> newMembers=[sanne-3158, sanne-63971, sanne-20017, sanne-2794,
>>>>>> sanne-25511, sanne-30075], oldViewId=7, oldMembers=[sanne-3158,
>>>>>> sanne-63971, sanne-20017, sanne-2794, sanne-25511]}
>>>>>> java.util.concurrent.ExecutionException:
>>>>>> org.infinispan.remoting.transport.jgroups.SuspectException: One
or
>>>>>> more nodes have left the cluster while replicating command
>>>>>> StateTransferControlCommand{cache=LuceneIndexesMetadata,
>>>>>> type=APPLY_STATE, sender=sanne-20017, viewId=8, state=4}
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev(a)lists.jboss.org
>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev(a)lists.jboss.org
>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev(a)lists.jboss.org
>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev