[jbosscache-dev] buddy replication test failures in 1.4.x

Manik Surtani manik at jboss.org
Mon Jul 2 08:39:29 EDT 2007


This is actually something I am looking at in HEAD right now.

Staggering starts is only a solution for the unit tests.  Still, it  
should not be ruled out, since a) implementation effort is small and  
risk is low b) This is an unlikely scenario for the real-world, where  
10 nodes start up within fractions of a second of each other.  :-)

Waiting before throwing a BuddyNotInit exception is what we had  
before, waiting on a latch which is released at the end of the  
initialisation phase.  This worked, but on occasion had the tendency  
to deadlock since the init phase had to assign buddies.  I did bring  
this back in 2.x, and this works thanks to JGroups 2.5's concurrent  
stack, but this is not a solution for 1.4.x.

Bruno's idea of a broadcast of availability is a good/valid approach,  
although this needs more thought since the init phase involves some  
communication (broadcasting buddy pool details, assigning buddies)  
and could lead to improperly initialised groups if buddies try and  
init simultaneously (each may think they are alone since no one has  
received an availability broadcast).  Might work though, just needs  
some thought.  The other negative here is another broadcast message,  
which is expensive.

For now I'd look at real-world validity - adding a few millisecs'  
delay between cache creation, etc.

Cheers,
Manik


On 2 Jul 2007, at 12:43, Bruno Georges wrote:

> Hi
>
> Excuse the stupid question, but would it be possible to implement a  
> callback mechanism, that way notification is propagated to the  
> coordinator when a buddy joins the group.
>
> my 2 cents.
>
> Galder Zamarreno wrote:
>> Hi,
>>
>> I have gone through some of the failures I'm getting when running  
>> the test suite for 1.4.x:
>>
>> * Some buddy replication tests show this test failures:
>> junit.framework.AssertionFailedError: buddy's list of groups it  
>> participates in should contain data owner's group name at  
>> org.jboss.cache.buddyreplication.BuddyReplicationTestsBase.assertIsBu 
>> ddy(BuddyReplicationTestsBase.java:280)
>>
>> Earlier in the test, you can see this Exception:
>> org.jboss.cache.buddyreplication.BuddyNotInitException: Not yet  
>> initialised
>>     at  
>> org.jboss.cache.buddyreplication.BuddyManager.handleAssignToBuddyGrou 
>> p(BuddyManager.java:450)
>>     at org.jboss.cache.TreeCache._remoteAssignToBuddyGroup 
>> (TreeCache.java:5372)
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at sun.reflect.NativeMethodAccessorImpl.invoke 
>> (NativeMethodAccessorImpl.java:39)
>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke 
>> (DelegatingMethodAccessorImpl.java:25)
>>     at java.lang.reflect.Method.invoke(Method.java:585)
>>     at org.jgroups.blocks.MethodCall.invoke(MethodCall.java:330)
>>
>> This failure seems to be a related to a timing issue, when a node  
>> requests another to join the buddy group but this node is still in  
>> the initialisation process, BM.init(). More precisely, the code is  
>> broadcasting buddy pool membership, and at that point a, request  
>> to joing the buddy groups occurs.
>>
>> The channel is connected before the buddy manager is initialised,  
>> so there's always the possibility of receiving messages before  
>> buddy manager has finished initialising.
>>
>> Various solutions that come to my mind:
>> 1.- staggering cache starts
>> 2.- wait for a little bit before throwing BuddyNotInitException  
>> seeing that the broadcasting task could be lengthy (i.e. one of  
>> the nodes fails to respond as the call is synchronous), maybe wait  
>> for buddyCommunicationTimeout?
>>
>> IMO, 2 is preferred.
>>
>> <bgeorges.vcf>

--
Manik Surtani

Lead, JBoss Cache
JBoss, a division of Red Hat






More information about the jbosscache-dev mailing list