This is actually something I am looking at in HEAD right now.
Staggering starts is only a solution for the unit tests. Still, it
should not be ruled out, since a) implementation effort is small and
risk is low b) This is an unlikely scenario for the real-world, where
10 nodes start up within fractions of a second of each other. :-)
Waiting before throwing a BuddyNotInit exception is what we had
before, waiting on a latch which is released at the end of the
initialisation phase. This worked, but on occasion had the tendency
to deadlock since the init phase had to assign buddies. I did bring
this back in 2.x, and this works thanks to JGroups 2.5's concurrent
stack, but this is not a solution for 1.4.x.
Bruno's idea of a broadcast of availability is a good/valid approach,
although this needs more thought since the init phase involves some
communication (broadcasting buddy pool details, assigning buddies)
and could lead to improperly initialised groups if buddies try and
init simultaneously (each may think they are alone since no one has
received an availability broadcast). Might work though, just needs
some thought. The other negative here is another broadcast message,
which is expensive.
For now I'd look at real-world validity - adding a few millisecs'
delay between cache creation, etc.
Cheers,
Manik
On 2 Jul 2007, at 12:43, Bruno Georges wrote:
Hi
Excuse the stupid question, but would it be possible to implement a
callback mechanism, that way notification is propagated to the
coordinator when a buddy joins the group.
my 2 cents.
Galder Zamarreno wrote:
> Hi,
>
> I have gone through some of the failures I'm getting when running
> the test suite for 1.4.x:
>
> * Some buddy replication tests show this test failures:
> junit.framework.AssertionFailedError: buddy's list of groups it
> participates in should contain data owner's group name at
> org.jboss.cache.buddyreplication.BuddyReplicationTestsBase.assertIsBu
> ddy(BuddyReplicationTestsBase.java:280)
>
> Earlier in the test, you can see this Exception:
> org.jboss.cache.buddyreplication.BuddyNotInitException: Not yet
> initialised
> at
> org.jboss.cache.buddyreplication.BuddyManager.handleAssignToBuddyGrou
> p(BuddyManager.java:450)
> at org.jboss.cache.TreeCache._remoteAssignToBuddyGroup
> (TreeCache.java:5372)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke
> (NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke
> (DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:585)
> at org.jgroups.blocks.MethodCall.invoke(MethodCall.java:330)
>
> This failure seems to be a related to a timing issue, when a node
> requests another to join the buddy group but this node is still in
> the initialisation process, BM.init(). More precisely, the code is
> broadcasting buddy pool membership, and at that point a, request
> to joing the buddy groups occurs.
>
> The channel is connected before the buddy manager is initialised,
> so there's always the possibility of receiving messages before
> buddy manager has finished initialising.
>
> Various solutions that come to my mind:
> 1.- staggering cache starts
> 2.- wait for a little bit before throwing BuddyNotInitException
> seeing that the broadcasting task could be lengthy (i.e. one of
> the nodes fails to respond as the call is synchronous), maybe wait
> for buddyCommunicationTimeout?
>
> IMO, 2 is preferred.
>
> <bgeorges.vcf>
--
Manik Surtani
Lead, JBoss Cache
JBoss, a division of Red Hat