[infinispan-dev] JGroups 2.7 and 2.8 work with Infinispan now

Bela Ban bban at redhat.com
Tue Jun 2 08:48:51 EDT 2009



Manik Surtani wrote:
>
>
>> Could we create a test where:
>>
>> 1.  Node A and B are in the cluster.
>> 2.  Node A does put(x, y) the same time Node C joins
>> 3.  Test that Node C sees the put.
>> 4.  Repeat the test to simulate the race condition.
>
> I believe the problem is that C discards messages before its view is 
> installed.

We have to differentiate: regardless of the presence of FLUSH, a 
*unicast* message is completely independent from multicast messages and 
views.

In your scenario, however, this is likely a multicast (unless you have 
buddy replication enabled, or use DIST).

Without FLUSH, C may or may not receive the put(x,y). With FLUSH, 2 
scenarios are possible:

   1. A,B,C receive the view V2, then the put()
   2. A,B receive the put(), then A,B,C receive the view V2. In this
      case, C does NOT receive the put() because it logically joined
      after the put()


> A & B could receive this view and send messages to C, which C will not 
> see.

Yes, if FLUSH is not present.

>   Initially we thought FLUSH will work around this problem, and while 
> it reduces the window of occurence, the problem still exists since 
> there is a race between A & B receiving unblock() and then 
> broadcasting queued messages, and C receiving these queued messages 
> and receiving unblock()

No, that should *not* occur because messages will always be received by 
C even though it is still blocked. The blocking only affects the 
*sending* of messages. Second, C will have view {A,B,C} installed, or 
else it would not receive the unblock(). Therefore, any messages from A 
or B would NOT get dropped. Third, even if that was the case (it isn't), 
any dropped message from A or B would get retransmitted when A or B 
sends the next message to C, or before the next view gets installed. 
Again, only with FLUSH though !

> One solution which Mircea proposed is for FLUSH to queue messages 
> received during a blocked state, so that this race condition on C is 
> removed.  However it would be better to do this without FLUSH, for C 
> to queue messages received between joining and its view being installed.

I don't think this is a good solution b/c it essentially does what 
virtual synchrony (FLUSH) does

-- 
Bela Ban
Lead JGroups / Clustering Team
JBoss - a division of Red Hat



More information about the infinispan-dev mailing list