Manik Surtani wrote:
> Could we create a test where:
>
> 1. Node A and B are in the cluster.
> 2. Node A does put(x, y) the same time Node C joins
> 3. Test that Node C sees the put.
> 4. Repeat the test to simulate the race condition.
I believe the problem is that C discards messages before its view is
installed.
We have to differentiate: regardless of the presence of FLUSH, a
*unicast* message is completely independent from multicast messages and
views.
In your scenario, however, this is likely a multicast (unless you have
buddy replication enabled, or use DIST).
Without FLUSH, C may or may not receive the put(x,y). With FLUSH, 2
scenarios are possible:
1. A,B,C receive the view V2, then the put()
2. A,B receive the put(), then A,B,C receive the view V2. In this
case, C does NOT receive the put() because it logically joined
after the put()
A & B could receive this view and send messages to C, which C
will not
see.
Yes, if FLUSH is not present.
Initially we thought FLUSH will work around this problem, and while
it reduces the window of occurence, the problem still exists since
there is a race between A & B receiving unblock() and then
broadcasting queued messages, and C receiving these queued messages
and receiving unblock()
No, that should *not* occur because messages will always be received by
C even though it is still blocked. The blocking only affects the
*sending* of messages. Second, C will have view {A,B,C} installed, or
else it would not receive the unblock(). Therefore, any messages from A
or B would NOT get dropped. Third, even if that was the case (it isn't),
any dropped message from A or B would get retransmitted when A or B
sends the next message to C, or before the next view gets installed.
Again, only with FLUSH though !
One solution which Mircea proposed is for FLUSH to queue messages
received during a blocked state, so that this race condition on C is
removed. However it would be better to do this without FLUSH, for C
to queue messages received between joining and its view being installed.
I don't think this is a good solution b/c it essentially does what
virtual synchrony (FLUSH) does
--
Bela Ban
Lead JGroups / Clustering Team
JBoss - a division of Red Hat