On 6/2/09 2:48 PM, Bela Ban wrote:
> Initially we thought FLUSH will work around this problem, and
while
> it reduces the window of occurence, the problem still exists since
> there is a race between A & B receiving unblock() and then
> broadcasting queued messages, and C receiving these queued messages
> and receiving unblock()
No, that should *not* occur because messages will always be received
by C even though it is still blocked. The blocking only affects the
*sending* of messages. Second, C will have view {A,B,C} installed, or
else it would not receive the unblock(). Therefore, any messages from
A or B would NOT get dropped. Third, even if that was the case (it
isn't), any dropped message from A or B would get retransmitted when A
or B sends the next message to C, or before the next view gets
installed. Again, only with FLUSH though !
There is a possibility that a message
could get dropped for a joining
member C. Flush has a latch that is turned on once a joining member
receives a first stop flush after it receives the first view. If a
message happens to go up the stack before first stop flush - we drop it.
See allowMessagesToPassUp field of FLUSH.
If we did not have this switch then we could have reverse order of view,
unblock and message, that is messages could go up the stack before unblock.