Vladimir Blagojevic wrote:
On 6/2/09 2:48 PM, Bela Ban wrote:
>> Initially we thought FLUSH will work around this problem, and
>> while it reduces the window of occurence, the problem still exists
>> since there is a race between A & B receiving unblock() and then
>> broadcasting queued messages, and C receiving these queued messages
>> and receiving unblock()
>
> No, that should *not* occur because messages will always be received
> by C even though it is still blocked. The blocking only affects the
> *sending* of messages. Second, C will have view {A,B,C} installed, or
> else it would not receive the unblock(). Therefore, any messages from
> A or B would NOT get dropped. Third, even if that was the case (it
> isn't), any dropped message from A or B would get retransmitted when
> A or B sends the next message to C, or before the next view gets
> installed. Again, only with FLUSH though !
There is a possibility that a message could get dropped for a joining
member C. Flush has a latch that is turned on once a joining member
receives a first stop flush after it receives the first view. If a
message happens to go up the stack before first stop flush - we drop
it. See allowMessagesToPassUp field of FLUSH.
If we did not have this switch then we could have reverse order of
view, unblock and message, that is messages could go up the stack
before unblock.
Oh, I see ! Question: why are we dropping that message rather than just
blocking ? Concerns that the incoming thread pools get exhausted ?
Ah, the message gets delivered by NAKACK, so we don't retransmit, but if
FLUSH discards it, we're screwed and never deliver that message to the app.
Your last statement confuses me a bit: I don't think a VIEW would get
sent up the stack *before* an unblock because they're tagged by seqnos
aren't they ? So, in other words, do we really need to discard incoming
mesages ?
--
Bela Ban
Lead JGroups / Clustering Team
JBoss - a division of Red Hat