]
Vladimir Blagojevic resolved JGRP-1282.
---------------------------------------
Resolution: Done
On master
Race condition in FLUSH when master leaves cluster
--------------------------------------------------
Key: JGRP-1282
URL:
https://issues.jboss.org/browse/JGRP-1282
Project: JGroups
Issue Type: Bug
Affects Versions: 2.6.16
Reporter: Dennis Reed
Assignee: Vladimir Blagojevic
Fix For: 2.6.19, 2.11.2, 2.12
There's a race condition in FLUSH when the master node is leaving the cluster,
that can cause the master to not send a new view (with a new master) before leaving.
The FLUSH is started when GMS sends down an Event.SUSPEND.
FLUSH.down calls FLUSH.startFlush, which calls FLUSH.onSuspend.
onSuspend sends a START_FLUSH message down.
In the working case, the local node gets the START_FLUSH first.
FLUSH.up calls FLUSH.handleStartFlush, which calls FLUSH.onStartFlush.
onStartFlush sets the member variable "flushMembers".
Then the other nodes reply to the START_FLUSH with a FLUSH_COMPLETED.
FLUSH.up calls FLUSH.onFlushCompleted.
onFlushCompleted checks "flushMembers" against the list of replies.
If they match (and flushMembers is not null), the flush completes.
But in the non-working case, the FLUSH_COMPLETED from the other
nodes is processed before the local START_FLUSH.
In this case, flushMembers has not been set, and onFlushCompleted
does nothing, expecting more replies (which never come).
I believe this will only be triggered when the master is leaving,
because it does not include itself in the FLUSH. If it was a flush
member, there would be a FLUSH_COMPLETED reply from itself to
trigger setting flushMembers at some point.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: