[
http://jira.jboss.com/jira/browse/JGRP-608?page=comments#action_12383946 ]
Michael Newcomb commented on JGRP-608:
--------------------------------------
I will check it out.
Upon initial inspection:
470 if(!flushInProgress){
471 flushInProgress = true;
472 sendBlockUpToChannel();
473 onStartFlush(msg.getSrc(), fh);
You might want to use an AtomicBoolean for flushInProgress and use
flushInProgress.compareAndSet(false, true) as your test. As it is, it might lead to more
problems because of multiple blocks going up the channel.
I do see the second problem of not notifying the winning requestor that his flush is going
to be accepted has been fixed by calling onStartFlush().
484 if(flushRequester.compareTo(coordinator) < 0){
485 rejectFlush(fh.viewID, coordinator);
486 if(log.isDebugEnabled()){
487 log.debug("Rejecting flush at " + localAddress
488 + " to current flush coordinator "
489 + coordinator
490 + " and switching flush coordinator to "
491 + flushRequester);
492 }
493 onStartFlush(flushRequester, fh);
However, the situation still exists where the flushCoordinator may not be set by the
winner of the initial flushInProgress = true and another flush requestor could execute the
following and flushCoordinator wasn't set yet.
477 synchronized(sharedLock){
478 if(flushCoordinator != null)
479 coordinator = flushCoordinator;
480 else
481 coordinator = flushRequester;
482 }
So, in that sense you really can't even use a boolean, you need to execute some code
that sets the flushCoordinator (and other onStartFlush stuff)while checking to see if a
flush is in progress (all in a protected section of code).
FLUSH not safe in simultaneous flush situation
----------------------------------------------
Key: JGRP-608
URL:
http://jira.jboss.com/jira/browse/JGRP-608
Project: JGroups
Issue Type: Feature Request
Affects Versions: 2.6
Reporter: Michael Newcomb
Assigned To: Vladimir Blagojevic
Priority: Critical
Fix For: 2.6
I'm running into a few problems when multiple members request a FLUSH at the same
time.
I am still in the process of analyzing the situation, but here are a few problems:
private void handleStartFlush(Message msg, FlushHeader fh) {
byte oldPhase = flushPhase.transitionToFirstPhase();
if(oldPhase == FlushPhase.START_PHASE){
sendBlockUpToChannel();
onStartFlush(msg.getSrc(), fh);
}else if(oldPhase == FlushPhase.FIRST_PHASE){
Address flushRequester = msg.getSrc();
Address coordinator = null;
synchronized(sharedLock){
if(flushCoordinator != null)
coordinator = flushCoordinator;
else
coordinator = flushRequester;
}
After a successful transtion to the first phase, onStartFlush is called. Inside the
method, the flushCoordinator is set. However a simultaneous handleStartFlush from another
member would fail the transitionToFirstPhase() call, but it is possible that the
flushCoordinator might not be set yet and the second handleStartFlush would end up setting
the 'coordinator' to the flushRequestor. This could grant the second START_FLUSH a
FLUSH_OK without rejecting the flush that succeeded in the transitionToFirstPhase().
IMHO the transitionToFirstPhase() should assign the flushCoordinator at that time to
avoid the above situation, in fact, all of the protected parts of the onStartFlush() call
should be made inside the protected section of transitionToFirstPhase().
The above should take care of the
didn't-succeed-in-transitionToFirstPhase-but-became-flushCoordinator-anyways-because-flushCoordinator-wasn't-set-yet
problem.
Now, there is another problem in the way simultaneous START_FLUSHes are resolved. If the
new flushRequestor is < the current flushCoordinator the new requestor will win the
flush fight. This is a great strategy because all the members will get the same result
here.
If A, B, C are members and A < B < C and they all simultaneously request a flush,
then the desired effect is for everyone to decide that A will win the flush fight.
So the following code handles that situation...
private void handleStartFlush(Message msg, FlushHeader fh) {
...
}else if(oldPhase == FlushPhase.FIRST_PHASE){
...
if(flushRequester.compareTo(coordinator) < 0){
rejectFlush(fh.viewID, coordinator);
...
synchronized(sharedLock){
flushCoordinator = flushRequester;
}
}
If the new flushRequetor is < the old flushCoordinator than reject/abort the old
flush. However, the new flushCoordinator is never acknowledged by a onStartFlush (which
sends FLUSH_OK). If we moved all the protected parts of onStartFlush() to
transitionToFirstPhase() then the flush members would never be set properly, so that
situation needs to be resolved as well.
I think this might be the problem I'm seeing because I wait indefinitely for my flush
to succeed.
I will be testing a patch this afternoon and hope to report back.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira