[jboss-jira] [JBoss JIRA] Commented: (JGRP-608) FLUSH not safe in simultaneous flush situation

Michael Newcomb (JIRA) jira-events at lists.jboss.org
Tue Oct 23 12:30:01 EDT 2007


    [ http://jira.jboss.com/jira/browse/JGRP-608?page=comments#action_12383946 ] 
            
Michael Newcomb commented on JGRP-608:
--------------------------------------

I will check it out.

Upon initial inspection:

470         if(!flushInProgress){
471             flushInProgress = true;
472             sendBlockUpToChannel();
473             onStartFlush(msg.getSrc(), fh);

You might want to use an AtomicBoolean for flushInProgress and use flushInProgress.compareAndSet(false, true) as your test. As it is, it might lead to more problems because of multiple blocks going up the channel.

I do see the second problem of not notifying the winning requestor that his flush is going to be accepted has been fixed by calling onStartFlush().

484             if(flushRequester.compareTo(coordinator) < 0){
485                 rejectFlush(fh.viewID, coordinator);
486                 if(log.isDebugEnabled()){
487                     log.debug("Rejecting flush at " + localAddress
488                               + " to current flush coordinator "
489                               + coordinator
490                               + " and switching flush coordinator to "
491                               + flushRequester);
492                 }
493                 onStartFlush(flushRequester, fh);

However, the situation still exists where the flushCoordinator may not be set by the winner of the initial flushInProgress = true and another flush requestor could execute the following and flushCoordinator wasn't set yet.

477             synchronized(sharedLock){
478                 if(flushCoordinator != null)
479                     coordinator = flushCoordinator;
480                 else
481                     coordinator = flushRequester;
482             }

So, in that sense you really can't even use a boolean, you need to execute some code that sets the flushCoordinator (and other onStartFlush stuff)while checking to see if a flush is in progress (all in a protected section of code).


> FLUSH not safe in simultaneous flush situation
> ----------------------------------------------
>
>                 Key: JGRP-608
>                 URL: http://jira.jboss.com/jira/browse/JGRP-608
>             Project: JGroups
>          Issue Type: Feature Request
>    Affects Versions: 2.6
>            Reporter: Michael Newcomb
>         Assigned To: Vladimir Blagojevic
>            Priority: Critical
>             Fix For: 2.6
>
>
> I'm running into a few problems when multiple members request a FLUSH at the same time.
> I am still in the process of analyzing the situation, but here are a few problems:
> private void handleStartFlush(Message msg, FlushHeader fh) {
>   byte oldPhase = flushPhase.transitionToFirstPhase();
>   if(oldPhase == FlushPhase.START_PHASE){
>     sendBlockUpToChannel();
>     onStartFlush(msg.getSrc(), fh);
>   }else if(oldPhase == FlushPhase.FIRST_PHASE){
>     Address flushRequester = msg.getSrc();
>     Address coordinator = null;
>     synchronized(sharedLock){
>       if(flushCoordinator != null)
>         coordinator = flushCoordinator;
>       else
>         coordinator = flushRequester;
>     }
> After a successful transtion to the first phase, onStartFlush is called. Inside the method, the flushCoordinator is set. However a simultaneous handleStartFlush from another member would fail the transitionToFirstPhase() call, but it is possible that the flushCoordinator might not be set yet and the second handleStartFlush would end up setting the 'coordinator' to the flushRequestor. This could grant the second START_FLUSH a FLUSH_OK without rejecting the flush that succeeded in the transitionToFirstPhase().
> IMHO the transitionToFirstPhase() should assign the flushCoordinator at that time to avoid the above situation, in fact, all of the protected parts of the onStartFlush() call should be made inside the protected section of transitionToFirstPhase().
> The above should take care of the didn't-succeed-in-transitionToFirstPhase-but-became-flushCoordinator-anyways-because-flushCoordinator-wasn't-set-yet problem.
> Now, there is another problem in the way simultaneous START_FLUSHes are resolved. If the new flushRequestor is < the current flushCoordinator the new requestor will win the flush fight. This is a great strategy because all the members will get the same result here.
> If A, B, C are members and A < B < C and they all simultaneously request a flush, then the desired effect is for everyone to decide that A will win the flush fight.
> So the following code handles that situation...
> private void handleStartFlush(Message msg, FlushHeader fh) {
>   ...
>   }else if(oldPhase == FlushPhase.FIRST_PHASE){
>     ...
>     if(flushRequester.compareTo(coordinator) < 0){
>       rejectFlush(fh.viewID, coordinator);
>       ...
>       synchronized(sharedLock){
>         flushCoordinator = flushRequester;
>       }
>     }
> If the new flushRequetor is < the old flushCoordinator than reject/abort the old flush. However, the new flushCoordinator is never acknowledged by a onStartFlush (which sends FLUSH_OK). If we moved all the protected parts of onStartFlush() to transitionToFirstPhase() then the flush members would never be set properly, so that situation needs to be resolved as well.
> I think this might be the problem I'm seeing because I wait indefinitely for my flush to succeed.
> I will be testing a patch this afternoon and hope to report back.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        



More information about the jboss-jira mailing list