[jbosscache-dev] Re: JGroups and concurrent FLUSHes

Fri Feb 27 11:34:33 EST 2009

Sounds like an overly large change for a micro release on the highly 
stable 2.6 branch. JGroups' 2.7 branch seems more appropriate.  NBST in 
JBC is not stable tech, so I don't like the idea of destabilizing a 
highly stable branch to cater to it.

 From a technical POV, what's discussed sounds fine. ;)

Bela Ban wrote:
> Copied Brian, who will probably not like this. I guess... :-)
> 
> Actually, we should move this discussion over to jbosscache-dev... 
> (copied). Please reply to jbosscache-dev from now on
> 
> Vladimir Blagojevic wrote:
>> As Bela and I talked this option simplifies FLUSH quite a bit but puts 
>> the burden and the *freedom* of retry management on application code. 
>> My only concern is compatibility if we are going to stick this into 
>> 2.6.9. This changes flush semantics quite a bit and perhaps we can 
>> talk about this as well.
>>
>> On 2/27/09 9:52 AM, Bela Ban wrote:
>>> Manik and I discussed this over the phone. Some items we came up with:
>>>
>>>    * Concurrent partial flushes won't happen because if a state
>>>      requester or provider isin the process of transferring state,
>>>      it'll reject the new state transfer
>>>    * Concurrent total and partial flushes *are* possible: a view change
>>>      and a partial state, executed concurrently
>>>          o We cannot disable flushing for view changes, because the
>>>            user probably wanted this with placing FLUSH into the stack.
>>>            OTOH, because JBC NBST requires FLUSH (because of the
>>>            partial flushing), we need to be able to disable flushing
>>>            for view changes. Hence the previous email.
>>>          o If a view change flush is in progress, currently a partial
>>>            flush would fail. Manik will change code to make the partial
>>>            flushing back off and retry, up to a number of times, if
>>>            this happens
>>>          o If a view change happens, but a partial flush is already in
>>>            progress, the view change will fail ! We'll change code such
>>>            that the coordinator backs off and retries a number of
>>>            times, before giving up.
>>>          o Because total flushes and partial flushes are usually very
>>>            short, the backing off and retrying mechanism should work
>>>            most of the time
>>>          o Would establishing total order between total and partial
>>>            flushes help ? Vladimir and Bela to investigate
>>>          o Vladimir: make sure that if A flushes A,B and C
>>>            successfully, but fails for D and E, A only aborts the flush
>>>            on A,B and C, but *not* on D or E ! A member cannot abort
>>>            flushes started by someone else !
>>>
>>>
>>
> 

-- 
Brian Stansberry
Lead, AS Clustering
JBoss, a division of Red Hat
brian.stansberry at redhat.com