[jbosscache-dev] Issues with FLUSH and JBC

Wed Sep 27 04:36:44 EDT 2006

What's the consensus as to how wwe should proceed ?

   1. Solution A with not blocking unicasts during a flush or
   2. Solution B where we block later (on FLUSH_COMPLETED) rather than
      on START_FLUSH

?

Vladimir is for #2.

How about adding of the unblock() callback in a separate listener 
interface ? I'd rather add this sooner than later. We would, however, 
also have to make sure that we actually do call this method.

Vladimir: let's have a call on this today, so we can see how to proceed...

Brian Stansberry wrote:
> Bela Ban wrote:
>> Okay, my comments will be available in book form at Prentice
>> hall this fall... :-)
>>
>
> LOL. I'll try to reform. At least my overly long messages are on e-mail
> and so don't kill trees. :)
>
>> Just kidding, here are some comments:
>>
>> * I don't want to change the entire implementation of FLUSH this
>> late, 2.4 is overdue for a final release. So option B
>> doesn't like
>> that appealing to me
>> o OTOH: if we can resolve the issue, why not...
>> * A: what if we block only **multicast** messages, but not
>> **unicast** messages ? This would solve issue A, but maybe there
>> are use cases that it won't solve... We can assume that unicast
>> messages are always responses to multicasts, so they should be
>> allowed to complete. If this solution flies, then we have a
>> quickfix for our problem and can *really* cleanly fix it in the
>> next release...
>
> We'd need to be sure JBC didn't make any unicast calls (besides RPC
> responses) during the state transfer. Possible unicast calls I can
> think of are:
>
> 1) Request for partial state transfer (with the current RPC-based
> mechanism). E.g. 3 node cluster, node B redeploys a webapp and asks for
> partial state transfer while node C is doing an initial state transfer.
> This would be an odd case though; typically you disable initial state
> transfer if you're going to use the activate/inactivateRegion API.
>
> 2) Calls related to buddy group assignments. Need to think about this a
> bit. But if they are using BR they won't be using initial state
> transfer, so probably not an issue.
>
>> * B: okay, but if my proposed solution above works, we can do
>> this in 2.5...
>> * C: this is essentially implementing the flush protocol at the
>> application level, which is not a bad idea because the
>> app always
>> has more information than JGroups. However, it is probably a bit
>> too redundant, and also requires quite a number of
>> changes, which
>> is also later for JBC 1.4 (SP?)...
>
> Yeah, it is a lot for 1.4. IMHO definitely moves it beyond the realm of
> an SP2, into 1.4.1.
>
>> * I might have to add an additional callback blockCompleted() or
>> unblock() to JGroups, to notify members that the FLUSH phase has
>> completed and everybody can resume sending messages.
>> I'm currently
>> investigating this... Downside: an API change, so possibly a new
>> ExtendedXXX interface which would get merged in JGroups 3.0
>>
>
> This would be needed with B if our current algorithm for JBC is going to
> work.
>
>>> A downside of this idea is it changes the semantics of flush and
>>> requires JGroups changes. We'd definitely like input from Bela on
>>> this. Also, since we initially rejecting it, we haven't fully
>>> thought it through. (As I'm editing this to send out I see there is
>>> no way to tell JBC after it returns from block() to not let any
>>> "new" activity through -- big hole. I'm back to rejecting this
>>> approach.)
>> Here, we might have to introduce additional callbacks, e.g.
>> - block(): stop sending messages. FLUSH doesn't block yet
>> though, so if an app ignores the convention and keeps sending
>> messages it will succeed
>> - No callback when FLUSH actually does block sending of messages
>> - unblock(): called when the app can resume sending messages.
>> FLUSH does not block sending of messages anymore
>>
>
> Yep. Our current algorithm does the following during the block() call:
>
> 1) Create a latch or something that prevents new transactions acquiring
> locks or existing transactions proceeding into the 2PC (i.e. prevent
> prepare() call.)
> 2) Give transactions already in the 2PC time to complete. If they
> don't, eventually roll them back.
> 3) Release the latch.
> 4) Immediately return from block(). (Vladimir -- problem here; there's a
> race condition between threads released in #3 and the return from
> block(). We need to figure out how to deal with that.)
>
> We count on FLUSH preventing the threads released in #3 sending any
> prepare() calls until the state transfer is done. Solution B breaks
> this for the period until FLUSH_COMPLETED is sent.
>
> An unblock() callback would help here, as we'd release the latch then.
>
>> I don't think the semantic changes are that big, actually you
>> could argue there are *no* semantic changes as block() is an
>> inidication that message sending will block, here we're just
>> saying it will block some time in the (near) future.
>
> +1.
>

-- 
Bela Ban
Lead JGroups / Manager JBoss Clustering Group
JBoss - a division of Red Hat