New subject: Issues with FLUSH and JBC

Wednesday, 27 September 2006

Is solution D out of the picture already? I mean if we really can't find a good enough
solution to solve it, why not just accpet it. State transfer, IMO, should not happen that
often anyway, if it is just a new node joining. I think the important thing is keep the
state valid/consistent.

If it is because of network instability, then we will see lots of tx problems anyway. Are
we throwing more fuel into the fire? 

-Ben

-----Original Message-----
From: jbosscache-dev-bounces(a)lists.jboss.org
[mailto:jbosscache-dev-bounces@lists.jboss.org] On Behalf Of Bela Ban
Sent: Wednesday, September 27, 2006 4:37 PM
To: Brian Stansberry
Cc: jbosscache-dev(a)lists.jboss.org
Subject: Re: [jbosscache-dev] Issues with FLUSH and JBC

What's the consensus as to how wwe should proceed ?

   1. Solution A with not blocking unicasts during a flush or
   2. Solution B where we block later (on FLUSH_COMPLETED) rather than
      on START_FLUSH

?

Vladimir is for #2.

How about adding of the unblock() callback in a separate listener interface ? I'd
rather add this sooner than later. We would, however, also have to make sure that we
actually do call this method.

Vladimir: let's have a call on this today, so we can see how to proceed...

Brian Stansberry wrote:
...
 Bela Ban wrote:
> Okay, my comments will be available in book form at Prentice hall 
> this fall... :-)
>

 LOL. I'll try to reform. At least my overly long messages are on 
 e-mail and so don't kill trees. :)

> Just kidding, here are some comments:
>
> * I don't want to change the entire implementation of FLUSH this 
> late, 2.4 is overdue for a final release. So option B doesn't like 
> that appealing to me o OTOH: if we can resolve the issue, why not...
> * A: what if we block only **multicast** messages, but not
> **unicast** messages ? This would solve issue A, but maybe there are 
> use cases that it won't solve... We can assume that unicast messages 
> are always responses to multicasts, so they should be allowed to 
> complete. If this solution flies, then we have a quickfix for our 
> problem and can *really* cleanly fix it in the next release...

 We'd need to be sure JBC didn't make any unicast calls (besides RPC
 responses) during the state transfer. Possible unicast calls I can 
 think of are:

 1) Request for partial state transfer (with the current RPC-based 
 mechanism). E.g. 3 node cluster, node B redeploys a webapp and asks 
 for partial state transfer while node C is doing an initial state transfer.
 This would be an odd case though; typically you disable initial state 
 transfer if you're going to use the activate/inactivateRegion API.

 2) Calls related to buddy group assignments. Need to think about this 
 a bit. But if they are using BR they won't be using initial state 
 transfer, so probably not an issue.

> * B: okay, but if my proposed solution above works, we can do this in 
> 2.5...
> * C: this is essentially implementing the flush protocol at the 
> application level, which is not a bad idea because the app always has 
> more information than JGroups. However, it is probably a bit too 
> redundant, and also requires quite a number of changes, which is also 
> later for JBC 1.4 (SP?)...

 Yeah, it is a lot for 1.4. IMHO definitely moves it beyond the realm 
 of an SP2, into 1.4.1.

> * I might have to add an additional callback blockCompleted() or
> unblock() to JGroups, to notify members that the FLUSH phase has 
> completed and everybody can resume sending messages.
> I'm currently
> investigating this... Downside: an API change, so possibly a new 
> ExtendedXXX interface which would get merged in JGroups 3.0
>

 This would be needed with B if our current algorithm for JBC is going 
 to work.

>> A downside of this idea is it changes the semantics of flush and 
>> requires JGroups changes. We'd definitely like input from Bela on 
>> this. Also, since we initially rejecting it, we haven't fully 
>> thought it through. (As I'm editing this to send out I see there is 
>> no way to tell JBC after it returns from block() to not let any 
>> "new" activity through -- big hole. I'm back to rejecting this
>> approach.)
> Here, we might have to introduce additional callbacks, e.g.
> - block(): stop sending messages. FLUSH doesn't block yet though, so 
> if an app ignores the convention and keeps sending messages it will 
> succeed
> - No callback when FLUSH actually does block sending of messages
> - unblock(): called when the app can resume sending messages.
> FLUSH does not block sending of messages anymore
>

 Yep. Our current algorithm does the following during the block() call:

 1) Create a latch or something that prevents new transactions 
 acquiring locks or existing transactions proceeding into the 2PC (i.e. 
 prevent
 prepare() call.)
 2) Give transactions already in the 2PC time to complete. If they 
 don't, eventually roll them back.
 3) Release the latch.
 4) Immediately return from block(). (Vladimir -- problem here; there's 
 a race condition between threads released in #3 and the return from 
 block(). We need to figure out how to deal with that.)

 We count on FLUSH preventing the threads released in #3 sending any
 prepare() calls until the state transfer is done. Solution B breaks 
 this for the period until FLUSH_COMPLETED is sent.

 An unblock() callback would help here, as we'd release the latch then.

> I don't think the semantic changes are that big, actually you could 
> argue there are *no* semantic changes as block() is an inidication 
> that message sending will block, here we're just saying it will block 
> some time in the (near) future.

 +1.

--
Bela Ban
Lead JGroups / Manager JBoss Clustering Group JBoss - a division of Red Hat
_______________________________________________
jbosscache-dev mailing list
jbosscache-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/jbosscache-dev

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

RE: [jbosscache-dev] Issues with FLUSH and JBC