[jboss-jira] [JBoss JIRA] Updated: (JGRP-336) Move the blocking in FLUSH from START_FLUSH to FLUSH_OK

Sun Oct 29 12:17:41 EST 2006

     [ http://jira.jboss.com/jira/browse/JGRP-336?page=all ]

Bela Ban updated JGRP-336:
--------------------------

    Fix Version/s: 2.4
                       (was: 2.5)

> Move the blocking in FLUSH from START_FLUSH to FLUSH_OK
> -------------------------------------------------------
>
>                 Key: JGRP-336
>                 URL: http://jira.jboss.com/jira/browse/JGRP-336
>             Project: JGroups
>          Issue Type: Feature Request
>    Affects Versions: 2.4
>            Reporter: Bela Ban
>         Assigned To: Vladimir Blagojevic
>             Fix For: 2.4
>
>
> With FLUSH, every member starts blocking outgoing calls in FLUSH.down() as soon as START_FLUSH has been received and block() returned. Then, a FLUSH_OK is multicast to the group. The issue with this is that sometimes a member might want to complete some work in block(), which might be sending a PREPARE or COMMIT multicast across the cluster and waiting for all replies (or a timeout). 
> As described in point #A in Brian's email (below), this won't work as the unicast response to the PREPARE or COMMIT call might block in FLUSH.down() if that member already received the START_FLUSH.
> Outlined in point #B below, if we moved the blocking from START_FLUSH (after block() returns) to after we have received *all* FLUSH_OK responses from all members, then PREPARE/COMMIT would be able to complete, because nobody blocks the unicast responses back to (e.g.) P until P's FLUSH_OK has been received, and that's only the case when P's block() returns.
> Contrary to Brian's solution, I think the FLUSH_COMPLETED phase can be unicast to just the initiator of the flush phase, *not* to the entire group.
> The point of this JIRA issue is to investigate whether moving blocking renders the flush protocol incorrect or not.
> [ excerpt from Brian's email]
> Vladimir and I found a problem today with using FLUSH in a JBC cache.
> Following is a description of the issue and some proposed solutions.
> Comments are welcome.
> Please see docs/design/FLUSH.txt in JGroups for background info on how
> FLUSH works.
> A) We have a problem in that the FLUSH protocol makes the decision to
> shut off the ability to pass messages down the channel independently at
> each node.  The protocol doesn't include anything at the JGroups level
> to readily support coordination between nodes as to when to shut off
> down messages. But, JBC needs coordination since it needs to make RPC
> calls around the cluster (e.g. commit()) as part of how it handles
> FLUSH.
> Basically, when the FLUSH protocol on a node receives a message telling
> it to START_FLUSH, it calls block() on the JBC instance.  JBC does what
> it needs to do, then returns from block().  Following the return from
> block() the FLUSH protocol in that channel then begins blocking any
> further down() messages.
> Problem is as follows.  2 node REPL_SYNC cluster, A B where A is just
> starting up and thus initiates a FLUSH:
> 1) JBC on B has tx in progress, just starting the 2PC. Sends out the
> prepare().
> 2) A sends out a START_FLUSH message.
> 3) A gets START_FLUSH, calls block() on JBC.
> 4) JBC on A is new, doesn't have much going on, very quickly returns
> from block(). A will no longer pass *down* any messages below FLUSH.
> 5) A gets the prepare() (no problem, FLUSH doesn't block up messages,
> just down messages.)
> 6) A executes the prepare(), but can't send the response to B because
> FLUSH is blocking the channel.
> 7) B gets the START_FLUSH, calls block() on JBC. 
> 8) JBC B doesn't immediately return from block() as it is giving the
> prepare() some time to complete (avoid unnecessary tx rollback).  But
> prepare() won't complete because A's channel is blocking the RPC
> response!!  Eventually JBC B's block() impl will have to roll back the
> tx.
> Basically you have a race condition between calls to block() and
> prepare() calls, and can have different winners on different nodes.
> B) A solution we discussed, rejected and then came back to this evening
> (please read FLUSH.txt to understand the change we're discussing):
> Channel does not block down messages when block() returns.  Rather it
> just sends out a FLUSH_OK message (see FLUSH.txt).  It shouldn't
> initiate any new cluster activity (e.g. a prepare()) after sending
> FLUSH_OK, but it can respond to RPC calls.  When it gets a FLUSH_OK from
> all the other members, it then blocks down messages and multicasts a
> FLUSH_COMPLETED to the cluster.
> Differences from the current FLUSH impl:
> 1) Node doesn't begin blocking down messages before sending FLUSH_OK.
> 2) Node begins blocking down messages before sending FLUSH_COMPLETED.
> 3) Node multicasts FLUSH_COMPLETED, rather than unicasting to the node
> that initiated the FLUSH.
> 4) Nodes regard the FLUSH_COMPLETED as the last message from another
> node, rather than the FLUSH_OK.
> A downside of this idea is it changes the semantics of flush and
> requires JGroups changes.  We'd definitely like input from Bela on this.
> Also, since we initially rejecting it, we haven't fully thought it
> through. (As I'm editing this to send out I see there is no way to tell
> JBC after it returns from block() to not let any "new" activity through
> -- big hole. I'm back to rejecting this approach.)
> C) Alternative idea we discussed was to do application level
> coordination around the cluster, i.e. add something similar to the
> existing FLUSH_OK/FLUSH_COMPLETED, but at the JBC level.  Revising the
> previous scenario:
> 1) JBC on B has tx in progress, just starting the 2PC. Sends out the
> prepare().
> 2) A sends out a START_FLUSH message.
> 3) A gets START_FLUSH, calls block().
> 4) JBC on A is new, doesn't have much going on, so doesn't do cleanup
> work on its own node.
> 4.1) JBC on A sends out an RPC call with its address as an arg to a new
> "flushReady()" method added to TreeCache. (Other name for method is
> fine.)
> 4.2) JBC on A blocks waiting for flushReady() RPC calls from all the
> other members. Does not return from block().
> 5) A gets the prepare() (no problem, FLUSH doesn't block up messages,
> just down messages.)
> 6) A executes the prepare(), can send the response to B because FLUSH
> isn't blocking the channel.
> 7) B gets the START_FLUSH, calls block(). 
> 8) JBC B doesn't immediately return from block() as it detects it has a
> 2PC in progress and is giving the prepare() some time to complete (avoid
> unnecessary tx rollback).
> 9) JBC B receives flushReady() call from A, adds entry to a vector
> recording A is ready.
> 10) B receives prepare() response from A, sends commit().
> 12) B sends out RPC call with its address to "flushReady()" method
> 11) A receives commit(), commits tx.
> 12) A receives flushReady() call from B.  Adds entry to a vector
> recording that B is ready.
> 13) A sees that all other nodes are ready, returns from block().
> 14) B sees that all other nodes are ready, returns from block().
> Downside to this is complexity and requirement to add another method for
> the "flushReady()" RPC.
> D) A 3rd alternative is to just accept the problem.  The problem is a
> race condition where A blocks down events but then receives a prepare().
> Its response to prepare() cannot be sent.  The effect is JBC B's impl of
> FLUSH will detect the prepare() isn't progressing and at some point roll
> back the tx.  This will result in a rollback() message being sent to A.
> A can receive it and roll back the tx.  IIRC a rollback() is always
> async, so A does not need to send a response. A and B end up in a valid
> state.
> Downside of this is the tx gets rolled back.  This could be a frequent
> occurrence in high load scenarios because a new node in the cluster
> could be expected to very quickly call blockOK(), possibly even before
> the START_FLUSH message goes out on the wire.
> Brian Stansberry
> Lead, AS Clustering
> JBoss, a division of Red Hat

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira