Here is a transcript of conversation that I had with Brian regarding the algorithm details
of JBCACHE-315:
Replying privately, but think we should take this to jbcache-dev or the forum. This is
complex and Vladimir Blagojevic wrote:
Hey Brian,
I though a bit more about the locking algorithm and I would like to
bounce it off you. If you recall we agreed on our phone call that we
have to go through the steps of:
a) set a flag or something that will block any ACTIVE transactions
from proceeding (i.e. entering prepare() phase of 2PC)
We then revised this by saying that infact "any ACTIVE transactions"
should rather be "any ACTIVE locally initiated transactions". We also
agreed how we can do this by using a latch in TxInterceptor.
b) wait for completion of any transactions that are beyond ACTIVE.
We thought that this was great idea but soon realized that
transactions can still deadlock. You said:"For example, locally
initiated transaction is holding a lock on some node and you have
remote prepare that comes in. Remote won't be able to acquire lock. At
some point we have to deal with that. Whoever sent that prepare call
isn't going to proceed - sender will block on that synchronous call.
So on remote node prepare is not going to be progressing."
To be even more specific:
On the remote node (i.e. the one we're working on) the JG up-handler thread will be
blocked while the prepare() call waits to acquire a lock. That thread will block until
there is a lock timeout. This will occur whether we are using REPL_ASYNC or REPL_SYNC.
One effect of this is that no other JG messages will be received until there is a timeout.
Note also that *I think* that having another thread roll back the tx associated with the
prepare call will not cause the JG up-handler thread to unblock!!
If REPL_SYNC, on the node that originated the GTX, the client thread that committed the tx
will be blocking waiting for a response to the prepare() call.
I have a another proposal. If we already have to introduce a latch why
not introduce it in "better" location. So the proposal is to introduce
our latch in InvocationContextInterceptor rather than in
TxInterceptor.
InvocationContextInterceptor is always first interceptor in the chain.
By introducing a latch here we can inspect a call and determine its
origin and transactional status and block transactions prior to them
grabbing any locks.
Can this be done in the TxInterceptor? I.e. isn't it always before any
LockInterceptor? I would think it would be. I expect Manik would put up a fuss about
doing tx-related stuff outside TxInterceptor; the whole reason it was added in 1.3.0 was
to encapsulate stuff that was previously spread around other interceptors.
If transaction
is originating locally and has not been registered in
TransactionalTable (had not yet performed any operation) block it on
latch prior it has a chance to acquire any locks.
+1. No reason to let a tx that hasn't acquired any locks go through and cause
trouble.
Then we look at the table and rollback any local transactions that
have not yet gone to prepare i.e transactions that we have missed with
our latch. If any rollbacked transaction retries it will be caught by
our latch:) All other transactions we let go through. Start a timer
and give it enough time to have beyond prepare transactions finish.
So in pseudocode, algorithm executed on each node:
receive block call
flip a latch in InvocationContextInterceptor and block any subsequent
local transactions
rollback local transactions if not yet in prepare phase and start
timer T (allow some time for beyond prepare transactions to finish)
if lock still exists at integration node after T expires rollback our
local transaction
flip latch back and allow transactions to proceed
return from block ok
flush blocks all down threads (thus no prepare will go through
although local transactions will proceed on each node)
Proceed with algorithm on state provider:
receive getState
grab a lock on integration point using LockUtil.breakLock variant
possibly rolling back some local
transactions read state and do state transfer with state receiver
when state transfer is done prepare messages will hit the cluster and
state will be consistent
no matter what happens will all global transactions
The concern I have with this is we give up one of the key goals -- not rolling back a tx
if its not hurting anything. Here we assume that an ACTIVE locally originated tx is going
to cause a problem by blocking a remote prepare() call. So we roll back the tx. Actually
the odds of a remote prepare() call being blocked are pretty low.
How about this:
1) receive block call
2) flip a latch in TxInterceptor (I'm assuming it will work putting it here instead of
InvocationContextInterceptor). This latch is used at 2 or 3 different control points to
block any threads that:
a) are not associated with a GTX (i.e. non-transactional reads/writes from the local
server)
b) are associated with a GTX, but not yet in TransactionTable (your idea above)
c) are associated with a locally originated GTX and are about to enter the
beforeCompletion phase (i.e. the original idea of preventing the tx from proceeding to
making a prepare() call.)
3) Loop through the GTXs in the TransactionTable. Create and throw in a map little object
for each GTX. Object is a state machine that uses the JTA Transaction.STATUS, the elapsed
time and whether the tx is locally or remotely originated to govern its state transitions.
Keep looping through the TransactionTable, create more of these objects if more GTXs
appear and for each GTX update the object with the current Transaction.STATUS, then read
the object state. The object state tells you whether you need to rollback the tx, etc.
4) If a the state machine is for a *remotely initiated* GTX that's in ACTIVE status,
after some elapsed time its state will tell you that its likely held up by a lock conflict
with a locally originated tx. At that point we have a choice.
a) roll back all locally originated tx's that are ACTIVE or PREPARED. Con:
indiscriminately breaks transactions. Con: if tx has already entered beforeCompletion() we
don't know whether its in prepare() call or later. We can only roll it back during
beforeCompletion(); otherwise we introduce a heuristic.
b) roll back the remotely originated tx. Pro: doesn't indiscriminately break
transactions. Con: I *think* this rollback won't unblock the JG up-handler thread.
5) We'd need to work out all the state transitions; i.e. what conditions lead to tx
rollback.
6) flip latch back and allow transactions to proceed
7) return from block ok
So in summary the goal of the first part of the algorithm is to allow
transactions beyond prepare to finish and prevent any local
transactions from hitting the cluster and becoming global. That leaves
us dealing only with local transactions at state provider in the
second part of the algorithm. In the second part with deal just with
state provider. We grab lock at integration point, possibly rollback
any local transaction there, do state transfer and let the prepares
hit the cluster thus preserving state consistency and disturbing least
number of global transactions.
It seems like if we had that blockDone Jgroups callback then things
would be nicer,algorithm executed on each node:
receive block call
flip a latch in InvocationContextInterceptor and block any subsequent
local transactions
rollback local transactions if not yet in prepare phase and start
timer T
if lock still exists at integration node after T expires rollback our
local transaction
return from block ok
flush blocks all down threads (thus no prepare will go through
although local transactions will proceed on each node)
Proceed with algorithm on state receiver and provider:
do state transfer
Proceed with algorithm executed on each node:
flip latch back and allow transactions to proceed
Here's a question for you about FLUSH: When a service returns from block() or sends
blockOK, does the channel immediately block? Is there coordination across the cluster?
My concern:
Node A doesn't have much going on, quickly returns from block(), so his channel is
blocked.
Node B takes a little longer; has some txs the completion of which requires sending
messages to A. Those messages don't get through due to A being blocked.
View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=3991386#...
Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...