Vladimir,
FYI, the stuff I wrote before for breaking locks, rolling back transactions for state
transfer can be found in o.j.c.lock.LockUtil.forceAcquireLock(). This would have been
called from StateTransferManager.acquireLocksForStateTransfer() line 401, but it was
disabled per the previous discussions in this thread.
This was written not for a FLUSH/block use case, but rather to be called as part of a
getState() call. See
http://jira.jboss.com/jira/browse/JBCACHE-315. So, it can't be
used directly for FLUSH, but elements may be useful. It's kind of ugly code though;
needed another round or two of polishing when we decided to shelve it. And lots of
testing. There is an o.j.c.statetransfer.ForcedStateTransferTest that's meant for
this; it should fail now due to code being disabled and is ignored by cruisecontrol.
Issues with LockUtil.forceAcquireLock():
1) It combines a) breaking locks and b) waiting for tx's to complete followed by
attempted rollback of those that don't into one operation. That most likely should be
somewhat separated.
2) It deals with transactions in all stages, including ACTIVE. We only want to deal with
ACTIVE transactions on the node that is preparing the state transfer. There's no
reason to touch an active transaction on some 3rd node, as the locks it's holding will
not affect anything.
A tricky thing there is during block() how does a node know its going to be preparing a
state transfer (and therefore needs to deal with ACTIVE transactions)? It's not as
simple as checking if the node is coordinator, because a partial state transfer call can
go to any node.
Perhaps an algorithm would be:
1) In block() on all nodes
a) set a flag or something that will block any ACTIVE transactions from proceeding
(i.e. entering prepare() phase of 2PC)
b) wait for complete / rollback any transactions that are beyond ACTIVE
2) In getState() on node that's providing state, execute something pretty much like
what I already wrote. That will deal with
a) any ACTIVE transactions on that node
b) any leftover locks held by sick nodes that didn't properly clean up after
themselves in block()
3) What's the call to "unblock"? How does the cache know its safe to resume
normal operations (e.g. revert flag set in 1a)?
View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=3971092#...
Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...