On 18 Mar 2011, at 12:13, Mircea Markus wrote:
Hi,
It's about the stage where TM's recovery process finds a in-doubt transaction
and notifies the sys admin about it: what hooks does ISPN provide to the sys admin in
order to "fix" the tx.
E.g. step >= 3.3 :
http://community.jboss.org/servlet/JiveServlet/showImage/102-16552-14-118...
Here is what I have in mind:
Expose (JMX) two operations:
//all the params together fully describe a xid.
replayTx(byte[] txBranch, byte[] txId, int formatId);
forceRollbackTx(byte[] txBranch, byte[] txId, int formatId);
You expect a sysadmin to type a byte array into a JMX console? :-) You might get death
threats from sysadmins...
Here is how these two ops would work:
A. replayTx
1. the node has locally the PrepareCommand associated with that XID
- re-issues a prepare: TransactionXAResource.prepare
- if successful re-issues a commit: TransactionXAResource.commit
-if failure happens at any step the user is informed and she/he can re-do the JMX
call
- if success the recovery information is removed from the cluster (async)
2. the node doesn't have the PrepareCommand associated with that XID
- broadcast ReplayTxCommand (Xid)
- when a node receives ReplayTxCommand
- if doesn't have a PreparedCommand associated with the Xid ignores it
- if has a PreparedCommand...
- is it the first in the view that has it [1]?
How does a node know the answer to this question? Is the list of nodes that holds the
prepare replay info stored on the PrepareCommand?
- yes. Execute A.1then returns result to node that broadcasted
ReplayTxCommand. This is guaranteed to happen on at most[2] one node in the cluster
- no. Ignores it.
- if success the recovery information is removed from the cluster (async)
B.rollbackTx
- node broadcasts RollbackCommand
- each node that has the PrepareCommand forces a rollback
- each node that doesn't have the PreparedCommand ignores it
- if success the recovery information is removed from the cluster (async)
Cheers,
Mircea
[1] this is determined by building the set of nodes on which tx spreads, based on
tx's state. Then determine the first in the view.
[2] it is possible not to happen on any node as the PrepareCommand might had been removed
from all nodes in between (node failures, expiration from the recovery cache).
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Manik Surtani
manik(a)jboss.org
twitter.com/maniksurtani
Lead, Infinispan
http://www.infinispan.org