[infinispan-dev] SysAdmin operations for recovering transactions

Mircea Markus mircea.markus at jboss.com
Fri Mar 18 08:13:15 EDT 2011


Hi,

It's about the stage where TM's recovery  process finds a in-doubt transaction and notifies the sys admin about it: what hooks does ISPN provide to the sys admin in order to "fix" the tx.
E.g. step >= 3.3 : http://community.jboss.org/servlet/JiveServlet/showImage/102-16552-14-11811/3_non_originator_failure.png

Here is what I have in mind:

Expose (JMX) two operations:

   //all the params together fully describe a xid.
   replayTx(byte[] txBranch, byte[] txId, int formatId); 
   forceRollbackTx(byte[] txBranch, byte[] txId, int formatId);

Here is how these two ops would work:
A. replayTx 
    1. the node has locally the PrepareCommand associated with that XID
	- re-issues a prepare: TransactionXAResource.prepare
	- if successful re-issues a commit: TransactionXAResource.commit
        -if failure happens at any step the user is informed and she/he can re-do the JMX call
	- if success the recovery information is removed from the cluster (async)
    2. the node doesn't have the PrepareCommand associated with that XID
	- broadcast ReplayTxCommand (Xid)
        - when a node receives ReplayTxCommand
		- if doesn't have a PreparedCommand associated with the Xid ignores it
		- if has a PreparedCommand...
			- is it the first in the view that has it [1]? 
				- yes. Execute A.1then returns result to node that broadcasted ReplayTxCommand. This is guaranteed to happen on at most[2] one node in the cluster
				- no. Ignores it.
	- if success the recovery information is removed from the cluster (async)
B.rollbackTx
   - node broadcasts RollbackCommand
   - each node that has the PrepareCommand forces a rollback
   - each node that doesn't have the PreparedCommand ignores it
   - if success the recovery information is removed from the cluster (async)

Cheers,
Mircea

[1] this is determined by building the set of nodes on which tx spreads, based on tx's state. Then determine the first in the view. 
[2] it is possible not to happen on any node as the PrepareCommand might had been removed from all nodes in between (node failures, expiration from the recovery cache). 

   


  


More information about the infinispan-dev mailing list