[infinispan-dev] SysAdmin operations for recovering transactions

Friday, 18 March 2011

Hi,

It's about the stage where TM's recovery  process finds a in-doubt transaction and
notifies the sys admin about it: what hooks does ISPN provide to the sys admin in order to
"fix" the tx.
E.g. step >= 3.3 :
http://community.jboss.org/servlet/JiveServlet/showImage/102-16552-14-118...

Here is what I have in mind:

Expose (JMX) two operations:

   //all the params together fully describe a xid.
   replayTx(byte[] txBranch, byte[] txId, int formatId); 
   forceRollbackTx(byte[] txBranch, byte[] txId, int formatId);

Here is how these two ops would work:
A. replayTx 
    1. the node has locally the PrepareCommand associated with that XID
	- re-issues a prepare: TransactionXAResource.prepare
	- if successful re-issues a commit: TransactionXAResource.commit
        -if failure happens at any step the user is informed and she/he can re-do the JMX
call
	- if success the recovery information is removed from the cluster (async)
    2. the node doesn't have the PrepareCommand associated with that XID
	- broadcast ReplayTxCommand (Xid)
        - when a node receives ReplayTxCommand
		- if doesn't have a PreparedCommand associated with the Xid ignores it
		- if has a PreparedCommand...
			- is it the first in the view that has it [1]? 
				- yes. Execute A.1then returns result to node that broadcasted ReplayTxCommand. This
is guaranteed to happen on at most[2] one node in the cluster
				- no. Ignores it.
	- if success the recovery information is removed from the cluster (async)
B.rollbackTx
   - node broadcasts RollbackCommand
   - each node that has the PrepareCommand forces a rollback
   - each node that doesn't have the PreparedCommand ignores it
   - if success the recovery information is removed from the cluster (async)

Cheers,
Mircea

[1] this is determined by building the set of nodes on which tx spreads, based on tx's
state. Then determine the first in the view. 
[2] it is possible not to happen on any node as the PrepareCommand might had been removed
from all nodes in between (node failures, expiration from the recovery cache). 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[infinispan-dev] SysAdmin operations for recovering transactions