improve XAResource recovery scan timing
---------------------------------------
Key: JBTM-842
URL:
https://issues.jboss.org/browse/JBTM-842
Project: JBoss Transaction Manager
Issue Type: Enhancement
Security Level: Public (Everyone can see)
Components: Recovery
Affects Versions: 4.15.0
Reporter: Jonathan Halliday
Assignee: Jonathan Halliday
Fix For: 4.15.1
The current XAResource recovery algorithm includes a requirement that an suspect orphan
Xid appear in two recovery scans before it is eligible for rollback under presumed abort.
This closes the timing window where a tx branch has been prepared for normal termination
but not yet committed. It's based on the assumption that the interval between scans is
long enough for any normally executing tx to proceed from prepare to commit. Where the
scans occur as part of consecutive recovery passes, this is generally the case.
However...
There are two cases in which scans can happen in quick succession, thus nullifying the
safeguard and causing the recovery system to incorrectly presume abort on a branch. The
first is caused by top down recovery running in the same pass before bottom up recovery.
If a tx log contains an xaresourcerecord, its instantiation may cause a recovery scan. The
second and more common case is where the user has incorrectly registered two or more
recovery resources for the same RM. At first glance it seems possible to add a check to
prevent this misconfiguration, but in practice it's not possible to write a robust
comparison method that will work with all known RMs. Thus, as it is infeasible to prevent
multiple scans occurring in quick succession, it is instead necessary to change the
safeguard algorithm.
We should require a set interval to pass after the first sighting of an Xid before
considering it eligible for presumed abort, rather than requiring a given number of scan
passes. This will require modification of XARecoveryModule.xaRecovery and RecoveryXids.
An additional safeguard may be created by supplementing the existing transaction log based
XAResourceOrphanFilter with one based on TransactionImple.getTransaction, at least for
configurations where the recovery manager is running in-process.
--
This message is automatically generated by JIRA.
For more information on JIRA, see:
http://www.atlassian.com/software/jira