[JBoss JIRA] Created: (JBTM-792) Race condition between thread sending COMPLETE and thread handling NOT_COMPLETED causes error
by Andrew Dinn (JIRA)
Race condition between thread sending COMPLETE and thread handling NOT_COMPLETED causes error
---------------------------------------------------------------------------------------------
Key: JBTM-792
URL: https://jira.jboss.org/browse/JBTM-792
Project: JBoss Transaction Manager
Issue Type: Bug
Security Level: Public (Everyone can see)
Components: XTS
Affects Versions: 4.12.0
Reporter: Andrew Dinn
Assignee: Andrew Dinn
Fix For: 4.13.0
BA CoordinatorCompletion participants suffer a timeout failure if they call BAParticipantManager.cannotComplete() underneath a call to BAWithCCParticipant.complete(). This problem arises because of a race condition on the coordinator side between the thread sending the COMPLETE request and the thread handling the NOT_COMPLETING response. AN error on the coordinator side means that the participant does not get sent the expected NOT_COMPLETED acknowledgement.
Here is how it goes worng:
The sending thread dispatches a COMPLETED message then waits on a change to the coordinator engine state.
The participant side receives COMPLETED and the participant calls cannotComplete, sending NOT_COMPLETING
The handler thread is started in response to an incoming NOT_COMPLETING message. The first thing it does is set the coordinator engine state to NOT_COMPLETING.
Chequered flag raised!
The sending thread wakes up and finds that the state has transitioned to something other than COMPLETED. If the complete was done as part of a client close request it tries to notify the coordinator to abort
In the meantime the handler tries to notify the coordinator that the participant cannot complete.
Chequered flag down!
If the sender wins the race the handler thread finds an ABORTED coordinator and blows up with a WrongState exception which means it does not send NOT_COMPLETED (also, it does nto clear out the transaction).
n.b. this is exactly the same problem as was dealt with earlier when fail requests were being sent during COMPLETE/CANCEL/COMPENSATE processing.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
14 years, 3 months
[JBoss JIRA] Created: (JBTM-784) incorrect ORB initialization
by Jonathan Halliday (JIRA)
incorrect ORB initialization
----------------------------
Key: JBTM-784
URL: https://jira.jboss.org/browse/JBTM-784
Project: JBoss Transaction Manager
Issue Type: Bug
Security Level: Public (Everyone can see)
Components: JTS, Recovery
Affects Versions: 4.12.0
Reporter: Jonathan Halliday
Assignee: Mark Little
Fix For: 4.13.0
When running the TransactionManager and RecoveryManager in the same JVM, the ORB initialization incorrectly attempts to setup the ORB twice:
WARN: ARJUNA-22251 The ORBManager is already associated with an ORB/OA.
The correct model is one ORB and two POAs, whereas we currently have two ORBs. See ORBManager; RecoveryORBManager (both have ORB field); ORB/InternalORB (why the subclass?) method initORB->InitLoader->ORBSetup; JacOrbRCServiceInit
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
14 years, 3 months
[JBoss JIRA] Created: (JBTM-786) XTS participant API should allow participant to complete in 2 phases
by Andrew Dinn (JIRA)
XTS participant API should allow participant to complete in 2 phases
--------------------------------------------------------------------
Key: JBTM-786
URL: https://jira.jboss.org/browse/JBTM-786
Project: JBoss Transaction Manager
Issue Type: Bug
Security Level: Public (Everyone can see)
Components: XTS
Affects Versions: 4.12.0
Reporter: Andrew Dinn
Assignee: Andrew Dinn
Currently, when an XTS participant completes it is expected 1) to persist all its changes and then allow the XTS participant management code 2) to log a recovery record and 3) to notify the coordinator that the participant completed. The exact path taken varies depending upon the participant type but the sequence is always the same.
So, a participant completion participant is expected to persist its changes and call the BAManager completed method. This initiates creation and logging of the recovery record and dispatch of a COMPLETED message to the coordinator.
A coordinator completion participant is not expected to persist its changes until its completed method is called. It is after this call returns that the XTS participant management code creates and logs the recovery record and dispatches a COMMITTED message.
This leaves a window open between 1 and 2 where a crash may occur, leaving persistent changes committed with no information available describing how to compensate them. In order to close this window a 2 phase protocol must be used when saving the changes:
1) participant prepares changes to be persisted
2) XTS participant manager logs recovery record
3) participant commits changes
4) XTS participant manager dispatches COMPLETED message
This ensures that changes are only actually committed to persistent storage when a recovery record is in place in the log, a precondition for the commit to be safe. It also ensures that the coordinator cannot be told the participant has completed unless the changes truly have been persisted.
However, this is not the full story. Clearly, this only works if the (application-specific) participant recovery module takes steps during recovery to resolve crashes between stages 1 and 2 or stages 2 and 3. Note that a crash between stages 3 and 4 is already handled by the existing recovery code.
So, the extra steps required are as follows:
The participant recovery module must be able to detect uncommitted change sets at recovery time.
The recovery record for a participant must include information allowing the associated unprepared change set to be identified.
At the first participant recovery pass when presented with a recovery record for participant p with change set u
- if u no longer exists then it has been committed so simply recreate p, allowing COMPLETED to be resent
- if u still exists then a crash occurred between stages 2 and 3 so, either commit the changes and recreate p or roll back the changes and reject the recovery record (causing it to be garbage collected).
In the former case this is safe because the changes have been completed as expected. In the latter case this is safe because the coordinator will not have seen a COMPLETED message so any attempt to close the activity will fail (if this is a coordinator completion participant and the coordinator resends COMPLETE it will get an unknown participant fault). A cancel request will proceed without error because of presumed abort.
After the first participant recovery pass has completed any change set u' which has not been rolled forward must be present because of a crash between stages 1 and 2. Thsi situation can be handled by rolling back the changes. Once again this is safe because the coordinator will not have seen a COMPLETED message.
Once this is fixed the demo should be updated to ensure that it uses this API.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
14 years, 3 months