[jbossts-issues] [JBoss JIRA] Commented: (JBTM-786) XTS participant API should allow participant to complete in 2 phases

Fri Sep 17 04:11:29 EDT 2010

    [ https://jira.jboss.org/browse/JBTM-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12551474#action_12551474 ] 

Andrew Dinn commented on JBTM-786:
----------------------------------

The WSAT and WSBA application recovery module API also needs changing to include an endScan call once a scan has completed. During the first scan the application recovery module should use info in the recovered participants to identify prepared but uncommited changes to local state and either retain the local prepared state (AT) or commit the local prepared state (BA). An endScan notification allows the application recovery module to roll back any remaining local prepared state. Without this notification it cannot release this state hence cannot free any associated locks.

> XTS participant API should allow participant to complete in 2 phases
> --------------------------------------------------------------------
>
>                 Key: JBTM-786
>                 URL: https://jira.jboss.org/browse/JBTM-786
>             Project: JBoss Transaction Manager
>          Issue Type: Bug
>      Security Level: Public(Everyone can see) 
>          Components: XTS
>    Affects Versions: 4.12.0
>            Reporter: Andrew Dinn
>            Assignee: Andrew Dinn
>
> Currently, when an XTS participant completes it is expected 1) to persist all its changes and then allow the XTS participant management code 2) to log a recovery record and 3) to notify the coordinator that the participant completed. The exact path taken varies depending upon the participant type but the sequence is always the same.
> So, a participant completion participant is expected to persist its changes and call the BAManager completed method. This initiates creation and logging of the recovery record and dispatch of a COMPLETED message to the coordinator.
> A coordinator completion participant is not expected to persist its changes until its completed method is called. It is after this call returns that the XTS participant management code creates and logs the recovery record and dispatches a COMMITTED message.
> This leaves a window open between 1 and 2 where a crash may occur, leaving persistent changes committed with no information available describing how to compensate them. In order to close this window a 2 phase protocol must be used when saving the changes:
> 1) participant prepares changes to be persisted
> 2) XTS participant manager logs recovery record
> 3) participant commits changes
> 4) XTS participant manager dispatches COMPLETED message
> This ensures that changes are only actually committed to persistent storage when a recovery record is in place in the log, a precondition for the commit to be safe. It also ensures that the coordinator cannot be told the participant has completed unless the changes truly have been persisted.
> However, this is not the full story. Clearly, this only works if the (application-specific) participant recovery module takes steps during recovery to resolve crashes between stages 1 and 2 or stages 2 and 3. Note that a crash between stages 3 and 4 is already handled by the existing recovery code.
> So, the extra steps required are as follows:
> The participant recovery module must be able to detect uncommitted change sets at recovery time.
> The recovery record for a participant must include information allowing the associated unprepared change set to be identified.
> At the first participant recovery pass when presented with a recovery record for participant p with change set u
>  - if u no longer exists then it has been committed so simply recreate p, allowing COMPLETED to be resent
>  -  if u still exists then a crash occurred between stages 2 and 3 so, either commit the changes and recreate p or roll back the changes and reject the recovery record (causing it to be garbage collected).
> In the former case this is safe because the changes have been completed as expected. In the latter case this is safe because the coordinator will not have seen a COMPLETED message so any attempt to close the activity will fail (if this is a coordinator completion participant and the coordinator resends COMPLETE it will get an unknown participant fault). A cancel request will proceed without error because of presumed abort.
> After the first participant recovery pass has completed any change set u' which has not been rolled forward must be present because of a crash between stages 1 and 2. Thsi situation can be handled by rolling back the changes. Once again this is safe because the coordinator will not have seen a COMPLETED message.
> Once this is fixed the demo should be updated to ensure that it uses this API.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira