[jbossts-issues] [JBoss JIRA] (JBTM-802) XTS BA coordinator drops completed as per spec but causes long delay for web service compensation
Andrew Dinn (Commented) (JIRA)
jira-events at lists.jboss.org
Fri Oct 21 12:28:45 EDT 2011
[ https://issues.jboss.org/browse/JBTM-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636540#comment-12636540 ]
Andrew Dinn commented on JBTM-802:
No, strictly it is not a bug as we are doing what is expected in the spec and it is functional. The problem raised here is a performance issue.
When the coordinator goes down before closing or cancelling a BA there may still be completed participants running. Those participants keep on sending completed notifications with an increasing timeout until at some point they decide to resend a getStatus message. The coordinator will no longer know about the BA after it reboots but is only allowed to reject the getStatus request so the web service can spend a long time waiting before it detects that it is game over for the BA (with our default timeout it takes about 600 seconds befoe the web service gets fed up and pings a getStatus()). If we break the spec and send invalidTX in response to completed then this will make the TX exit quicker.
Having thought about this a little more I am not sure this is necessarily the best way to achieve this. The web service is always allowed to send getStatus whenever it wants. So, we could consider change the behaviour on the web service side to make it detect the failure condition earlier without breaking the spec e.g. we could gradually increase the timeout resending, alternately, first a completed message and then a getStatus message.
This is maybe a better solution but it needs some careful adjusting. Currently, a successful response to getStatus resets the resend timeout to the minimum. But the whole point of gradually increasing the timeout is to avoid flooding the coordinator with 'and me' messages when it is waiting for another participant to complete before closing. This needs thinking through.
Whatever we implement we ought to implement it in trunk as well as in the EAP5 branch. The reason the fix was initially slated for there was that the problem arose with an EAP5 customer but that was not meant to indicate that we don't want a fix in trunk.
> XTS BA coordinator drops completed as per spec but causes long delay for web service compensation
> Key: JBTM-802
> URL: https://issues.jboss.org/browse/JBTM-802
> Project: JBoss Transaction Manager
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: XTS
> Affects Versions: 4.13.0, 4.6.1.CP08
> Reporter: Andrew Dinn
> Assignee: Paul Robinson
> Fix For: 4.6.1.CP13, 4.15.x, 5.0.0.Final
> The WSBA spec requires a coordinator to drop completed requests from a web service belonging to an unknown transaction. The idea is that the web service eventually gets bored and sends a getstatus message to see if the transacton is still valid at which point the coordinator dispatches an invalid state fault initiating compensation at the web service end. The problem is that timeing out the resending of completed and switching to sending getstatus is a tricky business. Done too early it can cause ping-ponging back and forth from completed to getstatus. Too late and it means it takes a long time before compensation is done.
> It is not clear from the spec that completed cannot also legitimately send an invalid state fault when it sees completed from an unknown participant. If so then doing this would significantly speed up recovery. This needs investigating to check that it will not cause interop problems and, if it is permissible, should be implemented.
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jbossts-issues