[
https://issues.jboss.org/browse/JBTM-802?page=com.atlassian.jira.plugin.s...
]
Andrew Dinn commented on JBTM-802:
----------------------------------
No, strictly it is not a bug as we are doing what is expected in the spec and it is
functional. The problem raised here is a performance issue.
When the coordinator goes down before closing or cancelling a BA there may still be
completed participants running. Those participants keep on sending completed notifications
with an increasing timeout until at some point they decide to resend a getStatus message.
The coordinator will no longer know about the BA after it reboots but is only allowed to
reject the getStatus request so the web service can spend a long time waiting before it
detects that it is game over for the BA (with our default timeout it takes about 600
seconds befoe the web service gets fed up and pings a getStatus()). If we break the spec
and send invalidTX in response to completed then this will make the TX exit quicker.
Having thought about this a little more I am not sure this is necessarily the best way to
achieve this. The web service is always allowed to send getStatus whenever it wants. So,
we could consider change the behaviour on the web service side to make it detect the
failure condition earlier without breaking the spec e.g. we could gradually increase the
timeout resending, alternately, first a completed message and then a getStatus message.
This is maybe a better solution but it needs some careful adjusting. Currently, a
successful response to getStatus resets the resend timeout to the minimum. But the whole
point of gradually increasing the timeout is to avoid flooding the coordinator with
'and me' messages when it is waiting for another participant to complete before
closing. This needs thinking through.
Whatever we implement we ought to implement it in trunk as well as in the EAP5 branch. The
reason the fix was initially slated for there was that the problem arose with an EAP5
customer but that was not meant to indicate that we don't want a fix in trunk.
XTS BA coordinator drops completed as per spec but causes long delay
for web service compensation
-------------------------------------------------------------------------------------------------
Key: JBTM-802
URL:
https://issues.jboss.org/browse/JBTM-802
Project: JBoss Transaction Manager
Issue Type: Bug
Security Level: Public(Everyone can see)
Components: XTS
Affects Versions: 4.13.0, 4.6.1.CP08
Reporter: Andrew Dinn
Assignee: Paul Robinson
Fix For: 4.6.1.CP13, 4.15.x, 5.0.0.Final
The WSBA spec requires a coordinator to drop completed requests from a web service
belonging to an unknown transaction. The idea is that the web service eventually gets
bored and sends a getstatus message to see if the transacton is still valid at which point
the coordinator dispatches an invalid state fault initiating compensation at the web
service end. The problem is that timeing out the resending of completed and switching to
sending getstatus is a tricky business. Done too early it can cause ping-ponging back and
forth from completed to getstatus. Too late and it means it takes a long time before
compensation is done.
It is not clear from the spec that completed cannot also legitimately send an invalid
state fault when it sees completed from an unknown participant. If so then doing this
would significantly speed up recovery. This needs investigating to check that it will not
cause interop problems and, if it is permissible, should be implemented.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see:
http://www.atlassian.com/software/jira