Make WS-AT and WS-BA participants gradually increment the resend period when retrying
prepared/completed messages
-----------------------------------------------------------------------------------------------------------------
Key: JBTM-487
URL:
https://jira.jboss.org/jira/browse/JBTM-487
Project: JBoss Transaction Manager
Issue Type: Feature Request
Security Level: Public (Everyone can see)
Components: WS-T Implementation
Affects Versions: 4.5
Reporter: Andrew Dinn
Assignee: Andrew Dinn
Fix For: 4.6
The current WS-AT/BA participant implementations resend prepared/completed messages at a
fixed frequency until they receive a response. The period is currently defined by a
settable property of class TransportTimer (which defaults to 5 seconds). It would be
better if the period between resends could be configured to increase gradually up to some
maximum period (obviously setting the maximum period equal to the initial period maintains
the status quo). The benefit of a higher period is that it avoids resends using up the
available network bandwidth when a response from the coordinator is slow. It is
particularly beneficial in the case where a web service employs BA ParticipantCompletion
participants since there may be a very long delay between the first completed message
being sent and a subsequent close or cancel operation being dispatched by the coordinator.
If the service is likely to support many long-running transactions then configuring a high
maximum resend period will limit the extent to which resent messages clogg up the network.
The downside of increasing the resend period is that a higher value means a higher latency
before participant (bottom-up) recovery is initiated following a coordinator crash.
It would also be useful if the initial and maximum resend period could be configured via
bean properties associated with the XTS Service bean.
Note that it only makes sense to implement this feature for retries dispatched from the
participant side. Retries only occur on the coordinator side while the coordinator is
waiting for a specific response from the participant and the wait will always timeout and
cancel further retries in these cases (using the timeout interval defined by
TransportTimer -- default 30 seconds).
This change is a preliminary to a related change required to successfully recover BA
participants. In order to detect coordinator crashes which occur between complete and
close/cancel they need switch from sending Completed messages to sending GetStatus
messages until they get a response or an invalid transaction/participant soap fault. The
switchover algorithm needs to be defined to kick in compatibly with this incremental
resend.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira