Paul Robinson created AS7-5700:
----------------------------------
Summary: WSBAParticipantCompletionTestCase XTS integration test fails
intermittently
Key: AS7-5700
URL:
https://issues.jboss.org/browse/AS7-5700
Project: Application Server 7
Issue Type: Bug
Components: XTS
Reporter: Paul Robinson
Assignee: Paul Robinson
Priority: Minor
Fix For: 7.2.0.Alpha1, 7.1.4.Final (EAP)
When a WSBA participant employs the ParticipantCompletion protocol, it is the
responsibility of the participant to notify the coordinator when it has completed its
work. This notification is asynchronous.
In the 'WSBAParticipantCompletionTestCase' test, the client invokes the
participant's web service who notifies completion just before returning from the
invocation. The client then sends a message to the coordinator requesting to close
(complete) the activity.
As the "complete" message is asynchronous, we now have a race. If the
Client's close message is processed by the coordinator before the participant's
"complete" message, then the coordinator cancels the BA as not all participants
have completed. This results in the client receiving a TransactionRolledBackException and
the completed participant is (eventually) compensated. The outcome is atomic, but a BA
that would have otherwise succeeded, is unsuccessful.
In reality we only expect this scenario to happen in the rather artificial scenario where
all parties (client, coordinator and participants) are on the same server. It also only
seems to happen on very slow machines. Therefore, it's fine to fix the test to prevent
this scenario from arising, rather than to somehow change the protocol (without breaking
the WS-BA standard) to prevent it.
We have two options, that I can see to fix the test:
1) Byteman Rendezvous. Here we would introduce a dependency on Byteman and write a script
that delays the client's close message until all participants' 'complete'
messages have been acknowledged by the coordinator. This is probably an over-engineered
solution as we would be introducing Byteman, to these tests, for this single case.
2) We add a Thread.sleep(10000) to the test, just before the client sends the
'cloe' message to the coordinator. This is what we did to the XTS tests in the
JBossTS project as a stop-gap until we decided how to do it "properly".
I suggest we go with 2) as it is the simplest solution. The extra time added to the test
is just 20s as there are only two tests affected by this. In the future, when we have more
tests we should reduce this sleep period or consider using another solution (such as
Byteman), in order to keep the test duration acceptable.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:
http://www.atlassian.com/software/jira