[
https://issues.jboss.org/browse/JBTM-949?page=com.atlassian.jira.plugin.s...
]
Paul Robinson commented on JBTM-949:
------------------------------------
We have now included the XTS recovery tests in CI. This job automated the process of
running the tests and generating the trace logs for inspection. Since then we've had
many failures from these tests. Two bugs in the code were found and a few bugs in the
tests. All these failures where due to timing issues that do not seem to occur on a fast
machine. Hence the reason why they where not spotted until now, as the recovery tests have
always been ran on powerful machines.
Whilst fixing these issues, I've had chance to understand, a lot better, how these
tests work, and we've come to the conclusion that the automated part of the test
actually tests everything that really matters. The Byteman script is used to crash the
system when it reaches a particular state. If this state is not met, then the test fails.
The system is then recovered and the tests check that the right outcome is sent to all
participants and that the TX log is tidied up. Again, the test fails if this was not the
case. The additional benefit of checking the trace logs is that we can ensure the correct
path was taken throughout the test. However, this process is very time consuming and, in
my experience, has never shown up any bugs, other than cosmetic issues with the Byteman
scripts (usually missmatch between the log messages and the expected log message). These
logs are very useful when tracking down the cause of a failure, so we should still keep
them, but not verify them by eye during each release.
Automate the verification of trace output from the XTS crash recovery
tests
---------------------------------------------------------------------------
Key: JBTM-949
URL:
https://issues.jboss.org/browse/JBTM-949
Project: JBoss Transaction Manager
Issue Type: Enhancement
Security Level: Public(Everyone can see)
Components: XTS
Reporter: Paul Robinson
Assignee: Paul Robinson
Fix For: 5.0.0.M2
Currently it is very difficult to verify the trace output from the XTS crash recovery
tests. With the current code it is infeasible to run multiple servers as the trace output
will span many files making it difficult to establish the correct order in which events
occurred.
I think, the following changes will make the test verification automatic and the tests
scalable to many participants:
# Carry out assertions in Byteman as the test progresses. Assertions at Runtime should be
more flexible as more info is available.
# Each participant is concerned only with the correctness of their own participation.
This is key to scalability to many participants.
# Anything that can't be solved by the above is dumped to one trace file per server
and is hopefuly simple enough for scriptable post-verification.
I think my idea needs prototyping first to check that it is feasible in practice.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see:
http://www.atlassian.com/software/jira