[
https://issues.jboss.org/browse/JBTM-1522?page=com.atlassian.jira.plugin....
]
Amos Feng commented on JBTM-1522:
---------------------------------
yeah, I review the codes of xts subsystem.
{code}
// add an XTS service which depends on all the WS endpoints
final XTSManagerService xtsService = new XTSManagerService(coordinatorURL,
isDefaultContextPropagation);
// this service needs to depend on the transaction recovery service
// because it can only initialise XTS recovery once the transaction recovery
// service has initialised the orb layer
ServiceBuilder<?> xtsServiceBuilder =
target.addService(XTSServices.JBOSS_XTS_MAIN, xtsService);
// add dependencies
xtsServiceBuilder.addDependency(...);
// install
xtsServiceBuilder
.setInitialMode(Mode.ACTIVE)
.install();
{code}
XTSManagerService.start() will call XTSService which calls XTSInitialisation.startup() and
install the recovery modules.
these are running in the MSC service thread and the xtstest.war deploy in the
ServerService Thread Pool. see the following logs
{code}
17:44:23,546 INFO
[org.jboss.jbossts.xts.recovery.participant.at.XTSATRecoveryManagerImple] (MSC service
thread 1-3) enter registerRecoveryModule
17:44:23,547 INFO
[org.jboss.jbossts.xts.recovery.participant.at.XTSATRecoveryManagerImple] (MSC service
thread 1-3) adding ATRecovery module
org.jboss.jbossts.xts.recovery.participant.at.XTSATSubordinateRecoveryModule@3a31e008
17:44:23,548 INFO
[org.jboss.jbossts.xts.recovery.participant.at.XTSATRecoveryManagerImple] (MSC service
thread 1-3) added ATRecovery module
org.jboss.jbossts.xts.recovery.participant.at.XTSATSubordinateRecoveryModule@3a31e008
17:44:23,550 INFO
[org.jboss.jbossts.xts.recovery.participant.ba.XTSBARecoveryManagerImple] (MSC service
thread 1-3) enter registerRecoveryModule
17:44:23,551 INFO
[org.jboss.jbossts.xts.recovery.participant.ba.XTSBARecoveryManagerImple] (MSC service
thread 1-3) adding BARecovery module
org.jboss.jbossts.xts.recovery.participant.ba.XTSBASubordinateRecoveryModule@a6f371a
17:44:23,551 INFO
[org.jboss.jbossts.xts.recovery.participant.ba.XTSBARecoveryManagerImple] (MSC service
thread 1-3) added BARecovery module
org.jboss.jbossts.xts.recovery.participant.ba.XTSBASubordinateRecoveryModule@a6f371a
...
17:44:23,919 INFO [org.jboss.jbossts.xts.servicetests.bean.XTSServiceTestRunnerBean]
(ServerService Thread Pool -- 65) Starting XTSServiceTestRunner
17:44:23,919 INFO
[org.jboss.jbossts.xts.servicetests.service.recovery.TestATRecoveryModule] (ServerService
Thread Pool -- 65) registering TestATRecoveryModule
17:44:23,919 INFO
[org.jboss.jbossts.xts.recovery.participant.at.XTSATRecoveryManagerImple] (ServerService
Thread Pool -- 65) enter registerRecoveryModule
17:44:23,920 INFO
[org.jboss.jbossts.xts.recovery.participant.at.XTSATRecoveryManagerImple] (ServerService
Thread Pool -- 65) adding ATRecovery module
org.jboss.jbossts.xts.servicetests.service.recovery.TestATRecoveryModule@39dbb3f
17:44:23,920 INFO
[org.jboss.jbossts.xts.recovery.participant.at.XTSATRecoveryManagerImple] (ServerService
Thread Pool -- 65) added ATRecovery module
org.jboss.jbossts.xts.servicetests.service.recovery.TestATRecoveryModule@39dbb3f
17:44:23,920 INFO
[org.jboss.jbossts.xts.servicetests.service.recovery.TestATRecoveryModule] (ServerService
Thread Pool -- 65) registered TestATRecoveryModule
{code}
so the XTSServiceTestRunnerBean.start() might fire before XTSManagerService.start() as
they are running in the different thread.
xtstest.war has already the dependency of "org.jboss.xts" but I think it can not
guard the start() order.
"no XTS application recovery module found" during XTS
Recovery Tests
--------------------------------------------------------------------
Key: JBTM-1522
URL:
https://issues.jboss.org/browse/JBTM-1522
Project: JBoss Transaction Manager
Issue Type: Bug
Security Level: Public(Everyone can see)
Components: Testing, XTS
Reporter: Paul Robinson
Assignee: Amos Feng
Priority: Critical
Fix For: 5.0.0.M3
Attachments: 2013-04-01_jbtm-1522_outputs.zip,
com.arjuna.qa.junit.TestATHeuristicRecoveryAfterDelayedCommit-output.txt,
com.arjuna.qa.junit.TestATHeuristicRecoveryAfterDelayedCommit.txt,
com.arjuna.qa.junit.TestBACrashDuringCommit-2.txt,
com.arjuna.qa.junit.TestBACrashDuringCommit-output-2.txt,
com.arjuna.qa.junit.TestBACrashDuringCommit-output.txt,
com.arjuna.qa.junit.TestBACrashDuringCommit.txt,
com.arjuna.qa.junit.TestBASubordinateCrashDuringCommitAfterSubordinateExit-output.txt,
com.arjuna.qa.junit.TestBASubordinateCrashDuringCommitAfterSubordinateExit.txt
See:
http://172.17.131.2/view/Narayana+BlackTie/job/narayana/211/artifact/XTS/...
Notice the following log is displayed repeatedly until the test gives up waiting for
recovery:
{code}
WARN [com.arjuna.wsrecovery] (Periodic Recovery) ARJUNA046032: no XTS application
recovery module found to help reactivate recovered WS-AT participant
org.jboss.jbossts.xts.servicetests.DurableTestParticipant.0
{code}
This error comes from
org.jboss.jbossts.xts.recovery.participant.at.XTSATRecoveryManagerImple#recoverParticipants().
In particular:
{code}
if (!found) {
// we failed to find a helper to convert a participant record so log
a warning
// but leave it in the table for next time
RecoveryLogger.i18NLogger.warn_participant_at_XTSATRecoveryModule_4(participantRecoveryRecord.getId());
}
{code}
It looks like the code is unable to restore the participant from the log due to
restoreParticipant(XTSATRecoveryModule module) returning false. There is
ParticipantRecoveryRecord in the log as you can see it dumped to the console in the above
log. Maybe there is a problem with that log, or maybe we are missing another log entry?
This problem is intermittent, so it's unlikely that you will see this happen when you
attach a debugger. However, we could attach a debugger to see what happens in the normal
case and also to inspect the log to see if anything is missing in the failing case. But I
have a cunning plan...
h4.Cunning Plan
We need to get a copy of the failing log, before recovery is attempted. We should then be
able to use that log to reproduce the issue on our own machines. Steps to take:
# Update BaseCrashTest to copy the contents of the tx-object-store to a unique folder
location (So we can retrieve it later for a failed run). Make sure you create the folder
structure under target/surefire-reports so that CI archives it off. Do the copy between
controller.kill and controller.start. This way we get the log before the recovery manager
has had chance to tamper with it.
# Update the "narayana-JBTM-1522" job in CI to use your branch, containing the
change above.
# Configure the job to run @hourly until it fails with this problem.
# Take a copy of the tx-object-store from the failing test and then put it in place on
your AS8 build.
# Boot the AS and confirm that the issue is reproduced.
# You can now keep putting the tx-object-store back in place every time you need to
reproduce the issue.
# Attach a debugger to find out what the problem is.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:
http://www.atlassian.com/software/jira