]
Amos Feng edited comment on JBTM-3015 at 4/25/18 3:51 AM:
----------------------------------------------------------
Thanks Tom for taking look at the issue. And I think there two issues in the test
1. The TestExecutor.writeToTheDatabase has not invoke the connection.close() and this
could cause the connection leak.
2. As Tom found the mistake with the properties passing and I think it should be fixed
with his PR.
was (Author: zhfeng):
Thanks Tom for taking look at the issue. And I think there two issues in the test
1. The BaseITCase.writeToTheDatabase has not invoke the connection.close() and this could
cause the connection leak.
2. As Tom found the mistake with the properties passing and I think it should be fixed
with his PR.
Tomcat JTA: Race condition in tx recovery makes PeriodicRecovery run
forever
----------------------------------------------------------------------------
Key: JBTM-3015
URL:
https://issues.jboss.org/browse/JBTM-3015
Project: JBoss Transaction Manager
Issue Type: Bug
Components: Tomcat
Affects Versions: 5.next
Environment: Oracle JDK 8, Linux, 8 CPU system
Reporter: Michal Karm Babacek
Assignee: Amos Feng
Priority: Critical
Attachments: good_threaddumps.zip, postgresql_trace_log.zip, test.war,
thread_dumps_in_sequence_over_time.zip, tomcat_logs.zip
There is a simple [recovery test in the test
suite|https://github.com/jbosstm/narayana/blob/master/tomcat/tomcat-jta/s...]
that runs 3 times in a row. If you change that number to 4, there is a good change you
will get the TS stuck forever. When I attached debugger and used breakpoints on the
iteration, e.g. {{i==3}} or {{i==10}}, the error was not shown, the TS run and completed,
so it is clearly a time related race condition.
*EDIT:* When I use 30 iterations and a conditional breakpoint on {{i ->
test(EXECUTOR_URL + RECOVERY_TEST)}} set to {{i == 28}} the execution loops forever even
with debugger attached.
In this JIRA's example, to reproduce, simply:
{noformat}
- IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
+ IntStream.range(0, 10).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
{noformat}
When the Tomcat gets into this unfortunate state and TS hangs and Tomcat runs forever,
this is what keeps showing in the database trace (and nothing else):
{noformat}
...
2018-04-24 12:37:50.406 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid
FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:37:54.410 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid
FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:37:58.415 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid
FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:02.419 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid
FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:06.421 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid
FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:10.423 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid
FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:14.426 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid
FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:18.429 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid
FROM pg_prepared_xacts where database = current_database()
...
{noformat}
I found this Atomikos related reference
[
online|https://fogbugz.atomikos.com/default13dd.html?community.6.3576.0].
Tomcat log keeps showing again and again:
{noformat}
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background
thread Status <== SCANNING
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background
thread scanning
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery
first pass at Tue, 24 Apr 2018 14:41:02
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkFirstPass
AtomicActionRecoveryModule first pass
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.processTransactions
processing /StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction transactions
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState
XARecoveryModule state change IDLE->FIRST_PASS
24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass
Local XARecoveryModule - first pass
24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass
xarecovery of org.postgresql.xa.PGXAConnection@2c8cb5bd
24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found
0 xids in doubt
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass
xarecovery of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found
0 xids in doubt
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState
XARecoveryModule state change FIRST_PASS->BETWEEN_PASSES
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery
second pass at Tue, 24 Apr 2018 14:41:04
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkSecondPass
AtomicActionRecoveryModule second pass
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState
XARecoveryModule state change BETWEEN_PASSES->SECOND_PASS
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass
Local XARecoveryModule - second pass
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass
Local XARecoveryModule.transactionInitiatedRecovery completed
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass
xarecovery second pass of org.postgresql.xa.PGXAConnection@2c8cb5bd
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have
0 Xids to recover on this pass.
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass
xarecovery second pass of
org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have
0 Xids to recover on this pass.
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass
Local XARecoveryModule.resourceInitiatedRecovery completed
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState
XARecoveryModule state change SECOND_PASS->IDLE
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background
thread Status <== INACTIVE
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background
thread backing off
{noformat}
h3. Logs and thread dumps
* [^postgresql_trace_log.zip] -- PostgreSQL trace, note the originally expected course of
recoveries and then the repeated stuck on the aforementioned message.
* [^test.war] -- test app used; the TS builds it for you though and stores it in
{{tomcat/tomcat-deployment/_DEFAULT__Basic-app_test.war}}
* [^thread_dumps_in_sequence_over_time.zip] -- thread dumps taken over time with minutes
in between snaps.
* [^tomcat_logs.zip] -- Tomcat logs
* [^good_threaddumps.zip] -- a dozen of snapshots couple of seconds from each other as
an example of a good TS run that terminates as expected, i.e. with {{IntStream.range(0,
3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));}}.
* This is what good, i.e. 3 iterations long DB trace looks like: [good db
trace|https://paste.fedoraproject.org/paste/drANRRl4DlckwLhSIwCOdA/raw].
* This is what good, i.e. 3 iterations long Tomcat catalina log looks like: [catalina
log|https://paste.fedoraproject.org/paste/CkaDCZCtoVhDtyTglEOFqg/raw]
Might interest also [~ochaloup]. WDYT guys? What should I try out?