[
https://issues.jboss.org/browse/JBTM-3015?page=com.atlassian.jira.plugin....
]
Michal Karm Babacek commented on JBTM-3015:
-------------------------------------------
[~tomjenkinson], I have it looping in debugger right now and it looks like [this call has
stopped
returning|https://github.com/apache/tomcat/blob/TOMCAT_9_0_7/java/org/apa...]
and it hangs. Pertinent threads with relevant sources:
{quote}
"http-nio-8080-exec-1@2786" daemon prio=5 tid=0x19 nid=NA waiting
java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Unsafe.java:-1)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at
org.apache.tomcat.dbcp.pool2.impl.LinkedBlockingDeque.takeFirst([LinkedBl...
at
org.apache.tomcat.dbcp.pool2.impl.GenericObjectPool.borrowObject([Generic...
at
org.apache.tomcat.dbcp.pool2.impl.GenericObjectPool.borrowObject([Generic...
at
org.apache.tomcat.dbcp.dbcp2.managed.ManagedConnection.updateTransactionS...
at
org.apache.tomcat.dbcp.dbcp2.managed.ManagedConnection.<init>([Mana...
at
org.apache.tomcat.dbcp.dbcp2.managed.ManagedDataSource.getConnection([Man...
at
org.jboss.narayana.tomcat.jta.integration.app.TestExecutor.doesEntryExist...
at
org.jboss.narayana.tomcat.jta.integration.app.TestExecutor.didRecoveryHap...
at
org.jboss.narayana.tomcat.jta.integration.app.TestExecutor.getRecoveryTes...
at
org.jboss.narayana.tomcat.jta.integration.app.TestExecutor.verifyRecovery...
<SNIP>
"Transaction Reaper Worker 0@3278" daemon prio=5 tid=0x3b nid=NA waiting
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Object.java:-1)
at java.lang.Object.wait(Object.java:502)
at
com.arjuna.ats.arjuna.coordinator.TransactionReaper.waitForCancellations(...
at
com.arjuna.ats.internal.arjuna.coordinator.ReaperWorkerThread.run([Reaper...
"Periodic Recovery@3254" daemon prio=5 tid=0x36 nid=NA waiting
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Object.java:-1)
at
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doBackoffWait([P...
at
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal([...
at
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run([PeriodicRec...
"Transaction Reaper@3277" daemon prio=5 tid=0x3a nid=NA waiting
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Object.java:-1)
at
com.arjuna.ats.internal.arjuna.coordinator.ReaperThread.run([ReaperThread...
"Transaction Expired Entry Monitor@3256" daemon prio=5 tid=0x37 nid=NA waiting
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Object.java:-1)
at
com.arjuna.ats.internal.arjuna.recovery.ExpiredEntryMonitor.run([ExpiredE...
{quote}
The maximum number of connections is set:
* [here for the DBCP
pool|https://github.com/jbosstm/narayana/blob/master/tomcat/tomcat-jta/sr...]
* and [here for the database
itself|https://github.com/jbosstm/narayana/blob/master/tools/src/main/jav...]
(a sufficiently bigger number I would say...)
I am definitely fine with editing the test case, but I would like to understand whether
the connection not being returned is really the case - i.e. if it is really an expected
behaviour given the byteman script and the nature of the test.
Is it so...?
Tomcat JTA: Race condition in tx recovery makes PeriodicRecovery run
forever
----------------------------------------------------------------------------
Key: JBTM-3015
URL:
https://issues.jboss.org/browse/JBTM-3015
Project: JBoss Transaction Manager
Issue Type: Bug
Components: Tomcat
Affects Versions: 5.next
Environment: Oracle JDK 8, Linux, 8 CPU system
Reporter: Michal Karm Babacek
Assignee: Amos Feng
Priority: Critical
Attachments: good_threaddumps.zip, postgresql_trace_log.zip, test.war,
thread_dumps_in_sequence_over_time.zip, tomcat_logs.zip
There is a simple [recovery test in the test
suite|https://github.com/jbosstm/narayana/blob/master/tomcat/tomcat-jta/s...]
that runs 3 times in a row. If you change that number to 4, there is a good change you
will get the TS stuck forever. When I attached debugger and used breakpoints on the
iteration, e.g. {{i==3}} or {{i==10}}, the error was not shown, the TS run and completed,
so it is clearly a time related race condition.
*EDIT:* When I use 30 iterations and a conditional breakpoint on {{i ->
test(EXECUTOR_URL + RECOVERY_TEST)}} set to {{i == 28}} the execution loops forever even
with debugger attached.
In this JIRA's example, to reproduce, simply:
{noformat}
- IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
+ IntStream.range(0, 10).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
{noformat}
When the Tomcat gets into this unfortunate state and TS hangs and Tomcat runs forever,
this is what keeps showing in the database trace (and nothing else):
{noformat}
...
2018-04-24 12:37:50.406 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid
FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:37:54.410 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid
FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:37:58.415 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid
FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:02.419 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid
FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:06.421 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid
FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:10.423 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid
FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:14.426 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid
FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:18.429 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid
FROM pg_prepared_xacts where database = current_database()
...
{noformat}
I found this Atomikos related reference
[
online|https://fogbugz.atomikos.com/default13dd.html?community.6.3576.0].
Tomcat log keeps showing again and again:
{noformat}
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background
thread Status <== SCANNING
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background
thread scanning
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery
first pass at Tue, 24 Apr 2018 14:41:02
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkFirstPass
AtomicActionRecoveryModule first pass
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.processTransactions
processing /StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction transactions
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState
XARecoveryModule state change IDLE->FIRST_PASS
24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass
Local XARecoveryModule - first pass
24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass
xarecovery of org.postgresql.xa.PGXAConnection@2c8cb5bd
24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found
0 xids in doubt
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass
xarecovery of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found
0 xids in doubt
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState
XARecoveryModule state change FIRST_PASS->BETWEEN_PASSES
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery
second pass at Tue, 24 Apr 2018 14:41:04
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkSecondPass
AtomicActionRecoveryModule second pass
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState
XARecoveryModule state change BETWEEN_PASSES->SECOND_PASS
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass
Local XARecoveryModule - second pass
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass
Local XARecoveryModule.transactionInitiatedRecovery completed
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass
xarecovery second pass of org.postgresql.xa.PGXAConnection@2c8cb5bd
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have
0 Xids to recover on this pass.
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass
xarecovery second pass of
org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have
0 Xids to recover on this pass.
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass
Local XARecoveryModule.resourceInitiatedRecovery completed
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState
XARecoveryModule state change SECOND_PASS->IDLE
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background
thread Status <== INACTIVE
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery]
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background
thread backing off
{noformat}
h3. Logs and thread dumps
* [^postgresql_trace_log.zip] -- PostgreSQL trace, note the originally expected course of
recoveries and then the repeated stuck on the aforementioned message.
* [^test.war] -- test app used; the TS builds it for you though and stores it in
{{tomcat/tomcat-deployment/_DEFAULT__Basic-app_test.war}}
* [^thread_dumps_in_sequence_over_time.zip] -- thread dumps taken over time with minutes
in between snaps.
* [^tomcat_logs.zip] -- Tomcat logs
* [^good_threaddumps.zip] -- a dozen of snapshots couple of seconds from each other as
an example of a good TS run that terminates as expected, i.e. with {{IntStream.range(0,
3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));}}.
* This is what good, i.e. 3 iterations long DB trace looks like: [good db
trace|https://paste.fedoraproject.org/paste/drANRRl4DlckwLhSIwCOdA/raw].
* This is what good, i.e. 3 iterations long Tomcat catalina log looks like: [catalina
log|https://paste.fedoraproject.org/paste/CkaDCZCtoVhDtyTglEOFqg/raw]
Might interest also [~ochaloup]. WDYT guys? What should I try out?
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)