[jbossts-issues] [JBoss JIRA] (JBTM-3015) Tomcat JTA: Race condition in tx recovery makes PeriodicRecovery run forever

Tom Jenkinson (JIRA) issues at jboss.org
Tue Apr 24 11:50:00 EDT 2018


    [ https://issues.jboss.org/browse/JBTM-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13566302#comment-13566302 ] 

Tom Jenkinson commented on JBTM-3015:
-------------------------------------

Hmm, ok. Well it should definitely get returned to the pool and I saw in the debugger that it was not being returned for this case. I had a breakpoint on borrowObject and one on return and that was called fine for most of the calls but this one would just borrow object. I imagine afterCompletion is where the return would normally be and so the byteman rule just shortcuts that out of the execution.

It seems fishy it hangs trying to borrow an object if the pool size was increased - perhaps that setting is somehow being negated by a different setting?


> Tomcat JTA: Race condition in tx recovery makes PeriodicRecovery run forever
> ----------------------------------------------------------------------------
>
>                 Key: JBTM-3015
>                 URL: https://issues.jboss.org/browse/JBTM-3015
>             Project: JBoss Transaction Manager
>          Issue Type: Bug
>          Components: Tomcat
>    Affects Versions: 5.next
>         Environment: Oracle JDK 8, Linux, 8 CPU system
>            Reporter: Michal Karm Babacek
>            Assignee: Amos Feng
>            Priority: Critical
>         Attachments: good_threaddumps.zip, postgresql_trace_log.zip, test.war, thread_dumps_in_sequence_over_time.zip, tomcat_logs.zip
>
>
> There is a simple [recovery test in the test suite|https://github.com/jbosstm/narayana/blob/master/tomcat/tomcat-jta/src/test/java/org/jboss/narayana/tomcat/jta/integration/BaseITCase.java#L141] that runs 3 times in a row. If you change that number to 4, there is a good change you will get the TS stuck forever. When I attached debugger and used breakpoints on the iteration, e.g. {{i==3}} or {{i==10}}, the error was not shown, the TS run and completed, so it is clearly a time related race condition.
> *EDIT:* When I use 30 iterations and a conditional breakpoint on {{i -> test(EXECUTOR_URL + RECOVERY_TEST)}} set to {{i == 28}} the execution loops forever even with debugger attached.
> In this JIRA's example, to reproduce, simply:
> {noformat}
> -        IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
> +        IntStream.range(0, 10).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
> {noformat}
> When the Tomcat gets into this unfortunate state and TS hangs and Tomcat runs forever, this is what keeps showing in the database trace (and nothing else):
> {noformat}
> ...
> 2018-04-24 12:37:50.406 UTC transaction_id: 0 LOG:  execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:37:54.410 UTC transaction_id: 0 LOG:  execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:37:58.415 UTC transaction_id: 0 LOG:  execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:02.419 UTC transaction_id: 0 LOG:  execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:06.421 UTC transaction_id: 0 LOG:  execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:10.423 UTC transaction_id: 0 LOG:  execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:14.426 UTC transaction_id: 0 LOG:  execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:18.429 UTC transaction_id: 0 LOG:  execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> ...
> {noformat}
> I found this Atomikos related reference [online|https://fogbugz.atomikos.com/default13dd.html?community.6.3576.0].
> Tomcat log keeps showing again and again:
> {noformat}
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== SCANNING
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread scanning
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery first pass at Tue, 24 Apr 2018 14:41:02
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkFirstPass AtomicActionRecoveryModule first pass
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.processTransactions processing /StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction transactions
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal  
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change IDLE->FIRST_PASS
> 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass Local XARecoveryModule - first pass
> 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.postgresql.xa.PGXAConnection at 2c8cb5bd
> 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource at 4094781e
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change FIRST_PASS->BETWEEN_PASSES
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal  
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery second pass at Tue, 24 Apr 2018 14:41:04
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkSecondPass AtomicActionRecoveryModule second pass
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal  
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change BETWEEN_PASSES->SECOND_PASS
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule - second pass
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.transactionInitiatedRecovery completed
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.postgresql.xa.PGXAConnection at 2c8cb5bd
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource at 4094781e
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.resourceInitiatedRecovery completed
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change SECOND_PASS->IDLE
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal  
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== INACTIVE
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread backing off
> {noformat} 
> h3. Logs and thread dumps
> * [^postgresql_trace_log.zip] -- PostgreSQL trace, note the originally expected course of recoveries and then the repeated stuck on the aforementioned message.
> * [^test.war] -- test app used; the TS builds it for you though and stores it in {{tomcat/tomcat-deployment/_DEFAULT__Basic-app_test.war}}
> * [^thread_dumps_in_sequence_over_time.zip] -- thread dumps taken over time with minutes in between snaps.
> * [^tomcat_logs.zip] -- Tomcat logs
> * [^good_threaddumps.zip] --  a dozen of snapshots couple of seconds from each other as an example of a good TS run that terminates as expected, i.e. with {{IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));}}.
> * This is what good, i.e. 3 iterations long DB trace looks like: [good db trace|https://paste.fedoraproject.org/paste/drANRRl4DlckwLhSIwCOdA/raw].
> * This is what good, i.e. 3 iterations long Tomcat catalina log looks like:  [catalina log|https://paste.fedoraproject.org/paste/CkaDCZCtoVhDtyTglEOFqg/raw]
> Might interest also [~ochaloup].   WDYT guys? What should I try out?



--
This message was sent by Atlassian JIRA
(v7.5.0#75005)


More information about the jbossts-issues mailing list