[JBoss JIRA] (JBTM-3015) Tomcat JTA: Race condition in tx recovery makes PeriodicRecovery run forever
by Tom Jenkinson (JIRA)
[ https://issues.jboss.org/browse/JBTM-3015?page=com.atlassian.jira.plugin.... ]
Tom Jenkinson commented on JBTM-3015:
-------------------------------------
Hmm, ok. Well it should definitely get returned to the pool and I saw in the debugger that it was not being returned for this case. I had a breakpoint on borrowObject and one on return and that was called fine for most of the calls but this one would just borrow object. I imagine afterCompletion is where the return would normally be and so the byteman rule just shortcuts that out of the execution.
It seems fishy it hangs trying to borrow an object if the pool size was increased - perhaps that setting is somehow being negated by a different setting?
> Tomcat JTA: Race condition in tx recovery makes PeriodicRecovery run forever
> ----------------------------------------------------------------------------
>
> Key: JBTM-3015
> URL: https://issues.jboss.org/browse/JBTM-3015
> Project: JBoss Transaction Manager
> Issue Type: Bug
> Components: Tomcat
> Affects Versions: 5.next
> Environment: Oracle JDK 8, Linux, 8 CPU system
> Reporter: Michal Karm Babacek
> Assignee: Amos Feng
> Priority: Critical
> Attachments: good_threaddumps.zip, postgresql_trace_log.zip, test.war, thread_dumps_in_sequence_over_time.zip, tomcat_logs.zip
>
>
> There is a simple [recovery test in the test suite|https://github.com/jbosstm/narayana/blob/master/tomcat/tomcat-jta/s...] that runs 3 times in a row. If you change that number to 4, there is a good change you will get the TS stuck forever. When I attached debugger and used breakpoints on the iteration, e.g. {{i==3}} or {{i==10}}, the error was not shown, the TS run and completed, so it is clearly a time related race condition.
> *EDIT:* When I use 30 iterations and a conditional breakpoint on {{i -> test(EXECUTOR_URL + RECOVERY_TEST)}} set to {{i == 28}} the execution loops forever even with debugger attached.
> In this JIRA's example, to reproduce, simply:
> {noformat}
> - IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
> + IntStream.range(0, 10).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
> {noformat}
> When the Tomcat gets into this unfortunate state and TS hangs and Tomcat runs forever, this is what keeps showing in the database trace (and nothing else):
> {noformat}
> ...
> 2018-04-24 12:37:50.406 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:37:54.410 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:37:58.415 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:02.419 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:06.421 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:10.423 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:14.426 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:18.429 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> ...
> {noformat}
> I found this Atomikos related reference [online|https://fogbugz.atomikos.com/default13dd.html?community.6.3576.0].
> Tomcat log keeps showing again and again:
> {noformat}
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== SCANNING
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread scanning
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery first pass at Tue, 24 Apr 2018 14:41:02
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkFirstPass AtomicActionRecoveryModule first pass
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.processTransactions processing /StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction transactions
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change IDLE->FIRST_PASS
> 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass Local XARecoveryModule - first pass
> 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.postgresql.xa.PGXAConnection@2c8cb5bd
> 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change FIRST_PASS->BETWEEN_PASSES
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery second pass at Tue, 24 Apr 2018 14:41:04
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkSecondPass AtomicActionRecoveryModule second pass
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change BETWEEN_PASSES->SECOND_PASS
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule - second pass
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.transactionInitiatedRecovery completed
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.postgresql.xa.PGXAConnection@2c8cb5bd
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.resourceInitiatedRecovery completed
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change SECOND_PASS->IDLE
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== INACTIVE
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread backing off
> {noformat}
> h3. Logs and thread dumps
> * [^postgresql_trace_log.zip] -- PostgreSQL trace, note the originally expected course of recoveries and then the repeated stuck on the aforementioned message.
> * [^test.war] -- test app used; the TS builds it for you though and stores it in {{tomcat/tomcat-deployment/_DEFAULT__Basic-app_test.war}}
> * [^thread_dumps_in_sequence_over_time.zip] -- thread dumps taken over time with minutes in between snaps.
> * [^tomcat_logs.zip] -- Tomcat logs
> * [^good_threaddumps.zip] -- a dozen of snapshots couple of seconds from each other as an example of a good TS run that terminates as expected, i.e. with {{IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));}}.
> * This is what good, i.e. 3 iterations long DB trace looks like: [good db trace|https://paste.fedoraproject.org/paste/drANRRl4DlckwLhSIwCOdA/raw].
> * This is what good, i.e. 3 iterations long Tomcat catalina log looks like: [catalina log|https://paste.fedoraproject.org/paste/CkaDCZCtoVhDtyTglEOFqg/raw]
> Might interest also [~ochaloup]. WDYT guys? What should I try out?
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 7 months
[JBoss JIRA] (JBTM-3015) Tomcat JTA: Race condition in tx recovery makes PeriodicRecovery run forever
by Michal Karm Babacek (JIRA)
[ https://issues.jboss.org/browse/JBTM-3015?page=com.atlassian.jira.plugin.... ]
Michal Karm Babacek commented on JBTM-3015:
-------------------------------------------
[~tomjenkinson], I changed the sizes, but it still gets stuck:
{noformat}
--- a/tomcat/tomcat-jta/src/test/java/org/jboss/narayana/tomcat/jta/integration/AbstractCase.java
+++ b/tomcat/tomcat-jta/src/test/java/org/jboss/narayana/tomcat/jta/integration/AbstractCase.java
@@ -168,9 +168,9 @@ public abstract class AbstractCase {
tsDbSource.setAttribute("factory", "org.jboss.narayana.tomcat.jta.TransactionalDataSourceFactory");
tsDbSource.setAttribute("initialSize", "10");
tsDbSource.setAttribute("maxWaitMillis", "10000");
- tsDbSource.setAttribute("maxTotal", "4");
- tsDbSource.setAttribute("maxIdle", "16");
- tsDbSource.setAttribute("minIdle", "8");
+ tsDbSource.setAttribute("maxTotal", "20");
+ tsDbSource.setAttribute("maxIdle", "20");
+ tsDbSource.setAttribute("minIdle", "10");
tsDbSource.setAttribute("maxAge", "30000");
tsDbSource.setAttribute("testOnBorrow", "true");
tsDbSource.setAttribute("validationQuery", "select 1");
--- a/tomcat/tomcat-jta/src/test/java/org/jboss/narayana/tomcat/jta/integration/BaseITCase.java
+++ b/tomcat/tomcat-jta/src/test/java/org/jboss/narayana/tomcat/jta/integration/BaseITCase.java
@@ -138,7 +138,7 @@ public class BaseITCase extends AbstractCase {
@Test
public void testRecovery() {
- IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
+ IntStream.range(0, 10).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
}
private void test(final String url) {
--- a/tools/src/main/java/io/narayana/db/PostgreContainerAllocator.java
+++ b/tools/src/main/java/io/narayana/db/PostgreContainerAllocator.java
@@ -163,7 +163,7 @@ public class PostgreContainerAllocator extends Allocator {
"-c", "log_line_prefix=%m transaction_id: %x ",
"-c", "log_statement=all",
"-c", "logging_collector=on",
- "-c", "max_connections=20",
+ "-c", "max_connections=40",
"-c", "max_locks_per_transaction=64",
"-c", "max_pred_locks_per_transaction=64",
"-c", "max_prepared_transactions=50"
{noformat}
> Tomcat JTA: Race condition in tx recovery makes PeriodicRecovery run forever
> ----------------------------------------------------------------------------
>
> Key: JBTM-3015
> URL: https://issues.jboss.org/browse/JBTM-3015
> Project: JBoss Transaction Manager
> Issue Type: Bug
> Components: Tomcat
> Affects Versions: 5.next
> Environment: Oracle JDK 8, Linux, 8 CPU system
> Reporter: Michal Karm Babacek
> Assignee: Amos Feng
> Priority: Critical
> Attachments: good_threaddumps.zip, postgresql_trace_log.zip, test.war, thread_dumps_in_sequence_over_time.zip, tomcat_logs.zip
>
>
> There is a simple [recovery test in the test suite|https://github.com/jbosstm/narayana/blob/master/tomcat/tomcat-jta/s...] that runs 3 times in a row. If you change that number to 4, there is a good change you will get the TS stuck forever. When I attached debugger and used breakpoints on the iteration, e.g. {{i==3}} or {{i==10}}, the error was not shown, the TS run and completed, so it is clearly a time related race condition.
> *EDIT:* When I use 30 iterations and a conditional breakpoint on {{i -> test(EXECUTOR_URL + RECOVERY_TEST)}} set to {{i == 28}} the execution loops forever even with debugger attached.
> In this JIRA's example, to reproduce, simply:
> {noformat}
> - IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
> + IntStream.range(0, 10).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
> {noformat}
> When the Tomcat gets into this unfortunate state and TS hangs and Tomcat runs forever, this is what keeps showing in the database trace (and nothing else):
> {noformat}
> ...
> 2018-04-24 12:37:50.406 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:37:54.410 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:37:58.415 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:02.419 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:06.421 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:10.423 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:14.426 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:18.429 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> ...
> {noformat}
> I found this Atomikos related reference [online|https://fogbugz.atomikos.com/default13dd.html?community.6.3576.0].
> Tomcat log keeps showing again and again:
> {noformat}
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== SCANNING
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread scanning
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery first pass at Tue, 24 Apr 2018 14:41:02
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkFirstPass AtomicActionRecoveryModule first pass
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.processTransactions processing /StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction transactions
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change IDLE->FIRST_PASS
> 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass Local XARecoveryModule - first pass
> 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.postgresql.xa.PGXAConnection@2c8cb5bd
> 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change FIRST_PASS->BETWEEN_PASSES
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery second pass at Tue, 24 Apr 2018 14:41:04
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkSecondPass AtomicActionRecoveryModule second pass
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change BETWEEN_PASSES->SECOND_PASS
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule - second pass
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.transactionInitiatedRecovery completed
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.postgresql.xa.PGXAConnection@2c8cb5bd
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.resourceInitiatedRecovery completed
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change SECOND_PASS->IDLE
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== INACTIVE
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread backing off
> {noformat}
> h3. Logs and thread dumps
> * [^postgresql_trace_log.zip] -- PostgreSQL trace, note the originally expected course of recoveries and then the repeated stuck on the aforementioned message.
> * [^test.war] -- test app used; the TS builds it for you though and stores it in {{tomcat/tomcat-deployment/_DEFAULT__Basic-app_test.war}}
> * [^thread_dumps_in_sequence_over_time.zip] -- thread dumps taken over time with minutes in between snaps.
> * [^tomcat_logs.zip] -- Tomcat logs
> * [^good_threaddumps.zip] -- a dozen of snapshots couple of seconds from each other as an example of a good TS run that terminates as expected, i.e. with {{IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));}}.
> * This is what good, i.e. 3 iterations long DB trace looks like: [good db trace|https://paste.fedoraproject.org/paste/drANRRl4DlckwLhSIwCOdA/raw].
> * This is what good, i.e. 3 iterations long Tomcat catalina log looks like: [catalina log|https://paste.fedoraproject.org/paste/CkaDCZCtoVhDtyTglEOFqg/raw]
> Might interest also [~ochaloup]. WDYT guys? What should I try out?
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 7 months
[JBoss JIRA] (JBTM-3015) Tomcat JTA: Race condition in tx recovery makes PeriodicRecovery run forever
by Michal Karm Babacek (JIRA)
[ https://issues.jboss.org/browse/JBTM-3015?page=com.atlassian.jira.plugin.... ]
Michal Karm Babacek commented on JBTM-3015:
-------------------------------------------
[~tomjenkinson], I have it looping in debugger right now and it looks like [this call has stopped returning|https://github.com/apache/tomcat/blob/TOMCAT_9_0_7/java/org/apa...] and it hangs. Pertinent threads with relevant sources:
{quote}
"http-nio-8080-exec-1@2786" daemon prio=5 tid=0x19 nid=NA waiting
java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Unsafe.java:-1)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at org.apache.tomcat.dbcp.pool2.impl.LinkedBlockingDeque.takeFirst([LinkedBl...
at org.apache.tomcat.dbcp.pool2.impl.GenericObjectPool.borrowObject([Generic...
at org.apache.tomcat.dbcp.pool2.impl.GenericObjectPool.borrowObject([Generic...
at org.apache.tomcat.dbcp.dbcp2.managed.ManagedConnection.updateTransactionS...
at org.apache.tomcat.dbcp.dbcp2.managed.ManagedConnection.<init>([ManagedConnection.java:59|https://github.com/apache/tomcat/blob/TOMCAT_9_...])
at org.apache.tomcat.dbcp.dbcp2.managed.ManagedDataSource.getConnection([Man...
at org.jboss.narayana.tomcat.jta.integration.app.TestExecutor.doesEntryExist...
at org.jboss.narayana.tomcat.jta.integration.app.TestExecutor.didRecoveryHap...
at org.jboss.narayana.tomcat.jta.integration.app.TestExecutor.getRecoveryTes...
at org.jboss.narayana.tomcat.jta.integration.app.TestExecutor.verifyRecovery...
<SNIP>
"Transaction Reaper Worker 0@3278" daemon prio=5 tid=0x3b nid=NA waiting
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Object.java:-1)
at java.lang.Object.wait(Object.java:502)
at com.arjuna.ats.arjuna.coordinator.TransactionReaper.waitForCancellations(...
at com.arjuna.ats.internal.arjuna.coordinator.ReaperWorkerThread.run([Reaper...
"Periodic Recovery@3254" daemon prio=5 tid=0x36 nid=NA waiting
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Object.java:-1)
at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doBackoffWait([P...
at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal([...
at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run([PeriodicRec...
"Transaction Reaper@3277" daemon prio=5 tid=0x3a nid=NA waiting
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Object.java:-1)
at com.arjuna.ats.internal.arjuna.coordinator.ReaperThread.run([ReaperThread...
"Transaction Expired Entry Monitor@3256" daemon prio=5 tid=0x37 nid=NA waiting
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Object.java:-1)
at com.arjuna.ats.internal.arjuna.recovery.ExpiredEntryMonitor.run([ExpiredE...
{quote}
The maximum number of connections is set:
* [here for the DBCP pool|https://github.com/jbosstm/narayana/blob/master/tomcat/tomcat-jta/sr...]
* and [here for the database itself|https://github.com/jbosstm/narayana/blob/master/tools/src/main/jav...] (a sufficiently bigger number I would say...)
I am definitely fine with editing the test case, but I would like to understand whether the connection not being returned is really the case - i.e. if it is really an expected behaviour given the byteman script and the nature of the test.
Is it so...?
> Tomcat JTA: Race condition in tx recovery makes PeriodicRecovery run forever
> ----------------------------------------------------------------------------
>
> Key: JBTM-3015
> URL: https://issues.jboss.org/browse/JBTM-3015
> Project: JBoss Transaction Manager
> Issue Type: Bug
> Components: Tomcat
> Affects Versions: 5.next
> Environment: Oracle JDK 8, Linux, 8 CPU system
> Reporter: Michal Karm Babacek
> Assignee: Amos Feng
> Priority: Critical
> Attachments: good_threaddumps.zip, postgresql_trace_log.zip, test.war, thread_dumps_in_sequence_over_time.zip, tomcat_logs.zip
>
>
> There is a simple [recovery test in the test suite|https://github.com/jbosstm/narayana/blob/master/tomcat/tomcat-jta/s...] that runs 3 times in a row. If you change that number to 4, there is a good change you will get the TS stuck forever. When I attached debugger and used breakpoints on the iteration, e.g. {{i==3}} or {{i==10}}, the error was not shown, the TS run and completed, so it is clearly a time related race condition.
> *EDIT:* When I use 30 iterations and a conditional breakpoint on {{i -> test(EXECUTOR_URL + RECOVERY_TEST)}} set to {{i == 28}} the execution loops forever even with debugger attached.
> In this JIRA's example, to reproduce, simply:
> {noformat}
> - IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
> + IntStream.range(0, 10).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
> {noformat}
> When the Tomcat gets into this unfortunate state and TS hangs and Tomcat runs forever, this is what keeps showing in the database trace (and nothing else):
> {noformat}
> ...
> 2018-04-24 12:37:50.406 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:37:54.410 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:37:58.415 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:02.419 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:06.421 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:10.423 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:14.426 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:18.429 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> ...
> {noformat}
> I found this Atomikos related reference [online|https://fogbugz.atomikos.com/default13dd.html?community.6.3576.0].
> Tomcat log keeps showing again and again:
> {noformat}
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== SCANNING
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread scanning
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery first pass at Tue, 24 Apr 2018 14:41:02
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkFirstPass AtomicActionRecoveryModule first pass
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.processTransactions processing /StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction transactions
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change IDLE->FIRST_PASS
> 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass Local XARecoveryModule - first pass
> 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.postgresql.xa.PGXAConnection@2c8cb5bd
> 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change FIRST_PASS->BETWEEN_PASSES
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery second pass at Tue, 24 Apr 2018 14:41:04
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkSecondPass AtomicActionRecoveryModule second pass
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change BETWEEN_PASSES->SECOND_PASS
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule - second pass
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.transactionInitiatedRecovery completed
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.postgresql.xa.PGXAConnection@2c8cb5bd
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.resourceInitiatedRecovery completed
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change SECOND_PASS->IDLE
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== INACTIVE
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread backing off
> {noformat}
> h3. Logs and thread dumps
> * [^postgresql_trace_log.zip] -- PostgreSQL trace, note the originally expected course of recoveries and then the repeated stuck on the aforementioned message.
> * [^test.war] -- test app used; the TS builds it for you though and stores it in {{tomcat/tomcat-deployment/_DEFAULT__Basic-app_test.war}}
> * [^thread_dumps_in_sequence_over_time.zip] -- thread dumps taken over time with minutes in between snaps.
> * [^tomcat_logs.zip] -- Tomcat logs
> * [^good_threaddumps.zip] -- a dozen of snapshots couple of seconds from each other as an example of a good TS run that terminates as expected, i.e. with {{IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));}}.
> * This is what good, i.e. 3 iterations long DB trace looks like: [good db trace|https://paste.fedoraproject.org/paste/drANRRl4DlckwLhSIwCOdA/raw].
> * This is what good, i.e. 3 iterations long Tomcat catalina log looks like: [catalina log|https://paste.fedoraproject.org/paste/CkaDCZCtoVhDtyTglEOFqg/raw]
> Might interest also [~ochaloup]. WDYT guys? What should I try out?
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 7 months
[JBoss JIRA] (JBTM-3015) Tomcat JTA: Race condition in tx recovery makes PeriodicRecovery run forever
by Tom Jenkinson (JIRA)
[ https://issues.jboss.org/browse/JBTM-3015?page=com.atlassian.jira.plugin.... ]
Tom Jenkinson commented on JBTM-3015:
-------------------------------------
It's not the solution but I think the reason it is hanging is probably because the writeToTheDatabase step (https://github.com/jbosstm/narayana/blob/master/tomcat/tomcat-jta/src/tes...) does not return it's connection to the pool, presumably because DBCP keeps it open until the transaction completes and the byteman rule likely prevents the return to the pool is being missed as an exception is thrown before it is reached:
https://github.com/jbosstm/narayana/blob/master/tomcat/tomcat-jta/src/tes...
Assuming that is the case, I think the best solution would be to add a note that you can't just increase the number of runs of the recovery test loop. I am not sure where we configure the maximum number of connections for the pool but we should check if increasing that verifies what I am expecting?
> Tomcat JTA: Race condition in tx recovery makes PeriodicRecovery run forever
> ----------------------------------------------------------------------------
>
> Key: JBTM-3015
> URL: https://issues.jboss.org/browse/JBTM-3015
> Project: JBoss Transaction Manager
> Issue Type: Bug
> Components: Tomcat
> Affects Versions: 5.next
> Environment: Oracle JDK 8, Linux, 8 CPU system
> Reporter: Michal Karm Babacek
> Assignee: Amos Feng
> Priority: Critical
> Attachments: good_threaddumps.zip, postgresql_trace_log.zip, test.war, thread_dumps_in_sequence_over_time.zip, tomcat_logs.zip
>
>
> There is a simple [recovery test in the test suite|https://github.com/jbosstm/narayana/blob/master/tomcat/tomcat-jta/s...] that runs 3 times in a row. If you change that number to 4, there is a good change you will get the TS stuck forever. When I attached debugger and used breakpoints on the iteration, e.g. {{i==3}} or {{i==10}}, the error was not shown, the TS run and completed, so it is clearly a time related race condition.
> *EDIT:* When I use 30 iterations and a conditional breakpoint on {{i -> test(EXECUTOR_URL + RECOVERY_TEST)}} set to {{i == 28}} the execution loops forever even with debugger attached.
> In this JIRA's example, to reproduce, simply:
> {noformat}
> - IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
> + IntStream.range(0, 10).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
> {noformat}
> When the Tomcat gets into this unfortunate state and TS hangs and Tomcat runs forever, this is what keeps showing in the database trace (and nothing else):
> {noformat}
> ...
> 2018-04-24 12:37:50.406 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:37:54.410 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:37:58.415 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:02.419 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:06.421 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:10.423 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:14.426 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:18.429 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> ...
> {noformat}
> I found this Atomikos related reference [online|https://fogbugz.atomikos.com/default13dd.html?community.6.3576.0].
> Tomcat log keeps showing again and again:
> {noformat}
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== SCANNING
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread scanning
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery first pass at Tue, 24 Apr 2018 14:41:02
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkFirstPass AtomicActionRecoveryModule first pass
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.processTransactions processing /StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction transactions
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change IDLE->FIRST_PASS
> 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass Local XARecoveryModule - first pass
> 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.postgresql.xa.PGXAConnection@2c8cb5bd
> 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change FIRST_PASS->BETWEEN_PASSES
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery second pass at Tue, 24 Apr 2018 14:41:04
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkSecondPass AtomicActionRecoveryModule second pass
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change BETWEEN_PASSES->SECOND_PASS
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule - second pass
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.transactionInitiatedRecovery completed
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.postgresql.xa.PGXAConnection@2c8cb5bd
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.resourceInitiatedRecovery completed
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change SECOND_PASS->IDLE
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== INACTIVE
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread backing off
> {noformat}
> h3. Logs and thread dumps
> * [^postgresql_trace_log.zip] -- PostgreSQL trace, note the originally expected course of recoveries and then the repeated stuck on the aforementioned message.
> * [^test.war] -- test app used; the TS builds it for you though and stores it in {{tomcat/tomcat-deployment/_DEFAULT__Basic-app_test.war}}
> * [^thread_dumps_in_sequence_over_time.zip] -- thread dumps taken over time with minutes in between snaps.
> * [^tomcat_logs.zip] -- Tomcat logs
> * [^good_threaddumps.zip] -- a dozen of snapshots couple of seconds from each other as an example of a good TS run that terminates as expected, i.e. with {{IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));}}.
> * This is what good, i.e. 3 iterations long DB trace looks like: [good db trace|https://paste.fedoraproject.org/paste/drANRRl4DlckwLhSIwCOdA/raw].
> * This is what good, i.e. 3 iterations long Tomcat catalina log looks like: [catalina log|https://paste.fedoraproject.org/paste/CkaDCZCtoVhDtyTglEOFqg/raw]
> Might interest also [~ochaloup]. WDYT guys? What should I try out?
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 7 months
[JBoss JIRA] (JBTM-3015) Tomcat JTA: Race condition in tx recovery makes PeriodicRecovery run forever
by Michal Karm Babacek (JIRA)
[ https://issues.jboss.org/browse/JBTM-3015?page=com.atlassian.jira.plugin.... ]
Michal Karm Babacek updated JBTM-3015:
--------------------------------------
Description:
There is a simple [recovery test in the test suite|https://github.com/jbosstm/narayana/blob/master/tomcat/tomcat-jta/s...] that runs 3 times in a row. If you change that number to 4, there is a good change you will get the TS stuck forever. When I attached debugger and used breakpoints on the iteration, e.g. {{i==3}} or {{i==10}}, the error was not shown, the TS run and completed, so it is clearly a time related race condition.
*EDIT:* When I use 30 iterations and a conditional breakpoint on {{i -> test(EXECUTOR_URL + RECOVERY_TEST)}} set to {{i == 28}} the execution loops forever even with debugger attached.
In this JIRA's example, to reproduce, simply:
{noformat}
- IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
+ IntStream.range(0, 10).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
{noformat}
When the Tomcat gets into this unfortunate state and TS hangs and Tomcat runs forever, this is what keeps showing in the database trace (and nothing else):
{noformat}
...
2018-04-24 12:37:50.406 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:37:54.410 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:37:58.415 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:02.419 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:06.421 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:10.423 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:14.426 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:18.429 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
...
{noformat}
I found this Atomikos related reference [online|https://fogbugz.atomikos.com/default13dd.html?community.6.3576.0].
Tomcat log keeps showing again and again:
{noformat}
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== SCANNING
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread scanning
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery first pass at Tue, 24 Apr 2018 14:41:02
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkFirstPass AtomicActionRecoveryModule first pass
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.processTransactions processing /StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction transactions
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change IDLE->FIRST_PASS
24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass Local XARecoveryModule - first pass
24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.postgresql.xa.PGXAConnection@2c8cb5bd
24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change FIRST_PASS->BETWEEN_PASSES
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery second pass at Tue, 24 Apr 2018 14:41:04
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkSecondPass AtomicActionRecoveryModule second pass
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change BETWEEN_PASSES->SECOND_PASS
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule - second pass
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.transactionInitiatedRecovery completed
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.postgresql.xa.PGXAConnection@2c8cb5bd
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.resourceInitiatedRecovery completed
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change SECOND_PASS->IDLE
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== INACTIVE
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread backing off
{noformat}
h3. Logs and thread dumps
* [^postgresql_trace_log.zip] -- PostgreSQL trace, note the originally expected course of recoveries and then the repeated stuck on the aforementioned message.
* [^test.war] -- test app used; the TS builds it for you though and stores it in {{tomcat/tomcat-deployment/_DEFAULT__Basic-app_test.war}}
* [^thread_dumps_in_sequence_over_time.zip] -- thread dumps taken over time with minutes in between snaps.
* [^tomcat_logs.zip] -- Tomcat logs
* [^good_threaddumps.zip] -- a dozen of snapshots couple of seconds from each other as an example of a good TS run that terminates as expected, i.e. with {{IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));}}.
* This is what good, i.e. 3 iterations long DB trace looks like: [good db trace|https://paste.fedoraproject.org/paste/drANRRl4DlckwLhSIwCOdA/raw].
* This is what good, i.e. 3 iterations long Tomcat catalina log looks like: [catalina log|https://paste.fedoraproject.org/paste/CkaDCZCtoVhDtyTglEOFqg/raw]
Might interest also [~ochaloup]. WDYT guys? What should I try out?
was:
There is a simple [recovery test in the test suite|https://github.com/jbosstm/narayana/blob/master/tomcat/tomcat-jta/s...] that runs 3 times in a row. If you change that number to 4, there is a good change you will get the TS stuck forever. When I attached debugger and used breakpoints on the iteration, e.g. {{i==3}} or {{i==10}}, the error was not shown, the TS run and completed, so it is clearly a time related race condition.
In this JIRA's example, to reproduce, simply:
{noformat}
- IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
+ IntStream.range(0, 10).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
{noformat}
When the Tomcat gets into this unfortunate state and TS hangs and Tomcat runs forever, this is what keeps showing in the database trace (and nothing else):
{noformat}
...
2018-04-24 12:37:50.406 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:37:54.410 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:37:58.415 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:02.419 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:06.421 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:10.423 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:14.426 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:18.429 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
...
{noformat}
I found this Atomikos related reference [online|https://fogbugz.atomikos.com/default13dd.html?community.6.3576.0].
Tomcat log keeps showing again and again:
{noformat}
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== SCANNING
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread scanning
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery first pass at Tue, 24 Apr 2018 14:41:02
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkFirstPass AtomicActionRecoveryModule first pass
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.processTransactions processing /StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction transactions
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change IDLE->FIRST_PASS
24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass Local XARecoveryModule - first pass
24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.postgresql.xa.PGXAConnection@2c8cb5bd
24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change FIRST_PASS->BETWEEN_PASSES
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery second pass at Tue, 24 Apr 2018 14:41:04
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkSecondPass AtomicActionRecoveryModule second pass
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change BETWEEN_PASSES->SECOND_PASS
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule - second pass
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.transactionInitiatedRecovery completed
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.postgresql.xa.PGXAConnection@2c8cb5bd
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.resourceInitiatedRecovery completed
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change SECOND_PASS->IDLE
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== INACTIVE
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread backing off
{noformat}
h3. Logs and thread dumps
* [^postgresql_trace_log.zip] -- PostgreSQL trace, note the originally expected course of recoveries and then the repeated stuck on the aforementioned message.
* [^test.war] -- test app used; the TS builds it for you though and stores it in {{tomcat/tomcat-deployment/_DEFAULT__Basic-app_test.war}}
* [^thread_dumps_in_sequence_over_time.zip] -- thread dumps taken over time with minutes in between snaps.
* [^tomcat_logs.zip] -- Tomcat logs
* [^good_threaddumps.zip] -- a dozen of snapshots couple of seconds from each other as an example of a good TS run that terminates as expected, i.e. with {{IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));}}.
* This is what good, i.e. 3 iterations long DB trace looks like: [good db trace|https://paste.fedoraproject.org/paste/drANRRl4DlckwLhSIwCOdA/raw].
* This is what good, i.e. 3 iterations long Tomcat catalina log looks like: [catalina log|https://paste.fedoraproject.org/paste/CkaDCZCtoVhDtyTglEOFqg/raw]
Might interest also [~ochaloup]. WDYT guys? What should I try out?
> Tomcat JTA: Race condition in tx recovery makes PeriodicRecovery run forever
> ----------------------------------------------------------------------------
>
> Key: JBTM-3015
> URL: https://issues.jboss.org/browse/JBTM-3015
> Project: JBoss Transaction Manager
> Issue Type: Bug
> Components: Tomcat
> Affects Versions: 5.next
> Environment: Oracle JDK 8, Linux, 8 CPU system
> Reporter: Michal Karm Babacek
> Assignee: Amos Feng
> Priority: Critical
> Attachments: good_threaddumps.zip, postgresql_trace_log.zip, test.war, thread_dumps_in_sequence_over_time.zip, tomcat_logs.zip
>
>
> There is a simple [recovery test in the test suite|https://github.com/jbosstm/narayana/blob/master/tomcat/tomcat-jta/s...] that runs 3 times in a row. If you change that number to 4, there is a good change you will get the TS stuck forever. When I attached debugger and used breakpoints on the iteration, e.g. {{i==3}} or {{i==10}}, the error was not shown, the TS run and completed, so it is clearly a time related race condition.
> *EDIT:* When I use 30 iterations and a conditional breakpoint on {{i -> test(EXECUTOR_URL + RECOVERY_TEST)}} set to {{i == 28}} the execution loops forever even with debugger attached.
> In this JIRA's example, to reproduce, simply:
> {noformat}
> - IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
> + IntStream.range(0, 10).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
> {noformat}
> When the Tomcat gets into this unfortunate state and TS hangs and Tomcat runs forever, this is what keeps showing in the database trace (and nothing else):
> {noformat}
> ...
> 2018-04-24 12:37:50.406 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:37:54.410 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:37:58.415 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:02.419 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:06.421 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:10.423 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:14.426 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:18.429 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> ...
> {noformat}
> I found this Atomikos related reference [online|https://fogbugz.atomikos.com/default13dd.html?community.6.3576.0].
> Tomcat log keeps showing again and again:
> {noformat}
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== SCANNING
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread scanning
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery first pass at Tue, 24 Apr 2018 14:41:02
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkFirstPass AtomicActionRecoveryModule first pass
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.processTransactions processing /StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction transactions
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change IDLE->FIRST_PASS
> 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass Local XARecoveryModule - first pass
> 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.postgresql.xa.PGXAConnection@2c8cb5bd
> 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change FIRST_PASS->BETWEEN_PASSES
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery second pass at Tue, 24 Apr 2018 14:41:04
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkSecondPass AtomicActionRecoveryModule second pass
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change BETWEEN_PASSES->SECOND_PASS
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule - second pass
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.transactionInitiatedRecovery completed
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.postgresql.xa.PGXAConnection@2c8cb5bd
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.resourceInitiatedRecovery completed
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change SECOND_PASS->IDLE
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== INACTIVE
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread backing off
> {noformat}
> h3. Logs and thread dumps
> * [^postgresql_trace_log.zip] -- PostgreSQL trace, note the originally expected course of recoveries and then the repeated stuck on the aforementioned message.
> * [^test.war] -- test app used; the TS builds it for you though and stores it in {{tomcat/tomcat-deployment/_DEFAULT__Basic-app_test.war}}
> * [^thread_dumps_in_sequence_over_time.zip] -- thread dumps taken over time with minutes in between snaps.
> * [^tomcat_logs.zip] -- Tomcat logs
> * [^good_threaddumps.zip] -- a dozen of snapshots couple of seconds from each other as an example of a good TS run that terminates as expected, i.e. with {{IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));}}.
> * This is what good, i.e. 3 iterations long DB trace looks like: [good db trace|https://paste.fedoraproject.org/paste/drANRRl4DlckwLhSIwCOdA/raw].
> * This is what good, i.e. 3 iterations long Tomcat catalina log looks like: [catalina log|https://paste.fedoraproject.org/paste/CkaDCZCtoVhDtyTglEOFqg/raw]
> Might interest also [~ochaloup]. WDYT guys? What should I try out?
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 7 months
[JBoss JIRA] (JBTM-3015) Tomcat JTA: Race condition in tx recovery makes PeriodicRecovery run forever
by Michal Karm Babacek (JIRA)
[ https://issues.jboss.org/browse/JBTM-3015?page=com.atlassian.jira.plugin.... ]
Michal Karm Babacek updated JBTM-3015:
--------------------------------------
Workaround Description: (was: Step one iteration at a time in a debugger and the issue does not appear :D)
> Tomcat JTA: Race condition in tx recovery makes PeriodicRecovery run forever
> ----------------------------------------------------------------------------
>
> Key: JBTM-3015
> URL: https://issues.jboss.org/browse/JBTM-3015
> Project: JBoss Transaction Manager
> Issue Type: Bug
> Components: Tomcat
> Affects Versions: 5.next
> Environment: Oracle JDK 8, Linux, 8 CPU system
> Reporter: Michal Karm Babacek
> Assignee: Amos Feng
> Priority: Critical
> Attachments: good_threaddumps.zip, postgresql_trace_log.zip, test.war, thread_dumps_in_sequence_over_time.zip, tomcat_logs.zip
>
>
> There is a simple [recovery test in the test suite|https://github.com/jbosstm/narayana/blob/master/tomcat/tomcat-jta/s...] that runs 3 times in a row. If you change that number to 4, there is a good change you will get the TS stuck forever. When I attached debugger and used breakpoints on the iteration, e.g. {{i==3}} or {{i==10}}, the error was not shown, the TS run and completed, so it is clearly a time related race condition.
> In this JIRA's example, to reproduce, simply:
> {noformat}
> - IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
> + IntStream.range(0, 10).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
> {noformat}
> When the Tomcat gets into this unfortunate state and TS hangs and Tomcat runs forever, this is what keeps showing in the database trace (and nothing else):
> {noformat}
> ...
> 2018-04-24 12:37:50.406 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:37:54.410 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:37:58.415 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:02.419 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:06.421 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:10.423 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:14.426 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:18.429 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> ...
> {noformat}
> I found this Atomikos related reference [online|https://fogbugz.atomikos.com/default13dd.html?community.6.3576.0].
> Tomcat log keeps showing again and again:
> {noformat}
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== SCANNING
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread scanning
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery first pass at Tue, 24 Apr 2018 14:41:02
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkFirstPass AtomicActionRecoveryModule first pass
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.processTransactions processing /StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction transactions
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change IDLE->FIRST_PASS
> 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass Local XARecoveryModule - first pass
> 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.postgresql.xa.PGXAConnection@2c8cb5bd
> 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change FIRST_PASS->BETWEEN_PASSES
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery second pass at Tue, 24 Apr 2018 14:41:04
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkSecondPass AtomicActionRecoveryModule second pass
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change BETWEEN_PASSES->SECOND_PASS
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule - second pass
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.transactionInitiatedRecovery completed
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.postgresql.xa.PGXAConnection@2c8cb5bd
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.resourceInitiatedRecovery completed
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change SECOND_PASS->IDLE
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== INACTIVE
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread backing off
> {noformat}
> h3. Logs and thread dumps
> * [^postgresql_trace_log.zip] -- PostgreSQL trace, note the originally expected course of recoveries and then the repeated stuck on the aforementioned message.
> * [^test.war] -- test app used; the TS builds it for you though and stores it in {{tomcat/tomcat-deployment/_DEFAULT__Basic-app_test.war}}
> * [^thread_dumps_in_sequence_over_time.zip] -- thread dumps taken over time with minutes in between snaps.
> * [^tomcat_logs.zip] -- Tomcat logs
> * [^good_threaddumps.zip] -- a dozen of snapshots couple of seconds from each other as an example of a good TS run that terminates as expected, i.e. with {{IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));}}.
> * This is what good, i.e. 3 iterations long DB trace looks like: [good db trace|https://paste.fedoraproject.org/paste/drANRRl4DlckwLhSIwCOdA/raw].
> * This is what good, i.e. 3 iterations long Tomcat catalina log looks like: [catalina log|https://paste.fedoraproject.org/paste/CkaDCZCtoVhDtyTglEOFqg/raw]
> Might interest also [~ochaloup]. WDYT guys? What should I try out?
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 7 months
[JBoss JIRA] (JBTM-3015) Tomcat JTA: Race condition in tx recovery makes PeriodicRecovery run forever
by Michal Karm Babacek (JIRA)
[ https://issues.jboss.org/browse/JBTM-3015?page=com.atlassian.jira.plugin.... ]
Michal Karm Babacek updated JBTM-3015:
--------------------------------------
Description:
There is a simple [recovery test in the test suite|https://github.com/jbosstm/narayana/blob/master/tomcat/tomcat-jta/s...] that runs 3 times in a row. If you change that number to 4, there is a good change you will get the TS stuck forever. When I attached debugger and used breakpoints on the iteration, e.g. {{i==3}} or {{i==10}}, the error was not shown, the TS run and completed, so it is clearly a time related race condition.
In this JIRA's example, to reproduce, simply:
{noformat}
- IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
+ IntStream.range(0, 10).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
{noformat}
When the Tomcat gets into this unfortunate state and TS hangs and Tomcat runs forever, this is what keeps showing in the database trace (and nothing else):
{noformat}
...
2018-04-24 12:37:50.406 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:37:54.410 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:37:58.415 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:02.419 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:06.421 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:10.423 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:14.426 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:18.429 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
...
{noformat}
I found this Atomikos related reference [online|https://fogbugz.atomikos.com/default13dd.html?community.6.3576.0].
Tomcat log keeps showing again and again:
{noformat}
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== SCANNING
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread scanning
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery first pass at Tue, 24 Apr 2018 14:41:02
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkFirstPass AtomicActionRecoveryModule first pass
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.processTransactions processing /StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction transactions
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change IDLE->FIRST_PASS
24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass Local XARecoveryModule - first pass
24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.postgresql.xa.PGXAConnection@2c8cb5bd
24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change FIRST_PASS->BETWEEN_PASSES
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery second pass at Tue, 24 Apr 2018 14:41:04
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkSecondPass AtomicActionRecoveryModule second pass
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change BETWEEN_PASSES->SECOND_PASS
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule - second pass
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.transactionInitiatedRecovery completed
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.postgresql.xa.PGXAConnection@2c8cb5bd
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.resourceInitiatedRecovery completed
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change SECOND_PASS->IDLE
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== INACTIVE
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread backing off
{noformat}
h3. Logs and thread dumps
* [^postgresql_trace_log.zip] -- PostgreSQL trace, note the originally expected course of recoveries and then the repeated stuck on the aforementioned message.
* [^test.war] -- test app used; the TS builds it for you though and stores it in {{tomcat/tomcat-deployment/_DEFAULT__Basic-app_test.war}}
* [^thread_dumps_in_sequence_over_time.zip] -- thread dumps taken over time with minutes in between snaps.
* [^tomcat_logs.zip] -- Tomcat logs
* [^good_threaddumps.zip] -- a dozen of snapshots couple of seconds from each other as an example of a good TS run that terminates as expected, i.e. with {{IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));}}.
* This is what good, i.e. 3 iterations long DB trace looks like: [good db trace|https://paste.fedoraproject.org/paste/drANRRl4DlckwLhSIwCOdA/raw].
* This is what good, i.e. 3 iterations long Tomcat catalina log looks like: [catalina log|https://paste.fedoraproject.org/paste/CkaDCZCtoVhDtyTglEOFqg/raw]
Might interest also [~ochaloup]. WDYT guys? What should I try out?
was:
There is a simple [recovery test in the test suite|https://github.com/jbosstm/narayana/blob/master/tomcat/tomcat-jta/s...] that runs 3 times in a row. If you change that number to 4, there is a good change you will get the TS stuck forever. When I attached debugger and used breakpoints on the iteration, e.g. {{i==3}} or {{i==10}}, the error was not shown, the TS run and completed, so it is clearly a time related race condition.
In this JIRA's example, to reproduce, simply:
{noformat}
- IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
+ IntStream.range(0, 10).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
{noformat}
When the Tomcat gets into this unfortunate state and TS hangs and Tomcat runs forever, this is what keeps showing in the database trace (and nothing else):
{noformat}
...
2018-04-24 12:37:50.406 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:37:54.410 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:37:58.415 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:02.419 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:06.421 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:10.423 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:14.426 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:18.429 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
...
{noformat}
I found this Atomikos related reference [online|https://fogbugz.atomikos.com/default13dd.html?community.6.3576.0].
Tomcat log keeps showing again and again:
{noformat}
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== SCANNING
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread scanning
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery first pass at Tue, 24 Apr 2018 14:41:02
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkFirstPass AtomicActionRecoveryModule first pass
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.processTransactions processing /StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction transactions
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change IDLE->FIRST_PASS
24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass Local XARecoveryModule - first pass
24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.postgresql.xa.PGXAConnection@2c8cb5bd
24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change FIRST_PASS->BETWEEN_PASSES
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery second pass at Tue, 24 Apr 2018 14:41:04
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkSecondPass AtomicActionRecoveryModule second pass
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change BETWEEN_PASSES->SECOND_PASS
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule - second pass
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.transactionInitiatedRecovery completed
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.postgresql.xa.PGXAConnection@2c8cb5bd
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.resourceInitiatedRecovery completed
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change SECOND_PASS->IDLE
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== INACTIVE
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread backing off
{noformat}
h3. Logs and thread dumps
* [^postgresql_trace_log.zip] -- PostgreSQL trace, note the originally expected course of recoveries and then the repeated stuck on the aforementioned message.
* [^test.war] -- test app used; the TS builds it for you though and stores it in {{tomcat/tomcat-deployment/_DEFAULT__Basic-app_test.war}}
* [^thread_dumps_in_sequence_over_time.zip] -- thread dumps taken over time with minutes in between snaps.
* [^tomcat_logs.zip] -- Tomcat logs
* [^good_threaddumps.zip] -- a dozen of snapshots couple of seconds from each other as an example of a good TS run that terminates as expected, i.e. with {{IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));}}.
Might interest also [~ochaloup]. WDYT guys? What should I try out?
> Tomcat JTA: Race condition in tx recovery makes PeriodicRecovery run forever
> ----------------------------------------------------------------------------
>
> Key: JBTM-3015
> URL: https://issues.jboss.org/browse/JBTM-3015
> Project: JBoss Transaction Manager
> Issue Type: Bug
> Components: Tomcat
> Affects Versions: 5.next
> Environment: Oracle JDK 8, Linux, 8 CPU system
> Reporter: Michal Karm Babacek
> Assignee: Amos Feng
> Priority: Critical
> Attachments: good_threaddumps.zip, postgresql_trace_log.zip, test.war, thread_dumps_in_sequence_over_time.zip, tomcat_logs.zip
>
>
> There is a simple [recovery test in the test suite|https://github.com/jbosstm/narayana/blob/master/tomcat/tomcat-jta/s...] that runs 3 times in a row. If you change that number to 4, there is a good change you will get the TS stuck forever. When I attached debugger and used breakpoints on the iteration, e.g. {{i==3}} or {{i==10}}, the error was not shown, the TS run and completed, so it is clearly a time related race condition.
> In this JIRA's example, to reproduce, simply:
> {noformat}
> - IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
> + IntStream.range(0, 10).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
> {noformat}
> When the Tomcat gets into this unfortunate state and TS hangs and Tomcat runs forever, this is what keeps showing in the database trace (and nothing else):
> {noformat}
> ...
> 2018-04-24 12:37:50.406 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:37:54.410 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:37:58.415 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:02.419 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:06.421 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:10.423 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:14.426 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:18.429 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> ...
> {noformat}
> I found this Atomikos related reference [online|https://fogbugz.atomikos.com/default13dd.html?community.6.3576.0].
> Tomcat log keeps showing again and again:
> {noformat}
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== SCANNING
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread scanning
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery first pass at Tue, 24 Apr 2018 14:41:02
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkFirstPass AtomicActionRecoveryModule first pass
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.processTransactions processing /StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction transactions
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change IDLE->FIRST_PASS
> 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass Local XARecoveryModule - first pass
> 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.postgresql.xa.PGXAConnection@2c8cb5bd
> 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change FIRST_PASS->BETWEEN_PASSES
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery second pass at Tue, 24 Apr 2018 14:41:04
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkSecondPass AtomicActionRecoveryModule second pass
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change BETWEEN_PASSES->SECOND_PASS
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule - second pass
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.transactionInitiatedRecovery completed
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.postgresql.xa.PGXAConnection@2c8cb5bd
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.resourceInitiatedRecovery completed
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change SECOND_PASS->IDLE
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== INACTIVE
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread backing off
> {noformat}
> h3. Logs and thread dumps
> * [^postgresql_trace_log.zip] -- PostgreSQL trace, note the originally expected course of recoveries and then the repeated stuck on the aforementioned message.
> * [^test.war] -- test app used; the TS builds it for you though and stores it in {{tomcat/tomcat-deployment/_DEFAULT__Basic-app_test.war}}
> * [^thread_dumps_in_sequence_over_time.zip] -- thread dumps taken over time with minutes in between snaps.
> * [^tomcat_logs.zip] -- Tomcat logs
> * [^good_threaddumps.zip] -- a dozen of snapshots couple of seconds from each other as an example of a good TS run that terminates as expected, i.e. with {{IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));}}.
> * This is what good, i.e. 3 iterations long DB trace looks like: [good db trace|https://paste.fedoraproject.org/paste/drANRRl4DlckwLhSIwCOdA/raw].
> * This is what good, i.e. 3 iterations long Tomcat catalina log looks like: [catalina log|https://paste.fedoraproject.org/paste/CkaDCZCtoVhDtyTglEOFqg/raw]
> Might interest also [~ochaloup]. WDYT guys? What should I try out?
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 7 months
[JBoss JIRA] (JBTM-3015) Tomcat JTA: Race condition in tx recovery makes PeriodicRecovery run forever
by Michal Karm Babacek (JIRA)
[ https://issues.jboss.org/browse/JBTM-3015?page=com.atlassian.jira.plugin.... ]
Michal Karm Babacek updated JBTM-3015:
--------------------------------------
Attachment: good_threaddumps.zip
thread_dumps_in_sequence_over_time.zip
postgresql_trace_log.zip
tomcat_logs.zip
test.war
> Tomcat JTA: Race condition in tx recovery makes PeriodicRecovery run forever
> ----------------------------------------------------------------------------
>
> Key: JBTM-3015
> URL: https://issues.jboss.org/browse/JBTM-3015
> Project: JBoss Transaction Manager
> Issue Type: Bug
> Components: Tomcat
> Affects Versions: 5.next
> Environment: Oracle JDK 8, Linux, 8 CPU system
> Reporter: Michal Karm Babacek
> Assignee: Amos Feng
> Priority: Critical
> Attachments: good_threaddumps.zip, postgresql_trace_log.zip, test.war, thread_dumps_in_sequence_over_time.zip, tomcat_logs.zip
>
>
> There is a simple [recovery test in the test suite|https://github.com/jbosstm/narayana/blob/master/tomcat/tomcat-jta/s...] that runs 3 times in a row. If you change that number to 4, there is a good change you will get the TS stuck forever. When I attached debugger and used breakpoints on the iteration, e.g. {{i==3}} or {{i==10}}, the error was not shown, the TS run and completed, so it is clearly a time related race condition.
> In this JIRA's example, to reproduce, simply:
> {noformat}
> - IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
> + IntStream.range(0, 10).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
> {noformat}
> When the Tomcat gets into this unfortunate state and TS hangs and Tomcat runs forever, this is what keeps showing in the database trace (and nothing else):
> {noformat}
> ...
> 2018-04-24 12:37:50.406 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:37:54.410 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:37:58.415 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:02.419 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:06.421 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:10.423 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:14.426 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> 2018-04-24 12:38:18.429 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
> ...
> {noformat}
> I found this Atomikos related reference [online|https://fogbugz.atomikos.com/default13dd.html?community.6.3576.0].
> Tomcat log keeps showing again and again:
> {noformat}
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== SCANNING
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread scanning
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery first pass at Tue, 24 Apr 2018 14:41:02
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkFirstPass AtomicActionRecoveryModule first pass
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.processTransactions processing /StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction transactions
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change IDLE->FIRST_PASS
> 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass Local XARecoveryModule - first pass
> 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.postgresql.xa.PGXAConnection@2c8cb5bd
> 24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change FIRST_PASS->BETWEEN_PASSES
> 24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery second pass at Tue, 24 Apr 2018 14:41:04
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkSecondPass AtomicActionRecoveryModule second pass
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change BETWEEN_PASSES->SECOND_PASS
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule - second pass
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.transactionInitiatedRecovery completed
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.postgresql.xa.PGXAConnection@2c8cb5bd
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.resourceInitiatedRecovery completed
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change SECOND_PASS->IDLE
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== INACTIVE
> 24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread backing off
> {noformat}
> h3. Logs and thread dumps
> * [^postgresql_trace_log.zip] -- PostgreSQL trace, note the originally expected course of recoveries and then the repeated stuck on the aforementioned message.
> * [^test.war] -- test app used; the TS builds it for you though and stores it in {{tomcat/tomcat-deployment/_DEFAULT__Basic-app_test.war}}
> * [^thread_dumps_in_sequence_over_time.zip] -- thread dumps taken over time with minutes in between snaps.
> * [^tomcat_logs.zip] -- Tomcat logs
> * [^good_threaddumps.zip] -- a dozen of snapshots couple of seconds from each other as an example of a good TS run that terminates as expected, i.e. with {{IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));}}.
> Might interest also [~ochaloup]. WDYT guys? What should I try out?
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 7 months
[JBoss JIRA] (JBTM-3015) Tomcat JTA: Race condition in tx recovery makes PeriodicRecovery run forever
by Michal Karm Babacek (JIRA)
Michal Karm Babacek created JBTM-3015:
-----------------------------------------
Summary: Tomcat JTA: Race condition in tx recovery makes PeriodicRecovery run forever
Key: JBTM-3015
URL: https://issues.jboss.org/browse/JBTM-3015
Project: JBoss Transaction Manager
Issue Type: Bug
Components: Tomcat
Affects Versions: 5.next
Environment: Oracle JDK 8, Linux, 8 CPU system
Reporter: Michal Karm Babacek
Assignee: Amos Feng
Priority: Critical
There is a simple [recovery test in the test suite|https://github.com/jbosstm/narayana/blob/master/tomcat/tomcat-jta/s...] that runs 3 times in a row. If you change that number to 4, there is a good change you will get the TS stuck forever. When I attached debugger and used breakpoints on the iteration, e.g. {{i==3}} or {{i==10}}, the error was not shown, the TS run and completed, so it is clearly a time related race condition.
In this JIRA's example, to reproduce, simply:
{noformat}
- IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
+ IntStream.range(0, 10).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));
{noformat}
When the Tomcat gets into this unfortunate state and TS hangs and Tomcat runs forever, this is what keeps showing in the database trace (and nothing else):
{noformat}
...
2018-04-24 12:37:50.406 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:37:54.410 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:37:58.415 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:02.419 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:06.421 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:10.423 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:14.426 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
2018-04-24 12:38:18.429 UTC transaction_id: 0 LOG: execute <unnamed>: SELECT gid FROM pg_prepared_xacts where database = current_database()
...
{noformat}
I found this Atomikos related reference [online|https://fogbugz.atomikos.com/default13dd.html?community.6.3576.0].
Tomcat log keeps showing again and again:
{noformat}
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== SCANNING
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread scanning
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery first pass at Tue, 24 Apr 2018 14:41:02
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkFirstPass AtomicActionRecoveryModule first pass
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.processTransactions processing /StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction transactions
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:02.537 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change IDLE->FIRST_PASS
24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass Local XARecoveryModule - first pass
24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.postgresql.xa.PGXAConnection@2c8cb5bd
24-Apr-2018 14:41:02.538 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass xarecovery of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass Found 0 xids in doubt
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change FIRST_PASS->BETWEEN_PASSES
24-Apr-2018 14:41:02.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal Periodic recovery second pass at Tue, 24 Apr 2018 14:41:04
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkSecondPass AtomicActionRecoveryModule second pass
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change BETWEEN_PASSES->SECOND_PASS
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule - second pass
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.transactionInitiatedRecovery completed
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.postgresql.xa.PGXAConnection@2c8cb5bd
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass xarecovery second pass of org.jboss.narayana.tomcat.jta.integration.app.TestXAResource@4094781e
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass Have 0 Xids to recover on this pass.
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass Local XARecoveryModule.resourceInitiatedRecovery completed
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.setScanState XARecoveryModule state change SECOND_PASS->IDLE
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread Status <== INACTIVE
24-Apr-2018 14:41:04.539 DEBUG [Periodic Recovery] com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run PeriodicRecovery: background thread backing off
{noformat}
h3. Logs and thread dumps
* [^postgresql_trace_log.zip] -- PostgreSQL trace, note the originally expected course of recoveries and then the repeated stuck on the aforementioned message.
* [^test.war] -- test app used; the TS builds it for you though and stores it in {{tomcat/tomcat-deployment/_DEFAULT__Basic-app_test.war}}
* [^thread_dumps_in_sequence_over_time.zip] -- thread dumps taken over time with minutes in between snaps.
* [^tomcat_logs.zip] -- Tomcat logs
* [^good_threaddumps.zip] -- a dozen of snapshots couple of seconds from each other as an example of a good TS run that terminates as expected, i.e. with {{IntStream.range(0, 3).forEach(i -> test(EXECUTOR_URL + RECOVERY_TEST));}}.
Might interest also [~ochaloup]. WDYT guys? What should I try out?
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 7 months