]
Ondra Chaloupka commented on JBTM-3047:
---------------------------------------
[~mmusgrov] what do you think about this?
Suspending recovery manager causes deadlock when acive
RecoveryMonitor scan request exists
------------------------------------------------------------------------------------------
Key: JBTM-3047
URL:
https://issues.jboss.org/browse/JBTM-3047
Project: JBoss Transaction Manager
Issue Type: Bug
Components: Tooling
Affects Versions: 5.9.0.Final
Reporter: Ondra Chaloupka
Assignee: Ondra Chaloupka
If {{PeriodicMonitor}} has active unfinished request to start recovery scan
{code}
INFO [com.arjuna.ats.arjuna] (Server.Connection:127.0.0.1:36326) ARJUNA012340:
RecoveryManager scan scheduled to begin.
{code}
and meanwhile [the RecoveryManager is
suspended|https://github.com/jbosstm/narayana/blob/5.9.0.Final/ArjunaCore...]
then the WorkerService is waiting in deadlock forever and JVM can't be cleanly
stopped.
The whole stopping EAP server stacktrace could be seen at
https://issues.jboss.org/browse/CLOUD-2767 the related Narayana related threads are[10].
It seems to me as the recovery was put to suspended[1] the processing thread waits for
the state change[2]. The state change to set to terminated could be done by periodic
recovery shutdown[3] but it's stuck on waiting for listners to be stopped[4]. Stopping
the listeners could be done by the thread waiting for the state change[5].
[1]
https://github.com/wildfly/wildfly/blob/master/transactions/src/main/java...
[2]
https://github.com/jbosstm/narayana/blob/master/ArjunaCore/arjuna/classes...
[3]
https://github.com/jbosstm/narayana/blob/master/ArjunaCore/arjuna/classes...
[4]
https://github.com/jbosstm/narayana/blob/master/ArjunaCore/arjuna/classes...
[5]
https://github.com/jbosstm/narayana/blob/master/ArjunaCore/arjuna/classes...
[10]
{code}
"Transaction Reaper Worker 0" #121 daemon prio=5 os_prio=0
tid=0x000000000197a000 nid=0x2b6 in Object.wait() [0x00007f3661d2b000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000b289c310> (a java.util.LinkedList)
at java.lang.Object.wait(Object.java:502)
at
com.arjuna.ats.arjuna.coordinator.TransactionReaper.waitForCancellations(TransactionReaper.java:328)
- locked <0x00000000b289c310> (a java.util.LinkedList)
at
com.arjuna.ats.internal.arjuna.coordinator.ReaperWorkerThread.run(ReaperWorkerThread.java:65)
Locked ownable synchronizers:
- None
"Transaction Reaper" #120 daemon prio=5 os_prio=0 tid=0x0000000001b4b800
nid=0x2b5 in Object.wait() [0x00007f3661e2c000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000b289c330> (a
com.arjuna.ats.arjuna.coordinator.TransactionReaper)
at com.arjuna.ats.internal.arjuna.coordinator.ReaperThread.run(ReaperThread.java:90)
- locked <0x00000000b289c330> (a
com.arjuna.ats.arjuna.coordinator.TransactionReaper)
Locked ownable synchronizers:
- None
"Server.Connection:10.130.0.162:46200" #335 daemon prio=5 os_prio=0
tid=0x0000000003853800 nid=0x449 in Object.wait() [0x00007f365ec2c000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at com.arjuna.ats.internal.arjuna.recovery.WorkerService.doWork(WorkerService.java:95)
- locked <0x00000000b2916608> (a
com.arjuna.ats.internal.arjuna.recovery.WorkerService)
at com.arjuna.ats.internal.arjuna.recovery.Connection.run(Connection.java:88)
Locked ownable synchronizers:
- None
"Server.Connection:10.130.0.162:46090" #225 daemon prio=5 os_prio=0
tid=0x0000000003a4c800 nid=0x3ca in Object.wait() [0x00007f365c2ef000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at com.arjuna.ats.internal.arjuna.recovery.WorkerService.doWork(WorkerService.java:95)
- locked <0x00000000b2916608> (a
com.arjuna.ats.internal.arjuna.recovery.WorkerService)
at com.arjuna.ats.internal.arjuna.recovery.Connection.run(Connection.java:88)
Locked ownable synchronizers:
- None
"MSC service thread 1-5" #19 prio=5 os_prio=0 tid=0x0000000001903000 nid=0x251
in Object.wait() [0x00007f366aebe000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at com.arjuna.ats.internal.arjuna.recovery.Listener.stopListener(Listener.java:222)
- locked <0x00000000b29161d0> (a
com.arjuna.ats.internal.arjuna.recovery.Listener)
at
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.shutdown(PeriodicRecovery.java:169)
at
com.arjuna.ats.internal.arjuna.recovery.RecoveryManagerImple.stop(RecoveryManagerImple.java:179)
at com.arjuna.ats.arjuna.recovery.RecoveryManager.terminate(RecoveryManager.java:210)
- locked <0x00000000b0badbb8> (a com.arjuna.ats.arjuna.recovery.RecoveryManager)
at com.arjuna.ats.arjuna.recovery.RecoveryManager.terminate(RecoveryManager.java:194)
at
com.arjuna.ats.jbossatx.jta.RecoveryManagerService.stop(RecoveryManagerService.java:73)
at
org.jboss.as.txn.service.ArjunaRecoveryManagerService.stop(ArjunaRecoveryManagerService.java:167)
- locked <0x00000000b313ddb0> (a
org.jboss.as.txn.service.ArjunaRecoveryManagerService)
at
org.jboss.msc.service.ServiceControllerImpl$StopTask.stopService(ServiceControllerImpl.java:2150)
at
org.jboss.msc.service.ServiceControllerImpl$StopTask.run(ServiceControllerImpl.java:2101)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- <0x00000000b351b500> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"Periodic Recovery" #116 prio=5 os_prio=0 tid=0x0000000002357000 nid=0x2b4 in
Object.wait() [0x00007f3661f2d000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doSuspendedWait(PeriodicRecovery.java:709)
at
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:346)
- locked <0x00000000b289c898> (a java.lang.Object)
Locked ownable synchronizers:
- None
{code}
NOTE: there is interesting point there are two {{WorkerService}} instances in the stack
trace. It's causes two sequential requests hits the RecoveryManager while the periodic
recovery was in progress before the server was shutdown. The code invoking the two calls
and then immediately running the shutdown was
https://github.com/jboss-openshift/cct_module/blob/7f891cca882f86f983b298...