[
https://issues.jboss.org/browse/JBTM-3047?page=com.atlassian.jira.plugin....
]
Ondra Chaloupka updated JBTM-3047:
----------------------------------
Steps to Reproduce:
This could be simulated for WFLY when it should be stopped. WildFly {{:shutdown}} command
causes suspend being invoked and then JVM is about to be closed.
* Periodic recovery is active and running - for processing both phases it needs more than
20 seconds which is the default socket timeout for PeriodicMonitor call
* Client requests SCAN, the default timeout for the socket is 20 seconds[5]
{{java -cp
modules/system/layers/base/org/jboss/jts/main/narayana-jts-idlj-5.9.0.Final.jar
com.arjuna.ats.arjuna.tools.RecoveryMonitor -host localhost -port 4712}}
* ServiceWorker is started and waits
* 20 seconds elapsed, client gets "java.net.SocketTimeoutException: Read timed
out"
* (recovery needs to be still in progress while stopping the server)
* server is asked to stop {{./bin/jboss-cli.sh --command=':shutdown()' -c}} which
causes suspend is called on the recovery manager and we have the deadlock
was:
This could be simulated for WFLY when it should be stopped. WildFly {{:shutdown}} command
causes suspend being invoked and then JVM is about to be closed.
* Periodic recovery is active and running - for processing both phases it needs more than
20 seconds which is the default socket timeout for PeriodicMonitor call
* Client requests SCAN, the default timeout for the socket is 20 seconds[5]
{{java -cp
modules/system/layers/base/org/jboss/jts/main/narayana-jts-idlj-5.9.0.Final.jar
com.arjuna.ats.arjuna.tools.RecoveryMonitor -host localhost -port 4712}}
* ServiceWorker is started and waits
* 20 seconds elapsed, client gets "java.net.SocketTimeoutException: Read timed
out"
* server is asked to stop {{./bin/jboss-cli.sh --command=':shutdown()' -c}} which
causes suspend is called on the recovery manager and we have the deadlock
Suspending recovery manager causes deadlock when acive
RecoveryMonitor scan request exists
------------------------------------------------------------------------------------------
Key: JBTM-3047
URL:
https://issues.jboss.org/browse/JBTM-3047
Project: JBoss Transaction Manager
Issue Type: Bug
Components: Tooling
Affects Versions: 5.9.0.Final
Reporter: Ondra Chaloupka
Assignee: Ondra Chaloupka
If {{PeriodicMonitor}} has active unfinished request to start recovery scan
{code}
INFO [com.arjuna.ats.arjuna] (Server.Connection:127.0.0.1:36326) ARJUNA012340:
RecoveryManager scan scheduled to begin.
{code}
and meanwhile [the RecoveryManager is
suspended|https://github.com/jbosstm/narayana/blob/5.9.0.Final/ArjunaCore...]
then the WorkerService is waiting in deadlock forever and JVM can't be cleanly
stopped.
The whole stopping EAP server stacktrace could be seen at
https://issues.jboss.org/browse/CLOUD-2767 the related Narayana related threads are[10].
It seems to me as the recovery was put to suspended[1] the processing thread waits for
the state change[2]. The state change to set to terminated could be done by periodic
recovery shutdown[3] but it's stuck on waiting for listners to be stopped[4]. Stopping
the listeners could be done by the thread waiting for the state change[5].
[1]
https://github.com/wildfly/wildfly/blob/master/transactions/src/main/java...
[2]
https://github.com/jbosstm/narayana/blob/master/ArjunaCore/arjuna/classes...
[3]
https://github.com/jbosstm/narayana/blob/master/ArjunaCore/arjuna/classes...
[4]
https://github.com/jbosstm/narayana/blob/master/ArjunaCore/arjuna/classes...
[5]
https://github.com/jbosstm/narayana/blob/master/ArjunaCore/arjuna/classes...
[10]
{code}
"Transaction Reaper Worker 0" #121 daemon prio=5 os_prio=0
tid=0x000000000197a000 nid=0x2b6 in Object.wait() [0x00007f3661d2b000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000b289c310> (a java.util.LinkedList)
at java.lang.Object.wait(Object.java:502)
at
com.arjuna.ats.arjuna.coordinator.TransactionReaper.waitForCancellations(TransactionReaper.java:328)
- locked <0x00000000b289c310> (a java.util.LinkedList)
at
com.arjuna.ats.internal.arjuna.coordinator.ReaperWorkerThread.run(ReaperWorkerThread.java:65)
Locked ownable synchronizers:
- None
"Transaction Reaper" #120 daemon prio=5 os_prio=0 tid=0x0000000001b4b800
nid=0x2b5 in Object.wait() [0x00007f3661e2c000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000b289c330> (a
com.arjuna.ats.arjuna.coordinator.TransactionReaper)
at com.arjuna.ats.internal.arjuna.coordinator.ReaperThread.run(ReaperThread.java:90)
- locked <0x00000000b289c330> (a
com.arjuna.ats.arjuna.coordinator.TransactionReaper)
Locked ownable synchronizers:
- None
"Server.Connection:10.130.0.162:46200" #335 daemon prio=5 os_prio=0
tid=0x0000000003853800 nid=0x449 in Object.wait() [0x00007f365ec2c000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at com.arjuna.ats.internal.arjuna.recovery.WorkerService.doWork(WorkerService.java:95)
- locked <0x00000000b2916608> (a
com.arjuna.ats.internal.arjuna.recovery.WorkerService)
at com.arjuna.ats.internal.arjuna.recovery.Connection.run(Connection.java:88)
Locked ownable synchronizers:
- None
"Server.Connection:10.130.0.162:46090" #225 daemon prio=5 os_prio=0
tid=0x0000000003a4c800 nid=0x3ca in Object.wait() [0x00007f365c2ef000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at com.arjuna.ats.internal.arjuna.recovery.WorkerService.doWork(WorkerService.java:95)
- locked <0x00000000b2916608> (a
com.arjuna.ats.internal.arjuna.recovery.WorkerService)
at com.arjuna.ats.internal.arjuna.recovery.Connection.run(Connection.java:88)
Locked ownable synchronizers:
- None
"MSC service thread 1-5" #19 prio=5 os_prio=0 tid=0x0000000001903000 nid=0x251
in Object.wait() [0x00007f366aebe000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at com.arjuna.ats.internal.arjuna.recovery.Listener.stopListener(Listener.java:222)
- locked <0x00000000b29161d0> (a
com.arjuna.ats.internal.arjuna.recovery.Listener)
at
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.shutdown(PeriodicRecovery.java:169)
at
com.arjuna.ats.internal.arjuna.recovery.RecoveryManagerImple.stop(RecoveryManagerImple.java:179)
at com.arjuna.ats.arjuna.recovery.RecoveryManager.terminate(RecoveryManager.java:210)
- locked <0x00000000b0badbb8> (a com.arjuna.ats.arjuna.recovery.RecoveryManager)
at com.arjuna.ats.arjuna.recovery.RecoveryManager.terminate(RecoveryManager.java:194)
at
com.arjuna.ats.jbossatx.jta.RecoveryManagerService.stop(RecoveryManagerService.java:73)
at
org.jboss.as.txn.service.ArjunaRecoveryManagerService.stop(ArjunaRecoveryManagerService.java:167)
- locked <0x00000000b313ddb0> (a
org.jboss.as.txn.service.ArjunaRecoveryManagerService)
at
org.jboss.msc.service.ServiceControllerImpl$StopTask.stopService(ServiceControllerImpl.java:2150)
at
org.jboss.msc.service.ServiceControllerImpl$StopTask.run(ServiceControllerImpl.java:2101)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- <0x00000000b351b500> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"Periodic Recovery" #116 prio=5 os_prio=0 tid=0x0000000002357000 nid=0x2b4 in
Object.wait() [0x00007f3661f2d000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doSuspendedWait(PeriodicRecovery.java:709)
at
com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:346)
- locked <0x00000000b289c898> (a java.lang.Object)
Locked ownable synchronizers:
- None
{code}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)