[jbossts-issues] [JBoss JIRA] (JBTM-3047) Suspending recovery manager causes deadlock when acive RecoveryMonitor scan request exists

Ondra Chaloupka (JIRA) issues at jboss.org
Fri Aug 3 06:04:01 EDT 2018


     [ https://issues.jboss.org/browse/JBTM-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ondra Chaloupka updated JBTM-3047:
----------------------------------
    Steps to Reproduce: 
This could be simulated for WFLY when it should be stopped. WildFly {{:shutdown}} command causes suspend being invoked and then JVM is about to be closed.

* Periodic recovery is active and running - for processing both phases it needs more than 20 seconds which is the default socket timeout for PeriodicMonitor call
* Client requests SCAN, the default timeout for the socket is 20 seconds[5]
  {{java -cp modules/system/layers/base/org/jboss/jts/main/narayana-jts-idlj-5.9.0.Final.jar com.arjuna.ats.arjuna.tools.RecoveryMonitor -host localhost -port 4712}}
* ServiceWorker is started and waits
* 20 seconds elapsed, client gets "java.net.SocketTimeoutException: Read timed out"
* (recovery needs to be still in progress while stopping the server)
* server is asked to stop {{./bin/jboss-cli.sh --command=':shutdown()' -c}} which causes suspend is called on the recovery manager and we have the deadlock

  was:
This could be simulated for WFLY when it should be stopped. WildFly {{:shutdown}} command causes suspend being invoked and then JVM is about to be closed.

* Periodic recovery is active and running - for processing both phases it needs more than 20 seconds which is the default socket timeout for PeriodicMonitor call
* Client requests SCAN, the default timeout for the socket is 20 seconds[5]
  {{java -cp modules/system/layers/base/org/jboss/jts/main/narayana-jts-idlj-5.9.0.Final.jar com.arjuna.ats.arjuna.tools.RecoveryMonitor -host localhost -port 4712}}
* ServiceWorker is started and waits
* 20 seconds elapsed, client gets "java.net.SocketTimeoutException: Read timed out"
* server is asked to stop {{./bin/jboss-cli.sh --command=':shutdown()' -c}} which causes suspend is called on the recovery manager and we have the deadlock



> Suspending recovery manager causes deadlock when acive RecoveryMonitor scan request exists
> ------------------------------------------------------------------------------------------
>
>                 Key: JBTM-3047
>                 URL: https://issues.jboss.org/browse/JBTM-3047
>             Project: JBoss Transaction Manager
>          Issue Type: Bug
>          Components: Tooling
>    Affects Versions: 5.9.0.Final
>            Reporter: Ondra Chaloupka
>            Assignee: Ondra Chaloupka
>
> If {{PeriodicMonitor}} has active unfinished request to start recovery scan
> {code}
> INFO  [com.arjuna.ats.arjuna] (Server.Connection:127.0.0.1:36326) ARJUNA012340: RecoveryManager scan scheduled to begin.
> {code}
> and meanwhile [the RecoveryManager is suspended|https://github.com/jbosstm/narayana/blob/5.9.0.Final/ArjunaCore/arjuna/classes/com/arjuna/ats/arjuna/recovery/RecoveryManager.java#L256] then the WorkerService is waiting in deadlock forever and JVM can't be cleanly stopped.
> The whole stopping EAP server stacktrace could be seen at https://issues.jboss.org/browse/CLOUD-2767 the related Narayana related threads are[10].
> It seems to me as the recovery was put to suspended[1] the processing thread waits for the state change[2]. The state change to set to terminated could be done by periodic recovery shutdown[3] but it's stuck on waiting for listners to be stopped[4]. Stopping the listeners could be done by the thread waiting for the state change[5].
> [1] https://github.com/wildfly/wildfly/blob/master/transactions/src/main/java/org/jboss/as/txn/suspend/RecoverySuspendController.java#L41
> [2] https://github.com/jbosstm/narayana/blob/master/ArjunaCore/arjuna/classes/com/arjuna/ats/internal/arjuna/recovery/PeriodicRecovery.java#L348
> [3] https://github.com/jbosstm/narayana/blob/master/ArjunaCore/arjuna/classes/com/arjuna/ats/internal/arjuna/recovery/PeriodicRecovery.java#L177
> [4] https://github.com/jbosstm/narayana/blob/master/ArjunaCore/arjuna/classes/com/arjuna/ats/internal/arjuna/recovery/PeriodicRecovery.java#L169
> [5] https://github.com/jbosstm/narayana/blob/master/ArjunaCore/arjuna/classes/com/arjuna/ats/internal/arjuna/recovery/PeriodicRecovery.java#L387
> [10]
> {code}
> "Transaction Reaper Worker 0" #121 daemon prio=5 os_prio=0 tid=0x000000000197a000 nid=0x2b6 in Object.wait() [0x00007f3661d2b000]
>    java.lang.Thread.State: WAITING (on object monitor)
>  at java.lang.Object.wait(Native Method)
>  - waiting on <0x00000000b289c310> (a java.util.LinkedList)
>  at java.lang.Object.wait(Object.java:502)
>  at com.arjuna.ats.arjuna.coordinator.TransactionReaper.waitForCancellations(TransactionReaper.java:328)
>  - locked <0x00000000b289c310> (a java.util.LinkedList)
>  at com.arjuna.ats.internal.arjuna.coordinator.ReaperWorkerThread.run(ReaperWorkerThread.java:65)
>    Locked ownable synchronizers:
>  - None
> "Transaction Reaper" #120 daemon prio=5 os_prio=0 tid=0x0000000001b4b800 nid=0x2b5 in Object.wait() [0x00007f3661e2c000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>  at java.lang.Object.wait(Native Method)
>  - waiting on <0x00000000b289c330> (a com.arjuna.ats.arjuna.coordinator.TransactionReaper)
>  at com.arjuna.ats.internal.arjuna.coordinator.ReaperThread.run(ReaperThread.java:90)
>  - locked <0x00000000b289c330> (a com.arjuna.ats.arjuna.coordinator.TransactionReaper)
>    Locked ownable synchronizers:
>  - None
> "Server.Connection:10.130.0.162:46200" #335 daemon prio=5 os_prio=0 tid=0x0000000003853800 nid=0x449 in Object.wait() [0x00007f365ec2c000]
>    java.lang.Thread.State: WAITING (on object monitor)
>  at java.lang.Object.wait(Native Method)
>  at java.lang.Object.wait(Object.java:502)
>  at com.arjuna.ats.internal.arjuna.recovery.WorkerService.doWork(WorkerService.java:95)
>  - locked <0x00000000b2916608> (a com.arjuna.ats.internal.arjuna.recovery.WorkerService)
>  at com.arjuna.ats.internal.arjuna.recovery.Connection.run(Connection.java:88)
>    Locked ownable synchronizers:
>  - None
> "Server.Connection:10.130.0.162:46090" #225 daemon prio=5 os_prio=0 tid=0x0000000003a4c800 nid=0x3ca in Object.wait() [0x00007f365c2ef000]
>    java.lang.Thread.State: WAITING (on object monitor)
>  at java.lang.Object.wait(Native Method)
>  at java.lang.Object.wait(Object.java:502)
>  at com.arjuna.ats.internal.arjuna.recovery.WorkerService.doWork(WorkerService.java:95)
>  - locked <0x00000000b2916608> (a com.arjuna.ats.internal.arjuna.recovery.WorkerService)
>  at com.arjuna.ats.internal.arjuna.recovery.Connection.run(Connection.java:88)
>    Locked ownable synchronizers:
>  - None
> "MSC service thread 1-5" #19 prio=5 os_prio=0 tid=0x0000000001903000 nid=0x251 in Object.wait() [0x00007f366aebe000]
>    java.lang.Thread.State: WAITING (on object monitor)
>  at java.lang.Object.wait(Native Method)
>  at java.lang.Object.wait(Object.java:502)
>  at com.arjuna.ats.internal.arjuna.recovery.Listener.stopListener(Listener.java:222)
>  - locked <0x00000000b29161d0> (a com.arjuna.ats.internal.arjuna.recovery.Listener)
>  at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.shutdown(PeriodicRecovery.java:169)
>  at com.arjuna.ats.internal.arjuna.recovery.RecoveryManagerImple.stop(RecoveryManagerImple.java:179)
>  at com.arjuna.ats.arjuna.recovery.RecoveryManager.terminate(RecoveryManager.java:210)
>  - locked <0x00000000b0badbb8> (a com.arjuna.ats.arjuna.recovery.RecoveryManager)
>  at com.arjuna.ats.arjuna.recovery.RecoveryManager.terminate(RecoveryManager.java:194)
>  at com.arjuna.ats.jbossatx.jta.RecoveryManagerService.stop(RecoveryManagerService.java:73)
>  at org.jboss.as.txn.service.ArjunaRecoveryManagerService.stop(ArjunaRecoveryManagerService.java:167)
>  - locked <0x00000000b313ddb0> (a org.jboss.as.txn.service.ArjunaRecoveryManagerService)
>  at org.jboss.msc.service.ServiceControllerImpl$StopTask.stopService(ServiceControllerImpl.java:2150)
>  at org.jboss.msc.service.ServiceControllerImpl$StopTask.run(ServiceControllerImpl.java:2101)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers:
>  - <0x00000000b351b500> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "Periodic Recovery" #116 prio=5 os_prio=0 tid=0x0000000002357000 nid=0x2b4 in Object.wait() [0x00007f3661f2d000]
>    java.lang.Thread.State: WAITING (on object monitor)
>  at java.lang.Object.wait(Native Method)
>  at java.lang.Object.wait(Object.java:502)
>  at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doSuspendedWait(PeriodicRecovery.java:709)
>  at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:346)
>  - locked <0x00000000b289c898> (a java.lang.Object)
>    Locked ownable synchronizers:
>  - None
> {code}



--
This message was sent by Atlassian JIRA
(v7.5.0#75005)


More information about the jbossts-issues mailing list