[jboss-jira] [JBoss JIRA] (WFLY-10621) Server stop indefinitely blocked by DistributableSessionManager in lifecycleLock.writeLockInterruptibly()
Petr H (JIRA)
issues at jboss.org
Mon Jun 25 08:03:00 EDT 2018
[ https://issues.jboss.org/browse/WFLY-10621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13596123#comment-13596123 ]
Petr H edited comment on WFLY-10621 at 6/25/18 8:02 AM:
--------------------------------------------------------
It isn't necessarily related to shared sessions, but they may be one of the possible triggers that lead to the use of the DistributableSessionManager where the problematic code lies.
Our case is:
- non-HA profile
- single server in the group
- 4 web applications deployed
- all web modules marked as distributable (in web.xml) - this is likely another trigger that, similarly to shared sessions, causes that the DistributableSessionManager is put into the action
- no shared sessions at all
But generally the StampedLock.writeLockInterruptibly() method is able to block forever which shouldn't be probably allowed during the server stop operation. Sure, it would be better to find out what does prevent the DistributableSessionManager from getting that lock, but since it isn't the kind of lock directly visible in thread dumps, I fear some debugging would be necessary.
I may still try the latest WildFly (13 atm.), but that will take some time. The issue doesn't even occur consistently (always) so it isn't so straightforward to reproduce.
was (Author: hostalp):
It isn't necessarily related to shared sessions, but they may be one of the possible triggers that lead to the use of the DistributableSessionManager where the problematic code lies.
Our case is:
- non-HA profile
- single server in the group
- 4 web applications depoyed
- all web modules marked as distributable (in web.xml) - this is likely another trigger that, similarly to shared sessions, causes that the DistributableSessionManager is put into the action
- no shared sessions at all
But generally the StampedLock.writeLockInterruptibly() method is able to block forever which shouldn't be probably allowed during the server stop operation. Sure, it would be better to find out what does prevent the DistributableSessionManager from getting that lock, but since it isn't the kind of lock directly visible in thread dumps, I fear some debugging would be necessary.
I may still try the latest WildFly (13 atm.), but that will take some time. The issue doesn't even occur consistently (always) so it isn't so straightforward to reproduce.
> Server stop indefinitely blocked by DistributableSessionManager in lifecycleLock.writeLockInterruptibly()
> ---------------------------------------------------------------------------------------------------------
>
> Key: WFLY-10621
> URL: https://issues.jboss.org/browse/WFLY-10621
> Project: WildFly
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 11.0.0.Final
> Environment: CentOS 7.4.1708
> Java 1.8.0_172-b11
> WildFly 11.0.0.Final
> Reporter: Petr H
> Assignee: Paul Ferraro
> Labels: clustering, sessions
> Attachments: threaddump-wfly11-stop-blocked.txt
>
>
> Note: This is report for the issue reported here first https://developer.jboss.org/thread/277883
> It can be seen at WildFly 11 and EAP 7.1 at least, however as the particular code in the DistributableSessionManager class seems to be unchanged even in WildFly 13 it's likely to occur there as well. (We test WildFly 11 because it's closest to the EAP 7.1 which is being considered for the production use later in this project)
> Stop server operation may hang indefinitely due to the thread such as the following one being blocked:
> {noformat}
> "ServerService Thread Pool -- 24" #49 prio=5 os_prio=0 tid=0x00000000035d1000 nid=0x4de1 waiting on condition [0x00007f4bddc46000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000000c4e61998> (a java.util.concurrent.locks.StampedLock)
> at java.util.concurrent.locks.StampedLock.acquireWrite(StampedLock.java:1119)
> at java.util.concurrent.locks.StampedLock.writeLockInterruptibly(StampedLock.java:413)
> at org.wildfly.clustering.web.undertow.session.DistributableSessionManager.stop(DistributableSessionManager.java:90)
> - locked <0x00000000c4e60090> (a org.wildfly.clustering.web.undertow.session.DistributableSessionManager)
> at io.undertow.servlet.core.DeploymentManagerImpl$3.call(DeploymentManagerImpl.java:599)
> at io.undertow.servlet.core.DeploymentManagerImpl$3.call(DeploymentManagerImpl.java:589)
> at io.undertow.servlet.core.ServletRequestContextThreadSetupAction$1.call(ServletRequestContextThreadSetupAction.java:42)
> at io.undertow.servlet.core.ContextClassLoaderSetupAction$1.call(ContextClassLoaderSetupAction.java:43)
> at org.wildfly.extension.undertow.security.SecurityContextThreadSetupAction.lambda$create$0(SecurityContextThreadSetupAction.java:105)
> at org.wildfly.extension.undertow.security.SecurityContextThreadSetupAction$$Lambda$1005/749195997.call(Unknown Source)
> at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508)
> at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction$$Lambda$1006/1165195653.call(Unknown Source)
> at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508)
> at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction$$Lambda$1006/1165195653.call(Unknown Source)
> at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508)
> at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction$$Lambda$1006/1165195653.call(Unknown Source)
> at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508)
> at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction$$Lambda$1006/1165195653.call(Unknown Source)
> at io.undertow.servlet.core.DeploymentManagerImpl.stop(DeploymentManagerImpl.java:603)
> at org.wildfly.extension.undertow.deployment.UndertowDeploymentService.stopContext(UndertowDeploymentService.java:139)
> at org.wildfly.extension.undertow.deployment.UndertowDeploymentService$2.run(UndertowDeploymentService.java:120)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> at org.jboss.threads.JBossThread.run(JBossThread.java:320)
> {noformat}
> The related code in the DistributableSessionManager makes use of the [java.util.concurrent.locks.StampedLock.writeLockInterruptibly()|https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/StampedLock.html#writeLockInterruptibly--] method which can theoretically block indefinitely and thus prevent the server stop from ever completing.
> The change was introduced in the following commit [WFLY-7695 Block session manager stop until all sessions are closed|https://github.com/wildfly/wildfly/commit/94863deeb96c4536ec5e8266573e211289b02529#diff-a7519dd56759f6084e7c0c0bb006353f]
> This can occur even in non-clustered environments (non-HA profile) with web applications just being marked as distributable (web.xml).
> We don't know what prevents the lock from being obtained (unable to determine from the thread dump) so we can't unfortunately do much about it at the moment from the application side.
> We only tried to change the infinispan web local-cache configuration due to _ISPN000136_ errors intermittedly occurring, but that seems to have no effect. We tried various combinations of *locking isolation* (REPEATABLE_READ, READ_COMMITED), *transaction locking* (OPTIMISTIC, PESSIMISTIC) and *transaction mode* (NONE, BATCH) settings and it seems to occur with all of them.
> Note: The issue doesn't occur always, only in some cases.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
More information about the jboss-jira
mailing list