[jboss-jira] [JBoss JIRA] (WFCORE-756) Core management server shutting down a thread pool early, resulting in RejectedExecutionException

Thu Jun 11 07:38:04 EDT 2015

     [ https://issues.jboss.org/browse/WFCORE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ladislav Thon updated WFCORE-756:
---------------------------------
    Description: 
In one of our tests, we've seen this exception during server shutdown:

{code}
2015-06-05 10:34:44,387 ERROR [org.xnio.listener] (XNIO-1 I/O-2) XNIO001007: A channel event listener threw an exception: java.util.concurrent.RejectedExecutionException: Task org.jboss.remoting3.remote.RemoteReadListener$1$1 at 5e7925db rejected from org.xnio.XnioWorker$TaskPool at 3d00aa37[Shutting down, pool size = 7, active threads = 0, queued tasks = 0, completed tasks = 52]
	at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
	at org.xnio.XnioWorker.execute(XnioWorker.java:741)
	at org.jboss.remoting3.remote.RemoteReadListener$1.handleEvent(RemoteReadListener.java:54)
	...
{code}

Log file: http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-7x-acceptance-multinode-rhel/jdk=java18_default,label_exp=EAP-RHEL7%20&&%20eap-sustaining/13/artifact/manu-eap-1.1.4/out/checkLogsAfterStopStandaloneFullHaXmlDifferentBindAddress1/artifact/server-checkLogsAfterStopStandaloneFullHaXmlDifferentBindAddress1.log

This is a very basic test that only starts the server and then shuts it down using {{:shutdown}}.

After looking into this for a while, I believe that this is caused by the core management server ({{UndertowHttpManagementService}}) shutting down an XNIO worker (= thread pool) while the {{:shutdown}} management operation is still running (or, in fact, finishing, trying to close the network connection).

I have a Byteman-based reproducer that inserts artifical pauses to certain well-defined places. I'm not sure if this has some connection to the graceful shutdown system, but I believe that even if it does, something like this shouldn't happen.

Steps to reproduce:

# {{./bin/standalone.sh -c standalone-full-ha.xml}} and wait until it starts completely
# {{jps -v | grep "&#92;-D&#92;\[Standalone&#92;\]"}} to figure out the PID of the newly started server
# {{bminstall.sh -b -Dorg.jboss.byteman.transform.all $PID}}
# {{bmsubmit.sh reproducer.btm}}, where {{reproducer.btm}} is a Byteman script reproduced below
# {{./bin/jboss-cli.sh -c}}
# {{:read-resource}} repeat few times
# {{:shutdown(timeout=1)}} (or plain {{:shutdown}})

The Byteman script:

{code}
RULE XnioWorker.TaskPool/ThreadPoolExecutor shutdown
CLASS java.util.concurrent.ThreadPoolExecutor
METHOD shutdown()
AFTER INVOKE advanceRunState
IF TRUE
DO Thread.sleep(10000)
ENDRULE

RULE Remoting onClose handler
CLASS org.jboss.remoting3.remote.RemoteReadListener$1
METHOD handleEvent(java.nio.channels.Channel)
AT ENTRY
IF TRUE
DO Thread.sleep(5000)
ENDRULE
{code}

  was:
In one of our tests, we've seen this exception during server shutdown:

{code}
2015-06-05 10:34:44,387 ERROR [org.xnio.listener] (XNIO-1 I/O-2) XNIO001007: A channel event listener threw an exception: java.util.concurrent.RejectedExecutionException: Task org.jboss.remoting3.remote.RemoteReadListener$1$1 at 5e7925db rejected from org.xnio.XnioWorker$TaskPool at 3d00aa37[Shutting down, pool size = 7, active threads = 0, queued tasks = 0, completed tasks = 52]
	at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
	at org.xnio.XnioWorker.execute(XnioWorker.java:741)
	at org.jboss.remoting3.remote.RemoteReadListener$1.handleEvent(RemoteReadListener.java:54)
	...
{code}

Log file: http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-7x-acceptance-multinode-rhel/jdk=java18_default,label_exp=EAP-RHEL7%20&&%20eap-sustaining/13/artifact/manu-eap-1.1.4/out/checkLogsAfterStopStandaloneFullHaXmlDifferentBindAddress1/artifact/server-checkLogsAfterStopStandaloneFullHaXmlDifferentBindAddress1.log

This is a very basic test that only starts the server and then shuts it down using {{:shutdown}}.

After looking into this for a while, I believe that this is caused by the core management server ({{UndertowHttpManagementService}}) shutting down an XNIO worker (= thread pool) while the {{:shutdown}} management operation is still running (or, in fact, finishing, trying to close the network connection).

I have a Byteman-based reproducer that inserts artifical pauses to certain well-defined places. I'm not sure if this has some connection to the graceful shutdown system, but I believe that even if it does, something like this shouldn't happen.

Steps to reproduce:

# {{./bin/standalone.sh -c standalone-full-ha.xml}} and wait until it starts completely
# {{jps -v | grep "&#92;-D&#92;\[Standalone&#92;\]"}} to figure out the PID of the newly started server
# {{bminstall.sh -b -Dorg.jboss.byteman.transform.all $PID}}
# {{bmsubmit.sh reproducer.btm}}, where {{reproducer.btm}} is a Byteman script reproduced below
# {{./bin/jboss-cli.sh -c}}
# {{:read-resource}} repeat few times
# {{:shutdown(timeout=1)}}

The Byteman script:

{code}
RULE XnioWorker.TaskPool/ThreadPoolExecutor shutdown
CLASS java.util.concurrent.ThreadPoolExecutor
METHOD shutdown()
AFTER INVOKE advanceRunState
IF TRUE
DO Thread.sleep(10000)
ENDRULE

RULE Remoting onClose handler
CLASS org.jboss.remoting3.remote.RemoteReadListener$1
METHOD handleEvent(java.nio.channels.Channel)
AT ENTRY
IF TRUE
DO Thread.sleep(5000)
ENDRULE
{code}

> Core management server shutting down a thread pool early, resulting in RejectedExecutionException
> -------------------------------------------------------------------------------------------------
>
>                 Key: WFCORE-756
>                 URL: https://issues.jboss.org/browse/WFCORE-756
>             Project: WildFly Core
>          Issue Type: Bug
>          Components: Domain Management
>            Reporter: Ladislav Thon
>            Assignee: Brian Stansberry
>
> In one of our tests, we've seen this exception during server shutdown:
> {code}
> 2015-06-05 10:34:44,387 ERROR [org.xnio.listener] (XNIO-1 I/O-2) XNIO001007: A channel event listener threw an exception: java.util.concurrent.RejectedExecutionException: Task org.jboss.remoting3.remote.RemoteReadListener$1$1 at 5e7925db rejected from org.xnio.XnioWorker$TaskPool at 3d00aa37[Shutting down, pool size = 7, active threads = 0, queued tasks = 0, completed tasks = 52]
> 	at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
> 	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
> 	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
> 	at org.xnio.XnioWorker.execute(XnioWorker.java:741)
> 	at org.jboss.remoting3.remote.RemoteReadListener$1.handleEvent(RemoteReadListener.java:54)
> 	...
> {code}
> Log file: http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-7x-acceptance-multinode-rhel/jdk=java18_default,label_exp=EAP-RHEL7%20&&%20eap-sustaining/13/artifact/manu-eap-1.1.4/out/checkLogsAfterStopStandaloneFullHaXmlDifferentBindAddress1/artifact/server-checkLogsAfterStopStandaloneFullHaXmlDifferentBindAddress1.log
> This is a very basic test that only starts the server and then shuts it down using {{:shutdown}}.
> After looking into this for a while, I believe that this is caused by the core management server ({{UndertowHttpManagementService}}) shutting down an XNIO worker (= thread pool) while the {{:shutdown}} management operation is still running (or, in fact, finishing, trying to close the network connection).
> I have a Byteman-based reproducer that inserts artifical pauses to certain well-defined places. I'm not sure if this has some connection to the graceful shutdown system, but I believe that even if it does, something like this shouldn't happen.
> Steps to reproduce:
> # {{./bin/standalone.sh -c standalone-full-ha.xml}} and wait until it starts completely
> # {{jps -v | grep "&#92;-D&#92;\[Standalone&#92;\]"}} to figure out the PID of the newly started server
> # {{bminstall.sh -b -Dorg.jboss.byteman.transform.all $PID}}
> # {{bmsubmit.sh reproducer.btm}}, where {{reproducer.btm}} is a Byteman script reproduced below
> # {{./bin/jboss-cli.sh -c}}
> # {{:read-resource}} repeat few times
> # {{:shutdown(timeout=1)}} (or plain {{:shutdown}})
> The Byteman script:
> {code}
> RULE XnioWorker.TaskPool/ThreadPoolExecutor shutdown
> CLASS java.util.concurrent.ThreadPoolExecutor
> METHOD shutdown()
> AFTER INVOKE advanceRunState
> IF TRUE
> DO Thread.sleep(10000)
> ENDRULE
> RULE Remoting onClose handler
> CLASS org.jboss.remoting3.remote.RemoteReadListener$1
> METHOD handleEvent(java.nio.channels.Channel)
> AT ENTRY
> IF TRUE
> DO Thread.sleep(5000)
> ENDRULE
> {code}

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)