[jboss-jira] [JBoss JIRA] (WFCORE-756) Core management server shutting down a thread pool early, resulting in RejectedExecutionException

Thu Jun 11 07:23:06 EDT 2015

Ladislav Thon created WFCORE-756:
------------------------------------

             Summary: Core management server shutting down a thread pool early, resulting in RejectedExecutionException
                 Key: WFCORE-756
                 URL: https://issues.jboss.org/browse/WFCORE-756
             Project: WildFly Core
          Issue Type: Bug
          Components: Domain Management
            Reporter: Ladislav Thon
            Assignee: Brian Stansberry

In one of our tests, we've seen this exception during server shutdown:

{code}
2015-06-05 10:34:44,387 ERROR [org.xnio.listener] (XNIO-1 I/O-2) XNIO001007: A channel event listener threw an exception: java.util.concurrent.RejectedExecutionException: Task org.jboss.remoting3.remote.RemoteReadListener$1$1 at 5e7925db rejected from org.xnio.XnioWorker$TaskPool at 3d00aa37[Shutting down, pool size = 7, active threads = 0, queued tasks = 0, completed tasks = 52]
	at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
	at org.xnio.XnioWorker.execute(XnioWorker.java:741)
	at org.jboss.remoting3.remote.RemoteReadListener$1.handleEvent(RemoteReadListener.java:54)
	...
{code}

Log file: http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-7x-acceptance-multinode-rhel/jdk=java18_default,label_exp=EAP-RHEL7%20&&%20eap-sustaining/13/artifact/manu-eap-1.1.4/out/checkLogsAfterStopStandaloneFullHaXmlDifferentBindAddress1/artifact/server-checkLogsAfterStopStandaloneFullHaXmlDifferentBindAddress1.log

This is a very basic test that only starts the server and then shuts it down using {{:shutdown}}.

After looking into this for a while, I believe that this is caused by the core management server ({{UndertowHttpManagementService}}) shutting down an XNIO worker (= thread pool) while the {{:shutdown}} management operation is still running (or, in fact, finishing, trying to close the network connection).

I have a Byteman-based reproducer that inserts artifical pauses to certain well-defined places. I'm not sure if this has some connection to the graceful shutdown system, but I believe that even if it does, something like this shouldn't happen.

Steps to reproduce:

# {{./bin/standalone.sh -c standalone-full-ha.xml}} and wait until it starts completely
# {{jps -v | grep "&#92;-D&#92;\[Standalone&#92;\]"}} to figure out the PID of the newly started server
# {{bminstall.sh -b -Dorg.jboss.byteman.transform.all $PID}}
# {{bmsubmit.sh reproducer.btm}}, where {{reproducer.btm}} is a Byteman script reproduced below
# {{./bin/jboss-cli.sh -c}}
# {{:read-resource}} repeat few times
# {{:shutdown(timeout=1)}}

The Byteman script:

{code}
RULE XnioWorker.TaskPool/ThreadPoolExecutor shutdown
CLASS java.util.concurrent.ThreadPoolExecutor
METHOD shutdown()
AFTER INVOKE advanceRunState
IF TRUE
DO Thread.sleep(10000)
ENDRULE

RULE Remoting onClose handler
CLASS org.jboss.remoting3.remote.RemoteReadListener$1
METHOD handleEvent(java.nio.channels.Channel)
AT ENTRY
IF TRUE
DO Thread.sleep(5000)
ENDRULE
{code}

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)