[jboss-jira] [JBoss JIRA] (WFLY-5304) NullPointerException during server graceful shutdown after EJBClient invocation arrived

Richard Achmatowicz (JIRA) issues at jboss.org
Thu Mar 3 15:09:00 EST 2016


    [ https://issues.jboss.org/browse/WFLY-5304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171633#comment-13171633 ] 

Richard Achmatowicz edited comment on WFLY-5304 at 3/3/16 3:08 PM:
-------------------------------------------------------------------

After discussion with Paul, its should the the case that the three services involved:
- EndpointService (which receives EJB-related messages on the Remoting channel jboss.ejb) 
- EJBRemoteConnectorService (which processes these as invocations)
- DeploymentRepository (which allows locating deployments)


should have dependencies in place such that the DeploymentRepository and  EndpointService do not stop before the EJBRemoteConnectorService stops. This seems to be the case when I instrument the build with some trace logging and introduce delays in stopping:

{noformat}
13:30:54,767 INFO  [stdout] (MSC service thread 1-3) org.jboss.as.remoting.EndpointService(lenovo): stop() called  for endpoint lenovo:MANAGEMENT - begin
13:30:54,768 INFO  [stdout] (MSC service thread 1-1) org.jboss.as.ejb3.remote.EJBRemoteConnectorService(lenovo): stop() called - begin 
13:30:54,768 INFO  [stdout] (MSC service thread 1-1) org.jboss.as.ejb3.remote.EJBRemoteConnectorService(lenovo): stop() called - sleeping 
13:30:54,769 INFO  [stdout] (MSC service thread 1-3) org.jboss.as.remoting.EndpointService(lenovo): stop() called for endpoint lenovo:MANAGEMENT - end
13:30:57,768 INFO  [stdout] (MSC service thread 1-1) org.jboss.as.ejb3.remote.EJBRemoteConnectorService(lenovo): stop() called - end 
13:30:57,769 INFO  [stdout] (MSC service thread 1-3) org.jboss.as.ejb3.deployment.DeploymentRepository{lenovo): stop() called - begin 
13:30:57,769 INFO  [stdout] (MSC service thread 1-6) org.jboss.as.remoting.EndpointService(lenovo): stop() called  for endpoint lenovo - begin
13:30:57,770 INFO  [stdout] (MSC service thread 1-3) org.jboss.as.ejb3.deployment.DeploymentRepository(lenovo): stop() called - end 
13:30:57,770 INFO  [stdout] (MSC service thread 1-6) org.jboss.as.remoting.EndpointService(lenovo): stop() called for endpoint lenovo - end
{noformat}

So, it seems that it should be the case that the DeploymentRepository should be available as long as The EJBRemoteConnectorService is available.
  
However, there seems to be no way of preventing this scenario:
- a message arrives at the Remoting endpont
- the message M gets handled by the VersionOneProtocolChannelReceiver and processing in that handler starts in a separate thread
- shutdown is called
- EJBRemoteConnectorService calls stop, which unregisters the Endpoint jboss.ejb and the message handler
- DeploymentRepository calls stop
- the processing of the message M continues
- the DeploymentRepository is no longer available and NPE results

This scenario is not unlikely - in our shutdown tests, shutdown starts when the server is under load. 

In other words, for invocations which have arrived and have started processing but have not yet reached the interceptor stage, in the presence of a concurrent shutdown request, there is no mechanism to prevent the three components above from shutting down before the initial stages of message processing have completed.

ControlPoints are used at the EJB interceptor level as they can (i) block access to specific EJB interfaces and (ii) count outstanding requests (those which have passed through the interceptor but have not yet returned). So using a ControlPoint for this issue here might not be appropriate. Also, during suspend, the DeploymentRepository will always be available any way.

Here, the problem is that once shutdown has started, the EJB service is no longer guaranteed to be available.
 
What we need is a way to check when the server is shutting down, so that when doing the initial EJB request processing, we can just return null in the lookup of finfdEJB().



was (Author: rachmato):
After discussion with Paul, its should the the case that the three services involved:
- EndpointService (which receives EJB-related messages on the Remoting channel jboss.ejb) 
- EJBRemoteConnectorService (which processes these as invocations)
- DeploymentRepository (which allows locating deployments)


should have dependencies in place such that the DeploymentRepository and  EndpointService do not stop before the EJBRemoteConnectorService stops. This seems to be the case when I instrument the build with some trace logging and introduce delays in stopping:

{noformat}
13:30:54,767 INFO  [stdout] (MSC service thread 1-3) org.jboss.as.remoting.EndpointService(lenovo): stop() called  for endpoint lenovo:MANAGEMENT - begin
13:30:54,768 INFO  [stdout] (MSC service thread 1-1) org.jboss.as.ejb3.remote.EJBRemoteConnectorService(lenovo): stop() called - begin 
13:30:54,768 INFO  [stdout] (MSC service thread 1-1) org.jboss.as.ejb3.remote.EJBRemoteConnectorService(lenovo): stop() called - sleeping 
13:30:54,769 INFO  [stdout] (MSC service thread 1-3) org.jboss.as.remoting.EndpointService(lenovo): stop() called for endpoint lenovo:MANAGEMENT - end
13:30:57,768 INFO  [stdout] (MSC service thread 1-1) org.jboss.as.ejb3.remote.EJBRemoteConnectorService(lenovo): stop() called - end 
13:30:57,769 INFO  [stdout] (MSC service thread 1-3) org.jboss.as.ejb3.deployment.DeploymentRepository{lenovo): stop() called - begin 
13:30:57,769 INFO  [stdout] (MSC service thread 1-6) org.jboss.as.remoting.EndpointService(lenovo): stop() called  for endpoint lenovo - begin
13:30:57,770 INFO  [stdout] (MSC service thread 1-3) org.jboss.as.ejb3.deployment.DeploymentRepository(lenovo): stop() called - end 
13:30:57,770 INFO  [stdout] (MSC service thread 1-6) org.jboss.as.remoting.EndpointService(lenovo): stop() called for endpoint lenovo - end
{noformat}

So, it seems that it should be the case that the DeploymentRepository should be available as long as The EJBRemoteConnectorService is available.
  
However, there seems to be no way of preventing this scenario:
- a message arrives at the Remoting endpont
- the message M gets handled by the VersionOneProtocolChannelReceiver and processing in that handler starts in a separate thread
- shutdown is called
- EJBRemoteConnectorService calls stop, which unregisters the Endpoint jboss.ejb and the message handler
- DeploymentRepository calls stop
- the processing of the message M continues
- the DeploymentRepository is no longer available and NPE results

This scenario is not unlikely - in our shutdown tests, shutdown starts when the server is under load. 

In other words, for invocations which have arrived and have started processing but have not yet reached the interceptor stage, in the presence of a concurrent shutdown request, there is no mechanism to prevent the three components above from shutting down before the initial stages of message processing have completed.

In other words, we could probably use a ControlPoint at this point in EJB request processing rather than later on during interceptor processing. A good spot might be in the findEJB() method which is the first place the DeploymentRepository is accessed and which could easily return a null indicating that the EJB service as a whole (and not just one specific EJB application, as addressed by the suspend interceptor) is not available.



> NullPointerException during server graceful shutdown after EJBClient invocation arrived
> ---------------------------------------------------------------------------------------
>
>                 Key: WFLY-5304
>                 URL: https://issues.jboss.org/browse/WFLY-5304
>             Project: WildFly
>          Issue Type: Bug
>          Components: Clustering, EJB, Remoting
>    Affects Versions: 10.0.0.Beta1
>            Reporter: Michal Vinkler
>            Assignee: Richard Achmatowicz
>            Priority: Minor
>
> EAP 7.0.0.DR9
> Scenario: ejb-ejbremote-shutdown-repl-async
> Perf21 logged NPE just before it gracefully shut down:
> {code}
> [JBossINF] [0m[31m09:02:25,058 ERROR [org.jboss.as.ejb3.remote] (default task-24) WFLYEJB0148: Exception on channel Channel ID 423b1c54 (inbound) of Remoting connection 41f997cb to /10.16.90.52:54492 from message org.jboss.remoting3.remote.InboundMessage$3 at 45c60b2c: java.lang.NullPointerException
> [JBossINF] 	at org.jboss.as.ejb3.deployment.DeploymentRepository.getStartedModules(DeploymentRepository.java:152)
> [JBossINF] 	at org.jboss.as.ejb3.remote.protocol.versionone.EJBIdentifierBasedMessageHandler.findEJB(EJBIdentifierBasedMessageHandler.java:46)
> [JBossINF] 	at org.jboss.as.ejb3.remote.protocol.versionone.MethodInvocationMessageHandler.processMessage(MethodInvocationMessageHandler.java:124)
> [JBossINF] 	at org.jboss.as.ejb3.remote.protocol.versionone.VersionOneProtocolChannelReceiver.processMessage(VersionOneProtocolChannelReceiver.java:213)
> [JBossINF] 	at org.jboss.as.ejb3.remote.protocol.versiontwo.VersionTwoProtocolChannelReceiver.processMessage(VersionTwoProtocolChannelReceiver.java:76)
> [JBossINF] 	at org.jboss.as.ejb3.remote.protocol.versionone.VersionOneProtocolChannelReceiver.handleMessage(VersionOneProtocolChannelReceiver.java:157)
> [JBossINF] 	at org.jboss.remoting3.remote.RemoteConnectionChannel$5.run(RemoteConnectionChannel.java:463)
> [JBossINF] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [JBossINF] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [JBossINF] 	at java.lang.Thread.run(Thread.java:745)
> [JBossINF] 
> [JBossINF] [0m[0m09:02:25,104 INFO  [org.infinispan.eviction.impl.PassivationManagerImpl] (ServerService Thread Pool -- 81) ISPN000029: Passivating all entries to disk
> [JBossINF] [0m[0m09:02:25,108 INFO  [org.infinispan.eviction.impl.PassivationManagerImpl] (ServerService Thread Pool -- 81) ISPN000030: Passivated 3 entries in 3 milliseconds
> [JBossINF] [0m[0m09:02:25,131 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-7) WFLYUT0008: Undertow HTTP listener default suspending
> [JBossINF] [0m[0m09:02:25,132 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 81) WFLYCLINF0003: Stopped repl cache from ejb container
> [JBossINF] [0m[0m09:02:25,148 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-4) WFLYUT0008: Undertow AJP listener ajp suspending
> [JBossINF] [0m[0m09:02:25,151 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-7) WFLYUT0007: Undertow HTTP listener default stopped, was bound to perf21/10.16.90.60:8080
> [JBossINF] [0m[0m09:02:25,156 INFO  [org.jboss.as.server.deployment] (MSC service thread 1-5) WFLYSRV0208: Stopped subdeployment (runtime-name: clusterbench-ee7-ejb.jar) in 1024ms
> [JBossINF] [0m[0m09:02:25,156 INFO  [org.jboss.as.server.deployment] (MSC service thread 1-2) WFLYSRV0208: Stopped subdeployment (runtime-name: clusterbench-ee7-web-passivating.war) in 1055ms
> [JBossINF] [0m[0m09:02:25,156 INFO  [org.jboss.as.server.deployment] (MSC service thread 1-3) WFLYSRV0208: Stopped subdeployment (runtime-name: clusterbench-ee7-web-default.war) in 1041ms
> [JBossINF] [0m[0m09:02:25,167 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-1) ISPN000080: Disconnecting JGroups channel ejb
> [JBossINF] [0m[0m09:02:25,169 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-4) WFLYUT0007: Undertow AJP listener ajp stopped, was bound to perf21/10.16.90.60:8009
> [JBossINF] [0m[0m09:02:25,169 INFO  [org.jboss.as.server.deployment] (MSC service thread 1-5) WFLYSRV0028: Stopped deployment clusterbench-ee7.ear (runtime-name: clusterbench-ee7.ear) in 1054ms
> [JBossINF] [0m[0m09:02:25,170 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-1) ISPN000082: Stopping the RpcDispatcher for channel ejb
> [JBossINF] [0m[0m09:02:25,171 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-3) WFLYUT0004: Undertow 1.3.0.Beta9 stopping
> [JBossINF] [0m[0m09:02:25,275 INFO  [org.jboss.as] (MSC service thread 1-2) WFLYSRV0050: EAP 7.0.0.Alpha1 (WildFly Core 2.0.0.Beta4) stopped in 1125ms
> {code}
> Graceful shutdown timeout was set to 300 seconds. 
> It does not seem to generate any kind of error on the client.
> The similar issue was already mentioned by Richard Achmatowicz here:
> https://issues.jboss.org/browse/WFLY-4697?focusedCommentId=13076866&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13076866
> Server log:
> http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-7x-failover-ejb-ejbremote-shutdown-repl-async/6/console-perf21/



--
This message was sent by Atlassian JIRA
(v6.4.11#64026)


More information about the jboss-jira mailing list