[jboss-jira] [JBoss JIRA] (WFLY-5304) NullPointerException during server graceful shutdown after EJBClient invocation arrived

Fri Mar 4 11:31:00 EST 2016

    [ https://issues.jboss.org/browse/WFLY-5304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172129#comment-13172129 ] 

Richard Achmatowicz edited comment on WFLY-5304 at 3/4/16 11:30 AM:
--------------------------------------------------------------------

Upon reflection, I think I wasn't looking at this issue correctly, focussing too much on one specific issue: what to do when the server is shutting down and we have method invocations still arriving. I think there is a larger issue to deal with.

VersionOneProtocolChannelReceiver handles four classes of incoming EJB related messages:
- method invocations
- session creation requests
- async invocation cancellation requests
- EJB client txn related requests (prepare, commit, rollback, forget, before completion)

This issue is this: how do we respond to incoming requests of all of these types:
- when the server is in process of suspending (in pre-suspend phase)
- when the server has suspended

The first step was to introduce the EjbSuspendInterceptor in the invocation interceptor chain. This causes a NoSuchEJBException to be returned for all invocations arriving after pre-suspend has started. Such a message response tells the EJB client to look for another server to make the invocation on.  This is per-invocation request.

The second step was to introduce a ServerActivity interface for VersionOne. The purpose of this was to allow the server to tell all connected clients when it was suspending (i.e. all EJB deployments on the server are no longer available, go look for another one) and when it was resuming (i.e. all EJB deployments on the server are now again available). The idea was to inform clients so that they would stop directing messages to the  suspended server.

The problems:
- even with the ServerActivity added, requests of all four classes can get sent from the client to the server and start processing, even when the server is actually suspended, due to race condition 
- we don't seem to have a policy in place for what to send back to the client in the case of such requests (other than invocations) when the server is suspended

One possible solution here is to expand the function of the ServerActivity callback in VersionOne to handle more than notification to clients that the server is in pre-suspend or suspended or resumed. For example, it could provide local awareness of when the server was in presuspend, suspend or resume and use that information to handle arriving invocations. In particular, we need to decide what to do in each of these cases:
MethodInvocationMessageHandler (handles invocation processing):
* pre-suspend: currently, send NoSuchEJBException sent to client 
* suspended: currently, send NoSuchEJBEXception to client via EjbSuspendInterceptor but if the server follows suspend by a shutdown this potentially results in the NPE issue described here
* resumed: process as normal

InvocationCancellationMessageHandler (handles cancelling asynchronous invocation task):
* pre-suspend: TBD 
* suspended: TBD
* resumed: TBD

SessionOpenRequestHandler (handles creating stateful session and returning id):
* pre-suspend: TBD 
* suspended: TBD
* resumed: TBD

TransactionRequestHandler (handles transaction requests prepare, commit, rollback, forget, before completion):
* pre-suspend: TBD 
* suspended: TBD
* resumed: TBD

was (Author: rachmato):
Upon reflection, I think I wasn't looking at this issue correctly, focussing too much on one specific issue: what to do when the server is shutting down and we have method invocations still arriving. I think there is a larger issue to deal with.

VersionOneProtocolChannelReceiver handles four classes of incoming EJB related messages:
- method invocations
- session creation requests
- async invocation cancellation requests
- EJB client txn related requests (prepare, commit, rollback, forget, before completion)

This issue is this: how do we respond to incoming requests of all of these types:
- when the server is in process of suspending (in pre-suspend phase)
- when the server has suspended

The first step was to introduce the EjbSuspendInterceptor in the invocation interceptor chain. This causes a NoSuchEJBException to be returned for all invocations arriving after pre-suspend has started. Such a message response tells the EJB client to look for another server to make the invocation on.  This is per-invocation request.

The second step was to introduce a ServerActivity interface for VersionOne. The purpose of this was to allow the server to tell all connected clients when it was suspending (i.e. all EJB deployments on the server are no longer available, go look for another one) and when it was resuming (i.e. all EJB deployments on the server are now again available). The idea was to inform clients so that they would stop directing messages to the  suspended server.

The problems:
- even with the ServerActivity added, requests of all four classes can get sent from the client to the server, even when the server is actually suspended, due to race condition 
- we don't seem to have a policy in place for what to send back to the client in the case of such requests (other than invocations) when the server is suspended

One possible solution here is to expand the function of the ServerActivity callback in VersionOne to handle more than notification to clients that the server is in pre-suspend or suspended or resumed. For example, it could provide local awareness of when the server was in presuspend, suspend or resume and use that information to handle arriving invocations. In particular, we need to decide what to do in each of these cases:
MethodInvocationMessageHandler (handles invocation processing):
* pre-suspend: currently, send NoSuchEJBException sent to client 
* suspended: currently, send NoSuchEJBEXception to client via EjbSuspendInterceptor but if the server follows suspend by a shutdown this potentially results in the NPE issue described here
* resumed: process as normal

InvocationCancellationMessageHandler (handles cancelling asynchronous invocation task):
* pre-suspend: TBD 
* suspended: TBD
* resumed: TBD

SessionOpenRequestHandler (handles creating stateful session and returning id):
* pre-suspend: TBD 
* suspended: TBD
* resumed: TBD

TransactionRequestHandler (handles transaction requests prepare, commit, rollback, forget, before completion):
* pre-suspend: TBD 
* suspended: TBD
* resumed: TBD

> NullPointerException during server graceful shutdown after EJBClient invocation arrived
> ---------------------------------------------------------------------------------------
>
>                 Key: WFLY-5304
>                 URL: https://issues.jboss.org/browse/WFLY-5304
>             Project: WildFly
>          Issue Type: Bug
>          Components: Clustering, EJB, Remoting
>    Affects Versions: 10.0.0.Beta1
>            Reporter: Michal Vinkler
>            Assignee: Richard Achmatowicz
>            Priority: Minor
>
> EAP 7.0.0.DR9
> Scenario: ejb-ejbremote-shutdown-repl-async
> Perf21 logged NPE just before it gracefully shut down:
> {code}
> [JBossINF] [0m[31m09:02:25,058 ERROR [org.jboss.as.ejb3.remote] (default task-24) WFLYEJB0148: Exception on channel Channel ID 423b1c54 (inbound) of Remoting connection 41f997cb to /10.16.90.52:54492 from message org.jboss.remoting3.remote.InboundMessage$3 at 45c60b2c: java.lang.NullPointerException
> [JBossINF] 	at org.jboss.as.ejb3.deployment.DeploymentRepository.getStartedModules(DeploymentRepository.java:152)
> [JBossINF] 	at org.jboss.as.ejb3.remote.protocol.versionone.EJBIdentifierBasedMessageHandler.findEJB(EJBIdentifierBasedMessageHandler.java:46)
> [JBossINF] 	at org.jboss.as.ejb3.remote.protocol.versionone.MethodInvocationMessageHandler.processMessage(MethodInvocationMessageHandler.java:124)
> [JBossINF] 	at org.jboss.as.ejb3.remote.protocol.versionone.VersionOneProtocolChannelReceiver.processMessage(VersionOneProtocolChannelReceiver.java:213)
> [JBossINF] 	at org.jboss.as.ejb3.remote.protocol.versiontwo.VersionTwoProtocolChannelReceiver.processMessage(VersionTwoProtocolChannelReceiver.java:76)
> [JBossINF] 	at org.jboss.as.ejb3.remote.protocol.versionone.VersionOneProtocolChannelReceiver.handleMessage(VersionOneProtocolChannelReceiver.java:157)
> [JBossINF] 	at org.jboss.remoting3.remote.RemoteConnectionChannel$5.run(RemoteConnectionChannel.java:463)
> [JBossINF] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [JBossINF] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [JBossINF] 	at java.lang.Thread.run(Thread.java:745)
> [JBossINF] 
> [JBossINF] [0m[0m09:02:25,104 INFO  [org.infinispan.eviction.impl.PassivationManagerImpl] (ServerService Thread Pool -- 81) ISPN000029: Passivating all entries to disk
> [JBossINF] [0m[0m09:02:25,108 INFO  [org.infinispan.eviction.impl.PassivationManagerImpl] (ServerService Thread Pool -- 81) ISPN000030: Passivated 3 entries in 3 milliseconds
> [JBossINF] [0m[0m09:02:25,131 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-7) WFLYUT0008: Undertow HTTP listener default suspending
> [JBossINF] [0m[0m09:02:25,132 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 81) WFLYCLINF0003: Stopped repl cache from ejb container
> [JBossINF] [0m[0m09:02:25,148 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-4) WFLYUT0008: Undertow AJP listener ajp suspending
> [JBossINF] [0m[0m09:02:25,151 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-7) WFLYUT0007: Undertow HTTP listener default stopped, was bound to perf21/10.16.90.60:8080
> [JBossINF] [0m[0m09:02:25,156 INFO  [org.jboss.as.server.deployment] (MSC service thread 1-5) WFLYSRV0208: Stopped subdeployment (runtime-name: clusterbench-ee7-ejb.jar) in 1024ms
> [JBossINF] [0m[0m09:02:25,156 INFO  [org.jboss.as.server.deployment] (MSC service thread 1-2) WFLYSRV0208: Stopped subdeployment (runtime-name: clusterbench-ee7-web-passivating.war) in 1055ms
> [JBossINF] [0m[0m09:02:25,156 INFO  [org.jboss.as.server.deployment] (MSC service thread 1-3) WFLYSRV0208: Stopped subdeployment (runtime-name: clusterbench-ee7-web-default.war) in 1041ms
> [JBossINF] [0m[0m09:02:25,167 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-1) ISPN000080: Disconnecting JGroups channel ejb
> [JBossINF] [0m[0m09:02:25,169 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-4) WFLYUT0007: Undertow AJP listener ajp stopped, was bound to perf21/10.16.90.60:8009
> [JBossINF] [0m[0m09:02:25,169 INFO  [org.jboss.as.server.deployment] (MSC service thread 1-5) WFLYSRV0028: Stopped deployment clusterbench-ee7.ear (runtime-name: clusterbench-ee7.ear) in 1054ms
> [JBossINF] [0m[0m09:02:25,170 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-1) ISPN000082: Stopping the RpcDispatcher for channel ejb
> [JBossINF] [0m[0m09:02:25,171 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-3) WFLYUT0004: Undertow 1.3.0.Beta9 stopping
> [JBossINF] [0m[0m09:02:25,275 INFO  [org.jboss.as] (MSC service thread 1-2) WFLYSRV0050: EAP 7.0.0.Alpha1 (WildFly Core 2.0.0.Beta4) stopped in 1125ms
> {code}
> Graceful shutdown timeout was set to 300 seconds. 
> It does not seem to generate any kind of error on the client.
> The similar issue was already mentioned by Richard Achmatowicz here:
> https://issues.jboss.org/browse/WFLY-4697?focusedCommentId=13076866&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13076866
> Server log:
> http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-7x-failover-ejb-ejbremote-shutdown-repl-async/6/console-perf21/

--
This message was sent by Atlassian JIRA
(v6.4.11#64026)