[
https://issues.jboss.org/browse/WFLY-5304?page=com.atlassian.jira.plugin....
]
Richard Achmatowicz edited comment on WFLY-5304 at 3/3/16 3:08 PM:
-------------------------------------------------------------------
After discussion with Paul, its should the the case that the three services involved:
- EndpointService (which receives EJB-related messages on the Remoting channel jboss.ejb)
- EJBRemoteConnectorService (which processes these as invocations)
- DeploymentRepository (which allows locating deployments)
should have dependencies in place such that the DeploymentRepository and EndpointService
do not stop before the EJBRemoteConnectorService stops. This seems to be the case when I
instrument the build with some trace logging and introduce delays in stopping:
{noformat}
13:30:54,767 INFO [stdout] (MSC service thread 1-3)
org.jboss.as.remoting.EndpointService(lenovo): stop() called for endpoint
lenovo:MANAGEMENT - begin
13:30:54,768 INFO [stdout] (MSC service thread 1-1)
org.jboss.as.ejb3.remote.EJBRemoteConnectorService(lenovo): stop() called - begin
13:30:54,768 INFO [stdout] (MSC service thread 1-1)
org.jboss.as.ejb3.remote.EJBRemoteConnectorService(lenovo): stop() called - sleeping
13:30:54,769 INFO [stdout] (MSC service thread 1-3)
org.jboss.as.remoting.EndpointService(lenovo): stop() called for endpoint
lenovo:MANAGEMENT - end
13:30:57,768 INFO [stdout] (MSC service thread 1-1)
org.jboss.as.ejb3.remote.EJBRemoteConnectorService(lenovo): stop() called - end
13:30:57,769 INFO [stdout] (MSC service thread 1-3)
org.jboss.as.ejb3.deployment.DeploymentRepository{lenovo): stop() called - begin
13:30:57,769 INFO [stdout] (MSC service thread 1-6)
org.jboss.as.remoting.EndpointService(lenovo): stop() called for endpoint lenovo - begin
13:30:57,770 INFO [stdout] (MSC service thread 1-3)
org.jboss.as.ejb3.deployment.DeploymentRepository(lenovo): stop() called - end
13:30:57,770 INFO [stdout] (MSC service thread 1-6)
org.jboss.as.remoting.EndpointService(lenovo): stop() called for endpoint lenovo - end
{noformat}
So, it seems that it should be the case that the DeploymentRepository should be available
as long as The EJBRemoteConnectorService is available.
However, there seems to be no way of preventing this scenario:
- a message arrives at the Remoting endpont
- the message M gets handled by the VersionOneProtocolChannelReceiver and processing in
that handler starts in a separate thread
- shutdown is called
- EJBRemoteConnectorService calls stop, which unregisters the Endpoint jboss.ejb and the
message handler
- DeploymentRepository calls stop
- the processing of the message M continues
- the DeploymentRepository is no longer available and NPE results
This scenario is not unlikely - in our shutdown tests, shutdown starts when the server is
under load.
In other words, for invocations which have arrived and have started processing but have
not yet reached the interceptor stage, in the presence of a concurrent shutdown request,
there is no mechanism to prevent the three components above from shutting down before the
initial stages of message processing have completed.
ControlPoints are used at the EJB interceptor level as they can (i) block access to
specific EJB interfaces and (ii) count outstanding requests (those which have passed
through the interceptor but have not yet returned). So using a ControlPoint for this issue
here might not be appropriate. Also, during suspend, the DeploymentRepository will always
be available any way.
Here, the problem is that once shutdown has started, the EJB service is no longer
guaranteed to be available.
What we need is a way to check when the server is shutting down, so that when doing the
initial EJB request processing, we can just return null in the lookup of finfdEJB().
was (Author: rachmato):
After discussion with Paul, its should the the case that the three services involved:
- EndpointService (which receives EJB-related messages on the Remoting channel jboss.ejb)
- EJBRemoteConnectorService (which processes these as invocations)
- DeploymentRepository (which allows locating deployments)
should have dependencies in place such that the DeploymentRepository and EndpointService
do not stop before the EJBRemoteConnectorService stops. This seems to be the case when I
instrument the build with some trace logging and introduce delays in stopping:
{noformat}
13:30:54,767 INFO [stdout] (MSC service thread 1-3)
org.jboss.as.remoting.EndpointService(lenovo): stop() called for endpoint
lenovo:MANAGEMENT - begin
13:30:54,768 INFO [stdout] (MSC service thread 1-1)
org.jboss.as.ejb3.remote.EJBRemoteConnectorService(lenovo): stop() called - begin
13:30:54,768 INFO [stdout] (MSC service thread 1-1)
org.jboss.as.ejb3.remote.EJBRemoteConnectorService(lenovo): stop() called - sleeping
13:30:54,769 INFO [stdout] (MSC service thread 1-3)
org.jboss.as.remoting.EndpointService(lenovo): stop() called for endpoint
lenovo:MANAGEMENT - end
13:30:57,768 INFO [stdout] (MSC service thread 1-1)
org.jboss.as.ejb3.remote.EJBRemoteConnectorService(lenovo): stop() called - end
13:30:57,769 INFO [stdout] (MSC service thread 1-3)
org.jboss.as.ejb3.deployment.DeploymentRepository{lenovo): stop() called - begin
13:30:57,769 INFO [stdout] (MSC service thread 1-6)
org.jboss.as.remoting.EndpointService(lenovo): stop() called for endpoint lenovo - begin
13:30:57,770 INFO [stdout] (MSC service thread 1-3)
org.jboss.as.ejb3.deployment.DeploymentRepository(lenovo): stop() called - end
13:30:57,770 INFO [stdout] (MSC service thread 1-6)
org.jboss.as.remoting.EndpointService(lenovo): stop() called for endpoint lenovo - end
{noformat}
So, it seems that it should be the case that the DeploymentRepository should be available
as long as The EJBRemoteConnectorService is available.
However, there seems to be no way of preventing this scenario:
- a message arrives at the Remoting endpont
- the message M gets handled by the VersionOneProtocolChannelReceiver and processing in
that handler starts in a separate thread
- shutdown is called
- EJBRemoteConnectorService calls stop, which unregisters the Endpoint jboss.ejb and the
message handler
- DeploymentRepository calls stop
- the processing of the message M continues
- the DeploymentRepository is no longer available and NPE results
This scenario is not unlikely - in our shutdown tests, shutdown starts when the server is
under load.
In other words, for invocations which have arrived and have started processing but have
not yet reached the interceptor stage, in the presence of a concurrent shutdown request,
there is no mechanism to prevent the three components above from shutting down before the
initial stages of message processing have completed.
In other words, we could probably use a ControlPoint at this point in EJB request
processing rather than later on during interceptor processing. A good spot might be in the
findEJB() method which is the first place the DeploymentRepository is accessed and which
could easily return a null indicating that the EJB service as a whole (and not just one
specific EJB application, as addressed by the suspend interceptor) is not available.
NullPointerException during server graceful shutdown after EJBClient
invocation arrived
---------------------------------------------------------------------------------------
Key: WFLY-5304
URL:
https://issues.jboss.org/browse/WFLY-5304
Project: WildFly
Issue Type: Bug
Components: Clustering, EJB, Remoting
Affects Versions: 10.0.0.Beta1
Reporter: Michal Vinkler
Assignee: Richard Achmatowicz
Priority: Minor
EAP 7.0.0.DR9
Scenario: ejb-ejbremote-shutdown-repl-async
Perf21 logged NPE just before it gracefully shut down:
{code}
[JBossINF] [0m[31m09:02:25,058 ERROR [org.jboss.as.ejb3.remote] (default task-24)
WFLYEJB0148: Exception on channel Channel ID 423b1c54 (inbound) of Remoting connection
41f997cb to /10.16.90.52:54492 from message
org.jboss.remoting3.remote.InboundMessage$3@45c60b2c: java.lang.NullPointerException
[JBossINF] at
org.jboss.as.ejb3.deployment.DeploymentRepository.getStartedModules(DeploymentRepository.java:152)
[JBossINF] at
org.jboss.as.ejb3.remote.protocol.versionone.EJBIdentifierBasedMessageHandler.findEJB(EJBIdentifierBasedMessageHandler.java:46)
[JBossINF] at
org.jboss.as.ejb3.remote.protocol.versionone.MethodInvocationMessageHandler.processMessage(MethodInvocationMessageHandler.java:124)
[JBossINF] at
org.jboss.as.ejb3.remote.protocol.versionone.VersionOneProtocolChannelReceiver.processMessage(VersionOneProtocolChannelReceiver.java:213)
[JBossINF] at
org.jboss.as.ejb3.remote.protocol.versiontwo.VersionTwoProtocolChannelReceiver.processMessage(VersionTwoProtocolChannelReceiver.java:76)
[JBossINF] at
org.jboss.as.ejb3.remote.protocol.versionone.VersionOneProtocolChannelReceiver.handleMessage(VersionOneProtocolChannelReceiver.java:157)
[JBossINF] at
org.jboss.remoting3.remote.RemoteConnectionChannel$5.run(RemoteConnectionChannel.java:463)
[JBossINF] at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[JBossINF] at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[JBossINF] at java.lang.Thread.run(Thread.java:745)
[JBossINF]
[JBossINF] [0m[0m09:02:25,104 INFO [org.infinispan.eviction.impl.PassivationManagerImpl]
(ServerService Thread Pool -- 81) ISPN000029: Passivating all entries to disk
[JBossINF] [0m[0m09:02:25,108 INFO [org.infinispan.eviction.impl.PassivationManagerImpl]
(ServerService Thread Pool -- 81) ISPN000030: Passivated 3 entries in 3 milliseconds
[JBossINF] [0m[0m09:02:25,131 INFO [org.wildfly.extension.undertow] (MSC service thread
1-7) WFLYUT0008: Undertow HTTP listener default suspending
[JBossINF] [0m[0m09:02:25,132 INFO [org.jboss.as.clustering.infinispan] (ServerService
Thread Pool -- 81) WFLYCLINF0003: Stopped repl cache from ejb container
[JBossINF] [0m[0m09:02:25,148 INFO [org.wildfly.extension.undertow] (MSC service thread
1-4) WFLYUT0008: Undertow AJP listener ajp suspending
[JBossINF] [0m[0m09:02:25,151 INFO [org.wildfly.extension.undertow] (MSC service thread
1-7) WFLYUT0007: Undertow HTTP listener default stopped, was bound to
perf21/10.16.90.60:8080
[JBossINF] [0m[0m09:02:25,156 INFO [org.jboss.as.server.deployment] (MSC service thread
1-5) WFLYSRV0208: Stopped subdeployment (runtime-name: clusterbench-ee7-ejb.jar) in
1024ms
[JBossINF] [0m[0m09:02:25,156 INFO [org.jboss.as.server.deployment] (MSC service thread
1-2) WFLYSRV0208: Stopped subdeployment (runtime-name:
clusterbench-ee7-web-passivating.war) in 1055ms
[JBossINF] [0m[0m09:02:25,156 INFO [org.jboss.as.server.deployment] (MSC service thread
1-3) WFLYSRV0208: Stopped subdeployment (runtime-name: clusterbench-ee7-web-default.war)
in 1041ms
[JBossINF] [0m[0m09:02:25,167 INFO
[org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-1)
ISPN000080: Disconnecting JGroups channel ejb
[JBossINF] [0m[0m09:02:25,169 INFO [org.wildfly.extension.undertow] (MSC service thread
1-4) WFLYUT0007: Undertow AJP listener ajp stopped, was bound to perf21/10.16.90.60:8009
[JBossINF] [0m[0m09:02:25,169 INFO [org.jboss.as.server.deployment] (MSC service thread
1-5) WFLYSRV0028: Stopped deployment clusterbench-ee7.ear (runtime-name:
clusterbench-ee7.ear) in 1054ms
[JBossINF] [0m[0m09:02:25,170 INFO
[org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-1)
ISPN000082: Stopping the RpcDispatcher for channel ejb
[JBossINF] [0m[0m09:02:25,171 INFO [org.wildfly.extension.undertow] (MSC service thread
1-3) WFLYUT0004: Undertow 1.3.0.Beta9 stopping
[JBossINF] [0m[0m09:02:25,275 INFO [org.jboss.as] (MSC service thread 1-2) WFLYSRV0050:
EAP 7.0.0.Alpha1 (WildFly Core 2.0.0.Beta4) stopped in 1125ms
{code}
Graceful shutdown timeout was set to 300 seconds.
It does not seem to generate any kind of error on the client.
The similar issue was already mentioned by Richard Achmatowicz here:
https://issues.jboss.org/browse/WFLY-4697?focusedCommentId=13076866&p...
Server log:
http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-7x-failover-ejb-e...
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)