]
Richard Achmatowicz commented on WFLY-11292:
--------------------------------------------
On the surface, what seems to be happening here is that the EJBClient is trying to find a
target for an invocation (presumably an invocation on a clustered deployment) and for some
reason it finds no nodes in its node database which can satisfy the invocation. This
triggers a re-connection attempt where it will try to reconnect to all previously
registered nodes (i.e. host/port instances). The attempt to reconnect to perf18 is failing
with a Remoting exception "org.jboss.remoting3.NotOpenException: Cannot open new
channel because close was initiated". Quickly looking at the code suggests that the
state machine that manages calls to openConnection has detected that both open and close
are being called on the connection and so it throws the exception. This seems to be a low
level issue and may be a red herring.
There should be no reason why the EJBClient should not be able to find a target for a
clustered invocation when three of the four nodes are still up and running and have the
deployment deployed. We need to look at the topology updates received from the various
servers to see how the "node database" is being updated/modified on the client
as a first step.
We can see the topology updates by enabling debug logging for these classes:
org.jboss.ejb.client.remoting.ClusterTopologyMessageHandler
org.jboss.ejb.client.remoting.ClusterNodeRemovalHandler
org.jboss.ejb.client.remoting.ClusterRemovalMessageHandler
org.jboss.ejb.client.remoting.ModuleAvailabilityMessageHandler
[~tommaso-borgato] Please enable DEBUG logging for these classes and run the test again.
If the error is reproducible with a fewer number of clients (e.g. 5 instead of 2000) then
please run with fewer clients. Otherwise, the logs will be very big as each client will
advertise its topology updates.
Legacy EJB Client: High fail rate
---------------------------------
Key: WFLY-11292
URL:
https://issues.jboss.org/browse/WFLY-11292
Project: WildFly
Issue Type: Bug
Components: Clustering, EJB
Affects Versions: 15.0.0.Alpha1
Reporter: tommaso borgato
Assignee: Flavia Rainone
Priority: Blocker
This bug is being filed as Blocker because we are observing and elevated fail rate:
roughly a thousand each run for (about 0.3%).
h2. WildFly Built from master branch on 6 Nov 2018
With this WildFly version (client org.jboss:jboss-ejb-client-legacy:3.0.2.Final-redhat-1)
in a scenario with 4 clustered nodes where nodes are failed via jboss shut-down / restart:
after node 1 of 4 is shut-down, a a series of errors start on the client side the yield to
1097 errors on a total of 340218 samples;
find
[
here|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/...]
the complete logs;
the start of client errors is here:
{noformat}
2018/11/07 05:28:51:255 EST [INFO ][TestController] HOST
perf17.mw.lab.eng.bos.redhat.com:rootProcess:c - Failing node 0 (perf18)
2018/11/07 05:28:51:270 EST [INFO ][StatsRunner] HOST
perf17.mw.lab.eng.bos.redhat.com:rootProcess:c - Total: Sessions: 2000, active: 2000,
samples: 5118, throughput: 511.7 samples/s, bandwidth: 0.0 MB/s, response min: 1 ms, mean:
1 ms, max: 54 ms, sampling errors: 0, unhealthy samples: 0, valid samples: 5118 (100%)
2018/11/07 05:28:51:270 EST [INFO ][StatsRunner] HOST
perf17.mw.lab.eng.bos.redhat.com:rootProcess:c - unknown-node: Sessions: 2000, active:
2000, samples: 5118, throughput: 511.7 samples/s, bandwidth: 0.0 MB/s, response min: 1 ms,
mean: 1 ms, max: 54 ms, sampling errors: 0, unhealthy samples: 0, valid samples: 5118
(100%)
05:28:53,257 INFO [org.jboss.ejb.client.remoting] (Remoting
"config-based-ejb-client-endpoint" task-2) EJBCLIENT000016: Channel Channel ID
c540daf7 (outbound) of Remoting connection 4ed5523c to perf18/10.16.90.54:8080 of endpoint
"config-based-ejb-client-endpoint" <664ac6d5> can no longer process
messages
05:28:53,277 ERROR [org.jboss.ejb.client.remoting.RemotingConnectionEJBReceiver]
(Remoting "config-based-ejb-client-endpoint" task-2) Failed to open channel for
context EJBReceiverContext{clientContext=org.jboss.ejb.client.EJBClientContext@3cd51445,
receiver=Remoting connection EJB receiver
[connection=org.jboss.ejb.client.remoting.ConnectionPool$PooledConnection(a)302066a5,channel=jboss.ejb,nodename=perf18]}
org.jboss.remoting3.NotOpenException: Cannot open new channel because close was
initiated
at
org.jboss.remoting3.remote.RemoteConnectionHandler.handleOutboundChannelOpen(RemoteConnectionHandler.java:198)
at
org.jboss.remoting3.remote.RemoteConnectionHandler.open(RemoteConnectionHandler.java:335)
at org.jboss.remoting3.ConnectionImpl.openChannel(ConnectionImpl.java:109)
at
org.jboss.ejb.client.remoting.ConnectionPool$PooledConnection.openChannel(ConnectionPool.java:292)
at
org.jboss.ejb.client.remoting.RemotingConnectionEJBReceiver.associate(RemotingConnectionEJBReceiver.java:180)
at org.jboss.ejb.client.EJBClientContext.registerEJBReceiver(EJBClientContext.java:399)
at org.jboss.ejb.client.EJBClientContext.registerEJBReceiver(EJBClientContext.java:349)
at
org.jboss.ejb.client.remoting.EJBClientContextConnectionReconnectHandler.reconnect(EJBClientContextConnectionReconnectHandler.java:67)
at
org.jboss.ejb.client.EJBClientContext$ReconnectAttempt.run(EJBClientContext.java:1474)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{noformat}
h2. WildFly Built from master branch on 7 Nov 2018
Same situation.
find
[
here|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/...]
the complete logs;
the start of client errors is here:
{noformat}
2018/11/07 09:23:22:916 EST [INFO ][TestController] HOST
perf17.mw.lab.eng.bos.redhat.com:rootProcess:c - Failing node 0 (perf18)
2018/11/07 09:23:22:930 EST [INFO ][StatsRunner] HOST
perf17.mw.lab.eng.bos.redhat.com:rootProcess:c - Total: Sessions: 2000, active: 2000,
samples: 5125, throughput: 512.4 samples/s, bandwidth: 0.0 MB/s, response min: 1 ms, mean:
2 ms, max: 32 ms, sampling errors: 0, unhealthy samples: 0, valid samples: 5125 (100%)
2018/11/07 09:23:22:930 EST [INFO ][StatsRunner] HOST
perf17.mw.lab.eng.bos.redhat.com:rootProcess:c - unknown-node: Sessions: 2000, active:
2000, samples: 5125, throughput: 512.4 samples/s, bandwidth: 0.0 MB/s, response min: 1 ms,
mean: 2 ms, max: 32 ms, sampling errors: 0, unhealthy samples: 0, valid samples: 5125
(100%)
09:23:24,924 INFO [org.jboss.ejb.client.remoting] (Remoting
"config-based-ejb-client-endpoint" task-4) EJBCLIENT000016: Channel Channel ID
d940c707 (outbound) of Remoting connection 73cfdf01 to perf18/10.16.90.54:8080 of endpoint
"config-based-ejb-client-endpoint" <7501616b> can no longer process
messages
09:23:24,949 ERROR [org.jboss.ejb.client.remoting.RemotingConnectionEJBReceiver]
(Remoting "config-based-ejb-client-endpoint" task-6) Failed to open channel for
context EJBReceiverContext{clientContext=org.jboss.ejb.client.EJBClientContext@56248025,
receiver=Remoting connection EJB receiver
[connection=org.jboss.ejb.client.remoting.ConnectionPool$PooledConnection(a)5269f48d,channel=jboss.ejb,nodename=perf18]}
org.jboss.remoting3.NotOpenException: Cannot open new channel because close was
initiated
at
org.jboss.remoting3.remote.RemoteConnectionHandler.handleOutboundChannelOpen(RemoteConnectionHandler.java:198)
at
org.jboss.remoting3.remote.RemoteConnectionHandler.open(RemoteConnectionHandler.java:335)
at org.jboss.remoting3.ConnectionImpl.openChannel(ConnectionImpl.java:109)
at
org.jboss.ejb.client.remoting.ConnectionPool$PooledConnection.openChannel(ConnectionPool.java:292)
at
org.jboss.ejb.client.remoting.RemotingConnectionEJBReceiver.associate(RemotingConnectionEJBReceiver.java:180)
at org.jboss.ejb.client.EJBClientContext.registerEJBReceiver(EJBClientContext.java:399)
at org.jboss.ejb.client.EJBClientContext.registerEJBReceiver(EJBClientContext.java:349)
at
org.jboss.ejb.client.remoting.EJBClientContextConnectionReconnectHandler.reconnect(EJBClientContextConnectionReconnectHandler.java:67)
at
org.jboss.ejb.client.EJBClientContext$ReconnectAttempt.run(EJBClientContext.java:1474)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{noformat}