[jboss-jira] [JBoss JIRA] (WFLY-11292) Legacy EJB Client: High fail rate
Richard Achmatowicz (Jira)
issues at jboss.org
Wed Nov 7 11:59:00 EST 2018
[ https://issues.jboss.org/browse/WFLY-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658374#comment-13658374 ]
Richard Achmatowicz commented on WFLY-11292:
--------------------------------------------
On the surface, what seems to be happening here is that the EJBClient is trying to find a target for an invocation (presumably an invocation on a clustered deployment) and for some reason it finds no nodes in its node database which can satisfy the invocation. This triggers a re-connection attempt where it will try to reconnect to all previously registered nodes (i.e. host/port instances). The attempt to reconnect to perf18 is failing with a Remoting exception "org.jboss.remoting3.NotOpenException: Cannot open new channel because close was initiated". Quickly looking at the code suggests that the state machine that manages calls to openConnection has detected that both open and close are being called on the connection and so it throws the exception. This seems to be a low level issue and may be a red herring.
There should be no reason why the EJBClient should not be able to find a target for a clustered invocation when three of the four nodes are still up and running and have the deployment deployed. We need to look at the topology updates received from the various servers to see how the "node database" is being updated/modified on the client as a first step.
We can see the topology updates by enabling debug logging for these classes:
org.jboss.ejb.client.remoting.ClusterTopologyMessageHandler
org.jboss.ejb.client.remoting.ClusterNodeRemovalHandler
org.jboss.ejb.client.remoting.ClusterRemovalMessageHandler
org.jboss.ejb.client.remoting.ModuleAvailabilityMessageHandler
[~tommaso-borgato] Please enable DEBUG logging for these classes and run the test again.
If the error is reproducible with a fewer number of clients (e.g. 5 instead of 2000) then please run with fewer clients. Otherwise, the logs will be very big as each client will advertise its topology updates.
> Legacy EJB Client: High fail rate
> ---------------------------------
>
> Key: WFLY-11292
> URL: https://issues.jboss.org/browse/WFLY-11292
> Project: WildFly
> Issue Type: Bug
> Components: Clustering, EJB
> Affects Versions: 15.0.0.Alpha1
> Reporter: tommaso borgato
> Assignee: Flavia Rainone
> Priority: Blocker
>
> This bug is being filed as Blocker because we are observing and elevated fail rate: roughly a thousand each run for (about 0.3%).
> h2. WildFly Built from master branch on 6 Nov 2018
> With this WildFly version (client org.jboss:jboss-ejb-client-legacy:3.0.2.Final-redhat-1) in a scenario with 4 clustered nodes where nodes are failed via jboss shut-down / restart: after node 1 of 4 is shut-down, a a series of errors start on the client side the yield to 1097 errors on a total of 340218 samples;
> find [here|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-jobs-legacy-ejb-ejbremote/job/perflab_eap-7x-failover-legacy-ejb-ejbremote-shutdown-dist-sync/8/] the complete logs;
> the start of client errors is here:
> {noformat}
> 2018/11/07 05:28:51:255 EST [INFO ][TestController] HOST perf17.mw.lab.eng.bos.redhat.com:rootProcess:c - Failing node 0 (perf18)
> 2018/11/07 05:28:51:270 EST [INFO ][StatsRunner] HOST perf17.mw.lab.eng.bos.redhat.com:rootProcess:c - Total: Sessions: 2000, active: 2000, samples: 5118, throughput: 511.7 samples/s, bandwidth: 0.0 MB/s, response min: 1 ms, mean: 1 ms, max: 54 ms, sampling errors: 0, unhealthy samples: 0, valid samples: 5118 (100%)
> 2018/11/07 05:28:51:270 EST [INFO ][StatsRunner] HOST perf17.mw.lab.eng.bos.redhat.com:rootProcess:c - unknown-node: Sessions: 2000, active: 2000, samples: 5118, throughput: 511.7 samples/s, bandwidth: 0.0 MB/s, response min: 1 ms, mean: 1 ms, max: 54 ms, sampling errors: 0, unhealthy samples: 0, valid samples: 5118 (100%)
> 05:28:53,257 INFO [org.jboss.ejb.client.remoting] (Remoting "config-based-ejb-client-endpoint" task-2) EJBCLIENT000016: Channel Channel ID c540daf7 (outbound) of Remoting connection 4ed5523c to perf18/10.16.90.54:8080 of endpoint "config-based-ejb-client-endpoint" <664ac6d5> can no longer process messages
> 05:28:53,277 ERROR [org.jboss.ejb.client.remoting.RemotingConnectionEJBReceiver] (Remoting "config-based-ejb-client-endpoint" task-2) Failed to open channel for context EJBReceiverContext{clientContext=org.jboss.ejb.client.EJBClientContext at 3cd51445, receiver=Remoting connection EJB receiver [connection=org.jboss.ejb.client.remoting.ConnectionPool$PooledConnection at 302066a5,channel=jboss.ejb,nodename=perf18]}
> org.jboss.remoting3.NotOpenException: Cannot open new channel because close was initiated
> at org.jboss.remoting3.remote.RemoteConnectionHandler.handleOutboundChannelOpen(RemoteConnectionHandler.java:198)
> at org.jboss.remoting3.remote.RemoteConnectionHandler.open(RemoteConnectionHandler.java:335)
> at org.jboss.remoting3.ConnectionImpl.openChannel(ConnectionImpl.java:109)
> at org.jboss.ejb.client.remoting.ConnectionPool$PooledConnection.openChannel(ConnectionPool.java:292)
> at org.jboss.ejb.client.remoting.RemotingConnectionEJBReceiver.associate(RemotingConnectionEJBReceiver.java:180)
> at org.jboss.ejb.client.EJBClientContext.registerEJBReceiver(EJBClientContext.java:399)
> at org.jboss.ejb.client.EJBClientContext.registerEJBReceiver(EJBClientContext.java:349)
> at org.jboss.ejb.client.remoting.EJBClientContextConnectionReconnectHandler.reconnect(EJBClientContextConnectionReconnectHandler.java:67)
> at org.jboss.ejb.client.EJBClientContext$ReconnectAttempt.run(EJBClientContext.java:1474)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> h2. WildFly Built from master branch on 7 Nov 2018
> Same situation.
> find [here|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-jobs-legacy-ejb-ejbremote/job/perflab_eap-7x-failover-legacy-ejb-ejbremote-shutdown-dist-sync/10/] the complete logs;
> the start of client errors is here:
> {noformat}
> 2018/11/07 09:23:22:916 EST [INFO ][TestController] HOST perf17.mw.lab.eng.bos.redhat.com:rootProcess:c - Failing node 0 (perf18)
> 2018/11/07 09:23:22:930 EST [INFO ][StatsRunner] HOST perf17.mw.lab.eng.bos.redhat.com:rootProcess:c - Total: Sessions: 2000, active: 2000, samples: 5125, throughput: 512.4 samples/s, bandwidth: 0.0 MB/s, response min: 1 ms, mean: 2 ms, max: 32 ms, sampling errors: 0, unhealthy samples: 0, valid samples: 5125 (100%)
> 2018/11/07 09:23:22:930 EST [INFO ][StatsRunner] HOST perf17.mw.lab.eng.bos.redhat.com:rootProcess:c - unknown-node: Sessions: 2000, active: 2000, samples: 5125, throughput: 512.4 samples/s, bandwidth: 0.0 MB/s, response min: 1 ms, mean: 2 ms, max: 32 ms, sampling errors: 0, unhealthy samples: 0, valid samples: 5125 (100%)
> 09:23:24,924 INFO [org.jboss.ejb.client.remoting] (Remoting "config-based-ejb-client-endpoint" task-4) EJBCLIENT000016: Channel Channel ID d940c707 (outbound) of Remoting connection 73cfdf01 to perf18/10.16.90.54:8080 of endpoint "config-based-ejb-client-endpoint" <7501616b> can no longer process messages
> 09:23:24,949 ERROR [org.jboss.ejb.client.remoting.RemotingConnectionEJBReceiver] (Remoting "config-based-ejb-client-endpoint" task-6) Failed to open channel for context EJBReceiverContext{clientContext=org.jboss.ejb.client.EJBClientContext at 56248025, receiver=Remoting connection EJB receiver [connection=org.jboss.ejb.client.remoting.ConnectionPool$PooledConnection at 5269f48d,channel=jboss.ejb,nodename=perf18]}
> org.jboss.remoting3.NotOpenException: Cannot open new channel because close was initiated
> at org.jboss.remoting3.remote.RemoteConnectionHandler.handleOutboundChannelOpen(RemoteConnectionHandler.java:198)
> at org.jboss.remoting3.remote.RemoteConnectionHandler.open(RemoteConnectionHandler.java:335)
> at org.jboss.remoting3.ConnectionImpl.openChannel(ConnectionImpl.java:109)
> at org.jboss.ejb.client.remoting.ConnectionPool$PooledConnection.openChannel(ConnectionPool.java:292)
> at org.jboss.ejb.client.remoting.RemotingConnectionEJBReceiver.associate(RemotingConnectionEJBReceiver.java:180)
> at org.jboss.ejb.client.EJBClientContext.registerEJBReceiver(EJBClientContext.java:399)
> at org.jboss.ejb.client.EJBClientContext.registerEJBReceiver(EJBClientContext.java:349)
> at org.jboss.ejb.client.remoting.EJBClientContextConnectionReconnectHandler.reconnect(EJBClientContextConnectionReconnectHandler.java:67)
> at org.jboss.ejb.client.EJBClientContext$ReconnectAttempt.run(EJBClientContext.java:1474)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
More information about the jboss-jira
mailing list