]
Cheng Fang commented on WFLY-13956:
-----------------------------------
In addition to the ejb client hanging at awaitResponse, I've also seen intermittent
hanging during discovery (either at firstMatchDiscovery or clusterDiscovery) when running
this test locally.
TwoConnectorsEJBFailoverTestCase sometimes hangs when running with
jboss-ejb-client 4.0.35
------------------------------------------------------------------------------------------
Key: WFLY-13956
URL:
https://issues.redhat.com/browse/WFLY-13956
Project: WildFly
Issue Type: Bug
Components: EJB
Affects Versions: 21.0.0.Beta1
Reporter: Cheng Fang
Assignee: Cheng Fang
Priority: Major
Attachments: Screen Shot 2020-10-12 at 9.57.36 AM.png
When upgrading jboss-ejb-client to version 4.0.35.Final, we noticed some WildFly CI jobs
sometimes hang when running {{TwoConnectorsEJBFailoverTestCase}}, causing 100+ (e.g., 466)
tests to fail:
{code}
RemoteElytronSingleSignOnTestCase.testSessionTimeoutDestroysSSOorg.jboss.as.test.clustering.cluster.sso.remote
RemoteElytronSingleSignOnTestCase.testFormAuthSingleSignOnorg.jboss.as.test.clustering.cluster.sso.remote
RemoteElytronSingleSignOnTestCase.testNoAuthSingleSignOnorg.jboss.as.test.clustering.cluster.sso.remote
RemoteSingleSignOnTestCase.testSessionTimeoutDestroysSSOorg.jboss.as.test.clustering.cluster.sso.remote
RemoteSingleSignOnTestCase.testFormAuthSingleSignOnorg.jboss.as.test.clustering.cluster.sso.remote
RemoteSingleSignOnTestCase.testNoAuthSingleSignOnorg.jboss.as.test.clustering.cluster.sso.remote
CoarseHotRodPersistenceWebFailoverTestCase.testGracefulUndeployFailoverorg.jboss.as.test.clustering.cluster.web.remote
CoarseHotRodPersistenceWebFailoverTestCase.testNonPrimaryOwnerorg.jboss.as.test.clustering.cluster.web.remote
CoarseHotRodPersistenceWebFailoverTestCase.testGracefulSimpleFailoverorg.jboss.as.test.clustering.cluster.web.remote
CoarseHotRodSessionActivationTestCase.testorg.jboss.as.test.clustering.cluster.web.remote
CoarseHotRodWebFailoverTestCase.testGracefulUndeployFailoverorg.jboss.as.test.clustering.cluster.web.remote
CoarseHotRodWebFailoverTestCase.testNonPrimaryOwnerorg.jboss.as.test.clustering.cluster.web.remote
CoarseHotRodWebFailoverTestCase.testGracefulSimpleFailoverorg.jboss.as.test.clustering.cluster.web.remote
CoarseTransactionalHotRodWebFailoverTestCase.testGracefulUndeployFailoverorg.jboss.as.test.clustering.cluster.web.remote
CoarseTransactionalHotRodWebFailoverTestCase.testNonPrimaryOwnerorg.jboss.as.test.clustering.cluster.web.remote
CoarseTransactionalHotRodWebFailoverTestCase.testGracefulSimpleFailoverorg.jboss.as.test.clustering.cluster.web.remote
FineHotRodPersistenceWebFailoverTestCase.testGracefulUndeployFailoverorg.jboss.as.test.clustering.cluster.web.remote
{code}
The error messages are like:
{code}
org.jboss.arquillian.container.spi.client.container.LifecycleException: The port 9990 is
already in use. It means that either the server might be already running or there is
another process using port 9990.
Managed containers do not support connecting to running server instances due to the
possible harmful effect of connecting to the wrong server.
Please stop server (or another process) before running, change to another type of
container (e.g. remote) or use jboss.socket.binding.port-offset variable to change the
default port.
To disable this check and allow Arquillian to connect to a running server, set
allowConnectingToRunningServer to true in the container configuration
{code}
In a [successful CI
job|https://ci.wildfly.org/buildConfiguration/WFPR/226438?],
{{TwoConnectorsEJBFailoverTestCase}} runs 2 tests:
{{testEJBClientUsingLegacyRemotingProtocol}} and
{{testEJBClientUsingHttpUpgradeProtocol}}, each taking ~25 seconds to complete. See
attached screenshot.
But in a [failed CI
job|https://ci.wildfly.org/buildConfiguration/WF_PullRequest_LinuxJdk11/2...],
filtering by {{TwoConnectorsEJBFailoverTestCase}} gives no result, which means the test
hangs at {{testEJBClientUsingLegacyRemotingProtocol}} and so the test runner hasn't
been able to collect the result from either test.