[jboss-jira] [JBoss JIRA] (WFCORE-3302) Intermittent protocol and controller module unit test failures since move to JBoss Remoting 5
Brian Stansberry (JIRA)
issues at jboss.org
Sun Sep 17 13:16:00 EDT 2017
Brian Stansberry created WFCORE-3302:
----------------------------------------
Summary: Intermittent protocol and controller module unit test failures since move to JBoss Remoting 5
Key: WFCORE-3302
URL: https://issues.jboss.org/browse/WFCORE-3302
Project: WildFly Core
Issue Type: Bug
Components: Domain Management, Test Suite
Reporter: Brian Stansberry
Assignee: Brian Stansberry
Since the move to JBoss Remoting 5 we've seen intermittent failures in the protocol and controller module testsuites involving the tests that use their respective copies of the ChannelServer + RemoteChannelPairSetup test fixture. These tests all do a setup and teardown of the fixture for each test method (i.e. @Before and @After) with the failure being that a test fails creating a remoting server with a failure that indicates the server from a previous test hasn't completely shut down yet:
{code}
java.lang.RuntimeException: java.net.BindException: Address already in use: bind
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
at org.xnio.nio.NioXnioWorker.createTcpConnectionServer(NioXnioWorker.java:181)
at org.xnio.XnioWorker.createStreamConnectionServer(XnioWorker.java:282)
at org.jboss.remoting3.remote.RemoteConnectionProvider$ProviderInterface.createServer(RemoteConnectionProvider.java:372)
at org.jboss.as.controller.support.ChannelServer.create(ChannelServer.java:92)
at org.jboss.as.controller.support.RemoteChannelPairSetup.setupRemoting(RemoteChannelPairSetup.java:88)
at org.jboss.as.controller.ModelControllerClientTestCase.setupTestClient(ModelControllerClientTestCase.java:94)
at org.jboss.as.controller.ModelControllerClientTestCase.testCloseInputStreamEntry(ModelControllerClientTestCase.java:346)
{code}
These failures have been mildly annoying on ci.wildfly.org, but now that the same code is being on other test machines, e.g. brontes used for EAP testing, they are completely intolerable, affecting a high percentage of CI runs for pull requests.
I believe the issue arises from changes to these fixtures that came in as part of the Remoting 5 upgrade such that a remoting Endpoint is not being created/shutdown for each test method. This causes a problem because the AcceptingChannel<StreamConnection> created by Endpoint.getConnectionProviderInterface(...).createServer(...) *does not* synchronously close down the underlying socket as part of a call to its close() method.
The socket is not closed synchronously because the ServerSocketChannel impl of close() does not close the socket if there are any registered keys. Debugging shows the socket is not closed until this stack happens:
{code}
"XNIO-1 Accept at 1562" daemon prio=5 tid=0xf nid=NA runnable
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.ServerSocketChannelImpl.kill(ServerSocketChannelImpl.java:307)
- locked <0xc0d> (a java.lang.Object)
at sun.nio.ch.KQueueSelectorImpl.implDereg(KQueueSelectorImpl.java:229)
at sun.nio.ch.SelectorImpl.processDeregisterQueue(SelectorImpl.java:149)
- locked <0xc38> (a java.util.HashSet)
at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:107)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0xc2b> (a sun.nio.ch.KQueueSelectorImpl)
- locked <0xc39> (a java.util.Collections$UnmodifiableSet)
- locked <0xc3a> (a sun.nio.ch.Util$2)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)
at org.xnio.nio.WorkerThread.run(WorkerThread.java:519)
{code}
That thread is not under the control of the test fixture, which means there's a race between it closing the socket and the test moving on the next setup where it tries to open the socket.
I think the only solution for this is to bring the endpoint lifecycle back under the control of the test fixture such that the fixture knows all is shutdown. I don't see anything else the test can block on to ensure the server socket is closed.
I think this would be a bug for any use of remoting where a server may quickly be shutdown and then recreated.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
More information about the jboss-jira
mailing list