[
https://issues.jboss.org/browse/WFCORE-3302?page=com.atlassian.jira.plugi...
]
Brian Stansberry commented on WFCORE-3302:
------------------------------------------
I've submitted master/3.x/3.0.x PRs to workaround this (WFCORE-3302) by closing the
endpoint. What's interesting, is that simply doing this didn't solve the problem:
endpoint.close();
This, however, did:
endpoint.closeAsync();
endpoint.awaitClosed(); // with some try/catch boilerplate around it to handle
InterruptedException
From a naive point of view it seems like those two should be
semantically equivalent. I've reported this on REM3-303 which is the main JIRA
tracking this problem.
Intermittent protocol and controller module unit test failures since
move to JBoss Remoting 5
---------------------------------------------------------------------------------------------
Key: WFCORE-3302
URL:
https://issues.jboss.org/browse/WFCORE-3302
Project: WildFly Core
Issue Type: Bug
Components: Domain Management, Test Suite
Reporter: Brian Stansberry
Assignee: Brian Stansberry
This bug is about problems in WF Core management tests. I believe it exposes a flaw in
how remoting handles server sockets, but AFAIK there is no impact on WF Core remoting
server sockets.
Since the move to JBoss Remoting 5 we've seen intermittent failures in the protocol
and controller module testsuites involving the tests that use their respective copies of
the ChannelServer + RemoteChannelPairSetup test fixture. These tests all do a setup and
teardown of the fixture for each test method (i.e. @Before and @After) with the failure
being that a test fails creating a remoting server with a failure that indicates the
server from a previous test hasn't completely shut down yet:
{code}
java.lang.RuntimeException: java.net.BindException: Address already in use: bind
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
at org.xnio.nio.NioXnioWorker.createTcpConnectionServer(NioXnioWorker.java:181)
at org.xnio.XnioWorker.createStreamConnectionServer(XnioWorker.java:282)
at
org.jboss.remoting3.remote.RemoteConnectionProvider$ProviderInterface.createServer(RemoteConnectionProvider.java:372)
at org.jboss.as.controller.support.ChannelServer.create(ChannelServer.java:92)
at
org.jboss.as.controller.support.RemoteChannelPairSetup.setupRemoting(RemoteChannelPairSetup.java:88)
at
org.jboss.as.controller.ModelControllerClientTestCase.setupTestClient(ModelControllerClientTestCase.java:94)
at
org.jboss.as.controller.ModelControllerClientTestCase.testCloseInputStreamEntry(ModelControllerClientTestCase.java:346)
{code}
These failures have been mildly annoying on
ci.wildfly.org, but now that the same code is
being on other test machines, e.g. brontes used for EAP testing, they are completely
intolerable, affecting a high percentage of CI runs for pull requests.
I believe the issue arises from changes to these fixtures that came in as part of the
Remoting 5 upgrade such that a remoting Endpoint is not being created/shutdown for each
test method. This causes a problem because the AcceptingChannel<StreamConnection>
created by Endpoint.getConnectionProviderInterface(...).createServer(...) *does not*
synchronously close down the underlying socket as part of a call to its close() method.
The socket is not closed synchronously because the ServerSocketChannel impl of close()
does not close the socket if there are any registered keys. Debugging shows the socket is
not closed until this stack happens:
{code}
"XNIO-1 Accept@1562" daemon prio=5 tid=0xf nid=NA runnable
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.ServerSocketChannelImpl.kill(ServerSocketChannelImpl.java:307)
- locked <0xc0d> (a java.lang.Object)
at sun.nio.ch.KQueueSelectorImpl.implDereg(KQueueSelectorImpl.java:229)
at sun.nio.ch.SelectorImpl.processDeregisterQueue(SelectorImpl.java:149)
- locked <0xc38> (a java.util.HashSet)
at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:107)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0xc2b> (a sun.nio.ch.KQueueSelectorImpl)
- locked <0xc39> (a java.util.Collections$UnmodifiableSet)
- locked <0xc3a> (a sun.nio.ch.Util$2)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)
at org.xnio.nio.WorkerThread.run(WorkerThread.java:519)
{code}
That thread is not under the control of the test fixture, which means there's a race
between it closing the socket and the test moving on the next setup where it tries to open
the socket.
I think the only solution for this is to bring the endpoint lifecycle back under the
control of the test fixture such that the fixture knows all is shutdown. I don't see
anything else the test can block on to ensure the server socket is closed.
I think this would be a bug for any use of remoting where a server may quickly be
shutdown and then recreated.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)