[
https://issues.jboss.org/browse/ISPN-6099?page=com.atlassian.jira.plugin....
]
Dan Berindei commented on ISPN-6099:
------------------------------------
Ok, so it's definitely not a kernel bug: the {{SO_REUSEADDR}} option is set by default
on Java server sockets (but I haven't yet found where exactly this is set). With this
option, it's entirely legal for two threads to {{bind()}} to the same address and
port; it's only illegal for both of them to {{listen()}} on that port. This is easy to
see in this strace log that includes more system calls:
{noformat}
[pid 21908] socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 92<TCP:[18611289]>
[pid 21908] setsockopt(92<TCP:[18611289]>, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid 21908] bind(92<TCP:[18611289]>, {sa_family=AF_INET, sin_port=htons(7900),
sin_addr=inet_addr("127.0.0.1")}, 16 <unfinished ...>
[pid 21909] socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 93<TCP:[18604916]>
[pid 21909] setsockopt(93<TCP:[18604916]>, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid 21908] <... bind resumed> ) = -1 EADDRINUSE (Address already in use)
[pid 21909] bind(93<TCP:[18604916]>, {sa_family=AF_INET, sin_port=htons(7900),
sin_addr=inet_addr("127.0.0.1")}, 16 <unfinished ...>
[pid 21909] <... bind resumed> ) = -1 EADDRINUSE (Address already in use)
...
[pid 21908] bind(92<TCP:[18611289]>, {sa_family=AF_INET, sin_port=htons(7905),
sin_addr=inet_addr("127.0.0.1")}, 16 <unfinished ...>
[pid 21909] bind(93<TCP:[18604916]>, {sa_family=AF_INET, sin_port=htons(7905),
sin_addr=inet_addr("127.0.0.1")}, 16 <unfinished ...>
[pid 21908] <... bind resumed> ) = -1 EADDRINUSE (Address already in use)
[pid 21909] <... bind resumed> ) = -1 EADDRINUSE (Address already in use)
[pid 21909] bind(93<TCP:[18604916]>, {sa_family=AF_INET, sin_port=htons(7906),
sin_addr=inet_addr("127.0.0.1")}, 16 <unfinished ...>
[pid 21908] bind(92<TCP:[18611289]>, {sa_family=AF_INET, sin_port=htons(7906),
sin_addr=inet_addr("127.0.0.1")}, 16 <unfinished ...>
[pid 21909] <... bind resumed> ) = 0
[pid 21908] <... bind resumed> ) = 0
[pid 21909] listen(93<TCP:[18604916]>, 50 <unfinished ...>
[pid 21908] listen(92<TCP:[18611289]>, 50 <unfinished ...>
[pid 21909] <... listen resumed> ) = 0
[pid 21909] getsockname(93<TCP:[127.0.0.1:7906]>, <unfinished ...>
[pid 21908] <... listen resumed> ) = -1 EADDRINUSE (Address already in use)
[pid 21909] <... getsockname resumed> {sa_family=AF_INET, sin_port=htons(7906),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
[pid 21909] getsockname(93<TCP:[127.0.0.1:7906]>, <unfinished ...>
[pid 21908] bind(92<TCP:[18611289]>, {sa_family=AF_INET, sin_port=htons(7907),
sin_addr=inet_addr("127.0.0.1")}, 16 <unfinished ...>
[pid 21909] <... getsockname resumed> {sa_family=AF_INET, sin_port=htons(7906),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
[pid 21908] <... bind resumed> ) = -1 EINVAL (Invalid argument)
{noformat}
So one {{listen()}} system call succeeds, and the other returns {{EADDRINUSE}}. However,
both sockets are now bound to port {{7906}}, so attempts to bind socket {{92}} to other
ports fail with {{EINVAL}}.
JGroups could work around this by not reusing the socket after an error, at least if
{{isBound()}} returns {{true}}. It could also remove the {{SO_REUSEADDR}} socket option
explicitly, but that might lead to other random failures in our tests, if {{bind()}}
starts to fail because it finds a socket on the same port in the {{TIMED_WAIT}} state.
ConcurrentJoinTest random failures
----------------------------------
Key: ISPN-6099
URL:
https://issues.jboss.org/browse/ISPN-6099
Project: Infinispan
Issue Type: Bug
Components: Test Suite - Core
Affects Versions: 8.1.0.Final
Environment: java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 8.2.0.Beta1
Since the switch to {{TCP_NIO2}} in the test suite, I've been seeing random failures
in {{ConcurrentJoinTest}} and other tests that attempt to start multiple channels in
parallel (e.g. {{StateTransferFunctionalTest}} and its subclasses).
Normally JGroups only reports a {{java.net.BindException: No available port to bind to in
range [8000 .. 8099]}}, but I have modified {{org.jgroups.util.Util.createServerSocket()}}
to report the cause exception and I got this:
{noformat}
java.net.BindException: No available port to bind to in range [8000 .. 8099]
at org.jgroups.util.Util.createServerSocketChannel(Util.java:3077)
~[jgroups-3.6.7.Final.jar:3.6.7.Final]
at org.jgroups.blocks.cs.NioServer.<init>(NioServer.java:86)
~[jgroups-3.6.7.Final.jar:3.6.7.Final]
at org.jgroups.protocols.TCP_NIO2.start(TCP_NIO2.java:97)
~[jgroups-3.6.7.Final.jar:3.6.7.Final]
at org.jgroups.stack.ProtocolStack.startStack(ProtocolStack.java:966)
~[jgroups-3.6.7.Final.jar:3.6.7.Final]
at org.jgroups.JChannel.startStack(JChannel.java:890)
~[jgroups-3.6.7.Final.jar:3.6.7.Final]
at org.jgroups.JChannel._preConnect(JChannel.java:553)
~[jgroups-3.6.7.Final.jar:3.6.7.Final]
at org.jgroups.JChannel.connect(JChannel.java:288)
~[jgroups-3.6.7.Final.jar:3.6.7.Final]
at org.jgroups.JChannel.connect(JChannel.java:279)
~[jgroups-3.6.7.Final.jar:3.6.7.Final]
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.startJGroupsChannelIfNeeded(JGroupsTransport.java:199)
~[classes/:?]
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.start(JGroupsTransport.java:190)
~[classes/:?]
at sun.reflect.GeneratedMethodAccessor129.invoke(Unknown Source) ~[?:?]
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_60]
at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_60]
at
org.infinispan.commons.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:168)
~[infinispan-commons-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at
org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:870)
~[classes/:?]
at
org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:639)
~[classes/:?]
at
org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:628)
~[classes/:?]
at
org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:531)
~[classes/:?]
at
org.infinispan.factories.GlobalComponentRegistry.start(GlobalComponentRegistry.java:229)
~[classes/:?]
... 11 more
Caused by: java.net.SocketException: Invalid argument
at sun.nio.ch.Net.bind0(Native Method) ~[?:1.8.0_60]
at sun.nio.ch.Net.bind(Net.java:433) ~[?:1.8.0_60]
at sun.nio.ch.Net.bind(Net.java:425) ~[?:1.8.0_60]
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
~[?:1.8.0_60]
at java.nio.channels.ServerSocketChannel.bind(ServerSocketChannel.java:157)
~[?:1.8.0_60]
at org.jgroups.util.Util.createServerSocketChannel(Util.java:3072)
~[jgroups-3.6.7.Final.jar:3.6.7.Final]
{noformat}
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)