]
Martin Stefanko updated JGRP-2350:
----------------------------------
Labels: downstream_dependency (was: )
TCP: connection close can block when send() block on full TCP
send-window
-------------------------------------------------------------------------
Key: JGRP-2350
URL:
https://issues.jboss.org/browse/JGRP-2350
Project: JGroups
Issue Type: Bug
Reporter: Bela Ban
Assignee: Bela Ban
Priority: Major
Labels: downstream_dependency
Fix For: 4.1.1, 4.0.20
When a peer is non-responsive (without closing its socket), a TcpConnection.send() can
block on a write (state is RUNNABLE!).
The problem is that the TcpConnection cannout be closed either, as TcpConnection.close()
tries to acquire the same lock already held by TcpConnection.send().
See the stack trace below for a sample scenario.
The use case is this one:
* Say we have nodes A (coord), B and C
* There's heavy (clustering) traffic to all 3 nodes, from the 2 clients
* B is isolated by executing 'ifdown bond0'
* At this point, the messages going to B will back up at (say) A because A doesn't
get any TCP acks from B
* At some point, depending on the traffic and the size of the sent messages, A will
acquire a lock on the send connection to B, to write data, but the write will block as the
TCP send-window to B is full (note that the sender thread will still be in state
RUNNABLE!)
* After 40s, A suspects B and emits a new view {A,C}
* This causes A's connection to B to be closed and subsequently removed. However,
this _won't_ happen, as the connection close will need to acquire the connection lock,
which is held by the TCP write
{noformat}
"main" #1 prio=5 os_prio=31 tid=0x00007fbbd3802000 nid=0x2303 runnable
[0x0000700009793000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
- locked <0x000000079e790a50> (a java.io.BufferedOutputStream)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
- locked <0x000000079e790838> (a java.io.DataOutputStream)
at org.jgroups.blocks.cs.TcpConnection.doSend(TcpConnection.java:161)
at org.jgroups.blocks.cs.TcpConnection.send(TcpConnection.java:131)
at org.jgroups.blocks.cs.TcpClient.send(TcpClient.java:103)
at org.jgroups.tests.bla6.main(bla6.java:35)
"Thread-2" #15 prio=5 os_prio=31 tid=0x00007fbbd2150800 nid=0x6503 waiting on
condition [0x000070000bcf6000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000079e7871a8> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at org.jgroups.blocks.cs.TcpConnection.close(TcpConnection.java:358)
at org.jgroups.util.Util.close(Util.java:422)
at org.jgroups.blocks.cs.TcpClient.stop(TcpClient.java:85)
at org.jgroups.blocks.cs.BaseServer.close(BaseServer.java:147)
at org.jgroups.util.Util.close(Util.java:422)
at org.jgroups.tests.bla6.lambda$main$0(bla6.java:27)
at org.jgroups.tests.bla6$$Lambda$1/1384010761.run(Unknown Source)
at java.lang.Thread.run(Thread.java:748)
{noformat}