[jboss-jira] [JBoss JIRA] (JGRP-2358) (7.2.z) TCP: connection close can block when send() block on full TCP send-window
Brad Maxwell (Jira)
issues at jboss.org
Tue Jul 9 11:58:00 EDT 2019
Brad Maxwell created JGRP-2358:
----------------------------------
Summary: (7.2.z) TCP: connection close can block when send() block on full TCP send-window
Key: JGRP-2358
URL: https://issues.jboss.org/browse/JGRP-2358
Project: JGroups
Issue Type: Bug
Reporter: Brad Maxwell
Assignee: Bela Ban
Fix For: 4.1.1, 4.0.20
When a peer is non-responsive (without closing its socket), a TcpConnection.send() can block on a write (state is RUNNABLE!).
The problem is that the TcpConnection cannout be closed either, as TcpConnection.close() tries to acquire the same lock already held by TcpConnection.send().
See the stack trace below for a sample scenario.
The use case is this one:
* Say we have nodes A (coord), B and C
* There's heavy (clustering) traffic to all 3 nodes, from the 2 clients
* B is isolated by executing 'ifdown bond0'
* At this point, the messages going to B will back up at (say) A because A doesn't get any TCP acks from B
* At some point, depending on the traffic and the size of the sent messages, A will acquire a lock on the send connection to B, to write data, but the write will block as the TCP send-window to B is full (note that the sender thread will still be in state RUNNABLE!)
* After 40s, A suspects B and emits a new view {A,C}
* This causes A's connection to B to be closed and subsequently removed. However, this _won't_ happen, as the connection close will need to acquire the connection lock, which is held by the TCP write
{noformat}
"main" #1 prio=5 os_prio=31 tid=0x00007fbbd3802000 nid=0x2303 runnable [0x0000700009793000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
- locked <0x000000079e790a50> (a java.io.BufferedOutputStream)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
- locked <0x000000079e790838> (a java.io.DataOutputStream)
at org.jgroups.blocks.cs.TcpConnection.doSend(TcpConnection.java:161)
at org.jgroups.blocks.cs.TcpConnection.send(TcpConnection.java:131)
at org.jgroups.blocks.cs.TcpClient.send(TcpClient.java:103)
at org.jgroups.tests.bla6.main(bla6.java:35)
"Thread-2" #15 prio=5 os_prio=31 tid=0x00007fbbd2150800 nid=0x6503 waiting on condition [0x000070000bcf6000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000079e7871a8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at org.jgroups.blocks.cs.TcpConnection.close(TcpConnection.java:358)
at org.jgroups.util.Util.close(Util.java:422)
at org.jgroups.blocks.cs.TcpClient.stop(TcpClient.java:85)
at org.jgroups.blocks.cs.BaseServer.close(BaseServer.java:147)
at org.jgroups.util.Util.close(Util.java:422)
at org.jgroups.tests.bla6.lambda$main$0(bla6.java:27)
at org.jgroups.tests.bla6$$Lambda$1/1384010761.run(Unknown Source)
at java.lang.Thread.run(Thread.java:748)
{noformat}
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
More information about the jboss-jira
mailing list