[jboss-jira] [JBoss JIRA] (JGRP-2350) TCP: connection close can block when send() block on full TCP send-window

Bela Ban (Jira) issues at jboss.org
Wed Jun 12 06:14:00 EDT 2019


     [ https://issues.jboss.org/browse/JGRP-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bela Ban updated JGRP-2350:
---------------------------
    Fix Version/s: 4.0.20


> TCP: connection close can block when send() block on full TCP send-window
> -------------------------------------------------------------------------
>
>                 Key: JGRP-2350
>                 URL: https://issues.jboss.org/browse/JGRP-2350
>             Project: JGroups
>          Issue Type: Bug
>            Reporter: Bela Ban
>            Assignee: Bela Ban
>            Priority: Major
>             Fix For: 4.1.1, 4.0.20
>
>
> When a peer is non-responsive (without closing its socket), a TcpConnection.send() can block on a write (state is RUNNABLE!).
> The problem is that the TcpConnection cannout be closed either, as TcpConnection.close() tries to acquire the same lock already held by TcpConnection.send().
> See the stack trace below for a sample scenario.
> The use case is this one:
> * Say we have nodes A (coord), B and C
> * There's heavy (clustering) traffic to all 3 nodes, from the 2 clients
> * B is isolated by executing 'ifdown bond0'
> * At this point, the messages going to B will back up at (say) A because A doesn't get any TCP acks from B
> * At some point, depending on the traffic and the size of the sent messages, A will acquire a lock on the send connection to B, to write data, but the write will block as the TCP send-window to B is full (note that the sender thread will still be in state RUNNABLE!)
> * After 40s, A suspects B and emits a new view {A,C}
> * This causes A's connection to B to be closed and subsequently removed. However, this _won't_ happen, as the connection close will need to acquire the connection lock, which is held by the TCP write
> {noformat}
> "main" #1 prio=5 os_prio=31 tid=0x00007fbbd3802000 nid=0x2303 runnable [0x0000700009793000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.net.SocketOutputStream.socketWrite0(Native Method)
> 	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
> 	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
> 	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
> 	- locked <0x000000079e790a50> (a java.io.BufferedOutputStream)
> 	at java.io.DataOutputStream.write(DataOutputStream.java:107)
> 	- locked <0x000000079e790838> (a java.io.DataOutputStream)
> 	at org.jgroups.blocks.cs.TcpConnection.doSend(TcpConnection.java:161)
> 	at org.jgroups.blocks.cs.TcpConnection.send(TcpConnection.java:131)
> 	at org.jgroups.blocks.cs.TcpClient.send(TcpClient.java:103)
> 	at org.jgroups.tests.bla6.main(bla6.java:35)
> "Thread-2" #15 prio=5 os_prio=31 tid=0x00007fbbd2150800 nid=0x6503 waiting on condition [0x000070000bcf6000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x000000079e7871a8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> 	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
> 	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
> 	at org.jgroups.blocks.cs.TcpConnection.close(TcpConnection.java:358)
> 	at org.jgroups.util.Util.close(Util.java:422)
> 	at org.jgroups.blocks.cs.TcpClient.stop(TcpClient.java:85)
> 	at org.jgroups.blocks.cs.BaseServer.close(BaseServer.java:147)
> 	at org.jgroups.util.Util.close(Util.java:422)
> 	at org.jgroups.tests.bla6.lambda$main$0(bla6.java:27)
> 	at org.jgroups.tests.bla6$$Lambda$1/1384010761.run(Unknown Source)
> 	at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v7.12.1#712002)


More information about the jboss-jira mailing list