[jboss-jira] [JBoss JIRA] (JGRP-2350) TCP: connection close can block when send() block on full TCP send-window
Bela Ban (Jira)
issues at jboss.org
Mon Jun 3 09:55:00 EDT 2019
[ https://issues.jboss.org/browse/JGRP-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bela Ban resolved JGRP-2350.
----------------------------
Resolution: Done
Moved closing of TcpConnection out of the send_lock scope
> TCP: connection close can block when send() block on full TCP send-window
> -------------------------------------------------------------------------
>
> Key: JGRP-2350
> URL: https://issues.jboss.org/browse/JGRP-2350
> Project: JGroups
> Issue Type: Bug
> Reporter: Bela Ban
> Assignee: Bela Ban
> Priority: Major
> Fix For: 4.1.1
>
>
> When a peer is non-responsive (without closing its socket), a TcpConnection.send() can block on a write (state is RUNNABLE!).
> The problem is that the TcpConnection cannout be closed either, as TcpConnection.close() tries to acquire the same lock already held by TcpConnection.send().
> See the stack trace below for a sample scenario.
> The use case is this one:
> * Say we have nodes A (coord), B and C
> * There's heavy (clustering) traffic to all 3 nodes, from the 2 clients
> * B is isolated by executing 'ifdown bond0'
> * At this point, the messages going to B will back up at (say) A because A doesn't get any TCP acks from B
> * At some point, depending on the traffic and the size of the sent messages, A will acquire a lock on the send connection to B, to write data, but the write will block as the TCP send-window to B is full (note that the sender thread will still be in state RUNNABLE!)
> * After 40s, A suspects B and emits a new view {A,C}
> * This causes A's connection to B to be closed and subsequently removed. However, this _won't_ happen, as the connection close will need to acquire the connection lock, which is held by the TCP write
> {noformat}
> "main" #1 prio=5 os_prio=31 tid=0x00007fbbd3802000 nid=0x2303 runnable [0x0000700009793000]
> java.lang.Thread.State: RUNNABLE
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
> at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
> - locked <0x000000079e790a50> (a java.io.BufferedOutputStream)
> at java.io.DataOutputStream.write(DataOutputStream.java:107)
> - locked <0x000000079e790838> (a java.io.DataOutputStream)
> at org.jgroups.blocks.cs.TcpConnection.doSend(TcpConnection.java:161)
> at org.jgroups.blocks.cs.TcpConnection.send(TcpConnection.java:131)
> at org.jgroups.blocks.cs.TcpClient.send(TcpClient.java:103)
> at org.jgroups.tests.bla6.main(bla6.java:35)
> "Thread-2" #15 prio=5 os_prio=31 tid=0x00007fbbd2150800 nid=0x6503 waiting on condition [0x000070000bcf6000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x000000079e7871a8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
> at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
> at org.jgroups.blocks.cs.TcpConnection.close(TcpConnection.java:358)
> at org.jgroups.util.Util.close(Util.java:422)
> at org.jgroups.blocks.cs.TcpClient.stop(TcpClient.java:85)
> at org.jgroups.blocks.cs.BaseServer.close(BaseServer.java:147)
> at org.jgroups.util.Util.close(Util.java:422)
> at org.jgroups.tests.bla6.lambda$main$0(bla6.java:27)
> at org.jgroups.tests.bla6$$Lambda$1/1384010761.run(Unknown Source)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
More information about the jboss-jira
mailing list