]
Bela Ban resolved JGRP-787.
---------------------------
Resolution: Done
Moved sending of the message out of the synchronized block. If send() throws an exception,
we will be hosed because the seqno assigned to that message will not get used, so the
receiver has a gap and will not deliver any messages higher than the seqno of the failed
message.
However, if message sending throws an exception, unless the destination crashed (which
means we won't send messages to the dest anymore anyway), this would be considered a
bug in JGroups.
UNICAST over TCP with xmit_off=true: sending message in synchronized
block leads to deadlocks
---------------------------------------------------------------------------------------------
Key: JGRP-787
URL:
http://jira.jboss.com/jira/browse/JGRP-787
Project: JGroups
Issue Type: Bug
Reporter: Bela Ban
Assigned To: Bela Ban
Fix For: 2.6.3, 2.7
Same issue as
http://jira.jboss.com/jira/browse/JGRP-303: that's why we moved the
send() outside the synchronized block.
The problem with xmit_off though is that we need to know the message was passed to TCP/IP
successfully, or else we CANNOT increment the sequence number !
Stack trace:
Found one Java-level deadlock:
=============================
"Incoming-27,UnicastTest-Group,192.168.1.5:7500":
waiting for ownable synchronizer 0x00002aaac0921168, (a
java.util.concurrent.locks.ReentrantLock$NonfairSync),
which is held by "Incoming-4,UnicastTest-Group,192.168.1.5:7500"
"Incoming-4,UnicastTest-Group,192.168.1.5:7500":
waiting to lock monitor 0x00002aaacc8e9cf0 (object 0x00002aaac09e3a88, a
org.jgroups.protocols.UNICAST$Entry),
which is held by "main"
"main":
waiting for ownable synchronizer 0x00002aaac0921168, (a
java.util.concurrent.locks.ReentrantLock$NonfairSync),
which is held by "Incoming-4,UnicastTest-Group,192.168.1.5:7500"
Java stack information for the threads listed above:
===================================================
"Incoming-27,UnicastTest-Group,192.168.1.5:7500":
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00002aaac0921168> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
at
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:635)
at org.jgroups.protocols.UNICAST.up(UNICAST.java:292)
at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:735)
at org.jgroups.protocols.BARRIER.up(BARRIER.java:136)
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:167)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:309)
at org.jgroups.protocols.MERGE2.up(MERGE2.java:144)
at org.jgroups.protocols.Discovery.up(Discovery.java:244)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1266)
at org.jgroups.protocols.TP.access$100(TP.java:49)
at org.jgroups.protocols.TP$1.run(TP.java:1169)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)
"Incoming-4,UnicastTest-Group,192.168.1.5:7500":
at org.jgroups.protocols.UNICAST.down(UNICAST.java:357)
- waiting to lock <0x00002aaac09e3a88> (a
org.jgroups.protocols.UNICAST$Entry)
at org.jgroups.protocols.pbcast.STABLE.down(STABLE.java:316)
at org.jgroups.protocols.VIEW_SYNC.down(VIEW_SYNC.java:204)
at org.jgroups.protocols.pbcast.GMS.down(GMS.java:859)
at org.jgroups.protocols.FC.sendCredit(FC.java:740)
at org.jgroups.protocols.FC.up(FC.java:416)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:788)
at org.jgroups.protocols.VIEW_SYNC.up(VIEW_SYNC.java:192)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:233)
at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:645)
at org.jgroups.protocols.UNICAST.up(UNICAST.java:292)
at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:735)
at org.jgroups.protocols.BARRIER.up(BARRIER.java:136)
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:167)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:309)
at org.jgroups.protocols.MERGE2.up(MERGE2.java:144)
at org.jgroups.protocols.Discovery.up(Discovery.java:244)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1266)
at org.jgroups.protocols.TP.access$100(TP.java:49)
at org.jgroups.protocols.TP$1.run(TP.java:1169)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)
"main":
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00002aaac0921168> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
at
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:635)
at org.jgroups.protocols.UNICAST.up(UNICAST.java:292)
at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:735)
at org.jgroups.protocols.BARRIER.up(BARRIER.java:136)
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:167)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:309)
at org.jgroups.protocols.MERGE2.up(MERGE2.java:144)
at org.jgroups.protocols.Discovery.up(Discovery.java:244)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1266)
at org.jgroups.protocols.TP.access$100(TP.java:49)
at org.jgroups.protocols.TP$1.run(TP.java:1169)
at
java.util.concurrent.ThreadPoolExecutor$CallerRunsPolicy.rejectedExecution(ThreadPoolExecutor.java:1737)
at
org.jgroups.util.ShutdownRejectedExecutionHandler.rejectedExecution(ShutdownRejectedExecutionHandler.java:39)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
at org.jgroups.protocols.TP.down(TP.java:1167)
at org.jgroups.protocols.Discovery.down(Discovery.java:349)
at org.jgroups.protocols.MERGE2.down(MERGE2.java:175)
at org.jgroups.protocols.FD_SOCK.down(FD_SOCK.java:373)
at org.jgroups.protocols.VERIFY_SUSPECT.down(VERIFY_SUSPECT.java:95)
at org.jgroups.protocols.BARRIER.down(BARRIER.java:107)
at org.jgroups.protocols.pbcast.NAKACK.down(NAKACK.java:660)
at org.jgroups.protocols.UNICAST.send(UNICAST.java:484)
at org.jgroups.protocols.UNICAST.down(UNICAST.java:373)
- locked <0x00002aaac09e3a88> (a org.jgroups.protocols.UNICAST$Entry)
at org.jgroups.protocols.pbcast.STABLE.down(STABLE.java:316)
at org.jgroups.protocols.VIEW_SYNC.down(VIEW_SYNC.java:204)
at org.jgroups.protocols.pbcast.GMS.down(GMS.java:859)
at org.jgroups.protocols.FC.handleDownMessage(FC.java:526)
at org.jgroups.protocols.FC.down(FC.java:365)
at org.jgroups.protocols.FRAG2.down(FRAG2.java:175)
at
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.down(STREAMING_STATE_TRANSFER.java:303)
at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:457)
at org.jgroups.JChannel.down(JChannel.java:1443)
at org.jgroups.JChannel.send(JChannel.java:620)
at org.jgroups.tests.UnicastTest.sendMessages(UnicastTest.java:241)
at org.jgroups.tests.UnicastTest.eventLoop(UnicastTest.java:198)
at org.jgroups.tests.UnicastTest.main(UnicastTest.java:355)
Found 1 deadlock.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: