[
http://jira.jboss.com/jira/browse/JGRP-465?page=comments#action_12359478 ]
Brian Stansberry commented on JGRP-465:
---------------------------------------
Bela, on our call we discussed using a ThreadLocal to flag the thread, and decided to
maintain a Set of up threads instead to ensure J2ME support.
As I think about it more, in 2.4, there's at maximum only one thread the carries
messages through FC.up(), correct? So we can just store a ref to that thread in FC and in
handleDownMessage() the check is just "if (up_thread == Thread.currentThread())
...".
Deadlock in FC if RPC response blocks
-------------------------------------
Key: JGRP-465
URL:
http://jira.jboss.com/jira/browse/JGRP-465
Project: JGroups
Issue Type: Bug
Affects Versions: 2.4.1 SP1
Reporter: Brian Stansberry
Assigned To: Brian Stansberry
Fix For: 2.4.1 SP2
In 2.4.1.SP1 (and probably earlier) FC can deadlock if all up/down threads are set to
false at and above FC and an incoming RPC loops back down the channel with its response.
Following stack trace shows a deadlock situation (note the line numbers in FC are off
from the cvs code; this occured with a patched FC version, but the patch is not relevant
to this error):
"IncomingPacketHandler (channel=Tomcat-Cluster)" daemon prio=1 tid=0xc8a68b60
nid=0x1ece in Object.wait() [0xc94d1000..0xc94d1f30]
at java.lang.Object.wait(Native Method)
at EDU.oswego.cs.dl.util.concurrent.CondVar.timedwait(CondVar.java:222)
- locked <0xd187e398> (a EDU.oswego.cs.dl.util.concurrent.CondVar)
at org.jgroups.protocols.FC.handleDownMessage(FC.java:394)
at org.jgroups.protocols.FC.down(FC.java:336)
at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java:517)
at org.jgroups.protocols.FC.receiveDownEvent(FC.java:330)
at org.jgroups.stack.Protocol.passDown(Protocol.java:551)
at org.jgroups.protocols.FRAG2.down(FRAG2.java:167)
at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java:517)
at org.jgroups.stack.Protocol.passDown(Protocol.java:551)
at org.jgroups.protocols.pbcast.STATE_TRANSFER.down(STATE_TRANSFER.java:294)
at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java:517)
at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:385)
at org.jgroups.JChannel.down(JChannel.java:1231)
at
org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:790)
at
org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.passDown(MessageDispatcher.java:767)
at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:693)
at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:544)
at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:367)
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:777)
at org.jgroups.JChannel.up(JChannel.java:1091)
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:377)
at org.jgroups.stack.ProtocolStack.receiveUpEvent(ProtocolStack.java:393)
at org.jgroups.stack.Protocol.passUp(Protocol.java:538)
at org.jgroups.protocols.pbcast.STATE_TRANSFER.up(STATE_TRANSFER.java:158)
at org.jgroups.stack.Protocol.receiveUpEvent(Protocol.java:488)
at org.jgroups.stack.Protocol.passUp(Protocol.java:538)
at org.jgroups.protocols.FRAG2.up(FRAG2.java:197)
at org.jgroups.stack.Protocol.receiveUpEvent(Protocol.java:488)
at org.jgroups.stack.Protocol.passUp(Protocol.java:538)
at org.jgroups.protocols.FC.up(FC.java:377)
at org.jgroups.stack.Protocol.receiveUpEvent(Protocol.java:488)
at org.jgroups.stack.Protocol.passUp(Protocol.java:538)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:768)
at org.jgroups.stack.Protocol.receiveUpEvent(Protocol.java:488)
at org.jgroups.protocols.pbcast.GMS.receiveUpEvent(GMS.java:788)
at org.jgroups.stack.Protocol.passUp(Protocol.java:538)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:260)
at org.jgroups.stack.Protocol.receiveUpEvent(Protocol.java:488)
at org.jgroups.stack.Protocol.passUp(Protocol.java:538)
at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:476)
- locked <0xd1eec6d8> (a org.jgroups.protocols.UNICAST$Entry)
at org.jgroups.protocols.UNICAST.up(UNICAST.java:206)
at org.jgroups.stack.Protocol.receiveUpEvent(Protocol.java:488)
at org.jgroups.stack.Protocol.passUp(Protocol.java:538)
at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:569)
at org.jgroups.stack.Protocol.receiveUpEvent(Protocol.java:488)
at org.jgroups.stack.Protocol.passUp(Protocol.java:538)
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:170)
at org.jgroups.stack.Protocol.receiveUpEvent(Protocol.java:488)
at org.jgroups.stack.Protocol.passUp(Protocol.java:538)
at org.jgroups.protocols.FD.up(FD.java:300)
at org.jgroups.stack.Protocol.receiveUpEvent(Protocol.java:488)
at org.jgroups.stack.Protocol.passUp(Protocol.java:538)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:301)
at org.jgroups.stack.Protocol.receiveUpEvent(Protocol.java:488)
at org.jgroups.stack.Protocol.passUp(Protocol.java:538)
at org.jgroups.protocols.MERGE2.up(MERGE2.java:162)
at org.jgroups.stack.Protocol.receiveUpEvent(Protocol.java:488)
at org.jgroups.stack.Protocol.passUp(Protocol.java:538)
at org.jgroups.protocols.Discovery.up(Discovery.java:225)
at org.jgroups.stack.Protocol.receiveUpEvent(Protocol.java:488)
at org.jgroups.stack.Protocol.passUp(Protocol.java:538)
at org.jgroups.protocols.TP.handleIncomingMessage(TP.java:908)
at org.jgroups.protocols.TP.handleIncomingPacket(TP.java:850)
at org.jgroups.protocols.TP.access$400(TP.java:45)
at org.jgroups.protocols.TP$IncomingPacketHandler.run(TP.java:1296)
at java.lang.Thread.run(Thread.java:595)
The thread carried an RPC up to RpcDispatcher and is waiting in FC.handleDownMessage()
for credits to become available to send the RPC response. Those credits will never
arrive, as the thread that is blocking is the one that would need to deliver the credits.
This is less of an issue in 2.5. This is because 2.5. uses the concurrent stack and
credit replenishments are sent as OOB messages on a separate thread. (Would still be an
issue in 2.5 if the concurrent stack were disabled via configuration.)
Hacky solution is to somehow flag the up thread and in handleDownMessage() check for the
flag before blocking the thread. If the flag is set, don't block the thread -- just
let it through, i.e. let it exceed max_credits.
In cases like recent JBC releases where the vast majority of RPC responses are
lightweight "null" responses, this is pretty safe. Need to add a config flag to
disable the workaround though for use in applications where RPC responses frequently
return large amounts of data.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira