[jboss-jira] [JBoss JIRA] (JGRP-1675) Threads stuck in FlowControl.decrementIfEnoughCredits

Tuesday, 5 November 2013

    [
https://issues.jboss.org/browse/JGRP-1675?page=com.atlassian.jira.plugin....
] 

Dan Berindei commented on JGRP-1675:
------------------------------------

I was able to reproduce the deadlock again with an updated test, here:
https://github.com/danberindei/JGroups/blob/JGRP-1675/tests/stress/org/jg...

It looks like a regular message can be processed on the INT thread pool:

{noformat}
"INT-2,cluster,D@1395" prio=5 tid=0x32 nid=NA waiting
  java.lang.Thread.State: WAITING
	  at java.lang.Object.wait(Object.java:-1)
	  at
org.jgroups.protocols.FlowControl$Credit.decrementIfEnoughCredits(FlowControl.java:567)
	  at org.jgroups.protocols.UFC.handleDownMessage(UFC.java:121)
	  at org.jgroups.protocols.FlowControl.down(FlowControl.java:330)
	  at org.jgroups.protocols.FlowControl.down(FlowControl.java:340)
	  at org.jgroups.protocols.FRAG2.fragment(FRAG2.java:243)
	  at org.jgroups.protocols.FRAG2.down(FRAG2.java:122)
	  at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1024)
	  at org.jgroups.JChannel.down(JChannel.java:760)
	  at
org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:684)
	  at org.jgroups.blocks.RequestCorrelator.sendResponse(RequestCorrelator.java:516)
	  at org.jgroups.blocks.RequestCorrelator.sendReply(RequestCorrelator.java:506)
	  at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:476)
	  at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:374)
	  at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:247)
	  at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:666)
	  at org.jgroups.JChannel.up(JChannel.java:730)
	  at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1019)
	  at org.jgroups.protocols.FRAG2.up(FRAG2.java:182)
	  at org.jgroups.protocols.FlowControl.up(FlowControl.java:434)
	  at org.jgroups.protocols.FlowControl.up(FlowControl.java:434)
	  at org.jgroups.stack.Protocol.up(Protocol.java:409)
	  at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:294)
	  at org.jgroups.protocols.UNICAST3.removeAndDeliver(UNICAST3.java:796)
	  at org.jgroups.protocols.UNICAST3.handleBatchReceived(UNICAST3.java:752)
	  at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:466)
	  at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:662)
	  at org.jgroups.stack.Protocol.up(Protocol.java:409)
	  at org.jgroups.protocols.TP.passBatchUp(TP.java:1422)
	  at org.jgroups.protocols.TP$BatchHandler.run(TP.java:1574)
	  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	  at java.lang.Thread.run(Thread.java:744)
{noformat}

While debugging, I was able to look at the message batch in UNICAST3.handleBatchReceived,
and see that it still has a REPLENISH message waiting to be delivered.

...
 Threads stuck in FlowControl.decrementIfEnoughCredits
 -----------------------------------------------------

                 Key: JGRP-1675
                 URL: https://issues.jboss.org/browse/JGRP-1675
             Project: JGroups
          Issue Type: Bug
    Affects Versions: 3.4
            Reporter: Radim Vansa
            Assignee: Bela Ban
             Fix For: 3.5

         Attachments: jgroups-udp-radim.xml, RemoteGetStressTest.java, UPerf2.java

 I have recently observed a repeated situation where many (or all) threads have been stuck
waiting for credits in FlowControl protocol.
 The credit request was not handled on the other node as this is non-oob message and some
(actually many of them - cause unknown) messages before the request have been lost -
therefore the request was waiting for them to be re-sent.
 However, these have not been re-sent properly as the retransmission request was not
received - all OOB threads were stuck in the FlowControl protocol as these handled some
other request and tried to send a response - but the response could not be sent until
FlowControl gets the credits.
 The probability of such situation could be lowered by tagging the credit request to be
OOB - then it would be handled immediately. If the credit replenish message would then be
processed in regular OOB pool, this could get already depleted by many requests, but
setting up the internal thread pool would solve the problem.
 Other consideration would be to allow releasing thread from FlowControl (let it send the
message even without credits) if it waits there for too long.
 h3. Workaround
 It appears that setting MFC and UFC max_credits to 10M or removing these protocols at all
is a workaround for this issue. 
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[jboss-jira] [JBoss JIRA] (JGRP-1675) Threads stuck in FlowControl.decrementIfEnoughCredits