[jboss-jira] [JBoss JIRA] (JGRP-1675) CreditRequest in FlowControl is not OOB

Radim Vansa (JIRA) jira-events at lists.jboss.org
Wed Sep 25 03:28:45 EDT 2013


    [ https://issues.jboss.org/browse/JGRP-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12807128#comment-12807128 ] 

Radim Vansa commented on JGRP-1675:
-----------------------------------

Ah, stupid me. I've finally realized why there may be so many requests that the OOB threadpool gets depleted. Inifinispan does GETs to both owners, waiting just for the first response. Therefore, when the request get stuck because of credits on one node, the other owner serves it and the requesting node may issue another, blocking more and more OOB threads in the node lacking credits. I should have got this way before.
So, when we run out of credits, OOB TP gets depleted and no more messages get there. Neither the credit replenishments, because several OOB packets get lost and even when the request is INTERNAL, it's ordered in relation to the OOB messages and as noone can receive these (they're dropped when we try to pass them to OOB threads), it just waits in the UNICAST tables.
The question is how we've run out of credits in the beginning, but I'd say that even situation when this happens should be handled.
                
> CreditRequest in FlowControl is not OOB
> ---------------------------------------
>
>                 Key: JGRP-1675
>                 URL: https://issues.jboss.org/browse/JGRP-1675
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 3.4
>            Reporter: Radim Vansa
>            Assignee: Bela Ban
>             Fix For: 3.4
>
>
> I have recently observed a repeated situation where many (or all) threads have been stuck waiting for credits in FlowControl protocol.
> The credit request was not handled on the other node as this is non-oob message and some (actually many of them - cause unknown) messages before the request have been lost - therefore the request was waiting for them to be re-sent.
> However, these have not been re-sent properly as the retransmission request was not received - all OOB threads were stuck in the FlowControl protocol as these handled some other request and tried to send a response - but the response could not be sent until FlowControl gets the credits.
> The probability of such situation could be lowered by tagging the credit request to be OOB - then it would be handled immediately. If the credit replenish message would then be processed in regular OOB pool, this could get already depleted by many requests, but setting up the internal thread pool would solve the problem.
> Other consideration would be to allow releasing thread from FlowControl (let it send the message even without credits) if it waits there for too long.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the jboss-jira mailing list