[jboss-jira] [JBoss JIRA] (JGRP-1675) CreditRequest in FlowControl is not OOB

Tue Aug 13 03:41:26 EDT 2013

    [ https://issues.jboss.org/browse/JGRP-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796446#comment-12796446 ] 

Radim Vansa commented on JGRP-1675:
-----------------------------------

Internal pool made things a bit more complicated (as the deadlock may be resolved eventually) but some unpleasant behaviour still happens.

There is some temporary 2-way connection failure, which results in nodes lacking credits (because these have sent the messages, but the other party did not receive them in order to send back credit) and depleted OOB pool due to responses to be sent waiting for credits. Meanwhile one credit request may be lost along with the messages and we won't create another for 5 seconds by default - everything is stuck for this period. Internal pool helps here that even if OOB is depleted, we eventually get the REPLENISH message processed.
The message loss should be fixed (there are retransmit requests going around and the messages are re-sent), but these are discarded, because there's no thread to process them.

[~belaban]: In fact I think you're right that the credit request shouldn't be OOB. I think that better solution for this situation would be to issue an event upon XMIT_REQ with number of messages lost (potentially just if the count is over some threshold), which would automatically send some credit to the originator as if the messages arrived to FlowControl properly. The only problem is finding out how much credit to send, but FlowControl could estimate that either as constant (smallest message size?) or by computing average message size.

> CreditRequest in FlowControl is not OOB
> ---------------------------------------
>
>                 Key: JGRP-1675
>                 URL: https://issues.jboss.org/browse/JGRP-1675
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 3.4
>            Reporter: Radim Vansa
>            Assignee: Bela Ban
>             Fix For: 3.4
>
>
> I have recently observed a repeated situation where many (or all) threads have been stuck waiting for credits in FlowControl protocol.
> The credit request was not handled on the other node as this is non-oob message and some (actually many of them - cause unknown) messages before the request have been lost - therefore the request was waiting for them to be re-sent.
> However, these have not been re-sent properly as the retransmission request was not received - all OOB threads were stuck in the FlowControl protocol as these handled some other request and tried to send a response - but the response could not be sent until FlowControl gets the credits.
> The probability of such situation could be lowered by tagging the credit request to be OOB - then it would be handled immediately. If the credit replenish message would then be processed in regular OOB pool, this could get already depleted by many requests, but setting up the internal thread pool would solve the problem.
> Other consideration would be to allow releasing thread from FlowControl (let it send the message even without credits) if it waits there for too long.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira