[jboss-jira] [JBoss JIRA] (JGRP-2486) FD Monitor get stuck on TrasferQueueBundler

lukas brandl (Jira) issues at jboss.org
Fri Jun 26 09:04:37 EDT 2020


    [ https://issues.redhat.com/browse/JGRP-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178029#comment-14178029 ] 

lukas brandl commented on JGRP-2486:
------------------------------------

Thank you for the quick reply and the recommendations.
To clarify: the dead node is never suspected by the surviving node in this case.
The thread getting stuck on the bundler (or the tcp connection) isn’t a problem in itself but appears to be the reason why the the node is never suspected.
We are aware that there are newer alternatives to FD, but we can’t easily change the protocol stack if this causes incompatibility with previous versions and therefore can’t be upgraded in a rolling fashion.

> FD Monitor get stuck on TrasferQueueBundler
> -------------------------------------------
>
>                 Key: JGRP-2486
>                 URL: https://issues.redhat.com/browse/JGRP-2486
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 4.0.22
>            Reporter: lukas brandl
>            Assignee: Bela Ban
>            Priority: Major
>         Attachments: Main.java, stack-trace.txt
>
>
> Apparently there is an issue in the FD protocol. When a cluster nodes is disconnected and the disconnect isn't handled by FD_SOCK, FD stops sending heartbeats after a while. This only happens when the queue of the TrasferQueueBundler fills up before the node is suspected.
> The stack trace shows that the FD$Monitor is blocked by the bundler. This is probably the reason why the heartbeat timeouts are not noticed.



--
This message was sent by Atlassian Jira
(v7.13.8#713008)



More information about the jboss-jira mailing list