[jboss-jira] [JBoss JIRA] (JGRP-2486) FD Monitor get stuck on TrasferQueueBundler

Friday, 26 June 2020

    [
https://issues.redhat.com/browse/JGRP-2486?page=com.atlassian.jira.plugin...
] 

lukas brandl commented on JGRP-2486:
------------------------------------

Thank you for the quick reply and the recommendations.
To clarify: the dead node is never suspected by the surviving node in this case.
The thread getting stuck on the bundler (or the tcp connection) isn’t a problem in itself
but appears to be the reason why the the node is never suspected.
We are aware that there are newer alternatives to FD, but we can’t easily change the
protocol stack if this causes incompatibility with previous versions and therefore can’t
be upgraded in a rolling fashion.

...
 FD Monitor get stuck on TrasferQueueBundler
 -------------------------------------------

                 Key: JGRP-2486
                 URL: https://issues.redhat.com/browse/JGRP-2486
             Project: JGroups
          Issue Type: Bug
    Affects Versions: 4.0.22
            Reporter: lukas brandl
            Assignee: Bela Ban
            Priority: Major
         Attachments: Main.java, stack-trace.txt

 Apparently there is an issue in the FD protocol. When a cluster nodes is disconnected and
the disconnect isn't handled by FD_SOCK, FD stops sending heartbeats after a while.
This only happens when the queue of the TrasferQueueBundler fills up before the node is
suspected.
 The stack trace shows that the FD$Monitor is blocked by the bundler. This is probably the
reason why the heartbeat timeouts are not noticed. 

--
This message was sent by Atlassian Jira
(v7.13.8#713008)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[jboss-jira] [JBoss JIRA] (JGRP-2486) FD Monitor get stuck on TrasferQueueBundler