[
https://issues.redhat.com/browse/JGRP-2486?page=com.atlassian.jira.plugin...
]
lukas brandl edited comment on JGRP-2486 at 7/1/20 7:35 AM:
------------------------------------------------------------
I've created a pull request (here: [
https://github.com/belaban/JGroups/pull/500]) to
split the FD Monitor task into a timeout checker and a heartbeat sender, in the same way
as the FD_ALL and FD_ALL2 protocol. This way the timeout check can't be blocket by the
TCP transport and the dead node is suspected.
was (Author: lbrandl2):
I've created a pull request (here: [
https://github.com/belaban/JGroups/pull/500]) to
split the FD Monitor task into a timeout checker and a heartbeat sender, in the same way
as the FD_ALL and FD_ALL2 protocol. This way the timeout check can't be blocket by the
TCP transport and the dead nodes is suspected.
FD Monitor get stuck on TrasferQueueBundler
-------------------------------------------
Key: JGRP-2486
URL:
https://issues.redhat.com/browse/JGRP-2486
Project: JGroups
Issue Type: Bug
Affects Versions: 4.0.22
Reporter: lukas brandl
Assignee: Bela Ban
Priority: Major
Attachments: Main.java, stack-trace.txt
Apparently there is an issue in the FD protocol. When a cluster nodes is disconnected and
the disconnect isn't handled by FD_SOCK, FD stops sending heartbeats after a while.
This only happens when the queue of the TrasferQueueBundler fills up before the node is
suspected.
The stack trace shows that the FD$Monitor is blocked by the bundler. This is probably the
reason why the heartbeat timeouts are not noticed.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)