[jboss-jira] [JBoss JIRA] (JGRP-2451) FD_ALL3
Bela Ban (Jira)
issues at jboss.org
Fri Feb 14 03:00:00 EST 2020
Bela Ban created JGRP-2451:
------------------------------
Summary: FD_ALL3
Key: JGRP-2451
URL: https://issues.redhat.com/browse/JGRP-2451
Project: JGroups
Issue Type: Feature Request
Reporter: Bela Ban
Assignee: Bela Ban
Fix For: 5.0, 4.2.0
New heartbeat protocol, based on {{FD_ALL2}}.
* Messages should count as heartbeats ({{msg_counts_as_heartbeat}} should be default, and as such, deprecated).
* When a multicast message is sent before {{interval}} elapsed, we suppress sending a heartbeat
* A simple bitmap represents members: bits set to {{1}} mean we received a heartbeat since the last check (at {{timeout}} ms), bits set to {{0}} lead to suspicions
* The bitmap is reset on each check
It is crucial that setting a bit in the bitmap is quick (not like in {{FD_ALL}}, where we fetch the current time from the time service), especially since this is done on every message.
The advantage of {{FD_ALL3}} over previous failure detection protocols is that we only send heartbeats when there is no (multicast) traffic, and we don't suspect a member P when heartbeats have been missing despite receiving traffic from P.
We need to think about whether to consider unicast messages, too, on the sender side: we could populate a bit map with messages sent to members: on a unicast message to P, P's bit would be set in the bit. On a multicast message, all bits would be set. Then, we could selectively send heartbeats only to members with bits set to 0.
However, this is only feasible with sending a message N-1 times (e.g. TCP); for UDP we don't have such an 'anycast' available.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
More information about the jboss-jira
mailing list