[jboss-jira] [JBoss JIRA] (JGRP-2451) FD_ALL3

Bela Ban (Jira) issues at jboss.org
Fri Feb 14 03:00:00 EST 2020


Bela Ban created JGRP-2451:
------------------------------

             Summary: FD_ALL3
                 Key: JGRP-2451
                 URL: https://issues.redhat.com/browse/JGRP-2451
             Project: JGroups
          Issue Type: Feature Request
            Reporter: Bela Ban
            Assignee: Bela Ban
             Fix For: 5.0, 4.2.0


New heartbeat protocol, based on {{FD_ALL2}}. 

* Messages should count as heartbeats ({{msg_counts_as_heartbeat}} should be default, and as such, deprecated).
* When a multicast message is sent before {{interval}} elapsed, we suppress sending a heartbeat
* A simple bitmap represents members: bits set to {{1}} mean we received a heartbeat since the last check (at {{timeout}} ms), bits set to {{0}} lead to suspicions
* The bitmap is reset on each check

It is crucial that setting a bit in the bitmap is quick (not like in {{FD_ALL}}, where we fetch the current time from the time service), especially since this is done on every message.

The advantage of {{FD_ALL3}} over previous failure detection protocols is that we only send heartbeats when there is no (multicast) traffic, and we don't suspect a member P when heartbeats have been missing despite receiving traffic from P.

We need to think about whether to consider unicast messages, too, on the sender side: we could populate a bit map with messages sent to members: on a unicast message to P, P's bit would be set in the bit. On a multicast message, all bits would be set. Then, we could selectively send heartbeats only to members with bits set to 0.
However, this is only feasible with sending a message N-1 times (e.g. TCP); for UDP we don't have such an 'anycast' available.



--
This message was sent by Atlassian Jira
(v7.13.8#713008)


More information about the jboss-jira mailing list