]
Bela Ban updated JGRP-2451:
---------------------------
Summary: FD_ALL2: improvements (was: FD_ALL3)
FD_ALL2: improvements
---------------------
Key: JGRP-2451
URL:
https://issues.redhat.com/browse/JGRP-2451
Project: JGroups
Issue Type: Feature Request
Reporter: Bela Ban
Assignee: Bela Ban
Priority: Major
Fix For: 5.0, 4.2.0
New heartbeat protocol, based on {{FD_ALL2}}.
* Messages should count as heartbeats ({{msg_counts_as_heartbeat}} should be default, and
as such, deprecated).
* When a multicast message is sent before {{interval}} elapsed, we suppress sending a
heartbeat
* A simple bitmap represents members: bits set to {{1}} mean we received a heartbeat
since the last check (at {{timeout}} ms), bits set to {{0}} lead to suspicions
* The bitmap is reset on each check
It is crucial that setting a bit in the bitmap is quick (not like in {{FD_ALL}}, where we
fetch the current time from the time service), especially since this is done on every
message.
The advantage of {{FD_ALL3}} over previous failure detection protocols is that we only
send heartbeats when there is no (multicast) traffic, and we don't suspect a member P
when heartbeats have been missing despite receiving traffic from P.
We need to think about whether to consider unicast messages, too, on the sender side: we
could populate a bit map with messages sent to members: on a unicast message to P, P's
bit would be set in the bit. On a multicast message, all bits would be set. Then, we could
selectively send heartbeats only to members with bits set to 0.
However, this is only feasible with sending a message N-1 times (e.g. TCP); for UDP we
don't have such an 'anycast' available.