[
https://issues.redhat.com/browse/JGRP-2451?page=com.atlassian.jira.plugin...
]
Bela Ban updated JGRP-2451:
---------------------------
Description:
Improvements to {{FD_ALL2}}.
* Messages should count as heartbeats ({{msg_counts_as_heartbeat}} should be *default*,
and as such, deprecated).
* When a multicast message is sent before {{interval}} elapsed, we suppress sending a
heartbeat
* There's a map associating members with booleans. True means a heartbeat was received
since the last check, false means it wasn't. On a check, the booleans are all set to
false.
It is crucial that setting the in the map is quick (not like in {{FD_ALL}}, where we
fetch the current time from the time service), especially since this is done on every
message.
The advantage is that we only send heartbeats when there is no (multicast) traffic, and
we don't suspect a member P when heartbeats have been missing despite receiving
traffic from P.
We need to think about whether to consider unicast messages, too, on the sender side: we
could populate a bit map with messages sent to members: on a unicast message to P, P's
bit would be set in the bit. On a multicast message, all bits would be set. Then, we could
selectively send heartbeats only to members with bits set to 0.
However, this is only feasible with sending a message N-1 times (e.g. TCP); for UDP we
don't have such an 'anycast' available.
was:
New heartbeat protocol, based on {{FD_ALL2}}.
* Messages should count as heartbeats ({{msg_counts_as_heartbeat}} should be default, and
as such, deprecated).
* When a multicast message is sent before {{interval}} elapsed, we suppress sending a
heartbeat
* A simple bitmap represents members: bits set to {{1}} mean we received a heartbeat since
the last check (at {{timeout}} ms), bits set to {{0}} lead to suspicions
* The bitmap is reset on each check
It is crucial that setting a bit in the bitmap is quick (not like in {{FD_ALL}}, where we
fetch the current time from the time service), especially since this is done on every
message.
The advantage of {{FD_ALL3}} over previous failure detection protocols is that we only
send heartbeats when there is no (multicast) traffic, and we don't suspect a member P
when heartbeats have been missing despite receiving traffic from P.
We need to think about whether to consider unicast messages, too, on the sender side: we
could populate a bit map with messages sent to members: on a unicast message to P, P's
bit would be set in the bit. On a multicast message, all bits would be set. Then, we could
selectively send heartbeats only to members with bits set to 0.
However, this is only feasible with sending a message N-1 times (e.g. TCP); for UDP we
don't have such an 'anycast' available.
FD_ALL2: improvements
---------------------
Key: JGRP-2451
URL:
https://issues.redhat.com/browse/JGRP-2451
Project: JGroups
Issue Type: Feature Request
Reporter: Bela Ban
Assignee: Bela Ban
Priority: Major
Fix For: 5.0, 4.2.0
Improvements to {{FD_ALL2}}.
* Messages should count as heartbeats ({{msg_counts_as_heartbeat}} should be *default*,
and as such, deprecated).
* When a multicast message is sent before {{interval}} elapsed, we suppress sending a
heartbeat
* There's a map associating members with booleans. True means a heartbeat was
received since the last check, false means it wasn't. On a check, the booleans are all
set to false.
It is crucial that setting the in the map is quick (not like in {{FD_ALL}}, where we
fetch the current time from the time service), especially since this is done on every
message.
The advantage is that we only send heartbeats when there is no (multicast) traffic, and
we don't suspect a member P when heartbeats have been missing despite receiving
traffic from P.
We need to think about whether to consider unicast messages, too, on the sender side: we
could populate a bit map with messages sent to members: on a unicast message to P, P's
bit would be set in the bit. On a multicast message, all bits would be set. Then, we could
selectively send heartbeats only to members with bits set to 0.
However, this is only feasible with sending a message N-1 times (e.g. TCP); for UDP we
don't have such an 'anycast' available.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)