[
https://jira.jboss.org/browse/JGRP-1241?page=com.atlassian.jira.plugin.sy...
]
Bela Ban commented on JGRP-1241:
--------------------------------
OK, here's a better and simpler solution:
- Everybody send a HEARTBEAT multicast perodically
- The multicast is needed, because - if we only sent a unicast to the coordinator and
possibly the next-in-line - then we can't handle the case of
{A,B,C,D,E}, where A, B and C crash at the same time !
- Every member maintains a suspect list
- This list is adjusted on view changes
- Reception of a SUSPECT(P) message adds P to the list
- When we suspect P because we haven't received a HEARTBEAT (or traffic if enabled):
- The set of eligible members is computed as: members - suspected members
- If we are the coordinator (first in the list), or the coordinator is suspected and
we're next-in-line:
- Pass a SUSPECT(P) event up the stack, this runs the VERIDY_SUSPECT protocol and
eventually passes the SUSPECT(P) up to
GMS, which will exclude P from the view
The cost of running the suspicion protocol is (excluding the periodic heartbeat
multicasts):
- num_msgs ARE_YOU_DEAD unicasts to P
- A potential response (I_AM_NOT_DEAD) from P to the coordinator
This is way better than the previous implementation !
FD_ALL: reduce number of messages sent on a suspicion
-----------------------------------------------------
Key: JGRP-1241
URL:
https://jira.jboss.org/browse/JGRP-1241
Project: JGroups
Issue Type: Feature Request
Reporter: Bela Ban
Assignee: Bela Ban
Fix For: 2.11
[(Edited) email reply to David Forget]
FD_ALL provides a mechanism to detect rapidly when several nodes leave the groups at the
same time. Once a failure is detected by a member of the cluster, a SUSPECT message is
multicast to all nodes. Then, another logic takes place in all other nodes call
VERIFY_SUSPECT. This is to give one last chance to the suspected node before it is removed
from the view. The default configuration (udp.xml) makes it so that each node will then
send 1 UDP messages to the suspected node which, if still alive, would reply with 1 UDP
messages. Since we are running many nodes on the same host and each of these nodes are
heavily multithreaded and running in a JVM with garbage collection, it happens that a
false failure is detected. The consequences of this scenario given the current
configuration result in an excessive number of messages sent on UDP. For example, in a 300
nodes cluster a single false failure case results in roughly 2 verify suspect messages
(ARE_YOU_ALIVE+I_AM_NOT_DEAD) x 298 nodes that process the suspect message x 299 nodes
that detect the failure = ~180 000 UDP messages {assuming that all other nodes detect the
false failure in a very short period of time (This is the worst case)}. This network flood
can lead to more false detection and more UDP messages up to a network saturation and
eventually a complete cluster / network degradation already know as "UDP
Storm"...
So 300 nodes multicast the SUSPECT message. That's 300 (multicast) messages TOTAL.
Then every one of those (300*num_msgs) unicasts an ARE_YOU_DEAD message to the suspected
member S. Since num_msgs=1, that's another 300 unicast messages, so our total is now
600.
The suspected member S now gets 300 unicasts and - as a response - send 300 unicast
I_AM_NOT_DEAD responses.
So now the total is 900 messages (300 multicasts + (300 unicasts * num_msgs) + 300
unicasts. I can't follow your computation above with 180'000 messages !?
SOLUTION #1:
However, one thing we could is the following: similar to STABLE and sending of stability
messages, instead of multicasting out a SUSPECT message immediately, we could schedule a
task which goes off in a random period in range [1 .. max_mcast_wait_time]. Reception of a
SUSPECT message for the same suspected member cancels this timer.
The effect of this staggered way of multicasting SUSPECT messages would be that ideally
only 1 member multicasts a SUSPECT message. That's already a huge improvement and
results in <1 SUSPECT multicast> + <2 * 300 unicast messages (in
VERIFY_SUSPECT)>...
SOLUTION #2
In a second step what we could also do is to change VERIFY_SUSPECT, and have only the
first N nodes in a cluster run the ARE_YOU_DEAD algorithm. This is similar to what I
recently implemented in [1] to make discovery scale to large clusters. WDYT ?
[1]
https://jira.jboss.org/browse/JGRP-1181
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira