New subject: [JBoss JIRA] Commented: (JGRP-1241) FD_ALL: make more suited for large clusters

Tuesday, 28 September 2010

FD_ALL: make more suited for large clusters
-------------------------------------------

                 Key: JGRP-1241
                 URL: https://jira.jboss.org/browse/JGRP-1241
             Project: JGroups
          Issue Type: Feature Request
            Reporter: Bela Ban
            Assignee: Bela Ban
             Fix For: 2.11

[(Edited) email reply to David Forget]

FD_ALL provides a mechanism to detect rapidly when several nodes leave the groups at the
same time. Once a failure is detected by a member of the cluster, a SUSPECT message is
multicast to all nodes. Then, another logic takes place in all other nodes call
VERIFY_SUSPECT. This is to give one last chance to the suspected node before it is removed
from the view. The default configuration (udp.xml) makes it so that each node will then
send 1 UDP messages to the suspected node which, if still alive, would reply with 1 UDP
messages. Since we are running many nodes on the same host and each of these nodes are
heavily multithreaded and running in a JVM with garbage collection, it happens that a
false failure is detected. The consequences of this scenario given the current
configuration result in an excessive number of messages sent on UDP. For example, in a 300
nodes cluster a single false failure case results in roughly 2 verify suspect messages
(ARE_YOU_ALIVE+I_AM_NOT_DEAD) x 298 nodes that process the suspect message x 299 nodes
that detect the failure = ~180 000 UDP messages {assuming that all other nodes detect the
false failure in a very short period of time (This is the worst case)}. This network flood
can lead to more false detection and more UDP messages up to a network saturation and
eventually a complete cluster / network degradation already know as "UDP
Storm"...

So 300 nodes multicast the SUSPECT message. That's 300 (multicast) messages TOTAL.

Then every one of those (300*num_msgs) unicasts an ARE_YOU_DEAD message to the suspected
member S. Since num_msgs=1, that's another 300 unicast messages, so our total is now
600.

The suspected member S now gets 300 unicasts and - as a response - send 300 unicast
I_AM_NOT_DEAD responses.

So now the total is 900 messages (300 multicasts + (300 unicasts * num_msgs) + 300
unicasts. I can't follow your computation above with 180'000 messages !?

SOLUTION #1:
However, one thing we could is the following: similar to STABLE and sending of stability
messages, instead of multicasting out a SUSPECT message immediately, we could schedule a
task which goes off in a random period in range [1 .. max_mcast_wait_time]. Reception of a
SUSPECT message for the same suspected member cancels this timer.

The effect of this staggered way of multicasting SUSPECT messages would be that ideally
only 1 member multicasts a SUSPECT message. That's already a huge improvement and
results in <1 SUSPECT multicast> + <2 * 300 unicast messages (in
VERIFY_SUSPECT)>...

SOLUTION #2
In a second step what we could also do is to change VERIFY_SUSPECT, and have only the
first N nodes in a cluster run the ARE_YOU_DEAD algorithm. This is similar to what I
recently implemented in [1] to make discovery scale to large clusters. WDYT ?

[1] https://jira.jboss.org/browse/JGRP-1181

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[JBoss JIRA] Created: (JGRP-1241) FD_ALL: make more suited for large clusters