[jboss-jira] [JBoss JIRA] Commented: (JGRP-1241) FD_ALL: make more suited for large clusters

Bela Ban (JIRA) jira-events at lists.jboss.org
Wed Sep 29 12:46:40 EDT 2010


    [ https://jira.jboss.org/browse/JGRP-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12554066#action_12554066 ] 

Bela Ban commented on JGRP-1241:
--------------------------------

We could also make a few selected members multicast the SUSPECT messages, e.g. something similar to max_rank (in Discovery) where only the oldest N members send out SUSPECT multicasts. If N=1, then only the current coordinator would send out SUSPECT multicasts

> FD_ALL: make more suited for large clusters
> -------------------------------------------
>
>                 Key: JGRP-1241
>                 URL: https://jira.jboss.org/browse/JGRP-1241
>             Project: JGroups
>          Issue Type: Feature Request
>            Reporter: Bela Ban
>            Assignee: Bela Ban
>             Fix For: 2.10.1, 2.11
>
>
> [(Edited) email reply to David Forget]
> FD_ALL provides a mechanism to detect rapidly when several nodes leave the groups at the same time. Once a failure is detected by a member of the cluster, a SUSPECT message is multicast to all nodes. Then, another logic takes place in all other nodes call VERIFY_SUSPECT. This is to give one last chance to the suspected node before it is removed from the view. The default configuration (udp.xml) makes it so that each node will then send 1 UDP messages to the suspected node which, if still alive, would reply with 1 UDP messages. Since we are running many nodes on the same host and each of these nodes are heavily multithreaded and running in a JVM with garbage collection, it happens that a false failure is detected. The consequences of this scenario given the current configuration result in an excessive number of messages sent on UDP. For example, in a 300 nodes cluster a single false failure case results in roughly 2 verify suspect messages (ARE_YOU_ALIVE+I_AM_NOT_DEAD) x 298 nodes that process the suspect message x 299 nodes that detect the failure = ~180 000 UDP messages {assuming that all other nodes detect the false failure in a very short period of time (This is the worst case)}. This network flood can lead to more false detection and more UDP messages up to a network saturation and eventually a complete cluster / network degradation already know as "UDP Storm"...
> So 300 nodes multicast the SUSPECT message. That's 300 (multicast) messages TOTAL.
> Then every one of those (300*num_msgs) unicasts an ARE_YOU_DEAD message to the suspected member S. Since num_msgs=1, that's another 300 unicast messages, so our total is now 600.
> The suspected member S now gets 300 unicasts and - as a response - send 300 unicast I_AM_NOT_DEAD responses.
> So now the total is 900 messages (300 multicasts + (300 unicasts * num_msgs) + 300 unicasts. I can't follow your computation above with 180'000 messages !?
> SOLUTION #1:
> However, one thing we could is the following: similar to STABLE and sending of stability messages, instead of multicasting out a SUSPECT message immediately, we could schedule a task which goes off in a random period in range [1 .. max_mcast_wait_time]. Reception of a SUSPECT message for the same suspected member cancels this timer.
> The effect of this staggered way of multicasting SUSPECT messages would be that ideally only 1 member multicasts a SUSPECT message. That's already a huge improvement and results in <1 SUSPECT multicast> + <2 * 300 unicast messages (in VERIFY_SUSPECT)>...
> SOLUTION #2
> In a second step what we could also do is to change VERIFY_SUSPECT, and have only the first N nodes in a cluster run the ARE_YOU_DEAD algorithm. This is similar to what I recently implemented in [1] to make discovery scale to large clusters. WDYT ?
> [1] https://jira.jboss.org/browse/JGRP-1181

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       



More information about the jboss-jira mailing list