[jboss-jira] [JBoss JIRA] Created: (JGRP-395) Parallel FD

Thursday, 21 December 2006


Parallel FD
-----------

                 Key: JGRP-395
                 URL: http://jira.jboss.com/jira/browse/JGRP-395
             Project: JGroups
          Issue Type: Feature Request
    Affects Versions: 2.4
            Reporter: Bela Ban
         Assigned To: Bela Ban
             Fix For: 2.5


With FD, when we have N nodes in a cluster and the switch crashes, every node will take
roughly (N-1) * TIMEOUT ms to become a singleton cluster. This is because in regular FD,
we only ping the next-in-line, e.g.
- Cluster is A, B, C, D
- The plug is pulled
- Example B:
- B decides that, after TIMEOUT ms, C is dead and excludes C from the pingable members
- B then starts emitting a SUSPECT(C) until it gets a new view which excludes C
- B switches to pinging D
- After TIMEOUT ms, it switches to A
- When all of C, D and A have been excluded, B decides to become a singleton cluster (and
coordinator in it)

SOLUTION:
- Nodes don't actively ping other nodes. Instead, each nodes periodically multicasts a
HEARTBEAT to the cluster
- The HEARTBEAT is suppressed when a node sends data, because data counts as a heartbeat
as well
- Every node maintains a table of nodes and the last time we received either a message or
a HEARTBEAT from that node
- The counter is updated with the current time whenever that is the case
- Periodically, we check whether any node has not sent us data/heartbeat for more the
timeout ms. If so, we suspect it

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[jboss-jira] [JBoss JIRA] Created: (JGRP-395) Parallel FD