[jboss-jira] [JBoss JIRA] (JGRP-1523) FD_ALL does not unsuspect on heartbeat

Wednesday, 17 October 2012

    [
https://issues.jboss.org/browse/JGRP-1523?page=com.atlassian.jira.plugin....
] 

Jan Boehm commented on JGRP-1523:
---------------------------------

More explanation (from IRC):

- suppose a cluster view containing nodes [A B C]
- the node that we look at is C
- first it suspects A as beeing in fault (in my use case this is due to an infinispan bug,
that causes OOB messages to be dropped because no threads are available to handle them, in
this case heartbeats)
- since B is in the member list before C, C will not propagate the suspicion
- now the load on C decreases, and new heartbeats from A arrive, but C still keeps it on
the suspect list
- now if B becomes suspect (for the same reasons as A before) C takes action

...
 FD_ALL does not unsuspect on heartbeat
 --------------------------------------

                 Key: JGRP-1523
                 URL: https://issues.jboss.org/browse/JGRP-1523
             Project: JGroups
          Issue Type: Quality Risk
    Affects Versions: 3.0.14
            Reporter: Jan Boehm
            Assignee: Bela Ban
             Fix For: 3.0.15, 3.2

 FD_ALL stores suspected nodes in "suspected_mbrs" when it receives no
heartbeats. If it does not predict the local node as new coordinator it does not pass this
suspicion upwards. Since UNSUSPECT from upwards is the only event that removes a node from
suspected_mbrs the set retains all nodes except the nodes that where newly suspected when
local became the potential new coordinator.
 This seems wasteful and wrong (it leads to wrong results if there are "stale"
suspects that would be preferred as new coordinators). The timestamps for nodes in
suspected_mbrs should be rechecked in FD_ALL.suspect before adding the new nodes. 
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[jboss-jira] [JBoss JIRA] (JGRP-1523) FD_ALL does not unsuspect on heartbeat