[
https://issues.jboss.org/browse/JGRP-1523?page=com.atlassian.jira.plugin....
]
Jan Boehm commented on JGRP-1523:
---------------------------------
As I understand FD_ALL.suspect() it only propagates the suspects if it believes to be the
(new?) coordinator (if local_add.equals(first)).
Supposed you have a member list with the following suspect states [old_suspect,
new_suspect, me, ...] while suspect() is called:
- Old_suspect indicates a wrongly suspected node that was retained in the suspect list
(the suspicion was not propagated, because new_suspect was not suspect at the time and
thus suspect() did not propagate the suspicion).
- New_suspect is a new suspect, passed to the current call of suspect().
- Me is the local node.
In this case the local node would start verifying its suspicion despite the fact, that
old_suspect may be perfectly reachable now and according to the "first" check
should be the new coordinator and thus responsible for checking new_suspect.
Contrast this with the case in which old_suspect was never suspected: Now the local node
would not propagate the suspicion about new_suspect. I am not sure if this inconsistency
causes clustering problems, but it seems unintentional.
FD_ALL does not unsuspect on heartbeat
--------------------------------------
Key: JGRP-1523
URL:
https://issues.jboss.org/browse/JGRP-1523
Project: JGroups
Issue Type: Quality Risk
Affects Versions: 3.0.14
Reporter: Jan Boehm
Assignee: Bela Ban
Fix For: 3.0.15, 3.2
FD_ALL stores suspected nodes in "suspected_mbrs" when it receives no
heartbeats. If it does not predict the local node as new coordinator it does not pass this
suspicion upwards. Since UNSUSPECT from upwards is the only event that removes a node from
suspected_mbrs the set retains all nodes except the nodes that where newly suspected when
local became the potential new coordinator.
This seems wasteful and wrong (it leads to wrong results if there are "stale"
suspects that would be preferred as new coordinators). The timestamps for nodes in
suspected_mbrs should be rechecked in FD_ALL.suspect before adding the new nodes.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:
http://www.atlassian.com/software/jira