[
https://issues.jboss.org/browse/JGRP-1523?page=com.atlassian.jira.plugin....
]
Jan Boehm commented on JGRP-1523:
---------------------------------
More explanation (from IRC):
- suppose a cluster view containing nodes [A B C]
- the node that we look at is C
- first it suspects A as beeing in fault (in my use case this is due to an infinispan bug,
that causes OOB messages to be dropped because no threads are available to handle them, in
this case heartbeats)
- since B is in the member list before C, C will not propagate the suspicion
- now the load on C decreases, and new heartbeats from A arrive, but C still keeps it on
the suspect list
- now if B becomes suspect (for the same reasons as A before) C takes action
FD_ALL does not unsuspect on heartbeat
--------------------------------------
Key: JGRP-1523
URL:
https://issues.jboss.org/browse/JGRP-1523
Project: JGroups
Issue Type: Quality Risk
Affects Versions: 3.0.14
Reporter: Jan Boehm
Assignee: Bela Ban
Fix For: 3.0.15, 3.2
FD_ALL stores suspected nodes in "suspected_mbrs" when it receives no
heartbeats. If it does not predict the local node as new coordinator it does not pass this
suspicion upwards. Since UNSUSPECT from upwards is the only event that removes a node from
suspected_mbrs the set retains all nodes except the nodes that where newly suspected when
local became the potential new coordinator.
This seems wasteful and wrong (it leads to wrong results if there are "stale"
suspects that would be preferred as new coordinators). The timestamps for nodes in
suspected_mbrs should be rechecked in FD_ALL.suspect before adding the new nodes.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:
http://www.atlassian.com/software/jira