[jboss-jira] [JBoss JIRA] (JGRP-1523) FD_ALL does not unsuspect on heartbeat

Bela Ban (JIRA) jira-events at lists.jboss.org
Fri Oct 19 02:44:01 EDT 2012


    [ https://issues.jboss.org/browse/JGRP-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727177#comment-12727177 ] 

Bela Ban edited comment on JGRP-1523 at 10/19/12 2:42 AM:
----------------------------------------------------------

[1:26pm] bela: ok, simple solution:
[1:26pm] bela: if msg_counts_as_heartbeat is true and we do receive a message (or a heartbeat) from a member in suspected_mbrs, we remove it from suspected_mbrs
[1:28pm] jboehm: why not recheck the timestamps in suspect()?
[1:28pm] jboehm: should be faster
[1:29pm] jboehm: suspect is only called if a new member is suspected, not every message, like update() in this case
[1:30pm] bela: So how would you remove A from suspected_mbrs then ?
[1:32pm] jboehm: in FD_ALL.suspect() instead of eligible_mbrs.removeAll(suspected_mbrs); iterate over suspected_mebrs, get the time stamp, check if it is OK and remove only if not
[1:33pm] jboehm: or something like that 
[1:34pm] bela: aha. And remove members from suspected_mbrs whose timestamps are below the timeout. Hmm, would work I guess
                
      was (Author: belaban):
     are in the suspect list, but from which we do receive traffic ?
[1:25pm] dberindei joined the chat room.
[1:26pm] bela: ok, simple solution:
[1:26pm] bela: if msg_counts_as_heartbeat is true and we do receive a message from a member in suspected_mbrs, we remove it from suspected_mbrs
[1:28pm] jboehm: why not recheck the timestamps in suspect()?
[1:28pm] jboehm: should be faster
[1:29pm] jboehm: suspect is only called if a new member is suspected, not every message, like update() in this case
[1:30pm] bela: So how would you remove A from suspected_mbrs then ?
[1:32pm] jboehm: in FD_ALL.suspect() instead of eligible_mbrs.removeAll(suspected_mbrs); iterate over suspected_mebrs, get the time stamp, check if it is OK and remove only if not
[1:33pm] jboehm: or something like that 
[1:34pm] bela: aha. And remove members from suspected_mbrs whose timestamps are below the timeout. Hmm, would work I guess
                  
> FD_ALL does not unsuspect on heartbeat
> --------------------------------------
>
>                 Key: JGRP-1523
>                 URL: https://issues.jboss.org/browse/JGRP-1523
>             Project: JGroups
>          Issue Type: Quality Risk
>    Affects Versions: 3.0.14
>            Reporter: Jan Boehm
>            Assignee: Bela Ban
>             Fix For: 3.2
>
>
> FD_ALL stores suspected nodes in "suspected_mbrs" when it receives no heartbeats. If it does not predict the local node as new coordinator it does not pass this suspicion upwards. Since UNSUSPECT from upwards is the only event that removes a node from suspected_mbrs the set retains all nodes except the nodes that where newly suspected when local became the potential new coordinator.
> This seems wasteful and wrong (it leads to wrong results if there are "stale" suspects that would be preferred as new coordinators). The timestamps for nodes in suspected_mbrs should be rechecked in FD_ALL.suspect before adding the new nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the jboss-jira mailing list