[jboss-jira] [JBoss JIRA] Commented: (JGRP-853) Failure detection: multiple crashes not detected

Bela Ban (JIRA) jira-events at lists.jboss.org
Fri Oct 31 03:27:21 EDT 2008


    [ https://jira.jboss.org/jira/browse/JGRP-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12436430#action_12436430 ] 

Bela Ban commented on JGRP-853:
-------------------------------

The reason is that we take the membership as the basis of picking the new ping destination. This is wrong because some members of the membership might already be suspected. In that case, we might return with null as ping_dest, and the pinger thread terminates.
Solution: use a set that starts out as a copy of the membership, but remove all suspected members.

> Failure detection: multiple crashes not detected
> ------------------------------------------------
>
>                 Key: JGRP-853
>                 URL: https://jira.jboss.org/jira/browse/JGRP-853
>             Project: JGroups
>          Issue Type: Bug
>            Reporter: Bela Ban
>            Assignee: Bela Ban
>             Fix For: 2.6.6
>
>
> When many nodes in a cluster crash simultaneously, the surviving node doesn't change its view.
> To reproduce:
> - Start 10 instances of Draw in a shell
> - Start 1 instance in a different shell
> - Kill shell #1
> ==> The view of the last instance doesn't go back to 1
> This works fine in CVS head (2.7).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        



More information about the jboss-jira mailing list