[jboss-jira] [JBoss JIRA] (JGRP-1856) FD_ALL should not update all mbrs timestamp in handleViewChange

Takayoshi Kimura (JIRA) issues at jboss.org
Mon Jun 23 03:35:24 EDT 2014


    [ https://issues.jboss.org/browse/JGRP-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978451#comment-12978451 ] 

Takayoshi Kimura commented on JGRP-1856:
----------------------------------------

For example, we have 6 members A, B, C, D, E and F. The FD_ALL is configured with interval=20 and timeout=60.

A, B, C, D are dead at same time.

On the FD_ALL timeout scan, the time from since last heartbeat looks like: A (61), B (59), C (56), D(45), E(12), F(15).

A is timed out, we suspect it and view change. During view change, the FD_ALL initializes timeout on remaining nodes: B (0), C (0), D(0), E(0), F(0).

B, C and D are live another 60 sec.

Does it make sense?

> FD_ALL should not update all mbrs timestamp in handleViewChange
> ---------------------------------------------------------------
>
>                 Key: JGRP-1856
>                 URL: https://issues.jboss.org/browse/JGRP-1856
>             Project: JGroups
>          Issue Type: Enhancement
>      Security Level: Public(Everyone can see) 
>    Affects Versions: 3.4.4
>            Reporter: Takayoshi Kimura
>            Assignee: Bela Ban
>
> In large cluster, when multiple instances (e.g. A, B, C, D) gone at once, FD_ALL interval based checker detects first few instances already timed out (e.g. A, B), suspects them and the coord sends a view change.
> In FD_ALL.handleViewChange(), it updates all mbrs timestamps including the rest of dead members (e.g. C, D) and they get extra life until suspect.
> It would be better not to do this so we can expel them sooner.



--
This message was sent by Atlassian JIRA
(v6.2.6#6264)


More information about the jboss-jira mailing list