[jboss-jira] [JBoss JIRA] (JGRP-1856) FD_ALL should not update all mbrs timestamp in handleViewChange
Takayoshi Kimura (JIRA)
issues at jboss.org
Mon Jun 23 03:35:24 EDT 2014
[ https://issues.jboss.org/browse/JGRP-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978451#comment-12978451 ]
Takayoshi Kimura commented on JGRP-1856:
----------------------------------------
For example, we have 6 members A, B, C, D, E and F. The FD_ALL is configured with interval=20 and timeout=60.
A, B, C, D are dead at same time.
On the FD_ALL timeout scan, the time from since last heartbeat looks like: A (61), B (59), C (56), D(45), E(12), F(15).
A is timed out, we suspect it and view change. During view change, the FD_ALL initializes timeout on remaining nodes: B (0), C (0), D(0), E(0), F(0).
B, C and D are live another 60 sec.
Does it make sense?
> FD_ALL should not update all mbrs timestamp in handleViewChange
> ---------------------------------------------------------------
>
> Key: JGRP-1856
> URL: https://issues.jboss.org/browse/JGRP-1856
> Project: JGroups
> Issue Type: Enhancement
> Security Level: Public(Everyone can see)
> Affects Versions: 3.4.4
> Reporter: Takayoshi Kimura
> Assignee: Bela Ban
>
> In large cluster, when multiple instances (e.g. A, B, C, D) gone at once, FD_ALL interval based checker detects first few instances already timed out (e.g. A, B), suspects them and the coord sends a view change.
> In FD_ALL.handleViewChange(), it updates all mbrs timestamps including the rest of dead members (e.g. C, D) and they get extra life until suspect.
> It would be better not to do this so we can expel them sooner.
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
More information about the jboss-jira
mailing list