[jboss-jira] [JBoss JIRA] (JGRP-1902) Simplify failure detection and merge timeout configuration
Dan Berindei (JIRA)
issues at jboss.org
Wed Nov 19 05:25:40 EST 2014
Dan Berindei created JGRP-1902:
----------------------------------
Summary: Simplify failure detection and merge timeout configuration
Key: JGRP-1902
URL: https://issues.jboss.org/browse/JGRP-1902
Project: JGroups
Issue Type: Enhancement
Affects Versions: 3.6
Reporter: Dan Berindei
Assignee: Bela Ban
Fix For: 4.0
FD/FD_ALL/FD_ALL2/FD_SOCK javadoc doesn't give any guidance as to how long it would take to detect a leaving member. MERGE2/MERGE3 javadoc also doesn't say how much it would take to detect that the network has healed.
For an example of how misleading the current settings can be, I have seen MERGE3 take more than 20s to merge two partitions with min_interval=1000 and max_interval=5000. FD also detects a leaver after {{timeout * max_tries}} in the best case, and twice that if 2 consecutive nodes (in the members list) leave at the same time.
The maximum time it takes to detect a leaver is of particular interest to Infinispan users, because Infinispan is supposed to protect against nodes leaving. But if the users don't configure a high enough RPC timeout in Infinispan, we don't get to detect the node leaving.
Ideally, the user should be able to specify a maximum detection time, and the protocol should adjust the existing settings to meet that (most of the time).
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
More information about the jboss-jira
mailing list