Ok, thanks. Based on that my intent is to defer
https://jira.jboss.org/jira/browse/JBAS-6604 until this is sorted. Would
you agree?
Bela Ban wrote:
Yes, the change from FD to FD_ALL is correct.
However, as I've found out on a recent clustering consulting gig, FD_ALL
has some issues with FD.shun="true" on merging. The issue we ran into
was a shunfest, where entire subclusters shunned all nodes in different
subclusters. This lead to the issue where the cluster never re-merged
due to excessive shunning going on.
I've created a JIRA issue [1] to look into getting rid of shunning and
only deal with merging. However, this is going to be in 2.9 the
earliest, not in the 2.6 branch.
[1]
https://jira.jboss.org/jira/browse/JGRP-937
Brian Stansberry wrote:
> I have a task to convert JBoss AS's UDP-based stacks from FD to
> FD_ALL.[1] I want to make sure I understand the configuration correctly
> such that I can get the same behavior as we have now for the simple case
> where one node crashes. (I realize FD_ALL can detect multiple failures
> more quickly, which is good.)
>
> The AS currently uses:
>
> <FD timeout="6000" max_tries="5" shun="true"/>
>
> Would this be fairly similar:
>
> <FD_ALL timeout="30000" interval="6000"
shun="true"/>
>
> In both case it takes 30 secs to suspect a node, in both cases a message
> is sent every 6 seconds, and in both cases 5 such messages would have to
> be lost for some reason before a healthy node would be suspected.
>
> [1]
https://jira.jboss.org/jira/browse/JBAS-6604
--
Brian Stansberry
Lead, AS Clustering
JBoss, a division of Red Hat
brian.stansberry(a)redhat.com