[jboss-jira] [JBoss JIRA] (JGRP-2293) Graceful concurrent leaving of coordinator(s) leaves the cluster with stale views

Mon Sep 24 12:54:00 EDT 2018

    [ https://issues.jboss.org/browse/JGRP-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637645#comment-13637645 ] 

Radoslav Husar commented on JGRP-2293:
--------------------------------------

I don't see use of FD to handle graceful leaving as a viable option (hence this issue). In cloud environments especially where networking is notoriously unreliable, making FD act overly fast could result in false positive cluster splits and making the whole situation even worse. The solution seems to be a proper graceful shutdown, where the coordinator ensures that a successor non-leaving coordinator is elected *and* acknowledged before the node completely shuts down.

> Graceful concurrent leaving of coordinator(s) leaves the cluster with stale views
> ---------------------------------------------------------------------------------
>
>                 Key: JGRP-2293
>                 URL: https://issues.jboss.org/browse/JGRP-2293
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 4.0.14
>            Reporter: Radoslav Husar
>            Assignee: Bela Ban
>            Priority: Critical
>             Fix For: 4.0.15
>
>
> JGroups does not handle concurrent leaving of nodes correctly. This is a typical use case in cloud environment when scaled down with an autoscaler/manually which we need to handle.
> A simple test can be devised which fails first n (where n>1) nodes from a cluster, reproducer PR https://github.com/belaban/JGroups/pull/397

--
This message was sent by Atlassian JIRA
(v7.5.0#75005)