[JBoss JIRA] (ISPN-4949) Split brain: inconsistent data after merge

Tuesday, 18 November 2014

    [
https://issues.jboss.org/browse/ISPN-4949?page=com.atlassian.jira.plugin....
] 

Bela Ban commented on ISPN-4949:
--------------------------------

{quote}
 Therefore, we can add another layer or make the JGroups view optionally
'reliable'.
{quote}

I don't like your use of the word _reliable_; it suggests view installation is
_unreliable_, which is not true. View installation is _reliable_, but failure detection
itself is _unreliable_ and will always be. There is no way to write a reliable failure
detection in an asynchronous messaging system [1].
In other words, when the failure detection _thinks_ a member has failed, a new view will
be installed _reliably_ (but without consensus).

{quote}
I haven't considered FD, as the detection of such split would take looong anyway, I
was really rather thinking of protocols where any node failure detection is constant
(therefore, FD would have to suspect all other nodes when a failure is detected, and use
VERIFY_SUSPECT to check who is still alive).
{quote}
FD_ALL or FD_ALL2 come closest to this, but they are _unreliable_ as well.

{quote}
Not sure if I understand; The algorithm installs the view as soon as it gets the ack or
timeout from all members.
{quote}
I think according to the semantics implied by your use of the term _reliable view
installation_, you *cannot* install a view unless you get an ack from all members in it.
So in order to install view 1-50, you'd have to get an ack from members [1..50]. If a
single member, e.g. 32, doesn't ack the view, you cannot install that view. In this
case, I think retrying to get 50 acks until all acks have been received or a new view is
to be installed would probably be best.

[1] http://www.cs.yale.edu/homes/aspnes/pinewiki/FailureDetectors.html

...
 Split brain: inconsistent data after merge
 ------------------------------------------

                 Key: ISPN-4949
                 URL: https://issues.jboss.org/browse/ISPN-4949
             Project: Infinispan
          Issue Type: Bug
          Components: State Transfer
    Affects Versions: 7.0.0.Final
            Reporter: Radim Vansa
            Assignee: Dan Berindei
            Priority: Critical

 1) cluster A, B, C, D splits into 2 parts:
 A, B (coord A) finds this out immediately and enters degraded mode with CH [A, B, C, D]
 C, D (coord D) first detects that B is lost, gets view A, C, D and starts rebalance with
CH [A, C, D]. Segment X is primary owned by C (it had backup on B but this got lost)
 2) D detects that A was lost as well, therefore enters degraded mode with CH [A, C, D]
 3) C inserts entry into X: all owners (only C) is present, therefore the modification is
allowed
 4) cluster is merged and coordinator finds out that the max stable topology has CH [A, B,
C, D] (it is the older of the two partitions' topologies, got from A, B) - logs
'No active or unavailable partitions, so all the partitions must be in degraded
mode' (yes, all partitions are in degraded mode, but write has happened in the
meantime)
 5) The old CH is broadcast in newest topology, no rebalance happens
 6) Inconsistency: read in X may miss the update 

--
This message was sent by Atlassian JIRA
(v6.3.8#6338)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009