[infinispan-issues] [JBoss JIRA] (ISPN-4949) Split brain: inconsistent data after merge

Tue Nov 18 06:32:39 EST 2014

    [ https://issues.jboss.org/browse/ISPN-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020747#comment-13020747 ] 

Radim Vansa commented on ISPN-4949:
-----------------------------------

{quote}
I don't like your use of the word reliable; it suggests view installation is unreliable, which is not true. View installation is reliable, but failure detection itself is unreliable and will always be. There is no way to write a reliable failure detection in an asynchronous messaging system [1].
In other words, when the failure detection thinks a member has failed, a new view will be installed reliably (but without consensus).{quote}
OK, poor choice of words, let's use rather 'confirmed view'.

{quote}
I think according to the semantics implied by your use of the term reliable view installation, you cannot install a view unless you get an ack from all members in it. So in order to install view 1-50, you'd have to get an ack from members [1..50]. If a single member, e.g. 32, doesn't ack the view, you cannot install that view. In this case, I think retrying to get 50 acks until all acks have been received or a new view is to be installed would probably be best.
{quote}
Sure, if 32 does not ack within the timeout, you'd send [1..31,33..50] and wait for the 49 acks.

So, can we expect this in JGroups (under JGRP-1901), or should it rather be implemented in Infinispan? ([~dan.berindei], what's your status on this?)

> Split brain: inconsistent data after merge
> ------------------------------------------
>
>                 Key: ISPN-4949
>                 URL: https://issues.jboss.org/browse/ISPN-4949
>             Project: Infinispan
>          Issue Type: Bug
>          Components: State Transfer
>    Affects Versions: 7.0.0.Final
>            Reporter: Radim Vansa
>            Assignee: Dan Berindei
>            Priority: Critical
>
> 1) cluster A, B, C, D splits into 2 parts:
> A, B (coord A) finds this out immediately and enters degraded mode with CH [A, B, C, D]
> C, D (coord D) first detects that B is lost, gets view A, C, D and starts rebalance with CH [A, C, D]. Segment X is primary owned by C (it had backup on B but this got lost)
> 2) D detects that A was lost as well, therefore enters degraded mode with CH [A, C, D]
> 3) C inserts entry into X: all owners (only C) is present, therefore the modification is allowed
> 4) cluster is merged and coordinator finds out that the max stable topology has CH [A, B, C, D] (it is the older of the two partitions' topologies, got from A, B) - logs 'No active or unavailable partitions, so all the partitions must be in degraded mode' (yes, all partitions are in degraded mode, but write has happened in the meantime)
> 5) The old CH is broadcast in newest topology, no rebalance happens
> 6) Inconsistency: read in X may miss the update

--
This message was sent by Atlassian JIRA
(v6.3.8#6338)