[infinispan-issues] [JBoss JIRA] (ISPN-4949) Split brain: inconsistent data after merge

Tue Nov 18 04:49:39 EST 2014

    [ https://issues.jboss.org/browse/ISPN-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020687#comment-13020687 ] 

Bela Ban commented on ISPN-4949:
--------------------------------

{quote}
 Therefore, we can add another layer or make the JGroups view optionally 'reliable'.
{quote}

I don't like your use of the word _reliable_; it suggests view installation is _unreliable_, which is not true. View installation is _reliable_, but failure detection itself is _unreliable_ and will always be. There is no way to write a reliable failure detection in an asynchronous messaging system [1].
In other words, when the failure detection _thinks_ a member has failed, a new view will be installed _reliably_ (but without consensus).

{quote}
I haven't considered FD, as the detection of such split would take looong anyway, I was really rather thinking of protocols where any node failure detection is constant (therefore, FD would have to suspect all other nodes when a failure is detected, and use VERIFY_SUSPECT to check who is still alive).
{quote}
FD_ALL or FD_ALL2 come closest to this, but they are _unreliable_ as well.

{quote}
Not sure if I understand; The algorithm installs the view as soon as it gets the ack or timeout from all members.
{quote}
I think according to the semantics implied by your use of the term _reliable view installation_, you *cannot* install a view unless you get an ack from all members in it. So in order to install view 1-50, you'd have to get an ack from members [1..50]. If a single member, e.g. 32, doesn't ack the view, you cannot install that view. In this case, I think retrying to get 50 acks until all acks have been received or a new view is to be installed would probably be best.

[1] http://www.cs.yale.edu/homes/aspnes/pinewiki/FailureDetectors.html

> Split brain: inconsistent data after merge
> ------------------------------------------
>
>                 Key: ISPN-4949
>                 URL: https://issues.jboss.org/browse/ISPN-4949
>             Project: Infinispan
>          Issue Type: Bug
>          Components: State Transfer
>    Affects Versions: 7.0.0.Final
>            Reporter: Radim Vansa
>            Assignee: Dan Berindei
>            Priority: Critical
>
> 1) cluster A, B, C, D splits into 2 parts:
> A, B (coord A) finds this out immediately and enters degraded mode with CH [A, B, C, D]
> C, D (coord D) first detects that B is lost, gets view A, C, D and starts rebalance with CH [A, C, D]. Segment X is primary owned by C (it had backup on B but this got lost)
> 2) D detects that A was lost as well, therefore enters degraded mode with CH [A, C, D]
> 3) C inserts entry into X: all owners (only C) is present, therefore the modification is allowed
> 4) cluster is merged and coordinator finds out that the max stable topology has CH [A, B, C, D] (it is the older of the two partitions' topologies, got from A, B) - logs 'No active or unavailable partitions, so all the partitions must be in degraded mode' (yes, all partitions are in degraded mode, but write has happened in the meantime)
> 5) The old CH is broadcast in newest topology, no rebalance happens
> 6) Inconsistency: read in X may miss the update

--
This message was sent by Atlassian JIRA
(v6.3.8#6338)