[infinispan-issues] [JBoss JIRA] (ISPN-4949) Split brain: inconsistent data after merge

Dan Berindei (JIRA) issues at jboss.org
Mon Nov 10 04:47:29 EST 2014


    [ https://issues.jboss.org/browse/ISPN-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018040#comment-13018040 ] 

Dan Berindei edited comment on ISPN-4949 at 11/10/14 4:46 AM:
--------------------------------------------------------------

As commented on IRC by [~dan.berindei], the problem is deeper. When the cluster ABCD can break into views ABC, BCD, it's possible that both (active) parts will modify an entry.

It seems the nodes need to achieve consensus about view membership - C must not be part of two views at any moment. That requires a modification in JGroups, not in Infinispan, and may prove troublesome even from theoretical perspective.


was (Author: rvansa):
As commented on IRC by [~dan.berindei], the problem is deeper. When the cluster ABCD can break into views ABC, CDB, it's possible that both (active) parts will modify an entry.

It seems the nodes need to achieve consensus about view membership - C must not be part of two views at any moment. That requires a modification in JGroups, not in Infinispan, and may prove troublesome even from theoretical perspective.

> Split brain: inconsistent data after merge
> ------------------------------------------
>
>                 Key: ISPN-4949
>                 URL: https://issues.jboss.org/browse/ISPN-4949
>             Project: Infinispan
>          Issue Type: Bug
>          Components: State Transfer
>    Affects Versions: 7.0.0.Final
>            Reporter: Radim Vansa
>            Priority: Critical
>
> 1) cluster A, B, C, D splits into 2 parts:
> A, B (coord A) finds this out immediately and enters degraded mode with CH [A, B, C, D]
> C, D (coord D) first detects that B is lost, gets view A, C, D and starts rebalance with CH [A, C, D]. Segment X is primary owned by C (it had backup on B but this got lost)
> 2) D detects that A was lost as well, therefore enters degraded mode with CH [A, C, D]
> 3) C inserts entry into X: all owners (only C) is present, therefore the modification is allowed
> 4) cluster is merged and coordinator finds out that the max stable topology has CH [A, B, C, D] (it is the older of the two partitions' topologies, got from A, B) - logs 'No active or unavailable partitions, so all the partitions must be in degraded mode' (yes, all partitions are in degraded mode, but write has happened in the meantime)
> 5) The old CH is broadcast in newest topology, no rebalance happens
> 6) Inconsistency: read in X may miss the update



--
This message was sent by Atlassian JIRA
(v6.3.8#6338)


More information about the infinispan-issues mailing list