[infinispan-issues] [JBoss JIRA] (ISPN-4949) Split brain: inconsistent data after merge

Mon Nov 10 05:09:29 EST 2014

    [ https://issues.jboss.org/browse/ISPN-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018556#comment-13018556 ] 

Dan Berindei commented on ISPN-4949:
------------------------------------

I have talked to Bela and he's considering installing the view in two phases. In the first phase, the coordinator would check that each of the view members is available, so it wouldn't be possible for B to install view BCD.

However, we can do the same thing in Infinispan. Right now, the cache topology installation is done with a single asynchronous {{CacheTopologyControlCommand(CH_UPDATE)}} command. We can add a prepare phase that checks that all the topology members are available (i.e. responding to coordinator messages), and that would also prevent node B from installing cache topology BCD.

We could also say Infinispan shouldn't keep a partition as available if it's possible that writes from another partition will succeed. With {{numOwners == 2}}, if both partitions eliminate one key owner from the CH, they can both update the key. But if we required each partition to have a majority of owners, and {{numOwners == 3}}, it wouldn't be possible to update the key in both partitions. The main problem with that is that the we've always pushed {{numOwners = 2}} as the default, and with this change Infinispan would enter degraded mode after a single node crash.

> Split brain: inconsistent data after merge
> ------------------------------------------
>
>                 Key: ISPN-4949
>                 URL: https://issues.jboss.org/browse/ISPN-4949
>             Project: Infinispan
>          Issue Type: Bug
>          Components: State Transfer
>    Affects Versions: 7.0.0.Final
>            Reporter: Radim Vansa
>            Priority: Critical
>
> 1) cluster A, B, C, D splits into 2 parts:
> A, B (coord A) finds this out immediately and enters degraded mode with CH [A, B, C, D]
> C, D (coord D) first detects that B is lost, gets view A, C, D and starts rebalance with CH [A, C, D]. Segment X is primary owned by C (it had backup on B but this got lost)
> 2) D detects that A was lost as well, therefore enters degraded mode with CH [A, C, D]
> 3) C inserts entry into X: all owners (only C) is present, therefore the modification is allowed
> 4) cluster is merged and coordinator finds out that the max stable topology has CH [A, B, C, D] (it is the older of the two partitions' topologies, got from A, B) - logs 'No active or unavailable partitions, so all the partitions must be in degraded mode' (yes, all partitions are in degraded mode, but write has happened in the meantime)
> 5) The old CH is broadcast in newest topology, no rebalance happens
> 6) Inconsistency: read in X may miss the update

--
This message was sent by Atlassian JIRA
(v6.3.8#6338)