[infinispan-issues] [JBoss JIRA] (ISPN-4949) Split brain: inconsistent data after merge

Radim Vansa (JIRA) issues at jboss.org
Mon Nov 10 07:19:30 EST 2014


    [ https://issues.jboss.org/browse/ISPN-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018592#comment-13018592 ] 

Radim Vansa commented on ISPN-4949:
-----------------------------------

One more note about ease of configuration: It would be great if the FD* + VERIFY_SUSPECT suite had an option to compute and provide guarantees how soon it should report node as non-responsive; any timeouts in upper layers (such as timeout for acking the view as I've outlined above) could be computed automatically.

Though, this it rather a nice-to-have feature; for now we need split-brain working, and we can decide about sensible timeout defaults (and document it).

> Split brain: inconsistent data after merge
> ------------------------------------------
>
>                 Key: ISPN-4949
>                 URL: https://issues.jboss.org/browse/ISPN-4949
>             Project: Infinispan
>          Issue Type: Bug
>          Components: State Transfer
>    Affects Versions: 7.0.0.Final
>            Reporter: Radim Vansa
>            Priority: Critical
>
> 1) cluster A, B, C, D splits into 2 parts:
> A, B (coord A) finds this out immediately and enters degraded mode with CH [A, B, C, D]
> C, D (coord D) first detects that B is lost, gets view A, C, D and starts rebalance with CH [A, C, D]. Segment X is primary owned by C (it had backup on B but this got lost)
> 2) D detects that A was lost as well, therefore enters degraded mode with CH [A, C, D]
> 3) C inserts entry into X: all owners (only C) is present, therefore the modification is allowed
> 4) cluster is merged and coordinator finds out that the max stable topology has CH [A, B, C, D] (it is the older of the two partitions' topologies, got from A, B) - logs 'No active or unavailable partitions, so all the partitions must be in degraded mode' (yes, all partitions are in degraded mode, but write has happened in the meantime)
> 5) The old CH is broadcast in newest topology, no rebalance happens
> 6) Inconsistency: read in X may miss the update



--
This message was sent by Atlassian JIRA
(v6.3.8#6338)


More information about the infinispan-issues mailing list