[JBoss JIRA] (ISPN-5046) PartitionHandling: split during commit can leave the cache inconsistent after merge

Tuesday, 9 December 2014

    [
https://issues.jboss.org/browse/ISPN-5046?page=com.atlassian.jira.plugin....
] 

Bela Ban edited comment on ISPN-5046 at 12/9/14 5:41 AM:
---------------------------------------------------------

What if A and B crashed instead of getting partitioned away ? 
* How long will C hang on to T trying to contact B to see if B committed T ?
* What if E joined CD to form CDE ?
** At this point, CDE would not be degraded any longer and might copy key K from the new
primary owner B to E

Or what happens if there *was* a partition, so we now have AB and CD, but now X joins AB
into ABX and Y joined CD into CDY ?

NOTE: my comments on new nodes joining a partition are probably moot, as I assume that
your model precludes this; and there is always a static set of nodes to work with.. ?

was (Author: belaban):
What if A and B crashed instead of getting partitioned away ? 
* How long will C hang on to T trying to contact B to see if B committed T ?
* What if E joined CD to form CDE ?
** At this point, CDE would not be degraded any longer and might copy key K from the new
primary owner B to E

Or what happens if there *was* a partition, so we now have AB and CD, but now X joins AB
into ABX and Y joined CD into CDY ?

...
 PartitionHandling: split during commit can leave the cache
inconsistent after merge
 -----------------------------------------------------------------------------------

                 Key: ISPN-5046
                 URL: https://issues.jboss.org/browse/ISPN-5046
             Project: Infinispan
          Issue Type: Bug
          Components: Core, State Transfer
    Affects Versions: 7.0.2.Final, 7.1.0.Alpha1
            Reporter: Dan Berindei
            Priority: Critical
             Fix For: 7.1.0.Beta1

 Say we have a cluster ABCD; a transaction T was started on A, with B as the primary owner
and C the backup owner. B and C both acknowledge the prepare, and the network splits into
AB and CD right before A sends the commit command. Eventually A suspects C and D, but the
commit still succeeds on B before C and D are suspected. And SuspectExceptions are ignored
for commit commands, so the user won't see any error.
 However, C will eventually suspect A and B. When the CD cache topology is installed, it
will roll back transaction T. After the merge, both partitions are in degraded mode, so we
assume that they both have the latest data and the key is never updated on C.
 From C's point of view, this is very similar to ISPN-3421. The fix should also be
similar, we could delay the transaction rollback on C until we get a confirmation from B
that T was not committed there. Since B is inaccessible, it will eventually get a
SuspectException and the CD cache topology, at which point the cache is in degraded mode
and it can wait for a merge. On merge, it should check the status of the transaction on B
again, and either commit or rollback based on what B did.
 We also need to suspend the cleanup of completed transactions while the cache is in
degraded mode, otherwise C might not find T on B after the merge. 

--
This message was sent by Atlassian JIRA
(v6.3.8#6338)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009