[JBoss JIRA] (ISPN-5021) Nodes that finish the rebalance later can see outdated values

Monday, 7 November 2016

    [
https://issues.jboss.org/browse/ISPN-5021?page=com.atlassian.jira.plugin....
] 

Dan Berindei commented on ISPN-5021:
------------------------------------

This is causing random failures in {{JCacheTwoCachesAnnotationsTest}}. The test has one
per-class node that is reused by all test methods, and 2 per-method nodes that are created
just during the test method (without waiting for the rebalance to finish.)

The scenario is like this:
# Rebalance starts, readCH = \{node1, node2\}, writeCH = \{node1, node2, node3\}
# Rebalance ends, but CH_UPDATE is delayed on node2
# {{cache1.put(k, v)}} updates the value on node1 and node3
# {{cache2.entrySet().iterator()}} sees that all segments are local and returns 0
entries.
# {{cache2.get(k)}} would also return {{null}} at this point, but the test fails before
calling it.

...
 Nodes that finish the rebalance later can see outdated values
 -------------------------------------------------------------

                 Key: ISPN-5021
                 URL: https://issues.jboss.org/browse/ISPN-5021
             Project: Infinispan
          Issue Type: Bug
          Components: Core, State Transfer
    Affects Versions: 7.0.2.Final
            Reporter: Dan Berindei
            Assignee: Pedro Ruivo
            Priority: Critical
              Labels: testsuite_stability

 Copied from
[ISPN-4444|https://issues.jboss.org/browse/ISPN-4444?focusedCommentId=1302...]
 If the CH_UPDATE command is delayed on the old owner, the new owners might update the key
without the old owner knowing, and a locality check on the old owner won't help.
 I remember one thing that struck me when reading the Raft algorithm was that they install
configuration changes symmetrically, in 3 phases. We might need to do the same for our
rebalance:
 1. T0: read_ch=old, write_ch=old
 2. start a rebalance
 3. T1: read_ch=old, write_ch=old+new
 4. new owners have all the data
 5. T2: read_ch=new, write_ch=old+new
 6. remove old cache entries and ignore further writes
 7. T3: read_ch=new, write_ch=new 

--
This message was sent by Atlassian JIRA
(v7.2.3#72005)

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009