[
https://issues.jboss.org/browse/ISPN-5021?page=com.atlassian.jira.plugin....
]
Dan Berindei commented on ISPN-5021:
------------------------------------
This is causing random failures in {{JCacheTwoCachesAnnotationsTest}}. The test has one
per-class node that is reused by all test methods, and 2 per-method nodes that are created
just during the test method (without waiting for the rebalance to finish.)
The scenario is like this:
# Rebalance starts, readCH = \{node1, node2\}, writeCH = \{node1, node2, node3\}
# Rebalance ends, but CH_UPDATE is delayed on node2
# {{cache1.put(k, v)}} updates the value on node1 and node3
# {{cache2.entrySet().iterator()}} sees that all segments are local and returns 0
entries.
# {{cache2.get(k)}} would also return {{null}} at this point, but the test fails before
calling it.
Nodes that finish the rebalance later can see outdated values
-------------------------------------------------------------
Key: ISPN-5021
URL:
https://issues.jboss.org/browse/ISPN-5021
Project: Infinispan
Issue Type: Bug
Components: Core, State Transfer
Affects Versions: 7.0.2.Final
Reporter: Dan Berindei
Assignee: Pedro Ruivo
Priority: Critical
Labels: testsuite_stability
Copied from
[
ISPN-4444|https://issues.jboss.org/browse/ISPN-4444?focusedCommentId=1302...]
If the CH_UPDATE command is delayed on the old owner, the new owners might update the key
without the old owner knowing, and a locality check on the old owner won't help.
I remember one thing that struck me when reading the Raft algorithm was that they install
configuration changes symmetrically, in 3 phases. We might need to do the same for our
rebalance:
1. T0: read_ch=old, write_ch=old
2. start a rebalance
3. T1: read_ch=old, write_ch=old+new
4. new owners have all the data
5. T2: read_ch=new, write_ch=old+new
6. remove old cache entries and ignore further writes
7. T3: read_ch=new, write_ch=new
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)