[JBoss JIRA] (ISPN-8480) Data Inconsistency in case of Topology change in Infinispan Cluster

Monday, 30 April 2018

    [
https://issues.jboss.org/browse/ISPN-8480?page=com.atlassian.jira.plugin....
] 

Dan Berindei commented on ISPN-8480:
------------------------------------

The subject of the bug is too generic, all these scenarios seem to be centered on the
cluster splitting in 2 partitions and merging back.

In Infinispan < 9.1, there is no way to combine entries from the 2 partitions after the
merge: the cache would always select one partition as the "correct" partition
during the merge, and updates made in the other partition would be lost. The only way to
avoid losing updates is to enable partition handling with {{<partition-handling
enabled="true"/>}}, which throws an {{AvailabilityException}} while the
cluster is split if the write could be lost during the merge.

In Infinispan 9.1 we introduced [merge
policies|http://infinispan.org/docs/stable/user_guide/user_guide.html#con...]
that allow combining the entries from the 2 partitions after the merge. This is still a
work in progress, and version 9.3 will include several fixes and improvements.

...
 Data Inconsistency in case of Topology change in Infinispan Cluster
 -------------------------------------------------------------------

                 Key: ISPN-8480
                 URL: https://issues.jboss.org/browse/ISPN-8480
             Project: Infinispan
          Issue Type: Bug
    Affects Versions: 8.2.5.Final
            Reporter: Rohit Singh
            Priority: Blocker
              Labels: Infinispan, JGroups, hibernate_2nd_level_cache

 {color:red}*Data Inconsistency in case of Topology change in Infinispan Cluster*{color}
 *Infinispan Version : 8.2.5*
 *Hibernate Version : 5.2.8*
 *JGROUPS Version : 3.6.7*
 *Clustering Mode : Replication*
 We have tested the same with invalidation mode too.
 Refer below config cache-config for hibernate L2 entity types:
 <replicated-cache-configuration name="entity" mode="SYNC"
remote-timeout="20000" statistics="false"
statistics-available="false">
                                 <state-transfer  enabled="false"
timeout="20000000"/>
          <locking isolation="READ_COMMITTED"
concurrency-level="1000" acquire-timeout="15000"
striping="false"/>
          <transaction mode="NONE" auto-commit="false"
locking="OPTIMISTIC"/>
                                 <eviction size="-1"
strategy="NONE"/>
          <expiration max-idle="-1" interval="5000"
lifespan="-1" />
       </replicated-cache-configuration>

 When a disconnected node rejoins the cluster, data remains inconsistent on the
reconnected node.
 *This is happening for both Hibernate L2 Cache and some custom cache (AdvancedCache).*
 The below 4 scenarios should explain the issue.
 *For Example:*
                 *Scenario (Issue) 1:*
                 -Initially the cluster comprises of 4 nodes, namely {A,B,C,D}.
                 -Somehow, node D gets removed from the cluster view.
                 -Then some updates/inserts in hibernate L2 cache is done on Node B.
                 -These updates/inserts gets propagated to all the nodes in the current
cluster view, i.e. {A,B,C}.
                 -And these updates/inserts doesn't get propagated to the Node D.
                 -Now D has stale state of the L2 cache.
                 -We expect, Node D should get the updated state of L2 Cache from
{A,B,C}.

                 *Scenario (Issue) 2:*
                 -Initially the cluster comprises of 4 nodes, namely {A,B,C,D}.
                 -Somehow, node D gets removed from the cluster view.
                 -Then some updates/inserts in hibernate L2 cache is done on Node B.
                 -These updates/inserts gets propagated to all the nodes in the current
cluster view, i.e. {A,B,C}.
                 -And these updates/inserts doesn't get propagated to the Node D.
                 -Now D has stale state of the L2 cache.
                 -Now D rejoins the cluster {A,B,C}.
                 -Now the updated cluster view is {A,B,C,D}.
                 -Still D has stale state of the L2 cache.
                 -We expect, Node D should get the updated state of L2 Cache from
{A,B,C}.

                 *Scenario (Issue) 3:*
                 -Initially the cluster comprises of 4 nodes, namely {A,B,C,D}.
                 -Somehow, node D gets removed from the cluster view.
                 -Then some updates/inserts in hibernate L2 cache is done on Node B.
                 -These updates/inserts gets propagated to all the nodes in the current
cluster view, i.e. {A,B,C}.
                 -And these updates/inserts doesn't get propagated to the Node D.
                 -Subsequently, some updates/inserts in hibernate L2 cache is done on Node
D too. These updates are done on some other keys, and not on the keys on which the updates
were done by Node B.
                 -Now {A,B,C} and Node D have different but updated state of the L2 cache,
And are not in sync.
                 -Now D rejoins the cluster {A,B,C}.
                 -Now the updated cluster view is {A,B,C,D}.
                 -Still {A,B,C} and Node D have different but updated state of the L2
cache, And are not in sync.
                 -We expect, updates from {A,B,C} and Node D should get merged to all the
nodes in cluster.

                 *Scenario (Issue) 4:*
                 -Initially the cluster comprises of 4 nodes, namely {A,B,C,D}.
                 -Somehow, node D gets removed from the cluster view.
                 -Then some updates/inserts in hibernate L2 cache is done on Node B.
                 -These updates/inserts gets propagated to all the nodes in the current
cluster view, i.e. {A,B,C}.
                 -And these updates/inserts doesn't get propagated to the Node D.
                 -Subsequently, some updates/inserts in hibernate L2 cache is done on Node
D too. These updates might be on the same keys on which the updates were  done by Node B.
                 -Now D has more updated state of the L2 cache.
                 -And {A,B,C} are having stale state of the L2 Cache.
                 -Now D rejoins the cluster {A,B,C}.
                 -Now the updated cluster view is {A,B,C,D}.
                 -Still {A,B,C} have stale state of the L2 cache.
                 -We expect, {A,B,C} should get the updated state of L2 Cache from Node
D.

--
This message was sent by Atlassian JIRA
(v7.5.0#75005)

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009