[infinispan-issues] [JBoss JIRA] (ISPN-1965) Some entries not available during view change

Thursday, 5 April 2012

    [
https://issues.jboss.org/browse/ISPN-1965?page=com.atlassian.jira.plugin....
] 

RH Bugzilla Integration commented on ISPN-1965:
-----------------------------------------------

Dan Berindei <dberinde(a)redhat.com&gt; made a comment on [bug
808623|https://bugzilla.redhat.com/show_bug.cgi?id=808623]

Misha, after reading it again I think it could be a little clearer. So here's another
attempt:

In rare circumstances, when a node leaves the cluster, instead of going directly to a new
cluster view that contains everyone but the leaver, the cluster splits into two partitions
which then merge after a short amount of time. During this time, at least some nodes will
not have access to all the data that previously existed in the cache. After the merge, all
the nodes will again have access to all the data, but changes made during the split may be
lost or be visible only to a part of the cluster.

Normally, when the view changes because of a join or a leave, the cache data is rebalanced
on the new cluster members. However, if numOwners or more nodes leave in quick succession,
keys for which all nodes have left will be lost. The same thing happens during a network
split - regardless how the partitions form, there will be at least one partition that
doesn't have all the data (assuming cluster size > numOwners).

While there are multiple partitions, each one can make changes to the data independently,
so a remote client will see inconsistencies in the data. When merging, JBoss Data Grid
does not attempt to resolve these inconsistencies, so different nodes may hold different
values even after the merge.

...
 Some entries not available during view change
 ---------------------------------------------

                 Key: ISPN-1965
                 URL: https://issues.jboss.org/browse/ISPN-1965
             Project: Infinispan
          Issue Type: Bug
    Affects Versions: 5.1.3.FINAL
            Reporter: Michal Linhard
            Assignee: Dan Berindei

 In the 4 node, dist mode, num-owners=2, elasticity test
 http://www.qa.jboss.com/~mlinhard/hyperion/run44-elas-dist/
 there is a cca 90 sec period of time where clients get null responses to GET
 requests on entries that should exist in the cache.
 first occurence:
 hyperion1139.log 05:31:01,202 286.409
 last occurence:
 hyperion1135.log 05:32:45,441 390.648
 total occurence count: (in all 19 driver nodes)
 152241
 (this doesn't mean it happens for 152K keys, because each key is retried after
 erroneous attempt)
 data doesn't seem to be lost, because these errors cease after a while and
 number of entries returns back to normal (see cache_entries.csv)
 this happens approximately in the period between node0001 is killed and cluster
 {node0002 - node0004} is formed (and shortly after). 
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[infinispan-issues] [JBoss JIRA] (ISPN-1965) Some entries not available during view change