[
https://issues.jboss.org/browse/ISPN-1965?page=com.atlassian.jira.plugin....
]
RH Bugzilla Integration commented on ISPN-1965:
-----------------------------------------------
Dan Berindei <dberinde(a)redhat.com> made a comment on [bug
808623|https://bugzilla.redhat.com/show_bug.cgi?id=808623]
Misha, after reading it again I think it could be a little clearer. So here's another
attempt:
In rare circumstances, when a node leaves the cluster, instead of going directly to a new
cluster view that contains everyone but the leaver, the cluster splits into two partitions
which then merge after a short amount of time. During this time, at least some nodes will
not have access to all the data that previously existed in the cache. After the merge, all
the nodes will again have access to all the data, but changes made during the split may be
lost or be visible only to a part of the cluster.
Normally, when the view changes because of a join or a leave, the cache data is rebalanced
on the new cluster members. However, if numOwners or more nodes leave in quick succession,
keys for which all nodes have left will be lost. The same thing happens during a network
split - regardless how the partitions form, there will be at least one partition that
doesn't have all the data (assuming cluster size > numOwners).
While there are multiple partitions, each one can make changes to the data independently,
so a remote client will see inconsistencies in the data. When merging, JBoss Data Grid
does not attempt to resolve these inconsistencies, so different nodes may hold different
values even after the merge.
Some entries not available during view change
---------------------------------------------
Key: ISPN-1965
URL:
https://issues.jboss.org/browse/ISPN-1965
Project: Infinispan
Issue Type: Bug
Affects Versions: 5.1.3.FINAL
Reporter: Michal Linhard
Assignee: Dan Berindei
In the 4 node, dist mode, num-owners=2, elasticity test
http://www.qa.jboss.com/~mlinhard/hyperion/run44-elas-dist/
there is a cca 90 sec period of time where clients get null responses to GET
requests on entries that should exist in the cache.
first occurence:
hyperion1139.log 05:31:01,202 286.409
last occurence:
hyperion1135.log 05:32:45,441 390.648
total occurence count: (in all 19 driver nodes)
152241
(this doesn't mean it happens for 152K keys, because each key is retried after
erroneous attempt)
data doesn't seem to be lost, because these errors cease after a while and
number of entries returns back to normal (see cache_entries.csv)
this happens approximately in the period between node0001 is killed and cluster
{node0002 - node0004} is formed (and shortly after).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see:
http://www.atlassian.com/software/jira