[infinispan-issues] [JBoss JIRA] (ISPN-4444) After state transfer, a node is able to read keys it no longer owns from its data container
Dan Berindei (JIRA)
issues at jboss.org
Tue Nov 25 05:54:39 EST 2014
[ https://issues.jboss.org/browse/ISPN-4444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022645#comment-13022645 ]
Dan Berindei commented on ISPN-4444:
------------------------------------
[~pruivo] I think I have missed something in the bug description. If the CH_UPDATE command is delayed on the old owner, the new owners might update the key without the old owner knowing, and a locality check on the old owner won't help.
I remember one thing that struck me when reading the Raft algorithm was that they install configuration changes symmetrically, in 3 phases. We might need to do the same for our rebalance: start a rebalance with {{read_ch=old, write_ch=old+new}}, when the new owners have all the data install {{read_ch=new, write_ch=old+new}}, and finally {{read_ch=new, write_ch=new}}. Old cache entries are removed during the 2nd topology update, and further writes should be ignored, in order for this to work.
> After state transfer, a node is able to read keys it no longer owns from its data container
> -------------------------------------------------------------------------------------------
>
> Key: ISPN-4444
> URL: https://issues.jboss.org/browse/ISPN-4444
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 7.0.0.Alpha4
> Reporter: Dan Berindei
> Assignee: Pedro Ruivo
> Priority: Critical
> Fix For: 7.1.0.Alpha1
>
>
> When state transfer ends and each node receives a CH_UPDATE command from the coordinator, it first installs the new topology and then it starts invalidating entries it no longer owns.
> However, there are two cases when the node can still read its stale values:
> 1. If L1 is enabled, it will look in the local DataContainer first, regardless of the key's location.
> 2. If L1 is disabled, but the key was removed on the new owners, the node will still look up the key in the local DataContainer after receiving a null response.
> The problem can be reproduced with {{TxReadAfterLosingOwnershipTest}} and its subclasses, by replacing the {{operation.update(cache(1));}} line with {{operation.update(cache(0));}}
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
More information about the infinispan-issues
mailing list