[infinispan-issues] [JBoss JIRA] (ISPN-4444) After state transfer, a node is able to read keys it no longer owns from its data container

Dan Berindei (JIRA) issues at jboss.org
Tue Nov 25 05:54:39 EST 2014


    [ https://issues.jboss.org/browse/ISPN-4444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022645#comment-13022645 ] 

Dan Berindei commented on ISPN-4444:
------------------------------------

[~pruivo] I think I have missed something in the bug description. If the CH_UPDATE command is delayed on the old owner, the new owners might update the key without the old owner knowing, and a locality check on the old owner won't help.

I remember one thing that struck me when reading the Raft algorithm was that they install configuration changes symmetrically, in 3 phases. We might need to do the same for our rebalance: start a rebalance with {{read_ch=old, write_ch=old+new}}, when the new owners have all the data install {{read_ch=new, write_ch=old+new}}, and finally {{read_ch=new, write_ch=new}}. Old cache entries are removed during the 2nd topology update, and further writes should be ignored, in order for this to work.

> After state transfer, a node is able to read keys it no longer owns from its data container
> -------------------------------------------------------------------------------------------
>
>                 Key: ISPN-4444
>                 URL: https://issues.jboss.org/browse/ISPN-4444
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Core, State Transfer
>    Affects Versions: 7.0.0.Alpha4
>            Reporter: Dan Berindei
>            Assignee: Pedro Ruivo
>            Priority: Critical
>             Fix For: 7.1.0.Alpha1
>
>
> When state transfer ends and each node receives a CH_UPDATE command from the coordinator, it first installs the new topology and then it starts invalidating entries it no longer owns.
> However, there are two cases when the node can still read its stale values:
> 1. If L1 is enabled, it will look in the local DataContainer first, regardless of the key's location.
> 2. If L1 is disabled, but the key was removed on the new owners, the node will still look up the key in the local DataContainer after receiving a null response.
> The problem can be reproduced with {{TxReadAfterLosingOwnershipTest}} and its subclasses, by replacing the {{operation.update(cache(1));}} line with {{operation.update(cache(0));}}



--
This message was sent by Atlassian JIRA
(v6.3.8#6338)


More information about the infinispan-issues mailing list