[JBoss JIRA] (ISPN-4908) Clustered cache with unshared store is inconsistent after restarting one node if entries are deleted during restart
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-4908?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant commented on ISPN-4908:
---------------------------------------
To keep track of data deleted while we're away we'd need tombstones
> Clustered cache with unshared store is inconsistent after restarting one node if entries are deleted during restart
> -------------------------------------------------------------------------------------------------------------------
>
> Key: ISPN-4908
> URL: https://issues.jboss.org/browse/ISPN-4908
> Project: Infinispan
> Issue Type: Bug
> Environment: Clustered REPL cache, preloaded, no eviction/expiration
> Reporter: Wolf-Dieter Fink
> Assignee: William Burns
>
> If a cache instance with an unshared cache store is down and the cache is changed until the instance is back and join the cluster the cache can become inconsisstent.
> If entries are deleted during downtime,
> - the with stale object is loaded first if preload=true
> - the local entries are updated with new and changed objects from the cluster
> - removed entries from the cluster are not seen and therefore not deleted
> After complete sync (only) this instance will have stale objects.
> From a consistence and performance perspective the store should be pruned on cluster-join by default in this case
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month
[JBoss JIRA] (ISPN-4908) Clustered cache with unshared store is inconsistent after restarting one node if entries are deleted during restart
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-4908?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-4908:
----------------------------------
Summary: Clustered cache with unshared store is inconsistent after restarting one node if entries are deleted during restart (was: Clustered cache with FileStore (shared=false) is inconsistent after restarting one node if entries are deleted during restart)
> Clustered cache with unshared store is inconsistent after restarting one node if entries are deleted during restart
> -------------------------------------------------------------------------------------------------------------------
>
> Key: ISPN-4908
> URL: https://issues.jboss.org/browse/ISPN-4908
> Project: Infinispan
> Issue Type: Bug
> Environment: Clustered REPL cache, preloaded, no eviction/expiration
> Reporter: Wolf-Dieter Fink
> Assignee: William Burns
>
> If a cache instance with a cache store is down and the cache is changed until the instance is back and join the cluster the cache can become inconsisstent.
> If entries are deleted during downtime,
> - the FileStore with stale object is loaded first if preload=true
> - the local entries are updated with new and changed objects from the cluster
> - removed entries from the cluster are not seen and therefore not deleted
> After complete sync (only) this instance will have stale objects.
> From a consistence and performance perspective the FileStore should be pruned on cluster-join by default in this case
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month
[JBoss JIRA] (ISPN-4908) Clustered cache with unshared store is inconsistent after restarting one node if entries are deleted during restart
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-4908?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-4908:
----------------------------------
Description:
If a cache instance with an unshared cache store is down and the cache is changed until the instance is back and join the cluster the cache can become inconsisstent.
If entries are deleted during downtime,
- the with stale object is loaded first if preload=true
- the local entries are updated with new and changed objects from the cluster
- removed entries from the cluster are not seen and therefore not deleted
After complete sync (only) this instance will have stale objects.
>From a consistence and performance perspective the store should be pruned on cluster-join by default in this case
was:
If a cache instance with a cache store is down and the cache is changed until the instance is back and join the cluster the cache can become inconsisstent.
If entries are deleted during downtime,
- the FileStore with stale object is loaded first if preload=true
- the local entries are updated with new and changed objects from the cluster
- removed entries from the cluster are not seen and therefore not deleted
After complete sync (only) this instance will have stale objects.
>From a consistence and performance perspective the FileStore should be pruned on cluster-join by default in this case
> Clustered cache with unshared store is inconsistent after restarting one node if entries are deleted during restart
> -------------------------------------------------------------------------------------------------------------------
>
> Key: ISPN-4908
> URL: https://issues.jboss.org/browse/ISPN-4908
> Project: Infinispan
> Issue Type: Bug
> Environment: Clustered REPL cache, preloaded, no eviction/expiration
> Reporter: Wolf-Dieter Fink
> Assignee: William Burns
>
> If a cache instance with an unshared cache store is down and the cache is changed until the instance is back and join the cluster the cache can become inconsisstent.
> If entries are deleted during downtime,
> - the with stale object is loaded first if preload=true
> - the local entries are updated with new and changed objects from the cluster
> - removed entries from the cluster are not seen and therefore not deleted
> After complete sync (only) this instance will have stale objects.
> From a consistence and performance perspective the store should be pruned on cluster-join by default in this case
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month
[JBoss JIRA] (ISPN-4444) After state transfer, a node is able to read keys it no longer owns from its data container
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4444?page=com.atlassian.jira.plugin.... ]
Dan Berindei reopened ISPN-4444:
--------------------------------
> After state transfer, a node is able to read keys it no longer owns from its data container
> -------------------------------------------------------------------------------------------
>
> Key: ISPN-4444
> URL: https://issues.jboss.org/browse/ISPN-4444
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 7.0.0.Alpha4
> Reporter: Dan Berindei
> Assignee: Pedro Ruivo
> Priority: Critical
> Fix For: 7.1.0.Alpha1
>
>
> When state transfer ends and each node receives a CH_UPDATE command from the coordinator, it first installs the new topology and then it starts invalidating entries it no longer owns.
> However, there are two cases when the node can still read its stale values:
> 1. If L1 is enabled, it will look in the local DataContainer first, regardless of the key's location.
> 2. If L1 is disabled, but the key was removed on the new owners, the node will still look up the key in the local DataContainer after receiving a null response.
> The problem can be reproduced with {{TxReadAfterLosingOwnershipTest}} and its subclasses, by replacing the {{operation.update(cache(1));}} line with {{operation.update(cache(0));}}
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month
[JBoss JIRA] (ISPN-4444) After state transfer, a node is able to read keys it no longer owns from its data container
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4444?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-4444:
-------------------------------
Status: Resolved (was: Pull Request Sent)
Resolution: Done
I created ISPN-5021 for my latest comment.
> After state transfer, a node is able to read keys it no longer owns from its data container
> -------------------------------------------------------------------------------------------
>
> Key: ISPN-4444
> URL: https://issues.jboss.org/browse/ISPN-4444
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 7.0.0.Alpha4
> Reporter: Dan Berindei
> Assignee: Pedro Ruivo
> Priority: Critical
> Fix For: 7.1.0.Alpha1
>
>
> When state transfer ends and each node receives a CH_UPDATE command from the coordinator, it first installs the new topology and then it starts invalidating entries it no longer owns.
> However, there are two cases when the node can still read its stale values:
> 1. If L1 is enabled, it will look in the local DataContainer first, regardless of the key's location.
> 2. If L1 is disabled, but the key was removed on the new owners, the node will still look up the key in the local DataContainer after receiving a null response.
> The problem can be reproduced with {{TxReadAfterLosingOwnershipTest}} and its subclasses, by replacing the {{operation.update(cache(1));}} line with {{operation.update(cache(0));}}
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month
[JBoss JIRA] (ISPN-5021) Nodes that finish the rebalance later can see outdated values
by Dan Berindei (JIRA)
Dan Berindei created ISPN-5021:
----------------------------------
Summary: Nodes that finish the rebalance later can see outdated values
Key: ISPN-5021
URL: https://issues.jboss.org/browse/ISPN-5021
Project: Infinispan
Issue Type: Bug
Components: Core, State Transfer
Affects Versions: 7.0.2.Final
Reporter: Dan Berindei
Assignee: Pedro Ruivo
Priority: Critical
Fix For: 7.1.0.Alpha1
Copied from [ISPN-4444|https://issues.jboss.org/browse/ISPN-4444?focusedCommentId=1302...]
If the CH_UPDATE command is delayed on the old owner, the new owners might update the key without the old owner knowing, and a locality check on the old owner won't help.
I remember one thing that struck me when reading the Raft algorithm was that they install configuration changes symmetrically, in 3 phases. We might need to do the same for our rebalance: start a rebalance with read_ch=old, write_ch=old+new, when the new owners have all the data install read_ch=new, write_ch=old+new, and finally read_ch=new, write_ch=new. Old cache entries are removed during the 2nd topology update, and further writes should be ignored, in order for this to work.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month
[JBoss JIRA] (ISPN-5021) Nodes that finish the rebalance later can see outdated values
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-5021?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-5021:
-------------------------------
Status: Open (was: New)
> Nodes that finish the rebalance later can see outdated values
> -------------------------------------------------------------
>
> Key: ISPN-5021
> URL: https://issues.jboss.org/browse/ISPN-5021
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 7.0.2.Final
> Reporter: Dan Berindei
> Assignee: Pedro Ruivo
> Priority: Critical
> Fix For: 7.1.0.Alpha1
>
>
> Copied from [ISPN-4444|https://issues.jboss.org/browse/ISPN-4444?focusedCommentId=1302...]
> If the CH_UPDATE command is delayed on the old owner, the new owners might update the key without the old owner knowing, and a locality check on the old owner won't help.
> I remember one thing that struck me when reading the Raft algorithm was that they install configuration changes symmetrically, in 3 phases. We might need to do the same for our rebalance: start a rebalance with read_ch=old, write_ch=old+new, when the new owners have all the data install read_ch=new, write_ch=old+new, and finally read_ch=new, write_ch=new. Old cache entries are removed during the 2nd topology update, and further writes should be ignored, in order for this to work.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month
[JBoss JIRA] (ISPN-4286) Two concurrent putIfAbsent operations can both return null during rebalance
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4286?page=com.atlassian.jira.plugin.... ]
Dan Berindei reassigned ISPN-4286:
----------------------------------
Assignee: Dan Berindei
> Two concurrent putIfAbsent operations can both return null during rebalance
> ---------------------------------------------------------------------------
>
> Key: ISPN-4286
> URL: https://issues.jboss.org/browse/ISPN-4286
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 6.0.2.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
> Fix For: 7.1.0.Alpha1
>
>
> If the cache topology changes while executing a putIfAbsent operation, the old primary owner will throw an OutdatedTopologyException, and the originator will retry on the new owner.
> When retrying the PutKeyValueCommand on the new primary owner, we compare the current value with the command's new value. If they are equal, we assume that the initial command wrote the old value, and we return {{null}}.
> However, the value might have been written by another putIfAbsent operation. So we could have two {{putIfAbsent(k, v)}} operations, both returning {{null}}.
> {code}
> A is the originator, B is the primary owner, k = null
> A -> B: putIfAbsent(k, v1)
> B dies before writing v, C is now primary owner
> D -> C: putIfAbsent(k, v1) // another put operation from D, with the same value
> C -> D: null // correct
> A -> C: retry_putIfAbsent(k, v1)
> C -> A: null // C assumes A is overwriting its own value, so it's also returning null
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month
[JBoss JIRA] (ISPN-4286) Two concurrent putIfAbsent operations can both return null during rebalance
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4286?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-4286:
------------------------------------
We would need versioning to fix this, since there's no way to distinguish between one {{v1}} and the other.
> Two concurrent putIfAbsent operations can both return null during rebalance
> ---------------------------------------------------------------------------
>
> Key: ISPN-4286
> URL: https://issues.jboss.org/browse/ISPN-4286
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 6.0.2.Final
> Reporter: Dan Berindei
> Priority: Critical
> Fix For: 7.1.0.Alpha1
>
>
> If the cache topology changes while executing a putIfAbsent operation, the old primary owner will throw an OutdatedTopologyException, and the originator will retry on the new owner.
> When retrying the PutKeyValueCommand on the new primary owner, we compare the current value with the command's new value. If they are equal, we assume that the initial command wrote the old value, and we return {{null}}.
> However, the value might have been written by another putIfAbsent operation. So we could have two {{putIfAbsent(k, v)}} operations, both returning {{null}}.
> {code}
> A is the originator, B is the primary owner, k = null
> A -> B: putIfAbsent(k, v1)
> B dies before writing v, C is now primary owner
> D -> C: putIfAbsent(k, v1) // another put operation from D, with the same value
> C -> D: null // correct
> A -> C: retry_putIfAbsent(k, v1)
> C -> A: null // C assumes A is overwriting its own value, so it's also returning null
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month