[JBoss JIRA] (ISPN-3422) In non-tx caches, write operations may not be atomic during rebalance
by Dan Berindei (JIRA)
Dan Berindei created ISPN-3422:
----------------------------------
Summary: In non-tx caches, write operations may not be atomic during rebalance
Key: ISPN-3422
URL: https://issues.jboss.org/browse/ISPN-3422
Project: Infinispan
Issue Type: Bug
Reporter: Dan Berindei
Assignee: Dan Berindei
If the cache topology changes while a write command is running and before it has actually committed the entry to the data container, we retry the command (see ISPN-3366 and ISPN-3357). But before we detect the topology change, one or more of the backup owners may have already applied the modification.
Retrying the command re-acquires the key lock on the primary owner (even if the primary owner didn't change). That means another command could have modified the same key in the meantime, but the retried command is going to ignore any changes and is going to return the value before the first attempt. Obviously, the command is not retried if the first attempt is not successful, but scenarios like this are possible:
{code}
thread 1: putIfAbsent(k, v1) -> null
thread 2: putIfAbsent(k, v2) -> null
{code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 2 months
[JBoss JIRA] (ISPN-3357) Insufficient owners with putIfAbsent during rebalance
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-3357?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-3357:
-------------------------------
Status: Pull Request Sent (was: Open)
Git Pull Request: https://github.com/infinispan/infinispan/pull/2015
For now we are completely ignoring the previous value on the primary owner when retrying, not just if the current value matches the final value of the command.
> Insufficient owners with putIfAbsent during rebalance
> -----------------------------------------------------
>
> Key: ISPN-3357
> URL: https://issues.jboss.org/browse/ISPN-3357
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Cache
> Affects Versions: 5.2.4.Final, 6.0.0.Alpha1
> Reporter: Takayoshi Kimura
> Assignee: Dan Berindei
> Priority: Critical
> Attachments: 7c29bccb.log, ISPN-3357-full-logs-leave.zip
>
>
> Here is test scenario:
> * DIST numOwners=2, start with 3 nodes cluster then join 1 node during load
> * HotRod putIfAbsent accesses from 40 threads (1 process, 1 remote cache instance), 40000 entries total
> After the test run, the numberOfEntries on each node are:
> * node1: 20074
> * node2: 19888
> * node3: 20114
> * node4: 18885
> Total is 78961, 1039 entries are missing. No error on HotRod client side so 80000 entries should be there.
> Let's take a look at example missing entry, hash(thread01key151) = 7c29bccb.
> Current CH: owners(7c29bccb) are [node1, node2]
> Pending CH: owners(7c29bccb) are [node1, node2, node4]
> Balanced CH: owners(7c29bccb) are [node1, node4]
> The events sequence is:
> * hotrod -> node1
> * node1 -> node2, node4
> * node2 committed entry
> * node4 performed clustered get before write, got a value from node2 and will not commit the entry because this node thinks it's not changed/created
> * node1 committed entry
> * node2 invalidates the entry because it's no longer an owner
> Result owners(7c29bccb) are only node1 and node4 is missing. This entry may be completely lost by further rebalances when node4 is donor for this segment.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 2 months
[JBoss JIRA] (ISPN-3421) State Transfer can leave keys on < numOwners nodes.
by Erik Salter (JIRA)
Erik Salter created ISPN-3421:
---------------------------------
Summary: State Transfer can leave keys on < numOwners nodes.
Key: ISPN-3421
URL: https://issues.jboss.org/browse/ISPN-3421
Project: Infinispan
Issue Type: Bug
Components: State transfer
Affects Versions: 5.2.7.Final
Reporter: Erik Salter
Assignee: Mircea Markus
There's a hole in state transfer mechanism that can occur when a node is leaving the cluster, but it was creating the entries and was only able to replicate the data to some of the nodes.
The problem occurs when the segment ownership of the node doesn't change after the rebalance. Since state transfer does not request state for keys in which it is already an owner, the cache could be left in a state where a key is resident < numOwners nodes. In addition, this could be any subset of the primary OR backup nodes.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 2 months
[JBoss JIRA] (ISPN-2965) L1 and early invalidation leaves inconsistent state
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-2965?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo updated ISPN-2965:
------------------------------
Status: Resolved (was: Pull Request Sent)
Resolution: Done
> L1 and early invalidation leaves inconsistent state
> ---------------------------------------------------
>
> Key: ISPN-2965
> URL: https://issues.jboss.org/browse/ISPN-2965
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Cache, Transactions
> Affects Versions: 5.2.1.Final
> Reporter: Sebastian Tusk
> Assignee: William Burns
> Labels: 5.2.x
> Fix For: 6.0.0.Alpha3
>
>
> In a distributed transactional cache with L1 enabled I can observe the following.
> Prepare cache by adding an entry with Cache.put( k, v1 ).
> 1. Node B starts with adding a changed value. Cache.put( k, v2 )
> 2. Node B TxDistributionInterceptor.visitPrepareCommand flushL1Caches sends invalidations.
> 3. Node A calls Cache.get( k ) retrieves v1 and stores this value in L1.
> 4. Node B proceeds with transaction.
> The result is that Node A answers subsequent Cache.get(k) with v1 and Node B answers with v2.
> It seems the invalidation is either send to early or must be synchronized in some way with the transaction.
> Cache config:
> <namedCache name="entity">
> <jmxStatistics enabled="true" />
> <clustering mode="dist">
> <stateTransfer fetchInMemoryState="false" timeout="20000" />
> <async />
> <l1 enabled="true" />
> <hash numOwners="1"/>
> </clustering>
> <locking isolationLevel="READ_COMMITTED"
> lockAcquisitionTimeout="15000" useLockStriping="false" />
> <eviction maxEntries="10000" strategy="LRU" />
> <expiration maxIdle="100000" wakeUpInterval="5000"/>
> <storeAsBinary storeKeysAsBinary="true" storeValuesAsBinary="false" enabled="false" />
> <transaction transactionMode="TRANSACTIONAL" autoCommit="false" lockingMode="OPTIMISTIC"/>
> </namedCache>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 2 months
[JBoss JIRA] (ISPN-1540) Refactor distribution interceptor
by William Burns (JIRA)
[ https://issues.jboss.org/browse/ISPN-1540?page=com.atlassian.jira.plugin.... ]
Work on ISPN-1540 started by William Burns.
> Refactor distribution interceptor
> ---------------------------------
>
> Key: ISPN-1540
> URL: https://issues.jboss.org/browse/ISPN-1540
> Project: Infinispan
> Issue Type: Feature Request
> Components: Distributed Cache
> Affects Versions: 5.1.0.BETA5
> Reporter: Mircea Markus
> Assignee: William Burns
> Fix For: 6.0.0.Final
>
>
> DistributionInterceptor, as it looks now is unnecessary complex. Before adding more functionality on top of it (i.e. ISPN-1539) it should be refactored:
> - extract L1 logic into a different interceptor
> - this would require moving the StateTransferLock logic into another interceptor as well
> - now that we have separation between tx and non-tx caches, we can extract the remaining logic into TransactionalDistributionInterceptor and NonTransactional...
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 2 months
[JBoss JIRA] (ISPN-2965) L1 and early invalidation leaves inconsistent state
by William Burns (JIRA)
[ https://issues.jboss.org/browse/ISPN-2965?page=com.atlassian.jira.plugin.... ]
William Burns updated ISPN-2965:
--------------------------------
Status: Pull Request Sent (was: Coding In Progress)
> L1 and early invalidation leaves inconsistent state
> ---------------------------------------------------
>
> Key: ISPN-2965
> URL: https://issues.jboss.org/browse/ISPN-2965
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Cache, Transactions
> Affects Versions: 5.2.1.Final
> Reporter: Sebastian Tusk
> Assignee: William Burns
> Labels: 5.2.x
> Fix For: 6.0.0.Alpha3
>
>
> In a distributed transactional cache with L1 enabled I can observe the following.
> Prepare cache by adding an entry with Cache.put( k, v1 ).
> 1. Node B starts with adding a changed value. Cache.put( k, v2 )
> 2. Node B TxDistributionInterceptor.visitPrepareCommand flushL1Caches sends invalidations.
> 3. Node A calls Cache.get( k ) retrieves v1 and stores this value in L1.
> 4. Node B proceeds with transaction.
> The result is that Node A answers subsequent Cache.get(k) with v1 and Node B answers with v2.
> It seems the invalidation is either send to early or must be synchronized in some way with the transaction.
> Cache config:
> <namedCache name="entity">
> <jmxStatistics enabled="true" />
> <clustering mode="dist">
> <stateTransfer fetchInMemoryState="false" timeout="20000" />
> <async />
> <l1 enabled="true" />
> <hash numOwners="1"/>
> </clustering>
> <locking isolationLevel="READ_COMMITTED"
> lockAcquisitionTimeout="15000" useLockStriping="false" />
> <eviction maxEntries="10000" strategy="LRU" />
> <expiration maxIdle="100000" wakeUpInterval="5000"/>
> <storeAsBinary storeKeysAsBinary="true" storeValuesAsBinary="false" enabled="false" />
> <transaction transactionMode="TRANSACTIONAL" autoCommit="false" lockingMode="OPTIMISTIC"/>
> </namedCache>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 2 months
[JBoss JIRA] (ISPN-3419) Write Skew check must be performed only in the primary owner
by Pedro Ruivo (JIRA)
Pedro Ruivo created ISPN-3419:
---------------------------------
Summary: Write Skew check must be performed only in the primary owner
Key: ISPN-3419
URL: https://issues.jboss.org/browse/ISPN-3419
Project: Infinispan
Issue Type: Bug
Components: Transactions
Affects Versions: 6.0.0.Alpha2
Reporter: Pedro Ruivo
Assignee: Pedro Ruivo
Fix For: 6.0.0.Alpha3
The following case can create data inconsistency:
A: txA reads k with version v1 and writes on k (say valueA)
primary-owner of k: txA acquire lock on k, validates and return v2 (increment(v1))
non-primary-onwer of k: txA registers a backup lock, validates and returns v2 (increment(v1))
A: txA commits with version v2.
primary-owner of k: applies txA //commit in the non-primary-owner is delayed
B: txB remote reads k with version v2 and writes on k (say valueB) and prepare the transaction
primary-owner of k: txB waits until the lock is relased.
non-primary-onwer of k: txA registers a backup lock, validates and returns v2 (increment(v1), because it has not applied the txA)
primary-owner of k: lock is release, txB is validated and it return v3 (increment(v2))
B collects all the response and merge them. However, the map.value() can return first the response from primary-owner (v3) and them the response from the non-primary-owner (v2). The result version map will be k=>v2.
txB will update k with the same previous version and a different value. from here the data will become inconsistent.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 2 months