[JBoss JIRA] (ISPN-8227) OptimisticTxPartitionAndMergeDuringPrepareTest.testPrimaryOwnerIsolatedPartition fails randomly
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-8227?page=com.atlassian.jira.plugin.... ]
Radim Vansa updated ISPN-8227:
------------------------------
Attachment: log.txt
> OptimisticTxPartitionAndMergeDuringPrepareTest.testPrimaryOwnerIsolatedPartition fails randomly
> -----------------------------------------------------------------------------------------------
>
> Key: ISPN-8227
> URL: https://issues.jboss.org/browse/ISPN-8227
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Test Suite - Core
> Affects Versions: 9.1.0.Final
> Reporter: Radim Vansa
> Assignee: Radim Vansa
> Attachments: log.txt
>
>
> The issue seems to be that during several topology changes the originator sends 2 VersionedPrepareCommands, 1 VersionedCommitCommand and 1 TxCompletionNotificationCommand. These commands are resent after cluster merge and then executed in arbitrary order, and after the completion notification command removes tx information from {{TransactionTable}} another prepare command adds it again.
> Logs attached, don't get confused by the fact that
> {code}
> 14:54:29,056 TRACE [org.infinispan.transaction.impl.TransactionTable] (remote-thread-test-NodeC-p146-t4) Created and registered remote transaction RemoteTransaction{modifications=[PutKeyValueCommand{key=MagicKey#k1{8/537A1109/107@test-NodeB-18592}, value=final-value, flags=[], commandInvocationId=CommandInvocation:local:0, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true, topologyId=13}, PutKeyValueCommand{key=MagicKey#k2{9/0AD74A7B/101@test-NodeC-10992}, value=final-value, flags=[], commandInvocationId=CommandInvocation:local:0, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true, topologyId=13}], lookedUpEntries={}, lockedKeys=[], backupKeyLocks=[], lookedUpEntriesTopology=2147483647, isMarkedForRollback=false, tx=GlobalTx:test-NodeA-55522:15, state=null}
> {code}
> appears in the log before
> {code}
> 14:54:29,056 TRACE [org.infinispan.transaction.impl.TransactionTable] (remote-thread-test-NodeC-p146-t6) Removed remote transaction GlobalTx:test-NodeA-55522:15 ? RemoteTransaction{modifications=[PutKeyValueComm
> and{key=MagicKey#k1{8/537A1109/107@test-NodeB-18592}, value=final-value, flags=[], commandInvocationId=CommandInvocation:local:0, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{
> lifespan=-1, maxIdle=-1, version=null}, successful=true, topologyId=13}, PutKeyValueCommand{key=MagicKey#k2{9/0AD74A7B/101@test-NodeC-10992}, value=final-value, flags=[], commandInvocationId=CommandInvocation:lo
> cal:0, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true, topologyId=13}], lookedUpEntries={MagicKey#k2{9/0AD74A7B/101@test-
> NodeC-10992}=VersionedRepeatableReadEntry(51ecb641){key=MagicKey#k2{9/0AD74A7B/101@test-NodeC-10992}, value=final-value, isCreated=false, isChanged=true, isRemoved=false, isExpired=false, skipLookup=true, metada
> ta=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=SimpleClusteredVersion{topologyId=13, version=1}}}, MagicKey#k1{8/537A1109/107@test-NodeB-18592}=VersionedRepeatableReadEntry(7a4c1b3f){key=MagicKey#
> k1{8/537A1109/107@test-NodeB-18592}, value=final-value, isCreated=false, isChanged=true, isRemoved=false, isExpired=false, skipLookup=true, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=Sim
> pleClusteredVersion{topologyId=13, version=1}}}}, lockedKeys=[MagicKey#k2{9/0AD74A7B/101@test-NodeC-10992}], backupKeyLocks=[MagicKey#k1{8/537A1109/107@test-NodeB-18592}], lookedUpEntriesTopology=14, isMarkedFor
> Rollback=false, tx=GlobalTx:test-NodeA-55522:15, state=null}
> {code}
> While we check {{TransactionTable.isTransactionCompleted()}} within {{remoteTransactions.compute}}, there is a race between calling {{TransactionTable.removeRemoteTransaction}} and {{TransactionTable.markTransactionCompleted}}. My suggestion is to mark transaction as complete from within the {{removeRemoteTransaction}} method, using {{remoteTransactions.compute}}.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 7 months
[JBoss JIRA] (ISPN-3918) Inconsistent view of the cache with putIfAbsent in a non-tx cache during state transfer
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-3918?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo commented on ISPN-3918:
-----------------------------------
if the primary replies to the originator as a regular message, it will be ordered with the backup command and it solve the problem when the topology is stable.
if a topology changes while the put if absent, we would need to versioning to keep track with command-invocation-id generate which version (and return value) to decide if the command should be handled or not.
> Inconsistent view of the cache with putIfAbsent in a non-tx cache during state transfer
> ---------------------------------------------------------------------------------------
>
> Key: ISPN-3918
> URL: https://issues.jboss.org/browse/ISPN-3918
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 6.0.0.Final
> Reporter: Dan Berindei
> Labels: consistency
> Fix For: 9.2.0.Final
>
> Attachments: NonTxPutIfAbsentDuringLeaveStressTest.testNodeLeavingDuringPutIfAbsent_8.log.gz, NonTxPutIfAbsentDuringRebalanceStressTest.testPutIfAbsentDuringJoin_1.log.gz, ntpiadjst.log.gz
>
>
> In a non-tx cache, sometimes it's possible for a {{get(k)}} to return {{null}} even though a previous {{putIfAbsent(k, v)}} returned a non-null value and the only concurrent operations on the cache are concurrent putIfAbsent calls.
> Say \[B, A, C] are the owners of k (C just joined)
> 1. A starts a {{putIfAbsent(k, v1)}} command, sends it to B
> 2. B forwards the command to A and C
> 3. C writes {{k=v1}}
> 4. C becomes the primary owner of k (owners are now \[C, A])
> 5. A/B see the new topology before committing and throw an outdatedTopologyException
> 6. A retries the command, sends it to C
> 7. C forwards the command to A, which writes {{k=v1}}
> 8. C doesn't have to update the entry, returns null
> If, between steps 3 and 7, another thread on A starts a {{putIfAbsent(k, v2)}} command, the command will fail and return {{v1}} (because the primary owner already has a value). However, a subsequent {{get(k)}} command will return {{null}}, because A is an owner and doesn't have the value.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 7 months
[JBoss JIRA] (ISPN-8226) PutIfAbsent can succeed in scattered cache on non-null entry
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-8226?page=com.atlassian.jira.plugin.... ]
Radim Vansa commented on ISPN-8226:
-----------------------------------
Actually this bug applies only when ISPN-8078 is in place; I'll merge the fix with ISPN-8078 PR instead.
> PutIfAbsent can succeed in scattered cache on non-null entry
> ------------------------------------------------------------
>
> Key: ISPN-8226
> URL: https://issues.jboss.org/browse/ISPN-8226
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.1.0.Final
> Reporter: Radim Vansa
> Assignee: Radim Vansa
>
> When the {{putIfAbsent}} is a retry, the value/metadata in RepeatableReadEntry is reset to the first loaded version. However if this command caused a prefetch from remote node after finding {{RemoteMetadata}} locally, the first loaded version is the remote metadata and the value updated by prefetch is reset, despite the prefetched value being already inserted into DC. The command sees {{null}} as cache value and since the seen version matches to the version in DC, the write is allowed.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 7 months
[JBoss JIRA] (ISPN-8226) PutIfAbsent can succeed in scattered cache on non-null entry
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-8226?page=com.atlassian.jira.plugin.... ]
Radim Vansa closed ISPN-8226.
-----------------------------
Release Notes Text: Actually this bug applies only when ISPN-8078 is in place; I'll merge the fix with ISPN-8078 PR instead.
Resolution: Rejected
> PutIfAbsent can succeed in scattered cache on non-null entry
> ------------------------------------------------------------
>
> Key: ISPN-8226
> URL: https://issues.jboss.org/browse/ISPN-8226
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.1.0.Final
> Reporter: Radim Vansa
> Assignee: Radim Vansa
>
> When the {{putIfAbsent}} is a retry, the value/metadata in RepeatableReadEntry is reset to the first loaded version. However if this command caused a prefetch from remote node after finding {{RemoteMetadata}} locally, the first loaded version is the remote metadata and the value updated by prefetch is reset, despite the prefetched value being already inserted into DC. The command sees {{null}} as cache value and since the seen version matches to the version in DC, the write is allowed.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 7 months
[JBoss JIRA] (ISPN-8195) Transaction fails to commit when a node crashes
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-8195?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-8195:
------------------------------------
I have also seen this in {{ConcurrentNonOverlappingLeaveTest}}. The problem is that {{TxDistributionInterceptor}} doesn't throw an {{OutdatedTopologyException}} if the current topology is not the same as the command topology set in {{StateTransferInterceptor}}, which makes it possible for {{A}} to process the commit command without waiting to to receive the transaction data (or even to become an owner, but that would be much harder to reproduce).
> Transaction fails to commit when a node crashes
> -----------------------------------------------
>
> Key: ISPN-8195
> URL: https://issues.jboss.org/browse/ISPN-8195
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Transactions
> Affects Versions: 9.1.0.Final
> Reporter: Radim Vansa
> Assignee: Dan Berindei
> Attachments: log.zip
>
>
> Nodes ABC, key is owned by BC:
> 1. C prepares a transaction modifying only one key [BC], prepare succeeds on both
> 2. B crashes
> 3. C tries to send CommitCommand to B and gets {{CacheNotFoundResponse}}
> 4. C throws OTE, which gets handled by STI and retried
> 5. A becomes an owner of key in the next topology
> 6. C sends CommitCommand to all owners, including A
> 7. A does not find the transaction prepared and throws {{IllegalStateException: Remote transaction not found: GlobalTx:test-NodeC-45028:1}}
> 8. C fails the transaction because of the {{IllegalStateException}}
> Usually A should request transactions during state transfer, but the CommitCommand is sent in the first topology with higher id - in this case it's the "Hey we've lost B!" topology which does not start rebalance yet.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 7 months