[infinispan-issues] [JBoss JIRA] (ISPN-7960) TxInterceptor.verifyRemoteTransaction ignores partition handling

Dan Berindei (JIRA) issues at jboss.org
Thu Jun 22 09:22:00 EDT 2017


     [ https://issues.jboss.org/browse/ISPN-7960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dan Berindei updated ISPN-7960:
-------------------------------
    Status: Open  (was: New)


> TxInterceptor.verifyRemoteTransaction ignores partition handling
> ----------------------------------------------------------------
>
>                 Key: ISPN-7960
>                 URL: https://issues.jboss.org/browse/ISPN-7960
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 9.1.0.Beta1, 9.0.3.Final
>            Reporter: Dan Berindei
>            Assignee: Dan Berindei
>             Fix For: 9.1.0.Final
>
>
> https://github.com/infinispan/infinispan/pull/5143 fixes the random test failures in {{PessimisticTxPartitionAndMergeDuringRuntimeTest.testOriginatorIsolatedPartition}}, but it uncovers another random failure in {{OptimisticTxPartitionAndMergeDuringCommitTest.testDegradedPartitionWithDiscard}}.
> When partition handling is enabled, {{TransactionTable.cleanupLeaverTransactions()}} will not roll back transactions from leavers, instead it will keep them in limbo until it sees a stable cache topology (i.e. either until the cache's stable topology is updated, or until all the stable topology's members are re-added to the current topology). {{TxInterceptor.verifyRemoteTransaction()}} instead always rolls back the transaction if the originator is not in the cluster view, and when the originator tries to complete the transaction after the merge it gets an exception:
> {noformat}
> 10:27:34,880 WARN  (remote-thread-Test-NodeG-p46360-t6:[]) [NonTotalOrderTxPerCacheInboundInvocationHandler] ISPN000071: Caught exception when handling command VersionedCommitCommand{gtx=GlobalTx:Test-NodeE-10968:31983, cacheName='opt-cache', topologyId=16, updatedVersions={MagicKey#k1{168F/00552148/106 at Test-NodeF-27031}=SimpleClusteredVersion{topologyId=13, version=2}, MagicKey#k2{1690/CCE79580/35 at Test-NodeG-8587}=SimpleClusteredVersion{topologyId=13, version=2}}}
> org.infinispan.commons.CacheException: ISPN000361: Cannot commit remote transaction GlobalTx:Test-NodeE-10968:31983 as it was already rolled back
> 	at org.infinispan.commands.tx.CommitCommand.invalidRemoteTxReturnValue(CommitCommand.java:49) ~[classes/:?]
> 	at org.infinispan.commands.tx.AbstractTransactionBoundaryCommand.invokeAsync(AbstractTransactionBoundaryCommand.java:98) ~[classes/:?]
> {noformat}
> The test splits actually tries to ensure that the {{CommitCommand}} is never executed on the owner before the split, only after the merge. But the {{DiscardFilter}} that it uses only blocks one invocation, and it lets the commit proceed when the originator retries:
> {noformat}
> 10:27:34,394 DEBUG (jgroups-6,Test-NodeG-8587:[]) [BaseTxPartitionAndMergeTest] Ignoring command VersionedCommitCommand{gtx=GlobalTx:Test-NodeE-10968:31983, cacheName='opt-cache', topologyId=13, updatedVersions={MagicKey#k1{168F/00552148/106 at Test-NodeF-27031}=SimpleClusteredVersion{topologyId=13, version=2}, MagicKey#k2{1690/CCE79580/35 at Test-NodeG-8587}=SimpleClusteredVersion{topologyId=13, version=2}}}
> 10:27:34,416 DEBUG (transport-thread-Test-NodeE-p46282-t3:[Topology-opt-cache]) [LocalTopologyManagerImpl] Updating local topology for cache opt-cache: CacheTopology{id=14, rebalanceId=4, currentCH=DefaultConsistentHash{ns=256, owners = (4)[Test-NodeE-10968: 67+67, Test-NodeF-27031: 59+65, Test-NodeG-8587: 63+57, Test-NodeH-3978: 67+67]}, pendingCH=null, unionCH=null, phase=NO_REBALANCE, actualMembers=[Test-NodeE-10968, Test-NodeF-27031], persistentUUIDs=[72351dc9-f621-41df-896b-1dc2f26798f5, 61811c2f-4931-49e2-b395-4debd39f6ca1]}
> 10:27:34,442 TRACE (jgroups-6,Test-NodeE-10968:[]) [RpcManagerImpl] Response(s) to VersionedCommitCommand{gtx=GlobalTx:Test-NodeE-10968:31983, cacheName='opt-cache', topologyId=13, updatedVersions={MagicKey#k1{168F/00552148/106 at Test-NodeF-27031}=SimpleClusteredVersion{topologyId=13, version=2}, MagicKey#k2{1690/CCE79580/35 at Test-NodeG-8587}=SimpleClusteredVersion{topologyId=13, version=2}}} is {Test-NodeG-8587=CacheNotFoundResponse, Test-NodeF-27031=SuccessfulResponse(null)}
> 10:27:34,442 TRACE (jgroups-6,Test-NodeE-10968:[]) [TxDistributionInterceptor] We have a newer topology, ignoring responses and retrying
> 10:27:34,451 TRACE (jgroups-6,Test-NodeE-10968:[]) [RpcManagerImpl] Test-NodeE-10968 invoking VersionedCommitCommand{gtx=GlobalTx:Test-NodeE-10968:31983, cacheName='opt-cache', topologyId=14, updatedVersions={MagicKey#k1{168F/00552148/106 at Test-NodeF-27031}=SimpleClusteredVersion{topologyId=13, version=2}, MagicKey#k2{1690/CCE79580/35 at Test-NodeG-8587}=SimpleClusteredVersion{topologyId=13, version=2}}} to recipient list [Test-NodeF-27031, Test-NodeG-8587] with options RpcOptions{timeout=15000, unit=MILLISECONDS, deliverOrder=NONE, responseFilter=null, responseMode=SYNCHRONOUS_IGNORE_LEAVERS}
> 10:27:34,637 TRACE (remote-thread-Test-NodeG-p46360-t6:[]) [TxInterceptor] Replaying the transactions received as a result of state transfer VersionedPrepareCommand {modifications=[PutKeyValueCommand{key=MagicKey#k1{168F/00552148/106 at Test-NodeF-27031}, value=final-value, flags=[], commandInvocationId=CommandInvocation:local:0, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true, topologyId=13}, PutKeyValueCommand{key=MagicKey#k2{1690/CCE79580/35 at Test-NodeG-8587}, value=final-value, flags=[], commandInvocationId=CommandInvocation:local:0, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true, topologyId=13}], onePhaseCommit=false, retried=false, versionsSeen=null, gtx=GlobalTx:Test-NodeE-10968:31983, cacheName='opt-cache'}
> 10:27:34,661 TRACE (remote-thread-Test-NodeG-p46360-t6:[]) [TxInterceptor] Rolling back remote transaction GlobalTx:Test-NodeE-10968:31983 because either already completed (false) or originator no longer in the cluster (true).
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.2.3#72005)


More information about the infinispan-issues mailing list