[infinispan-issues] [JBoss JIRA] (ISPN-4872) During state transfer, a pessimistic transaction can fail due to a ClusteredGetCommand forcing a lock command

RH Bugzilla Integration (JIRA) issues at jboss.org
Wed Jan 28 08:30:57 EST 2015


    [ https://issues.jboss.org/browse/ISPN-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035761#comment-13035761 ] 

RH Bugzilla Integration commented on ISPN-4872:
-----------------------------------------------

Dave Stahl <dstahl at redhat.com> changed the Status of [bug 1160637|https://bugzilla.redhat.com/show_bug.cgi?id=1160637] from VERIFIED to CLOSED

> During state transfer, a pessimistic transaction can fail due to a ClusteredGetCommand forcing a lock command
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: ISPN-4872
>                 URL: https://issues.jboss.org/browse/ISPN-4872
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 5.2.7.Final, 6.0.2.Final, 7.0.0.CR2
>            Reporter: Erik Salter
>            Assignee: Dan Berindei
>             Fix For: 7.0.0.Final
>
>
> This is a strange one:
> A pessimistic transaction started on the primary owner during state transfer can fail because a backup owner issues a ClusteredGetCommand to the primary while processing the write command.  This can happen if the backup needs to perform a remote get (due to a DELTA_WRITE, conditional, or reliable return values).  In this case, it's a DELTA_WRITE.  
> In this chain of events, state transfer is ongoing, and the union CH is (east-dht5, west-dht5, east-dht6) In this case, the pessimistic transaction originated on east-dht5 must be invoked on west-dht5 and east-dht6:
> {code}
> 2014-10-21 02:15:01,309 TRACE [org.infinispan.transaction.TransactionTable] (OOB-1713,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) Created a new local transaction: LocalXaTransaction{xid=null} LocalTransaction{remoteLockedNodes=null, isMarkedForRollback=false, lockedKeys=null, backupKeyLocks=null, isFromStateTransfer=false} globalTx=GlobalTransaction:<240-east-dht5.comcast.net-26106(CMC-Denver-CO)>:1575124:local, topologyId=33, age(ms)=0
> 2014-10-21 02:15:01,309 TRACE [org.infinispan.util.concurrent.locks.containers.OwnableReentrantPerEntryLockContainer] (OOB-1713,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) Creating and acquiring new lock instance for key EdgeResourceCacheKey[edgeDeviceId=4211,resourceId=431831]
> 2014-10-21 02:15:01,345 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (OOB-1713,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) Replication task sending PrepareCommand {modifications=[PutKeyValueCommand{key=EdgeResourceCacheKey[edgeDeviceId=4211,resourceId=431831], value=QamResource [qamId=431831, currentBandwidth=3750000, maxBandwidth=38810000, maxPrograms=255, currentPrograms=1], flags=[SKIP_REMOTE_LOOKUP, DELTA_WRITE], putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1, successful=true, ignorePreviousValue=false}], onePhaseCommit=true, gtx=GlobalTransaction:<240-east-dht5.comcast.net-26106(CMC-Denver-CO)>:1575124:local, cacheName='qamAllocation', topologyId=33} to addresses [240-west-dht5.comcast.net-35898(CH2-Chicago-IL), 240-east-dht6.comcast.net-47049(CMC-Denver-CO)] with response mode GET_ALL
> {code}
> Let's examine east-dht6.  It gets the write request.  It needs to perform a remote get to pull the full key context, since it receives only the delta.
> {code}
> 2014-10-21 02:15:01,362 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (OOB-2149,session-resource-cluster,240-east-dht6.comcast.net-47049(CMC-Denver-CO)) Attempting to execute command: PrepareCommand {modifications=[PutKeyValueCommand{key=EdgeResourceCacheKey[edgeDeviceId=4211,resourceId=431831], value=QamResourceDelta [qamId=431831, basePort=2049, currentBandwidth=3750000, maxBandwidth=38810000, maxPrograms=255, currentPrograms=1, baseProgramNumber=1, portStep=1, program=Program [portNumber=2053, programNumber=5, status=UNUSED, sessionToken=, lastUpdatedTime=1413872101310, bandwidth=0, inputId=-1]], flags=[SKIP_REMOTE_LOOKUP, DELTA_WRITE], putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1, successful=true, ignorePreviousValue=false}], onePhaseCommit=true, gtx=GlobalTransaction:<240-east-dht5.comcast.net-26106(CMC-Denver-CO)>:1575124:local, cacheName='qamAllocation', topologyId=33} [sender=240-east-dht5.comcast.net-26106(CMC-Denver-CO)]
> ...
> 2014-10-21 02:15:01,362 TRACE [org.infinispan.transaction.TransactionTable] (OOB-2149,session-resource-cluster,240-east-dht6.comcast.net-47049(CMC-Denver-CO)) Created and registered remote transaction RemoteTransaction{modifications=...
> globalTx=GlobalTransaction:<240-east-dht5.comcast.net-26106(CMC-Denver-CO)>:1575124:remote, topologyId=33, age(ms)=0
> 2014-10-21 02:15:01,362 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (OOB-2149,session-resource-cluster,240-east-dht6.comcast.net-47049(CMC-Denver-CO)) Replication task sending ClusteredGetCommand{key=EdgeResourceCacheKey[edgeDeviceId=4211,resourceId=431831], flags=[SKIP_REMOTE_LOOKUP, DELTA_WRITE]} to addresses [240-east-dht5.comcast.net-26106(CMC-Denver-CO), 240-west-dht5.comcast.net-35898(CH2-Chicago-IL)] with response mode GET_FIRST
> {code}
> east-dht5 receives the ClusteredGetCommand request.  Because TxDistributionInterceptor::line 325 is true, the acquireRemoteLock flag is set to true.
> {code}
>         boolean acquireRemoteLock = false;
>          if (ctx.isInTxScope()) {
>             TxInvocationContext txContext = (TxInvocationContext) ctx;
>             acquireRemoteLock = isWrite && isPessimisticCache && !txContext.getAffectedKeys().contains(key);
>          }
> {code}
> Thus, when east-dht5 runs the ClusteredGetCommand. it will create and invoke a LockControlCommand.  Because the LCC is created inline, it will not have its origin set.  But the LCC will create a RemoteTxInvocationContext.  When the LCC runs through the interceptor chain, TxInterceptor::invokeNextInterceptorAndVerifyTransaction will check the originator (because of the remote context).  When it doesn't find it, it will force a rollback.
> {code}
> 2014-10-21 02:15:01,347 TRACE [org.infinispan.remoting.InboundInvocationHandlerImpl] (OOB-1721,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) Calling perform() on ClusteredGetCommand{key=EdgeResourceCacheKey[edgeDeviceId=4211,resourceId=431831], flags=[SKIP_REMOTE_LOOKUP, DELTA_WRITE]}
> 2014-10-21 02:15:01,347 TRACE [org.infinispan.transaction.TransactionTable] (OOB-1721,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) Created and registered remote transaction RemoteTransaction{modifications=[], lookedUpEntries={}, lockedKeys=null, backupKeyLocks=null, missingLookedUpEntries=false, isMarkedForRollback=false} globalTx=GlobalTransaction:<240-east-dht5.comcast.net-26106(CMC-Denver-CO)>:1575124:local, topologyId=33, age(ms)=0
> 2014-10-21 02:15:01,347 TRACE [org.infinispan.statetransfer.StateTransferInterceptor] (OOB-1721,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) handleTopologyAffectedCommand for command LockControlCommand{cache=qamAllocation, keys=[EdgeResourceCacheKey[edgeDeviceId=4211,resourceId=431831]], flags=[SKIP_REMOTE_LOOKUP, DELTA_WRITE], unlock=false}
> 2014-10-21 02:15:01,347 TRACE [org.infinispan.interceptors.locking.PessimisticLockingInterceptor] (OOB-1721,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) Locking key EdgeResourceCacheKey[edgeDeviceId=4211,resourceId=431831], no need to check for pending locks.
> 2014-10-21 02:15:01,347 TRACE [org.infinispan.interceptors.TxInterceptor] (OOB-1721,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) invokeNextInterceptorAndVerifyTransaction :: originatorMissing=true, alreadyCompleted=false
> 2014-10-21 02:15:01,347 TRACE [org.infinispan.interceptors.TxInterceptor] (OOB-1721,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) Rolling back remote transaction GlobalTransaction:<240-east-dht5.comcast.net-26106(CMC-Denver-CO)>:1575124:local because either already completed(false) or originator(null) no longer in the cluster(true).
>  
> // ROLLBACK HAPPENS HERE
> {code}
> When the transaction attempts to commit, it fails due to the spurious rollback.  For instance:
> {code}
> 2014-10-21 02:15:01,384 WARN  [org.infinispan.remoting.rpc.RpcManagerImpl] (OOB-1713,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) ISPN000071: Caught exception when handling command PrepareCommand {modifications=[PutKeyValueCommand{key=EdgeResourceCacheKey[edgeDeviceId=4211,resourceId=431831], value=QamResource [qamId=431831, currentBandwidth=3750000, maxBandwidth=38810000, maxPrograms=255, currentPrograms=1], flags=[SKIP_REMOTE_LOOKUP, DELTA_WRITE], putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1, successful=true, ignorePreviousValue=false}], onePhaseCommit=true, gtx=GlobalTransaction:<240-east-dht5.comcast.net-26106(CMC-Denver-CO)>:1575124:local, cacheName='qamAllocation', topologyId=33}
> org.infinispan.remoting.RemoteException: ISPN000217: Received exception from 240-west-dht5.comcast.net-35898(CH2-Chicago-IL), see cause for remote stack trace
> ...
> 	at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:124)
> 	at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:343)
> 	at org.infinispan.transaction.TransactionCoordinator.commit(TransactionCoordinator.java:175)
> 	at org.infinispan.transaction.xa.TransactionXaAdapter.commit(TransactionXaAdapter.java:122)
> ... 
> com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord.topLevelOnePhaseCommit(XAResourceRecord.java:636)
> 	at com.arjuna.ats.arjuna.coordinator.BasicAction.onePhaseCommit(BasicAction.java:2619)
> 	at com.arjuna.ats.arjuna.coordinator.BasicAction.End(BasicAction.java:1779)
> 	at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.end(TwoPhaseCoordinator.java:88)
> 	at com.arjuna.ats.arjuna.AtomicAction.commit(AtomicAction.java:177)
> Caused by: org.infinispan.transaction.xa.InvalidTransactionException: This remote transaction GlobalTransaction:<240-east-dht5.comcast.net-26106(CMC-Denver-CO)>:1575124:local is already rolled back
> 	at org.infinispan.transaction.RemoteTransaction.checkIfRolledBack(RemoteTransaction.java:137)
> 	at org.infinispan.transaction.RemoteTransaction.putLookedUpEntry(RemoteTransaction.java:73)
> 	at org.infinispan.context.impl.RemoteTxInvocationContext.putLookedUpEntry(RemoteTxInvocationContext.java:99)
> 	at org.infinispan.container.EntryFactoryImpl.wrapInternalCacheEntryForPut(EntryFactoryImpl.java:242)
> 	at org.infinispan.container.EntryFactoryImpl.wrapEntryForPut(EntryFactoryImpl.java:166)
> ...
> {code}
> So it looks like there's a couple ways to handle this.  One would be to only acquire a remote lock in TxDistributionInterceptor if the current node was the originator.  Another -- possibly less intrusive -- would be to set the origin of the invoked LCC command to that of the ClusteredGetCommand.  The former should be preferable, but this code tends to be a bit labyrinthine with the various pessimistic use cases out there. 



--
This message was sent by Atlassian JIRA
(v6.3.11#6341)


More information about the infinispan-issues mailing list