[JBoss JIRA] (ISPN-4872) During state transfer, a pessimistic transaction can fail due to a ClusteredGetCommand forcing a lock command
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-4872?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-4872:
-----------------------------------------------
Dave Stahl <dstahl(a)redhat.com> changed the Status of [bug 1160637|https://bugzilla.redhat.com/show_bug.cgi?id=1160637] from MODIFIED to ON_QA
> During state transfer, a pessimistic transaction can fail due to a ClusteredGetCommand forcing a lock command
> -------------------------------------------------------------------------------------------------------------
>
> Key: ISPN-4872
> URL: https://issues.jboss.org/browse/ISPN-4872
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 5.2.7.Final, 6.0.2.Final, 7.0.0.CR2
> Reporter: Erik Salter
> Assignee: Dan Berindei
> Fix For: 7.0.0.Final
>
>
> This is a strange one:
> A pessimistic transaction started on the primary owner during state transfer can fail because a backup owner issues a ClusteredGetCommand to the primary while processing the write command. This can happen if the backup needs to perform a remote get (due to a DELTA_WRITE, conditional, or reliable return values). In this case, it's a DELTA_WRITE.
> In this chain of events, state transfer is ongoing, and the union CH is (east-dht5, west-dht5, east-dht6) In this case, the pessimistic transaction originated on east-dht5 must be invoked on west-dht5 and east-dht6:
> {code}
> 2014-10-21 02:15:01,309 TRACE [org.infinispan.transaction.TransactionTable] (OOB-1713,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) Created a new local transaction: LocalXaTransaction{xid=null} LocalTransaction{remoteLockedNodes=null, isMarkedForRollback=false, lockedKeys=null, backupKeyLocks=null, isFromStateTransfer=false} globalTx=GlobalTransaction:<240-east-dht5.comcast.net-26106(CMC-Denver-CO)>:1575124:local, topologyId=33, age(ms)=0
> 2014-10-21 02:15:01,309 TRACE [org.infinispan.util.concurrent.locks.containers.OwnableReentrantPerEntryLockContainer] (OOB-1713,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) Creating and acquiring new lock instance for key EdgeResourceCacheKey[edgeDeviceId=4211,resourceId=431831]
> 2014-10-21 02:15:01,345 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (OOB-1713,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) Replication task sending PrepareCommand {modifications=[PutKeyValueCommand{key=EdgeResourceCacheKey[edgeDeviceId=4211,resourceId=431831], value=QamResource [qamId=431831, currentBandwidth=3750000, maxBandwidth=38810000, maxPrograms=255, currentPrograms=1], flags=[SKIP_REMOTE_LOOKUP, DELTA_WRITE], putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1, successful=true, ignorePreviousValue=false}], onePhaseCommit=true, gtx=GlobalTransaction:<240-east-dht5.comcast.net-26106(CMC-Denver-CO)>:1575124:local, cacheName='qamAllocation', topologyId=33} to addresses [240-west-dht5.comcast.net-35898(CH2-Chicago-IL), 240-east-dht6.comcast.net-47049(CMC-Denver-CO)] with response mode GET_ALL
> {code}
> Let's examine east-dht6. It gets the write request. It needs to perform a remote get to pull the full key context, since it receives only the delta.
> {code}
> 2014-10-21 02:15:01,362 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (OOB-2149,session-resource-cluster,240-east-dht6.comcast.net-47049(CMC-Denver-CO)) Attempting to execute command: PrepareCommand {modifications=[PutKeyValueCommand{key=EdgeResourceCacheKey[edgeDeviceId=4211,resourceId=431831], value=QamResourceDelta [qamId=431831, basePort=2049, currentBandwidth=3750000, maxBandwidth=38810000, maxPrograms=255, currentPrograms=1, baseProgramNumber=1, portStep=1, program=Program [portNumber=2053, programNumber=5, status=UNUSED, sessionToken=, lastUpdatedTime=1413872101310, bandwidth=0, inputId=-1]], flags=[SKIP_REMOTE_LOOKUP, DELTA_WRITE], putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1, successful=true, ignorePreviousValue=false}], onePhaseCommit=true, gtx=GlobalTransaction:<240-east-dht5.comcast.net-26106(CMC-Denver-CO)>:1575124:local, cacheName='qamAllocation', topologyId=33} [sender=240-east-dht5.comcast.net-26106(CMC-Denver-CO)]
> ...
> 2014-10-21 02:15:01,362 TRACE [org.infinispan.transaction.TransactionTable] (OOB-2149,session-resource-cluster,240-east-dht6.comcast.net-47049(CMC-Denver-CO)) Created and registered remote transaction RemoteTransaction{modifications=...
> globalTx=GlobalTransaction:<240-east-dht5.comcast.net-26106(CMC-Denver-CO)>:1575124:remote, topologyId=33, age(ms)=0
> 2014-10-21 02:15:01,362 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (OOB-2149,session-resource-cluster,240-east-dht6.comcast.net-47049(CMC-Denver-CO)) Replication task sending ClusteredGetCommand{key=EdgeResourceCacheKey[edgeDeviceId=4211,resourceId=431831], flags=[SKIP_REMOTE_LOOKUP, DELTA_WRITE]} to addresses [240-east-dht5.comcast.net-26106(CMC-Denver-CO), 240-west-dht5.comcast.net-35898(CH2-Chicago-IL)] with response mode GET_FIRST
> {code}
> east-dht5 receives the ClusteredGetCommand request. Because TxDistributionInterceptor::line 325 is true, the acquireRemoteLock flag is set to true.
> {code}
> boolean acquireRemoteLock = false;
> if (ctx.isInTxScope()) {
> TxInvocationContext txContext = (TxInvocationContext) ctx;
> acquireRemoteLock = isWrite && isPessimisticCache && !txContext.getAffectedKeys().contains(key);
> }
> {code}
> Thus, when east-dht5 runs the ClusteredGetCommand. it will create and invoke a LockControlCommand. Because the LCC is created inline, it will not have its origin set. But the LCC will create a RemoteTxInvocationContext. When the LCC runs through the interceptor chain, TxInterceptor::invokeNextInterceptorAndVerifyTransaction will check the originator (because of the remote context). When it doesn't find it, it will force a rollback.
> {code}
> 2014-10-21 02:15:01,347 TRACE [org.infinispan.remoting.InboundInvocationHandlerImpl] (OOB-1721,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) Calling perform() on ClusteredGetCommand{key=EdgeResourceCacheKey[edgeDeviceId=4211,resourceId=431831], flags=[SKIP_REMOTE_LOOKUP, DELTA_WRITE]}
> 2014-10-21 02:15:01,347 TRACE [org.infinispan.transaction.TransactionTable] (OOB-1721,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) Created and registered remote transaction RemoteTransaction{modifications=[], lookedUpEntries={}, lockedKeys=null, backupKeyLocks=null, missingLookedUpEntries=false, isMarkedForRollback=false} globalTx=GlobalTransaction:<240-east-dht5.comcast.net-26106(CMC-Denver-CO)>:1575124:local, topologyId=33, age(ms)=0
> 2014-10-21 02:15:01,347 TRACE [org.infinispan.statetransfer.StateTransferInterceptor] (OOB-1721,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) handleTopologyAffectedCommand for command LockControlCommand{cache=qamAllocation, keys=[EdgeResourceCacheKey[edgeDeviceId=4211,resourceId=431831]], flags=[SKIP_REMOTE_LOOKUP, DELTA_WRITE], unlock=false}
> 2014-10-21 02:15:01,347 TRACE [org.infinispan.interceptors.locking.PessimisticLockingInterceptor] (OOB-1721,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) Locking key EdgeResourceCacheKey[edgeDeviceId=4211,resourceId=431831], no need to check for pending locks.
> 2014-10-21 02:15:01,347 TRACE [org.infinispan.interceptors.TxInterceptor] (OOB-1721,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) invokeNextInterceptorAndVerifyTransaction :: originatorMissing=true, alreadyCompleted=false
> 2014-10-21 02:15:01,347 TRACE [org.infinispan.interceptors.TxInterceptor] (OOB-1721,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) Rolling back remote transaction GlobalTransaction:<240-east-dht5.comcast.net-26106(CMC-Denver-CO)>:1575124:local because either already completed(false) or originator(null) no longer in the cluster(true).
>
> // ROLLBACK HAPPENS HERE
> {code}
> When the transaction attempts to commit, it fails due to the spurious rollback. For instance:
> {code}
> 2014-10-21 02:15:01,384 WARN [org.infinispan.remoting.rpc.RpcManagerImpl] (OOB-1713,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) ISPN000071: Caught exception when handling command PrepareCommand {modifications=[PutKeyValueCommand{key=EdgeResourceCacheKey[edgeDeviceId=4211,resourceId=431831], value=QamResource [qamId=431831, currentBandwidth=3750000, maxBandwidth=38810000, maxPrograms=255, currentPrograms=1], flags=[SKIP_REMOTE_LOOKUP, DELTA_WRITE], putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1, successful=true, ignorePreviousValue=false}], onePhaseCommit=true, gtx=GlobalTransaction:<240-east-dht5.comcast.net-26106(CMC-Denver-CO)>:1575124:local, cacheName='qamAllocation', topologyId=33}
> org.infinispan.remoting.RemoteException: ISPN000217: Received exception from 240-west-dht5.comcast.net-35898(CH2-Chicago-IL), see cause for remote stack trace
> ...
> at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:124)
> at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:343)
> at org.infinispan.transaction.TransactionCoordinator.commit(TransactionCoordinator.java:175)
> at org.infinispan.transaction.xa.TransactionXaAdapter.commit(TransactionXaAdapter.java:122)
> ...
> com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord.topLevelOnePhaseCommit(XAResourceRecord.java:636)
> at com.arjuna.ats.arjuna.coordinator.BasicAction.onePhaseCommit(BasicAction.java:2619)
> at com.arjuna.ats.arjuna.coordinator.BasicAction.End(BasicAction.java:1779)
> at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.end(TwoPhaseCoordinator.java:88)
> at com.arjuna.ats.arjuna.AtomicAction.commit(AtomicAction.java:177)
> Caused by: org.infinispan.transaction.xa.InvalidTransactionException: This remote transaction GlobalTransaction:<240-east-dht5.comcast.net-26106(CMC-Denver-CO)>:1575124:local is already rolled back
> at org.infinispan.transaction.RemoteTransaction.checkIfRolledBack(RemoteTransaction.java:137)
> at org.infinispan.transaction.RemoteTransaction.putLookedUpEntry(RemoteTransaction.java:73)
> at org.infinispan.context.impl.RemoteTxInvocationContext.putLookedUpEntry(RemoteTxInvocationContext.java:99)
> at org.infinispan.container.EntryFactoryImpl.wrapInternalCacheEntryForPut(EntryFactoryImpl.java:242)
> at org.infinispan.container.EntryFactoryImpl.wrapEntryForPut(EntryFactoryImpl.java:166)
> ...
> {code}
> So it looks like there's a couple ways to handle this. One would be to only acquire a remote lock in TxDistributionInterceptor if the current node was the originator. Another -- possibly less intrusive -- would be to set the origin of the invoked LCC command to that of the ClusteredGetCommand. The former should be preferable, but this code tends to be a bit labyrinthine with the various pessimistic use cases out there.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 5 months
[JBoss JIRA] (ISPN-4678) Trim down ExpandableMarshalledValueByteStream to the actual data length before storing in cache
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-4678?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-4678:
-----------------------------------------------
Dave Stahl <dstahl(a)redhat.com> changed the Status of [bug 1122631|https://bugzilla.redhat.com/show_bug.cgi?id=1122631] from MODIFIED to ON_QA
> Trim down ExpandableMarshalledValueByteStream to the actual data length before storing in cache
> -----------------------------------------------------------------------------------------------
>
> Key: ISPN-4678
> URL: https://issues.jboss.org/browse/ISPN-4678
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core
> Affects Versions: 7.0.0.Beta1
> Reporter: Martin Gencur
> Assignee: Martin Gencur
> Fix For: 7.0.0.CR1
>
>
> When storeAsBinary configuration flag is used, the ExpandableMarshalledValueByteStream is used to serialize objects to byte arrays.
> However, for value size up to 4kB, almost half of the space for each cache entry can be wasted because the ByteStream is stored as is - with the additional space for future expansion.
> For value size greater than 4kB, only additional 25% of actual value size can be wasted. But as these values can be quite large, the wasted space gets bigger too.
> It is not necessary to store ExpandableMarshalledValueByteStream as is and can be shrinked to actual data size before storing in cache.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 5 months
[JBoss JIRA] (ISPN-4693) Locks acquired before a partition will never be released
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-4693?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-4693:
-----------------------------------------------
Dave Stahl <dstahl(a)redhat.com> changed the Status of [bug 1155692|https://bugzilla.redhat.com/show_bug.cgi?id=1155692] from MODIFIED to ON_QA
> Locks acquired before a partition will never be released
> --------------------------------------------------------
>
> Key: ISPN-4693
> URL: https://issues.jboss.org/browse/ISPN-4693
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 6.0.2.Final
> Reporter: Erik Salter
> Assignee: Dan Berindei
> Fix For: 7.0.0.CR2, 7.0.0.Final
>
>
> Consider the following case:
> 1. A prepares a transaction for key K on nodes B and C.
> 2. The primary owner B acquires and registers this lock.
> 3. The cluster partitions and A,C remain in partition one, and B in a separate partition.
> 4. The transaction rolls back. However, since B left the cluster, the rollback will not be sent to B.
> 5. B rejoins the cluster. It still has a lock acquired for key K.
> This can lead to a lock being acquired indefinitely.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 5 months
[JBoss JIRA] (ISPN-4900) Split-brain: cancelled ST results in missing data
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-4900?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-4900:
-----------------------------------------------
Dave Stahl <dstahl(a)redhat.com> changed the Status of [bug 1158535|https://bugzilla.redhat.com/show_bug.cgi?id=1158535] from MODIFIED to ON_QA
> Split-brain: cancelled ST results in missing data
> -------------------------------------------------
>
> Key: ISPN-4900
> URL: https://issues.jboss.org/browse/ISPN-4900
> Project: Infinispan
> Issue Type: Bug
> Components: State Transfer
> Reporter: Radim Vansa
> Assignee: Dan Berindei
> Priority: Critical
> Fix For: 7.0.0.Final
>
> Attachments: log.txt
>
>
> 1. Cluster [A, B, C, D], in CH 1 segment X owned by [D, C]
> 2. Split brain [A, B], [C, D]: A and B detects that D is missing, therefore they get view [A, B, C] and start rebalancing, in CH 2 segment X is owned by [C, B]
> 3. A and B get new view [A, B] (C is missing) and state transfer of X is cancelled, nodes enter degraded mode.
> 4. Split brain is fixed, all nodes find each other and merge - B starts to be AVAILABLE, but still does not have data for X
> 5. Subsequent requests on B return null upon cache.get()
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 5 months