]
Paul Ferraro commented on ISPN-5955:
------------------------------------
[~dan.berindei] Can we backport this fix to 8.1.x?
FAIL_SILENTLY doesn't always prevent rollback with pessimistic
transactions
---------------------------------------------------------------------------
Key: ISPN-5955
URL:
https://issues.jboss.org/browse/ISPN-5955
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 8.0.1.Final, 8.1.0.Beta1
Reporter: Dan Berindei
Assignee: Dan Berindei
Labels: testsuite_stability
Fix For: 8.2.0.Beta1
The {{FAIL_SILENTLY}} should "prevent a failing operation from affecting any ongoing
JTA transactions", but sometimes this doesn't work properly.
{{FlagsReplicationTest}} has a transaction using both {{FAIL_SILENTLY}} and
{{SKIP_LOCKING}} in the same transaction:
# A {{lock(FAIL_SILENTLY)(key)}} operation fails.
Both on the primary owner and on the originator, PessimisticLockingInterceptor sends a
TxCompletionNotificationCommand to all the nodes affected by the tx so far, to release the
locks. This marks the transaction as completed, but doesn't mark it as rolled back.
# A {{remove(SKIP_LOCKING)(key)}} operation should then succeed.
But the operation will invoke a {{ClusteredGetCommand(key, acquireRemoteLock=true,
SKIP_LOCKING)}} on both owners, which will in turn invoke locally a
{{LockControlCommand(key, SKIP_LOCKING)}}.
At this point, {{TxInterceptor}} sees that the transaction is already completed, and
invokes a {{RollbackCommand}} to mark it as rolled back, then remove it from the
transaction table. The command still succeeds.
# The test then tries to commit the transaction. Usually, the {{PrepareCommand}}
doesn't find the remote transaction in the transaction table, and it succeeds.
But some of the time, because the {{ClusteredGetCommand}} command only uses the first
response, the remote transaction is not removed from the transaction table by the time the
{{PrepareCommand}} is executed on one of the owners.
If that happens, the {{PrepareCommand}} executes with a remote transaction instance
that's already marked for rollback, and fails when trying to put the key in the
context.
{noformat}
15:42:56,736 ERROR (remote-thread-NodeF-p1112-t5:GlobalTx:NodeG-6318:216)
[InvocationContextInterceptor] ISPN000136: Error executing command LockControlCommand,
writing keys []
org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after
0 milliseconds for key replication.FlagsReplicationTest and requestor
GlobalTx:NodeG-6318:216. Lock is held by GlobalTx:NodeG-6318:215
15:42:56,802 TRACE (OOB-1,NodeF-64741:) [NonTotalOrderTxPerCacheInboundInvocationHandler]
Calling perform() on ClusteredGetCommand{key=replication.FlagsReplicationTest,
flags=[SKIP_LOCKING]}
15:42:56,803 TRACE (OOB-1,NodeF-64741:GlobalTx:NodeG-6318:216)
[InvocationContextInterceptor] Invoked with command LockControlCommand{cache=dist,
keys=[replication.FlagsReplicationTest], flags=[SKIP_LOCKING], unlock=false,
gtx=GlobalTx:NodeG-6318:216} and InvocationContext
[org.infinispan.context.impl.RemoteTxInvocationContext@75bf6da7]
15:42:56,822 TRACE (remote-thread-NodeF-p1112-t6:GlobalTx:NodeG-6318:216)
[InvocationContextInterceptor] Invoked with command PrepareCommand
{modifications=[RemoveCommand{key=replication.FlagsReplicationTest, value=null,
flags=[SKIP_LOCKING], valueMatcher=MATCH_ALWAYS}], onePhaseCommit=true,
gtx=GlobalTx:NodeG-6318:216, cacheName='dist', topologyId=6} and InvocationContext
[org.infinispan.context.impl.RemoteTxInvocationContext@75bf6da7]
15:42:56,832 TRACE (OOB-1,NodeF-64741:GlobalTx:NodeG-6318:216) [TxInterceptor] Rolling
back remote transaction GlobalTx:NodeG-6318:216 because either already completed (true) or
originator no longer in the cluster (false).
15:42:56,864 TRACE (remote-thread-NodeF-p1112-t6:GlobalTx:NodeG-6318:216)
[EntryFactoryImpl] Creating new entry for key replication.FlagsReplicationTest
15:42:56,864 TRACE (remote-thread-NodeF-p1112-t6:GlobalTx:NodeG-6318:216)
[BaseSequentialInvocationContext] Interceptor class
org.infinispan.interceptors.EntryWrappingInterceptor threw exception
org.infinispan.transaction.xa.InvalidTransactionException: This remote transaction
GlobalTx:NodeG-6318:216 is already rolled back
15:42:56,873 TRACE (remote-thread-NodeF-p1112-t6:GlobalTx:NodeG-6318:216) [TxInterceptor]
invokeNextInterceptorAndVerifyTransaction :: originatorMissing=false,
alreadyCompleted=true
15:42:56,873 TRACE (remote-thread-NodeF-p1112-t6:GlobalTx:NodeG-6318:216) [TxInterceptor]
Rolling back remote transaction GlobalTx:NodeG-6318:216 because either already completed
(true) or originator no longer in the cluster (false).
{noformat}
Possible actions:
* Never try to release locks from a {{LockControlCommand}}, wait for the
{{RollbackCommand}} instead. This could cause problems if the primary owner dies, the
transaction tries to lock the same entry again, and the backup owner that became primary
wrongfully assumes that it is a proper owner.
* Use a {{LockControlCommand(unlock=true)}} instead of a
{{TxCompletionNotificationCommand}} to release the backup locks after an operation with
{{FAIL_SILENTLY}} failed.
* Don't set {{acquireRemoteLock=true}} when the {{SKIP_LOCKING}} flag has been set.