]
Dan Berindei updated ISPN-3803:
-------------------------------
Status: Pull Request Sent (was: Open)
Git Pull Request:
Stale locks during state transfer in non-tx caches
--------------------------------------------------
Key: ISPN-3803
URL:
https://issues.jboss.org/browse/ISPN-3803
Project: Infinispan
Issue Type: Bug
Components: Locking and Concurrency, State transfer
Affects Versions: 6.0.0.Final
Reporter: Dan Berindei
Assignee: Dan Berindei
Priority: Critical
Labels: testsuite_stability
Fix For: 7.0.0.Final
The problem is that {{SingleKeyNonTxInvocationContext.addLockedKey()}} only sets a
{{isLocked}} flag, the actual key is set when the entry is wrapped and inserted in the
context. If the topology changes between the lock acquisition and the entry wrapping,
{{SingleKeyNonTxInvocationContext.getLockedKeys()}} will return an empty set and the lock
won't be released.
Future commands won't try to acquire a lock on this node, so the stale lock is
harmless most of the time. But if there was already a command waiting for the lock, that
command will time out (instead of retrying on the new primary owner).
This causes random failures in NonTxPutIfAbsentDuringJoinStressTest:
{noformat}
12:34:50,960 TRACE (ForkThread-3,NonTxPutIfAbsentDuringJoinStressTest:___defaultcache)
[InvocationContextInterceptor] Invoked with command PutKeyValueCommand{key=key_52,
value=value_52_2, flags=null, putIfAbsent=true, valueMatcher=MATCH_EXPECTED,
metadata=EmbeddedMetadata{version=null}, successful=true} and InvocationContext
[org.infinispan.context.SingleKeyNonTxInvocationContext@478c9796]
12:34:50,960 TRACE (ForkThread-3,NonTxPutIfAbsentDuringJoinStressTest:___defaultcache)
[NonTransactionalLockingInterceptor] Are (NodeA-19338) we the lock owners for key
'key_52'? true
12:34:50,960 TRACE (ForkThread-3,NonTxPutIfAbsentDuringJoinStressTest:___defaultcache)
[LockManagerImpl] Attempting to lock key_52 with acquisition timeout of 10000 millis
12:34:50,962 TRACE (remote-thread-1,NodeA:___defaultcache)
[NonTransactionalLockingInterceptor] Are (NodeA-19338) we the lock owners for key
'key_52'? true
12:34:50,962 TRACE (remote-thread-1,NodeA:___defaultcache) [LockManagerImpl] Attempting
to lock key_52 with acquisition timeout of 10000 millis
12:34:50,965 TRACE (asyncTransportThread-0,NodeA:___defaultcache) [StateConsumerImpl]
Received new topology for cache ___defaultcache, isRebalance = false, isMember = true,
topology = CacheTopology{id=4, currentCH=DefaultConsistentHash{numSegments=60,
numOwners=2, members=[NodeA-19338, NodeB-49967, NodeC-56763]}, pendingCH=null}
12:34:50,966 TRACE (ForkThread-3,NonTxPutIfAbsentDuringJoinStressTest:___defaultcache)
[LockManagerImpl] Successfully acquired lock key_52!
*** 12:34:50,966 TRACE
(ForkThread-3,NonTxPutIfAbsentDuringJoinStressTest:___defaultcache)
[EntryWrappingInterceptor] Wrapping entry 'key_52'? false
12:34:50,966 TRACE (ForkThread-3,NonTxPutIfAbsentDuringJoinStressTest:___defaultcache)
[StateTransferInterceptor] Retrying command because of topology change:
PutKeyValueCommand{key=key_52, value=value_52_2, flags=null, putIfAbsent=true,
valueMatcher=MATCH_EXPECTED, metadata=EmbeddedMetadata{version=null}, successful=true}
12:34:50,978 TRACE (ForkThread-3,NonTxPutIfAbsentDuringJoinStressTest:___defaultcache)
[NonTransactionalLockingInterceptor] Are (NodeA-19338) we the lock owners for key
'key_52'? false
12:34:50,978 TRACE (ForkThread-3,NonTxPutIfAbsentDuringJoinStressTest:___defaultcache)
[EntryWrappingInterceptor] Wrapping entry 'key_52'? false
12:34:50,978 TRACE (ForkThread-3,NonTxPutIfAbsentDuringJoinStressTest:___defaultcache)
[BaseDistributionInterceptor] I'm not the primary owner, so sending the command to the
primary owner(NodeC-56763) in order to be forwarded
12:34:51,007 TRACE (ForkThread-3,NonTxPutIfAbsentDuringJoinStressTest:___defaultcache)
[EntryWrappingInterceptor] The return value is value_52_0
12:35:00,963 TRACE (remote-thread-1,NodeA:___defaultcache)
[ReentrantPerEntryLockContainer] Timed out attempting to acquire lock for key key_52 after
10 seconds
12:35:00,963 DEBUG (remote-thread-1,NodeA:___defaultcache) [LockManagerImpl] Failed to
acquire lock key_52, owner is
Thread[ForkThread-3,NonTxPutIfAbsentDuringJoinStressTest,5,]
12:35:00,963 DEBUG (remote-thread-1,NodeA:___defaultcache) [LockManagerImpl] This
transaction (Thread[remote-thread-1,NodeA,5,main]) already owned locks []
12:35:00,966 ERROR (testng-NonTxPutIfAbsentDuringJoinStressTest:)
[UnitTestTestNGListener] Test
testNodeJoiningDuringPutIfAbsent(org.infinispan.distribution.rehash.NonTxPutIfAbsentDuringJoinStressTest)
failed.
java.util.concurrent.ExecutionException: org.infinispan.remoting.RemoteException:
ISPN000217: Received exception from NodeA-19338, see cause for remote stack trace
Caused by: org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after
[10 seconds] on key [key_52] for requestor [Thread[remote-thread-1,NodeA,5,main]]! Lock
held by [Thread[ForkThread-3,NonTxPutIfAbsentDuringJoinStressTest,5,]]{noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: