[infinispan-issues] [JBoss JIRA] (ISPN-3803) Stale locks during state transfer in non-tx caches

Dan Berindei (JIRA) jira-events at lists.jboss.org
Tue Dec 10 02:41:05 EST 2013


Dan Berindei created ISPN-3803:
----------------------------------

             Summary: Stale locks during state transfer in non-tx caches
                 Key: ISPN-3803
                 URL: https://issues.jboss.org/browse/ISPN-3803
             Project: Infinispan
          Issue Type: Bug
          Components: Locking and Concurrency, State transfer
    Affects Versions: 6.0.0.Final
            Reporter: Dan Berindei
            Assignee: Dan Berindei
            Priority: Critical
             Fix For: 7.0.0.Final


The problem is that {{SingleKeyNonTxInvocationContext.addLockedKey()}} only sets a {{isLocked}} flag, the actual key is set when the entry is wrapped and inserted in the context. If the topology changes between the lock acquisition and the entry wrapping,  {{SingleKeyNonTxInvocationContext.getLockedKeys()}} will return an empty set and the lock won't be released.

Future commands won't try to acquire a lock on this node, so the stale lock is harmless most of the time. But if there was already a command waiting for the lock, that command will time out (instead of retrying on the new primary owner).

This causes random failures in NonTxPutIfAbsentDuringJoinStressTest:

{noformat}
12:34:50,960 TRACE (ForkThread-3,NonTxPutIfAbsentDuringJoinStressTest:___defaultcache) [InvocationContextInterceptor] Invoked with command PutKeyValueCommand{key=key_52, value=value_52_2, flags=null, putIfAbsent=true, valueMatcher=MATCH_EXPECTED, metadata=EmbeddedMetadata{version=null}, successful=true} and InvocationContext [org.infinispan.context.SingleKeyNonTxInvocationContext at 478c9796]
12:34:50,960 TRACE (ForkThread-3,NonTxPutIfAbsentDuringJoinStressTest:___defaultcache) [NonTransactionalLockingInterceptor] Are (NodeA-19338) we the lock owners for key 'key_52'? true
12:34:50,960 TRACE (ForkThread-3,NonTxPutIfAbsentDuringJoinStressTest:___defaultcache) [LockManagerImpl] Attempting to lock key_52 with acquisition timeout of 10000 millis
12:34:50,962 TRACE (remote-thread-1,NodeA:___defaultcache) [NonTransactionalLockingInterceptor] Are (NodeA-19338) we the lock owners for key 'key_52'? true
12:34:50,962 TRACE (remote-thread-1,NodeA:___defaultcache) [LockManagerImpl] Attempting to lock key_52 with acquisition timeout of 10000 millis
12:34:50,965 TRACE (asyncTransportThread-0,NodeA:___defaultcache) [StateConsumerImpl] Received new topology for cache ___defaultcache, isRebalance = false, isMember = true, topology = CacheTopology{id=4, currentCH=DefaultConsistentHash{numSegments=60, numOwners=2, members=[NodeA-19338, NodeB-49967, NodeC-56763]}, pendingCH=null}
12:34:50,966 TRACE (ForkThread-3,NonTxPutIfAbsentDuringJoinStressTest:___defaultcache) [LockManagerImpl] Successfully acquired lock key_52!
*** 12:34:50,966 TRACE (ForkThread-3,NonTxPutIfAbsentDuringJoinStressTest:___defaultcache) [EntryWrappingInterceptor] Wrapping entry 'key_52'? false
12:34:50,966 TRACE (ForkThread-3,NonTxPutIfAbsentDuringJoinStressTest:___defaultcache) [StateTransferInterceptor] Retrying command because of topology change: PutKeyValueCommand{key=key_52, value=value_52_2, flags=null, putIfAbsent=true, valueMatcher=MATCH_EXPECTED, metadata=EmbeddedMetadata{version=null}, successful=true}
12:34:50,978 TRACE (ForkThread-3,NonTxPutIfAbsentDuringJoinStressTest:___defaultcache) [NonTransactionalLockingInterceptor] Are (NodeA-19338) we the lock owners for key 'key_52'? false
12:34:50,978 TRACE (ForkThread-3,NonTxPutIfAbsentDuringJoinStressTest:___defaultcache) [EntryWrappingInterceptor] Wrapping entry 'key_52'? false
12:34:50,978 TRACE (ForkThread-3,NonTxPutIfAbsentDuringJoinStressTest:___defaultcache) [BaseDistributionInterceptor] I'm not the primary owner, so sending the command to the primary owner(NodeC-56763) in order to be forwarded
12:34:51,007 TRACE (ForkThread-3,NonTxPutIfAbsentDuringJoinStressTest:___defaultcache) [EntryWrappingInterceptor] The return value is value_52_0
12:35:00,963 TRACE (remote-thread-1,NodeA:___defaultcache) [ReentrantPerEntryLockContainer] Timed out attempting to acquire lock for key key_52 after 10 seconds
12:35:00,963 DEBUG (remote-thread-1,NodeA:___defaultcache) [LockManagerImpl] Failed to acquire lock key_52, owner is Thread[ForkThread-3,NonTxPutIfAbsentDuringJoinStressTest,5,]
12:35:00,963 DEBUG (remote-thread-1,NodeA:___defaultcache) [LockManagerImpl] This transaction (Thread[remote-thread-1,NodeA,5,main]) already owned locks []
12:35:00,966 ERROR (testng-NonTxPutIfAbsentDuringJoinStressTest:) [UnitTestTestNGListener] Test testNodeJoiningDuringPutIfAbsent(org.infinispan.distribution.rehash.NonTxPutIfAbsentDuringJoinStressTest) failed.
java.util.concurrent.ExecutionException: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from NodeA-19338, see cause for remote stack trace
Caused by: org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [10 seconds] on key [key_52] for requestor [Thread[remote-thread-1,NodeA,5,main]]! Lock held by [Thread[ForkThread-3,NonTxPutIfAbsentDuringJoinStressTest,5,]]{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the infinispan-issues mailing list