]
Pedro Ruivo updated ISPN-5885:
------------------------------
Status: Pull Request Sent (was: Open)
Git Pull Request:
NonTotalOrderPerCacheInvocationHandlerImpl should release locks
---------------------------------------------------------------
Key: ISPN-5885
URL:
https://issues.jboss.org/browse/ISPN-5885
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 8.0.1.Final, 8.1.0.Alpha2
Reporter: Dan Berindei
Assignee: Pedro Ruivo
Labels: testsuite_stability
Fix For: 8.1.0.Final, 8.0.3.Final
Attachments: ThreeNodesReplicatedSplitAndMergeTest.log.zip
Traditionally, the locking interceptor is the one that releases key locks. But after the
ISPN-2849 fix, the locks are acquired by {{NonTotalOrderPerCacheInvocationHandlerImpl}}
before the interceptor chain even starts executing. If there's an exception between
the lock acquisition and the locking interceptor, the locks will never be released.
There is currently one situation where I have reproduced this, in
{{ThreeNodesReplicatedSplitAndMergeTest.testSplitAndMerge0}}. Node {{C}} is split in a
partition by itself, and then it is merged with the {{AB}} partition. After the merge,
JGroups sometimes replays the put command broadcasted in the {{AB}} partition on {{C}}.
{{C}}'s cache is still in degraded mode, so the write fails, but its lock is never
released.
{noformat}
12:53:41,088 INFO (testng-ThreeNodesReplicatedSplitAndMergeTest:[]) [JGroupsTransport]
ISPN000093: Received new, MERGED cluster view for channel ISPN:
MergeView::[NodeA-44116|10] (3) [NodeA-44116, NodeB-58097, NodeC-3920], 2 subgroups:
[NodeA-44116|8] (2) [NodeA-44116, NodeB-58097], [NodeC-3920|9] (1) [NodeC-3920]
12:53:42,143 TRACE (OOB-1,NodeC-3920:[]) [GlobalInboundInvocationHandler] Attempting to
execute CacheRpcCommand: SingleRpcCommand{cacheName='___defaultcache',
command=PutKeyValueCommand{key=MagicKey#null{ced8b1f2@NodeC-3920/9}, value=v22,
flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS,
metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null},
successful=true}} [sender=NodeB-58097]
12:53:42,144 TRACE (OOB-1,NodeC-3920:[]) [DefaultLockManager] Lock
key=MagicKey#null{ced8b1f2@NodeC-3920/9} for owner=CommandUUID{address=NodeB-58097,
id=48899}. timeout=10000 (MILLISECONDS)
12:53:42,144 TRACE (OOB-1,NodeC-3920:[]) [InfinispanLock]
LockPlaceHolder{lockState=ACQUIRED, owner=CommandUUID{address=NodeB-58097, id=48899}}
successfully acquired the lock.
12:53:42,144 TRACE (remote-thread-NodeC-p36993-t5:[]) [InvocationContextInterceptor]
Invoked with command PutKeyValueCommand{key=MagicKey#null{ced8b1f2@NodeC-3920/9},
value=v22, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS,
metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null},
successful=true} and InvocationContext
[org.infinispan.context.SingleKeyNonTxInvocationContext@67bb1b50]
12:53:42,144 TRACE (remote-thread-NodeC-p36993-t5:[]) [PartitionHandlingManagerImpl]
Checking availability for key=MagicKey#null{ced8b1f2@NodeC-3920/9}, status=DEGRADED_MODE
12:53:42,144 TRACE (remote-thread-NodeC-p36993-t5:[]) [PartitionHandlingManagerImpl]
Partition is in DEGRADED_MODE mode, access is not allowed for key
MagicKey#null{ced8b1f2@NodeC-3920/9}
12:53:42,144 ERROR (remote-thread-NodeC-p36993-t5:[]) [InvocationContextInterceptor]
ISPN000136: Execution error
org.infinispan.partitionhandling.AvailabilityException: ISPN000306: Key
'MagicKey#null{ced8b1f2@NodeC-3920/9}' is not available. Not all owners are in
this partition
12:53:42,144 WARN (remote-thread-NodeC-p36993-t5:[])
[NonTotalOrderPerCacheInboundInvocationHandler] ISPN000071: Caught exception when handling
command SingleRpcCommand{cacheName='___defaultcache',
command=PutKeyValueCommand{key=MagicKey#null{ced8b1f2@NodeC-3920/9}, value=v22,
flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS,
metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null},
successful=true}}
{noformat}
Of course, knowing that {{C}} is in degraded mode, perhaps we should not try to acquire
the lock at all. But then we'd move even more responsibility into the invocation
handler. There's also the possibility of having exceptions caused by -topology changes
or- programming errors, so a solution that handles any exception would be preferable.