[JBoss JIRA] (ISPN-5885) NonTotalOrderPerCacheInvocationHandlerImpl should release locks
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-5885?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-5885:
----------------------------------
Fix Version/s: 8.0.3.Final
(was: 8.0.2.Final)
> NonTotalOrderPerCacheInvocationHandlerImpl should release locks
> ---------------------------------------------------------------
>
> Key: ISPN-5885
> URL: https://issues.jboss.org/browse/ISPN-5885
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 8.0.1.Final, 8.1.0.Alpha2
> Reporter: Dan Berindei
> Assignee: Pedro Ruivo
> Labels: testsuite_stability
> Fix For: 8.1.0.Final, 8.0.3.Final
>
> Attachments: ThreeNodesReplicatedSplitAndMergeTest.log.zip
>
>
> Traditionally, the locking interceptor is the one that releases key locks. But after the ISPN-2849 fix, the locks are acquired by {{NonTotalOrderPerCacheInvocationHandlerImpl}} before the interceptor chain even starts executing. If there's an exception between the lock acquisition and the locking interceptor, the locks will never be released.
> There is currently one situation where I have reproduced this, in {{ThreeNodesReplicatedSplitAndMergeTest.testSplitAndMerge0}}. Node {{C}} is split in a partition by itself, and then it is merged with the {{AB}} partition. After the merge, JGroups sometimes replays the put command broadcasted in the {{AB}} partition on {{C}}. {{C}}'s cache is still in degraded mode, so the write fails, but its lock is never released.
> {noformat}
> 12:53:41,088 INFO (testng-ThreeNodesReplicatedSplitAndMergeTest:[]) [JGroupsTransport] ISPN000093: Received new, MERGED cluster view for channel ISPN: MergeView::[NodeA-44116|10] (3) [NodeA-44116, NodeB-58097, NodeC-3920], 2 subgroups: [NodeA-44116|8] (2) [NodeA-44116, NodeB-58097], [NodeC-3920|9] (1) [NodeC-3920]
> 12:53:42,143 TRACE (OOB-1,NodeC-3920:[]) [GlobalInboundInvocationHandler] Attempting to execute CacheRpcCommand: SingleRpcCommand{cacheName='___defaultcache', command=PutKeyValueCommand{key=MagicKey#null{ced8b1f2@NodeC-3920/9}, value=v22, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true}} [sender=NodeB-58097]
> 12:53:42,144 TRACE (OOB-1,NodeC-3920:[]) [DefaultLockManager] Lock key=MagicKey#null{ced8b1f2@NodeC-3920/9} for owner=CommandUUID{address=NodeB-58097, id=48899}. timeout=10000 (MILLISECONDS)
> 12:53:42,144 TRACE (OOB-1,NodeC-3920:[]) [InfinispanLock] LockPlaceHolder{lockState=ACQUIRED, owner=CommandUUID{address=NodeB-58097, id=48899}} successfully acquired the lock.
> 12:53:42,144 TRACE (remote-thread-NodeC-p36993-t5:[]) [InvocationContextInterceptor] Invoked with command PutKeyValueCommand{key=MagicKey#null{ced8b1f2@NodeC-3920/9}, value=v22, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true} and InvocationContext [org.infinispan.context.SingleKeyNonTxInvocationContext@67bb1b50]
> 12:53:42,144 TRACE (remote-thread-NodeC-p36993-t5:[]) [PartitionHandlingManagerImpl] Checking availability for key=MagicKey#null{ced8b1f2@NodeC-3920/9}, status=DEGRADED_MODE
> 12:53:42,144 TRACE (remote-thread-NodeC-p36993-t5:[]) [PartitionHandlingManagerImpl] Partition is in DEGRADED_MODE mode, access is not allowed for key MagicKey#null{ced8b1f2@NodeC-3920/9}
> 12:53:42,144 ERROR (remote-thread-NodeC-p36993-t5:[]) [InvocationContextInterceptor] ISPN000136: Execution error
> org.infinispan.partitionhandling.AvailabilityException: ISPN000306: Key 'MagicKey#null{ced8b1f2@NodeC-3920/9}' is not available. Not all owners are in this partition
> 12:53:42,144 WARN (remote-thread-NodeC-p36993-t5:[]) [NonTotalOrderPerCacheInboundInvocationHandler] ISPN000071: Caught exception when handling command SingleRpcCommand{cacheName='___defaultcache', command=PutKeyValueCommand{key=MagicKey#null{ced8b1f2@NodeC-3920/9}, value=v22, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true}}
> {noformat}
> Of course, knowing that {{C}} is in degraded mode, perhaps we should not try to acquire the lock at all. But then we'd move even more responsibility into the invocation handler. There's also the possibility of having exceptions caused by -topology changes or- programming errors, so a solution that handles any exception would be preferable.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 5 months
[JBoss JIRA] (ISPN-5955) FAIL_SILENTLY doesn't always prevent rollback with pessimistic transactions
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-5955?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-5955:
----------------------------------
Fix Version/s: 8.1.0.Final
(was: 8.1.0.CR1)
> FAIL_SILENTLY doesn't always prevent rollback with pessimistic transactions
> ---------------------------------------------------------------------------
>
> Key: ISPN-5955
> URL: https://issues.jboss.org/browse/ISPN-5955
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 8.0.1.Final, 8.1.0.Beta1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Labels: testsuite_stability
> Fix For: 8.1.0.Final
>
>
> The {{FAIL_SILENTLY}} should "prevent a failing operation from affecting any ongoing JTA transactions", but sometimes this doesn't work properly.
> {{FlagsReplicationTest}} has a transaction using both {{FAIL_SILENTLY}} and {{SKIP_LOCKING}} in the same transaction:
> # A {{lock(FAIL_SILENTLY)(key)}} operation fails.
> Both on the primary owner and on the originator, PessimisticLockingInterceptor sends a TxCompletionNotificationCommand to all the nodes affected by the tx so far, to release the locks. This marks the transaction as completed, but doesn't mark it as rolled back.
> # A {{remove(SKIP_LOCKING)(key)}} operation should then succeed.
> But the operation will invoke a {{ClusteredGetCommand(key, acquireRemoteLock=true, SKIP_LOCKING)}} on both owners, which will in turn invoke locally a {{LockControlCommand(key, SKIP_LOCKING)}}.
> At this point, {{TxInterceptor}} sees that the transaction is already completed, and invokes a {{RollbackCommand}} to mark it as rolled back, then remove it from the transaction table. The command still succeeds.
> # The test then tries to commit the transaction. Usually, the {{PrepareCommand}} doesn't find the remote transaction in the transaction table, and it succeeds.
> But some of the time, because the {{ClusteredGetCommand}} command only uses the first response, the remote transaction is not removed from the transaction table by the time the {{PrepareCommand}} is executed on one of the owners.
> If that happens, the {{PrepareCommand}} executes with a remote transaction instance that's already marked for rollback, and fails when trying to put the key in the context.
> {noformat}
> 15:42:56,736 ERROR (remote-thread-NodeF-p1112-t5:GlobalTx:NodeG-6318:216) [InvocationContextInterceptor] ISPN000136: Error executing command LockControlCommand, writing keys []
> org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 0 milliseconds for key replication.FlagsReplicationTest and requestor GlobalTx:NodeG-6318:216. Lock is held by GlobalTx:NodeG-6318:215
> 15:42:56,802 TRACE (OOB-1,NodeF-64741:) [NonTotalOrderTxPerCacheInboundInvocationHandler] Calling perform() on ClusteredGetCommand{key=replication.FlagsReplicationTest, flags=[SKIP_LOCKING]}
> 15:42:56,803 TRACE (OOB-1,NodeF-64741:GlobalTx:NodeG-6318:216) [InvocationContextInterceptor] Invoked with command LockControlCommand{cache=dist, keys=[replication.FlagsReplicationTest], flags=[SKIP_LOCKING], unlock=false, gtx=GlobalTx:NodeG-6318:216} and InvocationContext [org.infinispan.context.impl.RemoteTxInvocationContext@75bf6da7]
> 15:42:56,822 TRACE (remote-thread-NodeF-p1112-t6:GlobalTx:NodeG-6318:216) [InvocationContextInterceptor] Invoked with command PrepareCommand {modifications=[RemoveCommand{key=replication.FlagsReplicationTest, value=null, flags=[SKIP_LOCKING], valueMatcher=MATCH_ALWAYS}], onePhaseCommit=true, gtx=GlobalTx:NodeG-6318:216, cacheName='dist', topologyId=6} and InvocationContext [org.infinispan.context.impl.RemoteTxInvocationContext@75bf6da7]
> 15:42:56,832 TRACE (OOB-1,NodeF-64741:GlobalTx:NodeG-6318:216) [TxInterceptor] Rolling back remote transaction GlobalTx:NodeG-6318:216 because either already completed (true) or originator no longer in the cluster (false).
> 15:42:56,864 TRACE (remote-thread-NodeF-p1112-t6:GlobalTx:NodeG-6318:216) [EntryFactoryImpl] Creating new entry for key replication.FlagsReplicationTest
> 15:42:56,864 TRACE (remote-thread-NodeF-p1112-t6:GlobalTx:NodeG-6318:216) [BaseSequentialInvocationContext] Interceptor class org.infinispan.interceptors.EntryWrappingInterceptor threw exception org.infinispan.transaction.xa.InvalidTransactionException: This remote transaction GlobalTx:NodeG-6318:216 is already rolled back
> 15:42:56,873 TRACE (remote-thread-NodeF-p1112-t6:GlobalTx:NodeG-6318:216) [TxInterceptor] invokeNextInterceptorAndVerifyTransaction :: originatorMissing=false, alreadyCompleted=true
> 15:42:56,873 TRACE (remote-thread-NodeF-p1112-t6:GlobalTx:NodeG-6318:216) [TxInterceptor] Rolling back remote transaction GlobalTx:NodeG-6318:216 because either already completed (true) or originator no longer in the cluster (false).
> {noformat}
> Possible actions:
> * Never try to release locks from a {{LockControlCommand}}, wait for the {{RollbackCommand}} instead. This could cause problems if the primary owner dies, the transaction tries to lock the same entry again, and the backup owner that became primary wrongfully assumes that it is a proper owner.
> * Use a {{LockControlCommand(unlock=true)}} instead of a {{TxCompletionNotificationCommand}} to release the backup locks after an operation with {{FAIL_SILENTLY}} failed.
> * Don't set {{acquireRemoteLock=true}} when the {{SKIP_LOCKING}} flag has been set.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 5 months
[JBoss JIRA] (ISPN-5937) HotRod client ExpiryTest.testGlobalExpiryPutWithFlag random failures
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-5937?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-5937:
----------------------------------
Fix Version/s: 8.1.0.Final
(was: 8.1.0.CR1)
> HotRod client ExpiryTest.testGlobalExpiryPutWithFlag random failures
> --------------------------------------------------------------------
>
> Key: ISPN-5937
> URL: https://issues.jboss.org/browse/ISPN-5937
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Server
> Affects Versions: 8.1.0.Beta1
> Reporter: Dan Berindei
> Priority: Critical
> Labels: testsuite_stability
> Fix For: 8.1.0.Final
>
>
> The put operation succeeds, but the get immediately afterwards doesn't find the entry:
> {noformat}
> java.lang.AssertionError: expected:<v1> but was:<null>
> at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:101)
> at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:108)
> at org.infinispan.client.hotrod.ExpiryTest.expectCachedThenExpired(ExpiryTest.java:103)
> at org.infinispan.client.hotrod.ExpiryTest.expectExpiryAfterRequest(ExpiryTest.java:99)
> at org.infinispan.client.hotrod.ExpiryTest.testGlobalExpiryPutWithFlag(ExpiryTest.java:46){noformat}
> http://ci.infinispan.org/viewLog.html?buildId=31770&tab=buildResultsDiv&b...
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 5 months
[JBoss JIRA] (ISPN-5929) InfinispanQueryIT.testQueryOnFirstNode random failures
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-5929?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-5929:
----------------------------------
Fix Version/s: 8.1.0.Final
(was: 8.1.0.CR1)
> InfinispanQueryIT.testQueryOnFirstNode random failures
> ------------------------------------------------------
>
> Key: ISPN-5929
> URL: https://issues.jboss.org/browse/ISPN-5929
> Project: Infinispan
> Issue Type: Bug
> Components: Integration , Test Suite - Query
> Affects Versions: 8.1.0.Alpha2
> Reporter: Dan Berindei
> Assignee: Adrian Nistor
> Priority: Blocker
> Fix For: 8.1.0.Final
>
>
> {{InfinispanQueryIT.testQueryOnFirstNode()}} and {{InfinispanQueryIT.testQueryOnSecondNode()}} fail randomly in CI with this assertion:
> {noformat}
> java.lang.AssertionError: expected:<3> but was:<2>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at org.junit.Assert.assertEquals(Assert.java:542)
> at org.infinispan.test.integration.as.query.InfinispanQueryIT.testQueryOnFirstNode(InfinispanQueryIT.java:99)
> {noformat}
> Example: http://ci.infinispan.org/viewLog.html?buildId=31810&tab=buildResultsDiv&b...
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 5 months
[JBoss JIRA] (ISPN-5989) Use DC.keyEquivalence().hashCode() for hashing into segments
by Radim Vansa (JIRA)
Radim Vansa created ISPN-5989:
---------------------------------
Summary: Use DC.keyEquivalence().hashCode() for hashing into segments
Key: ISPN-5989
URL: https://issues.jboss.org/browse/ISPN-5989
Project: Infinispan
Issue Type: Enhancement
Components: Core
Affects Versions: 8.1.0.Beta1
Reporter: Radim Vansa
Assignee: Radim Vansa
Custom Equivalence is not taken into account in key -> segment decision.
This is also related to ISPN-3905, the fix should remove the expensive string encoding as well.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 5 months
[JBoss JIRA] (ISPN-5988) Fine-/Coarse-grained AtomicMap info is not preserved in cache
by Radim Vansa (JIRA)
Radim Vansa created ISPN-5988:
---------------------------------
Summary: Fine-/Coarse-grained AtomicMap info is not preserved in cache
Key: ISPN-5988
URL: https://issues.jboss.org/browse/ISPN-5988
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 8.1.0.Beta1
Reporter: Radim Vansa
Assignee: Pedro Ruivo
In `AtomicMapAPITest#testAtomicMapAfterFineGrainedAtomicMap` we test that one entry cannot be mapped to both fine- and coarse-grained atomic map. However, the test relies on local node being owner, because the check is based on type of the proxy - but the proxy is not persisted in cache.
So in practice you can access the fine-grained map as coarse-grained and vice versa.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 5 months