[JBoss JIRA] (ISPN-5955) FAIL_SILENTLY doesn't always prevent rollback with pessimistic transactions
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-5955?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-5955:
----------------------------------
Status: Resolved (was: Pull Request Sent)
Fix Version/s: 8.2.0.Final
8.1.1.Final
Resolution: Done
> FAIL_SILENTLY doesn't always prevent rollback with pessimistic transactions
> ---------------------------------------------------------------------------
>
> Key: ISPN-5955
> URL: https://issues.jboss.org/browse/ISPN-5955
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 8.0.1.Final, 8.1.0.Beta1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Labels: testsuite_stability
> Fix For: 8.2.0.Beta1, 8.2.0.Final, 8.1.1.Final
>
>
> The {{FAIL_SILENTLY}} should "prevent a failing operation from affecting any ongoing JTA transactions", but sometimes this doesn't work properly.
> {{FlagsReplicationTest}} has a transaction using both {{FAIL_SILENTLY}} and {{SKIP_LOCKING}} in the same transaction:
> # A {{lock(FAIL_SILENTLY)(key)}} operation fails.
> Both on the primary owner and on the originator, PessimisticLockingInterceptor sends a TxCompletionNotificationCommand to all the nodes affected by the tx so far, to release the locks. This marks the transaction as completed, but doesn't mark it as rolled back.
> # A {{remove(SKIP_LOCKING)(key)}} operation should then succeed.
> But the operation will invoke a {{ClusteredGetCommand(key, acquireRemoteLock=true, SKIP_LOCKING)}} on both owners, which will in turn invoke locally a {{LockControlCommand(key, SKIP_LOCKING)}}.
> At this point, {{TxInterceptor}} sees that the transaction is already completed, and invokes a {{RollbackCommand}} to mark it as rolled back, then remove it from the transaction table. The command still succeeds.
> # The test then tries to commit the transaction. Usually, the {{PrepareCommand}} doesn't find the remote transaction in the transaction table, and it succeeds.
> But some of the time, because the {{ClusteredGetCommand}} command only uses the first response, the remote transaction is not removed from the transaction table by the time the {{PrepareCommand}} is executed on one of the owners.
> If that happens, the {{PrepareCommand}} executes with a remote transaction instance that's already marked for rollback, and fails when trying to put the key in the context.
> {noformat}
> 15:42:56,736 ERROR (remote-thread-NodeF-p1112-t5:GlobalTx:NodeG-6318:216) [InvocationContextInterceptor] ISPN000136: Error executing command LockControlCommand, writing keys []
> org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 0 milliseconds for key replication.FlagsReplicationTest and requestor GlobalTx:NodeG-6318:216. Lock is held by GlobalTx:NodeG-6318:215
> 15:42:56,802 TRACE (OOB-1,NodeF-64741:) [NonTotalOrderTxPerCacheInboundInvocationHandler] Calling perform() on ClusteredGetCommand{key=replication.FlagsReplicationTest, flags=[SKIP_LOCKING]}
> 15:42:56,803 TRACE (OOB-1,NodeF-64741:GlobalTx:NodeG-6318:216) [InvocationContextInterceptor] Invoked with command LockControlCommand{cache=dist, keys=[replication.FlagsReplicationTest], flags=[SKIP_LOCKING], unlock=false, gtx=GlobalTx:NodeG-6318:216} and InvocationContext [org.infinispan.context.impl.RemoteTxInvocationContext@75bf6da7]
> 15:42:56,822 TRACE (remote-thread-NodeF-p1112-t6:GlobalTx:NodeG-6318:216) [InvocationContextInterceptor] Invoked with command PrepareCommand {modifications=[RemoveCommand{key=replication.FlagsReplicationTest, value=null, flags=[SKIP_LOCKING], valueMatcher=MATCH_ALWAYS}], onePhaseCommit=true, gtx=GlobalTx:NodeG-6318:216, cacheName='dist', topologyId=6} and InvocationContext [org.infinispan.context.impl.RemoteTxInvocationContext@75bf6da7]
> 15:42:56,832 TRACE (OOB-1,NodeF-64741:GlobalTx:NodeG-6318:216) [TxInterceptor] Rolling back remote transaction GlobalTx:NodeG-6318:216 because either already completed (true) or originator no longer in the cluster (false).
> 15:42:56,864 TRACE (remote-thread-NodeF-p1112-t6:GlobalTx:NodeG-6318:216) [EntryFactoryImpl] Creating new entry for key replication.FlagsReplicationTest
> 15:42:56,864 TRACE (remote-thread-NodeF-p1112-t6:GlobalTx:NodeG-6318:216) [BaseSequentialInvocationContext] Interceptor class org.infinispan.interceptors.EntryWrappingInterceptor threw exception org.infinispan.transaction.xa.InvalidTransactionException: This remote transaction GlobalTx:NodeG-6318:216 is already rolled back
> 15:42:56,873 TRACE (remote-thread-NodeF-p1112-t6:GlobalTx:NodeG-6318:216) [TxInterceptor] invokeNextInterceptorAndVerifyTransaction :: originatorMissing=false, alreadyCompleted=true
> 15:42:56,873 TRACE (remote-thread-NodeF-p1112-t6:GlobalTx:NodeG-6318:216) [TxInterceptor] Rolling back remote transaction GlobalTx:NodeG-6318:216 because either already completed (true) or originator no longer in the cluster (false).
> {noformat}
> Possible actions:
> * Never try to release locks from a {{LockControlCommand}}, wait for the {{RollbackCommand}} instead. This could cause problems if the primary owner dies, the transaction tries to lock the same entry again, and the backup owner that became primary wrongfully assumes that it is a proper owner.
> * Use a {{LockControlCommand(unlock=true)}} instead of a {{TxCompletionNotificationCommand}} to release the backup locks after an operation with {{FAIL_SILENTLY}} failed.
> * Don't set {{acquireRemoteLock=true}} when the {{SKIP_LOCKING}} flag has been set.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 2 months
[JBoss JIRA] (ISPN-5481) ConfigurationOverrideTest random failures
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-5481?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-5481:
----------------------------------
Status: Resolved (was: Pull Request Sent)
Fix Version/s: 8.2.0.Final
Resolution: Done
> ConfigurationOverrideTest random failures
> -----------------------------------------
>
> Key: ISPN-5481
> URL: https://issues.jboss.org/browse/ISPN-5481
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Test Suite - Core
> Affects Versions: 7.2.1.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
> Fix For: 8.2.0.Beta1, 8.2.0.Final
>
>
> {{ConfigurationOverrideTest}} uses the default global configuration, and it fails when another test has already registered a cache manager MBean in JMX with the same name:
> {noformat}
> org.infinispan.jmx.JmxDomainConflictException: ISPN000034: There's already a JMX MBean instance type=CacheManager,name="DefaultCacheManager" already registered under 'org.infinispan' JMX domain. If you want to allow multiple instances configured with same JMX domain enable 'allowDuplicateDomains' attribute in 'globalJmxStatistics' config element
> at org.infinispan.jmx.JmxUtil.buildJmxDomain(JmxUtil.java:51)
> at org.infinispan.jmx.CacheManagerJmxRegistration.updateDomain(CacheManagerJmxRegistration.java:79)
> at org.infinispan.jmx.CacheManagerJmxRegistration.buildRegistrar(CacheManagerJmxRegistration.java:73)
> at org.infinispan.jmx.AbstractJmxRegistration.registerMBeans(AbstractJmxRegistration.java:37)
> at org.infinispan.jmx.CacheManagerJmxRegistration.start(CacheManagerJmxRegistration.java:41)
> at org.infinispan.manager.DefaultCacheManager.start(DefaultCacheManager.java:625)
> at org.infinispan.manager.DefaultCacheManager.<init>(DefaultCacheManager.java:218)
> at org.infinispan.manager.DefaultCacheManager.<init>(DefaultCacheManager.java:199)
> at org.infinispan.configuration.ConfigurationOverrideTest.testOverrideWithStore(ConfigurationOverrideTest.java:80)
> {noformat}
> We should verify the other tests as well, to make sure they all use the {{PerThreadMBeanServerLookup}} and/or a unique JMX domain.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 2 months
[JBoss JIRA] (ISPN-6107) State transfer should not fetch segments that were added during a rebalance
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-6107?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-6107:
-------------------------------
Status: Open (was: New)
> State transfer should not fetch segments that were added during a rebalance
> ---------------------------------------------------------------------------
>
> Key: ISPN-6107
> URL: https://issues.jboss.org/browse/ISPN-6107
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Test Suite - Core
> Affects Versions: 8.1.0.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Fix For: 8.2.0.Beta1
>
>
> When the last owner of a segment leaves the cache, the coordinator will update the consistent hash and replace that owner with {{numOwners}} owners (so that a segment always has at least 1 owner). If there is a rebalance in progress, it could be that both the current and the pending CH lost all the owners of a segment, and the coordinator will assign new owners in both CHs (not necessarily the same).
> Sometimes, this causes tests that create clusters with many nodes to spend a lot of time shutting down the cluster. Here's an example:
> # Cluster ABCDE, coordinator A, topology id = 0, currentCH = \{0: CD, 1: BC\}, pendingCH = null
> # D leaves
> # A broadcasts a REBALANCE_START command with topology id 1, members = ABCE, currentCH = \{0: C, 1: BC\}, pendingCH = \{0: BC, 1: BC\}
> # A and E confirm that they finished the rebalance
> # C leaves before sending the data for segment 0 to B
> # A broadcasts a CH_UPDATE command with topology id 2, members = ABE, currentCH = \{0: AE, 1: B\}, pendingCH = \{0: B, 1: B\}
> # A now owns segment 0 in the writeCH (which is the union of currentCH and pendingCH).
> # A tries to request segment 0 from the other owner in the currentCH, E
> # B confirms that it finished the rebalance
> # A broadcasts a new topology: topology id 3, currentCH = \{0: B, 1: B\}, pendingCH = null
> # E installs topology 3, and throws an IllegalArgumentException when handling A's request for segments
> # A is not able to install topology 3, because it requests the transactions data while holding the lock on the LocalCacheStatus
> # A receives the IllegalArgumentException from E and retries. But because it still has the old topology, it retries on E ad infinitum - using a lot of CPU in the process.
> A requesting segment 0 from E is not a problem in itself - normally E would just send back an empty set of transactions and entries. The problem is that the cluster is able to install a new topology, because A already confirmed receiving all the data, but A is stuck with the old topology.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 2 months
[JBoss JIRA] (ISPN-6107) State transfer should not fetch segments that were added during a rebalance
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-6107?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-6107:
-------------------------------
Status: Pull Request Sent (was: Open)
Git Pull Request: https://github.com/infinispan/infinispan/pull/3962
> State transfer should not fetch segments that were added during a rebalance
> ---------------------------------------------------------------------------
>
> Key: ISPN-6107
> URL: https://issues.jboss.org/browse/ISPN-6107
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Test Suite - Core
> Affects Versions: 8.1.0.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Fix For: 8.2.0.Beta1
>
>
> When the last owner of a segment leaves the cache, the coordinator will update the consistent hash and replace that owner with {{numOwners}} owners (so that a segment always has at least 1 owner). If there is a rebalance in progress, it could be that both the current and the pending CH lost all the owners of a segment, and the coordinator will assign new owners in both CHs (not necessarily the same).
> Sometimes, this causes tests that create clusters with many nodes to spend a lot of time shutting down the cluster. Here's an example:
> # Cluster ABCDE, coordinator A, topology id = 0, currentCH = \{0: CD, 1: BC\}, pendingCH = null
> # D leaves
> # A broadcasts a REBALANCE_START command with topology id 1, members = ABCE, currentCH = \{0: C, 1: BC\}, pendingCH = \{0: BC, 1: BC\}
> # A and E confirm that they finished the rebalance
> # C leaves before sending the data for segment 0 to B
> # A broadcasts a CH_UPDATE command with topology id 2, members = ABE, currentCH = \{0: AE, 1: B\}, pendingCH = \{0: B, 1: B\}
> # A now owns segment 0 in the writeCH (which is the union of currentCH and pendingCH).
> # A tries to request segment 0 from the other owner in the currentCH, E
> # B confirms that it finished the rebalance
> # A broadcasts a new topology: topology id 3, currentCH = \{0: B, 1: B\}, pendingCH = null
> # E installs topology 3, and throws an IllegalArgumentException when handling A's request for segments
> # A is not able to install topology 3, because it requests the transactions data while holding the lock on the LocalCacheStatus
> # A receives the IllegalArgumentException from E and retries. But because it still has the old topology, it retries on E ad infinitum - using a lot of CPU in the process.
> A requesting segment 0 from E is not a problem in itself - normally E would just send back an empty set of transactions and entries. The problem is that the cluster is able to install a new topology, because A already confirmed receiving all the data, but A is stuck with the old topology.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 2 months
[JBoss JIRA] (ISPN-5481) ConfigurationOverrideTest random failures
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-5481?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-5481:
-------------------------------
Status: Pull Request Sent (was: Reopened)
Git Pull Request: https://github.com/infinispan/infinispan/pull/3877, https://github.com/infinispan/infinispan/pull/3961 (was: https://github.com/infinispan/infinispan/pull/3877)
> ConfigurationOverrideTest random failures
> -----------------------------------------
>
> Key: ISPN-5481
> URL: https://issues.jboss.org/browse/ISPN-5481
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Test Suite - Core
> Affects Versions: 7.2.1.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
> Fix For: 8.2.0.Beta1
>
>
> {{ConfigurationOverrideTest}} uses the default global configuration, and it fails when another test has already registered a cache manager MBean in JMX with the same name:
> {noformat}
> org.infinispan.jmx.JmxDomainConflictException: ISPN000034: There's already a JMX MBean instance type=CacheManager,name="DefaultCacheManager" already registered under 'org.infinispan' JMX domain. If you want to allow multiple instances configured with same JMX domain enable 'allowDuplicateDomains' attribute in 'globalJmxStatistics' config element
> at org.infinispan.jmx.JmxUtil.buildJmxDomain(JmxUtil.java:51)
> at org.infinispan.jmx.CacheManagerJmxRegistration.updateDomain(CacheManagerJmxRegistration.java:79)
> at org.infinispan.jmx.CacheManagerJmxRegistration.buildRegistrar(CacheManagerJmxRegistration.java:73)
> at org.infinispan.jmx.AbstractJmxRegistration.registerMBeans(AbstractJmxRegistration.java:37)
> at org.infinispan.jmx.CacheManagerJmxRegistration.start(CacheManagerJmxRegistration.java:41)
> at org.infinispan.manager.DefaultCacheManager.start(DefaultCacheManager.java:625)
> at org.infinispan.manager.DefaultCacheManager.<init>(DefaultCacheManager.java:218)
> at org.infinispan.manager.DefaultCacheManager.<init>(DefaultCacheManager.java:199)
> at org.infinispan.configuration.ConfigurationOverrideTest.testOverrideWithStore(ConfigurationOverrideTest.java:80)
> {noformat}
> We should verify the other tests as well, to make sure they all use the {{PerThreadMBeanServerLookup}} and/or a unique JMX domain.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 2 months
[JBoss JIRA] (ISPN-5481) ConfigurationOverrideTest random failures
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-5481?page=com.atlassian.jira.plugin.... ]
Dan Berindei reopened ISPN-5481:
--------------------------------
Found more test methods in {{SecurityXmlFileParsingTest}}, {{ConfigurationOverrideTest}}, and {{ConfigurationUnitTest}} that were using {{new DefaultCacheManager(new GlobalConfigurationBuilder().build())}}.
> ConfigurationOverrideTest random failures
> -----------------------------------------
>
> Key: ISPN-5481
> URL: https://issues.jboss.org/browse/ISPN-5481
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Test Suite - Core
> Affects Versions: 7.2.1.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
> Fix For: 8.2.0.Beta1
>
>
> {{ConfigurationOverrideTest}} uses the default global configuration, and it fails when another test has already registered a cache manager MBean in JMX with the same name:
> {noformat}
> org.infinispan.jmx.JmxDomainConflictException: ISPN000034: There's already a JMX MBean instance type=CacheManager,name="DefaultCacheManager" already registered under 'org.infinispan' JMX domain. If you want to allow multiple instances configured with same JMX domain enable 'allowDuplicateDomains' attribute in 'globalJmxStatistics' config element
> at org.infinispan.jmx.JmxUtil.buildJmxDomain(JmxUtil.java:51)
> at org.infinispan.jmx.CacheManagerJmxRegistration.updateDomain(CacheManagerJmxRegistration.java:79)
> at org.infinispan.jmx.CacheManagerJmxRegistration.buildRegistrar(CacheManagerJmxRegistration.java:73)
> at org.infinispan.jmx.AbstractJmxRegistration.registerMBeans(AbstractJmxRegistration.java:37)
> at org.infinispan.jmx.CacheManagerJmxRegistration.start(CacheManagerJmxRegistration.java:41)
> at org.infinispan.manager.DefaultCacheManager.start(DefaultCacheManager.java:625)
> at org.infinispan.manager.DefaultCacheManager.<init>(DefaultCacheManager.java:218)
> at org.infinispan.manager.DefaultCacheManager.<init>(DefaultCacheManager.java:199)
> at org.infinispan.configuration.ConfigurationOverrideTest.testOverrideWithStore(ConfigurationOverrideTest.java:80)
> {noformat}
> We should verify the other tests as well, to make sure they all use the {{PerThreadMBeanServerLookup}} and/or a unique JMX domain.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 2 months
[JBoss JIRA] (ISPN-6118) CacheNotifierImplInitialTransferDistTest random failures
by Dan Berindei (JIRA)
Dan Berindei created ISPN-6118:
----------------------------------
Summary: CacheNotifierImplInitialTransferDistTest random failures
Key: ISPN-6118
URL: https://issues.jboss.org/browse/ISPN-6118
Project: Infinispan
Issue Type: Bug
Components: Test Suite - Core
Affects Versions: 8.1.0.Final
Reporter: Dan Berindei
Fix For: 8.2.0.Beta1
The failures are probably related to the default consistent hash factory change.
{noformat}
18:54:56,255 ERROR (testng-CacheNotifierImplInitialTransferDistTest:) [UnitTestTestNGListener] Test testCreateAfterIterationBeganAndSegmentNotCompleteValueNonOwnerClustered(org.infinispan.notifications.cachelistener.CacheNotifierImplInitialTransferDistTest) failed.
java.util.concurrent.TimeoutException: Timed out waiting for event pre_complete_segment_invoked
at org.infinispan.test.fwk.CheckPoint.awaitStrict(CheckPoint.java:43) ~[test-classes/:?]
at org.infinispan.test.fwk.CheckPoint.awaitStrict(CheckPoint.java:33) ~[test-classes/:?]
at org.infinispan.notifications.cachelistener.CacheNotifierImplInitialTransferDistTest.testIterationBeganAndSegmentNotComplete(CacheNotifierImplInitialTransferDistTest.java:519) ~[test-classes/:?]
at org.infinispan.notifications.cachelistener.CacheNotifierImplInitialTransferDistTest.testCreateAfterIterationBeganAndSegmentNotCompleteValueNonOwnerClustered(CacheNotifierImplInitialTransferDistTest.java:622) ~[test-classes/:?]
{noformat}
{noformat}
18:55:06,637 ERROR (testng-CacheNotifierImplInitialTransferDistTest:) [UnitTestTestNGListener] Test testCreateAfterIterationBeganAndSegmentNotCompleteValueOwnerClustered(org.infinispan.notifications.cachelistener.CacheNotifierImplInitialTransferDistTest) failed.
java.util.concurrent.TimeoutException: Timed out waiting for event pre_complete_segment_invoked
at org.infinispan.test.fwk.CheckPoint.awaitStrict(CheckPoint.java:43) ~[test-classes/:?]
at org.infinispan.test.fwk.CheckPoint.awaitStrict(CheckPoint.java:33) ~[test-classes/:?]
at org.infinispan.notifications.cachelistener.CacheNotifierImplInitialTransferDistTest.testIterationBeganAndSegmentNotComplete(CacheNotifierImplInitialTransferDistTest.java:519) ~[test-classes/:?]
at org.infinispan.notifications.cachelistener.CacheNotifierImplInitialTransferDistTest.testCreateAfterIterationBeganAndSegmentNotCompleteValueOwnerClustered(CacheNotifierImplInitialTransferDistTest.java:633) ~[test-classes/:?]
{noformat}
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 2 months