[JBoss JIRA] (ISPN-5883) Node can apply new topology after sending status response
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-5883?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-5883:
------------------------------------
This is another way the test can fail, with the same cause:
{noformat}
12:35:10,499 ERROR (testng-OptimisticTxPartitionAndMergeDuringCommitTest:) [UnitTestTestNGListener] Test testDegradedPartition(org.infinispan.partitionhandling.OptimisticTxPartitionAndMergeDuringCommitTest) failed.
java.lang.AssertionError: There are pending transactions!
at org.testng.AssertJUnit.fail(AssertJUnit.java:59) ~[testng-6.8.8.jar:?]
at org.testng.AssertJUnit.assertTrue(AssertJUnit.java:24) ~[testng-6.8.8.jar:?]
at org.infinispan.test.AbstractInfinispanTest.eventually(AbstractInfinispanTest.java:109) ~[test-classes/:?]
at org.infinispan.test.AbstractInfinispanTest.eventually(AbstractInfinispanTest.java:404) ~[test-classes/:?]
at org.infinispan.test.MultipleCacheManagersTest.assertNoTransactions(MultipleCacheManagersTest.java:560) ~[test-classes/:?]
at org.infinispan.partitionhandling.BaseTxPartitionAndMergeTest.finalAsserts(BaseTxPartitionAndMergeTest.java:90) ~[test-classes/:?]
at org.infinispan.partitionhandling.BaseOptimisticTxPartitionAndMergeTest.doTest(BaseOptimisticTxPartitionAndMergeTest.java:77) ~[test-classes/:?]
at org.infinispan.partitionhandling.OptimisticTxPartitionAndMergeDuringCommitTest.testDegradedPartition(OptimisticTxPartitionAndMergeDuringCommitTest.java:29) ~[test-classes/:?]
{noformat}
If the test cache ({{opt-cache}} or {{pes-cache}}) is not rebalanced, the partially committed transaction is not cleaned up either.
> Node can apply new topology after sending status response
> ---------------------------------------------------------
>
> Key: ISPN-5883
> URL: https://issues.jboss.org/browse/ISPN-5883
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Test Suite - Core
> Affects Versions: 8.0.1.Final, 7.2.5.Final, 8.1.0.Alpha2
> Reporter: Dan Berindei
> Priority: Critical
> Labels: testsuite_stability
> Fix For: 8.1.0.Beta2
>
>
> {{LocalTopologyManagerImpl}} is responsible for sending the {{ClusterTopologyControlCommand(GET_STATUS)}} response, and when it sends the response it doesn't check the current view id against the new coordinator's view id. If the old coordinator already sent a topology update before the merge, that topology update might be processed after sending the status response. The new coordinator will send a topology update with a topology id of {{max(status response topology ids) + 1}}. The node will then process the topology update from the old coordinator, but it will ignore the topology update from the new coordinator with the same topology id.
> This is extra common in the partition handling tests, e.g. {{BasePessimisticTxPartitionAndMergeTest}} subclasses, because the test "injects" the JGroups view on each node serially, and often the 4th node sends the status response before it gets the new view.
> {noformat}
> 22:16:37,776 DEBUG (remote-thread-NodeD-p26-t6:[]) [LocalTopologyManagerImpl] Sending cluster status response for view 10
> // Topology from NodeC
> 22:16:37,778 DEBUG (transport-thread-NodeD-p28-t2:[]) [LocalTopologyManagerImpl] Updating local topology for cache pes-cache: CacheTopology{id=8, rebalanceId=3, currentCH=DefaultConsistentHash{ns=60, owners = (4)[NodeA-37631: 15+15, NodeB-47846: 15+15, NodeC-46467: 15+15, NodeD-30486: 15+15]}, pendingCH=null, unionCH=null, actualMembers=[NodeC-46467, NodeD-30486]}
> // Later, topology from NodeA
> 22:16:37,827 DEBUG (transport-thread-NodeD-p28-t1:[]) [LocalTopologyManagerImpl] Ignoring late consistent hash update for cache pes-cache, current topology is 8: CacheTopology{id=8, rebalanceId=3, currentCH=DefaultConsistentHash{ns=60, owners = (4)[NodeA-37631: 15+15, NodeB-47846: 15+15, NodeC-46467: 15+15, NodeD-30486: 15+15]}, pendingCH=null, unionCH=null, actualMembers=[NodeA-37631, NodeB-47846, NodeC-46467, NodeD-30486]}
> {noformat}
> As a solution, we can delay sending the status response until we have the same view as the coordinator (or a later one). We already check that the sender is the current coordinator before applying a topology update, so this will guarantee that the we don't apply other topology updates from the old coordinator. Since the status request is only sent after the new view was installed, this will not introduce any delays in the vast majority of cases.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 5 months
[JBoss JIRA] (ISPN-5954) Adjust Embedded and Remote Uber Jar packaging
by Sebastian Łaskawiec (JIRA)
[ https://issues.jboss.org/browse/ISPN-5954?page=com.atlassian.jira.plugin.... ]
Sebastian Łaskawiec edited comment on ISPN-5954 at 11/12/15 1:45 PM:
---------------------------------------------------------------------
And another one: java.lang.NoClassDefFoundError: com/sun/jdi/request/EventRequest
{code}
19:38:48,515 INFO [org.jboss.weld.ClassLoading] (MSC service thread 1-8) WELD-000119 Not generating any bean definitions from protostream.javassist.util.HotSwapper because of underlying class loading error
19:38:48,516 DEBUG [org.jboss.weld.ClassLoading] (MSC service thread 1-8) catching: org.jboss.weld.resources.spi.ResourceLoadingException: Error while loading class protostream.javassist.util.HotSwapper
at org.jboss.weld.resources.ClassTransformer.getWeldClass(ClassTransformer.java:199) [weld-core-1.1.31.Final-redhat-1.jar:1.1.31.Final-redhat-1]
at org.jboss.weld.resources.ClassTransformer.loadClass(ClassTransformer.java:151) [weld-core-1.1.31.Final-redhat-1.jar:1.1.31.Final-redhat-1]
at org.jboss.weld.bootstrap.BeanDeployer.loadWeldClass(BeanDeployer.java:118) [weld-core-1.1.31.Final-redhat-1.jar:1.1.31.Final-redhat-1]
at org.jboss.weld.bootstrap.BeanDeployer.addClass(BeanDeployer.java:81) [weld-core-1.1.31.Final-redhat-1.jar:1.1.31.Final-redhat-1]
at org.jboss.weld.bootstrap.BeanDeployer.addClasses(BeanDeployer.java:137) [weld-core-1.1.31.Final-redhat-1.jar:1.1.31.Final-redhat-1]
at org.jboss.weld.bootstrap.BeanDeployment.createBeans(BeanDeployment.java:184) [weld-core-1.1.31.Final-redhat-1.jar:1.1.31.Final-redhat-1]
at org.jboss.weld.bootstrap.WeldBootstrap.deployBeans(WeldBootstrap.java:349) [weld-core-1.1.31.Final-redhat-1.jar:1.1.31.Final-redhat-1]
at org.jboss.as.weld.WeldStartService.start(WeldStartService.java:63) [jboss-as-weld-7.5.4.Final-redhat-4.jar:7.5.4.Final-redhat-4]
at org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1980)
at org.jboss.msc.service.ServiceControllerImpl$StartTask.run(ServiceControllerImpl.java:1913)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_40]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_40]
at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_40]
Caused by: java.lang.NoClassDefFoundError: com/sun/jdi/request/EventRequest
at java.lang.Class.getDeclaredFields0(Native Method) [rt.jar:1.8.0_40]
at java.lang.Class.privateGetDeclaredFields(Class.java:2583) [rt.jar:1.8.0_40]
at java.lang.Class.getDeclaredFields(Class.java:1916) [rt.jar:1.8.0_40]
at org.jboss.weld.util.reflection.SecureReflections$4.work(SecureReflections.java:105) [weld-core-1.1.31.Final-redhat-1.jar:1.1.31.Final-redhat-1]
at org.jboss.weld.util.reflection.SecureReflections$4.work(SecureReflections.java:102) [weld-core-1.1.31.Final-redhat-1.jar:1.1.31.Final-redhat-1]
at org.jboss.weld.util.reflection.SecureReflectionAccess.run(SecureReflectionAccess.java:52) [weld-core-1.1.31.Final-redhat-1.jar:1.1.31.Final-redhat-1]
at org.jboss.weld.util.reflection.SecureReflectionAccess.runAndWrap(SecureReflectionAccess.java:63) [weld-core-1.1.31.Final-redhat-1.jar:1.1.31.Final-redhat-1]
at org.jboss.weld.util.reflection.SecureReflections.getDeclaredFields(SecureReflections.java:102) [weld-core-1.1.31.Final-redhat-1.jar:1.1.31.Final-redhat-1]
at org.jboss.weld.introspector.jlr.WeldClassImpl.<init>(WeldClassImpl.java:160) [weld-core-1.1.31.Final-redhat-1.jar:1.1.31.Final-redhat-1]
at org.jboss.weld.introspector.jlr.WeldClassImpl.of(WeldClassImpl.java:126) [weld-core-1.1.31.Final-redhat-1.jar:1.1.31.Final-redhat-1]
at org.jboss.weld.resources.ClassTransformer$TransformTypeToWeldClass.load(ClassTransformer.java:61) [weld-core-1.1.31.Final-redhat-1.jar:1.1.31.Final-redhat-1]
at org.jboss.weld.resources.ClassTransformer$TransformTypeToWeldClass.load(ClassTransformer.java:52) [weld-core-1.1.31.Final-redhat-1.jar:1.1.31.Final-redhat-1]
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3589)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2374)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2337)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2252)
at com.google.common.cache.LocalCache.get(LocalCache.java:3990)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3994)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4878)
at org.jboss.weld.util.cache.LoadingCacheUtils.getCacheValue(LoadingCacheUtils.java:49) [weld-core-1.1.31.Final-redhat-1.jar:1.1.31.Final-redhat-1]
at org.jboss.weld.util.cache.LoadingCacheUtils.getCastCacheValue(LoadingCacheUtils.java:73) [weld-core-1.1.31.Final-redhat-1.jar:1.1.31.Final-redhat-1]
at org.jboss.weld.resources.ClassTransformer.getWeldClass(ClassTransformer.java:188) [weld-core-1.1.31.Final-redhat-1.jar:1.1.31.Final-redhat-1]
... 12 more
Caused by: java.lang.ClassNotFoundException: com.sun.jdi.request.EventRequest from [Module "deployment.CustomCacheStore-1.0.war:main" from Service Module Loader]
at org.jboss.modules.ModuleClassLoader.findClass(ModuleClassLoader.java:213) [infinispan-embedded-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
at org.jboss.modules.ConcurrentClassLoader.performLoadClassUnchecked(ConcurrentClassLoader.java:459) [infinispan-embedded-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
at org.jboss.modules.ConcurrentClassLoader.performLoadClassChecked(ConcurrentClassLoader.java:408) [infinispan-embedded-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
at org.jboss.modules.ConcurrentClassLoader.performLoadClass(ConcurrentClassLoader.java:389) [infinispan-embedded-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
at org.jboss.modules.ConcurrentClassLoader.loadClass(ConcurrentClassLoader.java:134) [infinispan-embedded-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
... 34 more
{code}
was (Author: sebastian.laskawiec):
And another one: java.lang.NoClassDefFoundError: com/sun/jdi/request/EventRequest
> Adjust Embedded and Remote Uber Jar packaging
> ---------------------------------------------
>
> Key: ISPN-5954
> URL: https://issues.jboss.org/browse/ISPN-5954
> Project: Infinispan
> Issue Type: Bug
> Reporter: Sebastian Łaskawiec
> Assignee: Sebastian Łaskawiec
>
> During the tests it turned out that both Remote and Embedded Uber Jars contains some classes which can not be loaded by JVM (because their dependencies are not there). This is not critical, but we should get rid of them
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 5 months
[JBoss JIRA] (ISPN-5954) Adjust Embedded and Remote Uber Jar packaging
by Sebastian Łaskawiec (JIRA)
[ https://issues.jboss.org/browse/ISPN-5954?page=com.atlassian.jira.plugin.... ]
Sebastian Łaskawiec commented on ISPN-5954:
-------------------------------------------
Currently there is one issue which can not be fixed:
* org/apache/logging/log4j/util/Activator
** because it requires org/osgi/framework/BundleActivator
** But we probably won't be able to fix it since this activator is required in OSGi use cases
> Adjust Embedded and Remote Uber Jar packaging
> ---------------------------------------------
>
> Key: ISPN-5954
> URL: https://issues.jboss.org/browse/ISPN-5954
> Project: Infinispan
> Issue Type: Bug
> Reporter: Sebastian Łaskawiec
> Assignee: Sebastian Łaskawiec
>
> During the tests it turned out that both Remote and Embedded Uber Jars contains some classes which can not be loaded by JVM (because their dependencies are not there). This is not critical, but we should get rid of them
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 5 months
[JBoss JIRA] (ISPN-5957) Simple cache fails to start when statistics is enabled
by Paul Ferraro (JIRA)
Paul Ferraro created ISPN-5957:
----------------------------------
Summary: Simple cache fails to start when statistics is enabled
Key: ISPN-5957
URL: https://issues.jboss.org/browse/ISPN-5957
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 8.0.1.Final
Reporter: Paul Ferraro
Priority: Critical
Fix For: 8.0.2.Final
Failure also happens with statistics are disabled, but still available.
{noformat}
31m13:29:08,909 ERROR [org.jboss.msc.service.fail] (ServerService Thread Pool -- 25) MSC000001: Failed to start service jboss.infinispan.hibernate.local-query: org.jboss.msc.service.StartException in service jboss.infinispan.hibernate.local-query: org.infinispan.commons.CacheException: Unable to construct a ComponentRegistry!
at org.wildfly.clustering.service.AsynchronousServiceBuilder$1.run(AsynchronousServiceBuilder.java:107)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
at org.jboss.threads.JBossThread.run(JBossThread.java:320)
Caused by: org.infinispan.commons.CacheException: Unable to construct a ComponentRegistry!
at org.infinispan.factories.ComponentRegistry.<init>(ComponentRegistry.java:91)
at org.infinispan.factories.InternalCacheFactory$1.<init>(InternalCacheFactory.java:89)
at org.infinispan.factories.InternalCacheFactory.createSimpleCache(InternalCacheFactory.java:89)
at org.infinispan.factories.InternalCacheFactory.createCache(InternalCacheFactory.java:54)
at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:621)
at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:580)
at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:445)
at org.jboss.as.clustering.infinispan.DefaultCacheContainer.getCache(DefaultCacheContainer.java:117)
at org.jboss.as.clustering.infinispan.DefaultCacheContainer.getCache(DefaultCacheContainer.java:112)
at org.wildfly.clustering.infinispan.spi.service.CacheBuilder.start(CacheBuilder.java:80)
at org.wildfly.clustering.service.AsynchronousServiceBuilder$1.run(AsynchronousServiceBuilder.java:102)
... 4 more
Caused by: org.infinispan.commons.CacheConfigurationException: Cannot auto-instantiate factory class org.infinispan.stats.impl.StatsCollector$Factory as it doesn't implement AutoInstantiableFactory! Debug stack: null
at org.infinispan.factories.AbstractComponentRegistry.instantiateFactory(AbstractComponentRegistry.java:350)
at org.infinispan.factories.AbstractComponentRegistry.createComponentFactoryInternal(AbstractComponentRegistry.java:321)
at org.infinispan.factories.ComponentRegistry.createComponentFactoryInternal(ComponentRegistry.java:198)
at org.infinispan.factories.AbstractComponentRegistry.getFactory(AbstractComponentRegistry.java:304)
at org.infinispan.factories.ComponentRegistry.getFactory(ComponentRegistry.java:179)
at org.infinispan.factories.AbstractComponentRegistry.getOrCreateComponent(AbstractComponentRegistry.java:270)
at org.infinispan.factories.ComponentRegistry.getOrCreateComponent(ComponentRegistry.java:152)
at org.infinispan.factories.AbstractComponentRegistry.invokeInjectionMethod(AbstractComponentRegistry.java:227)
at org.infinispan.factories.AbstractComponentRegistry.access$000(AbstractComponentRegistry.java:65)
at org.infinispan.factories.AbstractComponentRegistry$Component.injectDependencies(AbstractComponentRegistry.java:797)
at org.infinispan.factories.AbstractComponentRegistry.registerComponentInternal(AbstractComponentRegistry.java:201)
at org.infinispan.factories.ComponentRegistry.registerComponentInternal(ComponentRegistry.java:189)
at org.infinispan.factories.AbstractComponentRegistry.registerComponent(AbstractComponentRegistry.java:156)
at org.infinispan.factories.AbstractComponentRegistry.getOrCreateComponent(AbstractComponentRegistry.java:277)
at org.infinispan.factories.ComponentRegistry.getOrCreateComponent(ComponentRegistry.java:152)
at org.infinispan.factories.AbstractComponentRegistry.invokeInjectionMethod(AbstractComponentRegistry.java:227)
at org.infinispan.factories.AbstractComponentRegistry.access$000(AbstractComponentRegistry.java:65)
at org.infinispan.factories.AbstractComponentRegistry$Component.injectDependencies(AbstractComponentRegistry.java:797)
at org.infinispan.factories.AbstractComponentRegistry.registerComponentInternal(AbstractComponentRegistry.java:201)
at org.infinispan.factories.ComponentRegistry.registerComponentInternal(ComponentRegistry.java:189)
at org.infinispan.factories.AbstractComponentRegistry.registerComponent(AbstractComponentRegistry.java:156)
at org.infinispan.factories.AbstractComponentRegistry.getOrCreateComponent(AbstractComponentRegistry.java:277)
at org.infinispan.factories.ComponentRegistry.getOrCreateComponent(ComponentRegistry.java:152)
at org.infinispan.factories.AbstractComponentRegistry.invokeInjectionMethod(AbstractComponentRegistry.java:227)
at org.infinispan.factories.AbstractComponentRegistry.access$000(AbstractComponentRegistry.java:65)
at org.infinispan.factories.AbstractComponentRegistry$Component.injectDependencies(AbstractComponentRegistry.java:797)
at org.infinispan.factories.AbstractComponentRegistry.registerComponentInternal(AbstractComponentRegistry.java:201)
at org.infinispan.factories.ComponentRegistry.registerComponentInternal(ComponentRegistry.java:189)
at org.infinispan.factories.AbstractComponentRegistry.registerNonVolatileComponent(AbstractComponentRegistry.java:164)
at org.infinispan.factories.ComponentRegistry.<init>(ComponentRegistry.java:85)
... 14 more
{noformat}
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 5 months
[JBoss JIRA] (ISPN-5956) OptimisticTxPartitionAndMergeDuringRollbackTest.testDegradedPartition random failures
by Dan Berindei (JIRA)
Dan Berindei created ISPN-5956:
----------------------------------
Summary: OptimisticTxPartitionAndMergeDuringRollbackTest.testDegradedPartition random failures
Key: ISPN-5956
URL: https://issues.jboss.org/browse/ISPN-5956
Project: Infinispan
Issue Type: Bug
Components: Test Suite - Core
Affects Versions: 8.1.0.Beta1
Reporter: Dan Berindei
Assignee: Dan Berindei
Priority: Blocker
Fix For: 8.1.0.Beta2
Unlike ISPN-5883, this is an issue with the test itself. The {{BasePartitionHandlingTest.Partition.merge(partition)}} method merges two partitions and waits for the default cache to rebalance with all the members in the merged partition. The test then assumes that all caches have been rebalanced, and fails when that is not true:
{noformat}
22:16:37,902 ERROR (testng-PessimisticTxPartitionAndMergeDuringRollbackTest:[]) [UnitTestTestNGListener] Test testDegradedPartition(org.infinispan.partitionhandling.PessimisticTxPartitionAndMergeDuringRollbackTest) failed.
org.infinispan.partitionhandling.AvailabilityException: ISPN000306: Key 'MagicKey#k1{bbea31da@NodeB-47846/40}' is not available. Not all owners are in this partition
at org.infinispan.partitionhandling.impl.PartitionHandlingManagerImpl.doCheck(PartitionHandlingManagerImpl.java:250) ~[classes/:?]
at org.infinispan.partitionhandling.impl.PartitionHandlingManagerImpl.checkRead(PartitionHandlingManagerImpl.java:98) ~[classes/:?]
at org.infinispan.partitionhandling.impl.PartitionHandlingInterceptor.postOperationPartitionCheck(PartitionHandlingInterceptor.java:184) ~[classes/:?]
at org.infinispan.partitionhandling.impl.PartitionHandlingInterceptor.visitGetKeyValueCommand(PartitionHandlingInterceptor.java:131) ~[classes/:?]
...
at org.infinispan.commands.read.GetKeyValueCommand.acceptVisitor(GetKeyValueCommand.java:40) ~[classes/:?]
at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:336) ~[classes/:?]
at org.infinispan.cache.impl.CacheImpl.get(CacheImpl.java:411) ~[classes/:?]
at org.infinispan.cache.impl.CacheImpl.get(CacheImpl.java:403) ~[classes/:?]
at org.infinispan.partitionhandling.BaseTxPartitionAndMergeTest.assertValue(BaseTxPartitionAndMergeTest.java:114) ~[test-classes/:?]
at org.infinispan.partitionhandling.BaseTxPartitionAndMergeTest.finalAsserts(BaseTxPartitionAndMergeTest.java:94) ~[test-classes/:?]
at org.infinispan.partitionhandling.BasePessimisticTxPartitionAndMergeTest.doTest(BasePessimisticTxPartitionAndMergeTest.java:81) ~[test-classes/:?]
at org.infinispan.partitionhandling.PessimisticTxPartitionAndMergeDuringRollbackTest.testDegradedPartition(PessimisticTxPartitionAndMergeDuringRollbackTest.java:29) ~[test-classes/:?]
{noformat}
Example in CI:
http://ci.infinispan.org/viewLog.html?buildId=32028&tab=buildResultsDiv&b...
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 5 months
[JBoss JIRA] (ISPN-5956) OptimisticTxPartitionAndMergeDuringRollbackTest.testDegradedPartition random failures
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-5956?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-5956:
-------------------------------
Status: Open (was: New)
> OptimisticTxPartitionAndMergeDuringRollbackTest.testDegradedPartition random failures
> -------------------------------------------------------------------------------------
>
> Key: ISPN-5956
> URL: https://issues.jboss.org/browse/ISPN-5956
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 8.1.0.Beta1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Blocker
> Labels: testsuite_stability
> Fix For: 8.1.0.Beta2
>
>
> Unlike ISPN-5883, this is an issue with the test itself. The {{BasePartitionHandlingTest.Partition.merge(partition)}} method merges two partitions and waits for the default cache to rebalance with all the members in the merged partition. The test then assumes that all caches have been rebalanced, and fails when that is not true:
> {noformat}
> 22:16:37,902 ERROR (testng-PessimisticTxPartitionAndMergeDuringRollbackTest:[]) [UnitTestTestNGListener] Test testDegradedPartition(org.infinispan.partitionhandling.PessimisticTxPartitionAndMergeDuringRollbackTest) failed.
> org.infinispan.partitionhandling.AvailabilityException: ISPN000306: Key 'MagicKey#k1{bbea31da@NodeB-47846/40}' is not available. Not all owners are in this partition
> at org.infinispan.partitionhandling.impl.PartitionHandlingManagerImpl.doCheck(PartitionHandlingManagerImpl.java:250) ~[classes/:?]
> at org.infinispan.partitionhandling.impl.PartitionHandlingManagerImpl.checkRead(PartitionHandlingManagerImpl.java:98) ~[classes/:?]
> at org.infinispan.partitionhandling.impl.PartitionHandlingInterceptor.postOperationPartitionCheck(PartitionHandlingInterceptor.java:184) ~[classes/:?]
> at org.infinispan.partitionhandling.impl.PartitionHandlingInterceptor.visitGetKeyValueCommand(PartitionHandlingInterceptor.java:131) ~[classes/:?]
> ...
> at org.infinispan.commands.read.GetKeyValueCommand.acceptVisitor(GetKeyValueCommand.java:40) ~[classes/:?]
> at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:336) ~[classes/:?]
> at org.infinispan.cache.impl.CacheImpl.get(CacheImpl.java:411) ~[classes/:?]
> at org.infinispan.cache.impl.CacheImpl.get(CacheImpl.java:403) ~[classes/:?]
> at org.infinispan.partitionhandling.BaseTxPartitionAndMergeTest.assertValue(BaseTxPartitionAndMergeTest.java:114) ~[test-classes/:?]
> at org.infinispan.partitionhandling.BaseTxPartitionAndMergeTest.finalAsserts(BaseTxPartitionAndMergeTest.java:94) ~[test-classes/:?]
> at org.infinispan.partitionhandling.BasePessimisticTxPartitionAndMergeTest.doTest(BasePessimisticTxPartitionAndMergeTest.java:81) ~[test-classes/:?]
> at org.infinispan.partitionhandling.PessimisticTxPartitionAndMergeDuringRollbackTest.testDegradedPartition(PessimisticTxPartitionAndMergeDuringRollbackTest.java:29) ~[test-classes/:?]
> {noformat}
> Example in CI:
> http://ci.infinispan.org/viewLog.html?buildId=32028&tab=buildResultsDiv&b...
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 5 months
[JBoss JIRA] (ISPN-5883) Node can apply new topology after sending status response
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-5883?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-5883:
------------------------------------
Issue causes failures like this in {{PessimisticTxPartitionAndMergeDuringRuntimeTest}} and {{OptimisticTxPartitionAndMergeDuringRollbackTest}}:
{noformat}
13:12:08,513 ERROR (testng-PessimisticTxPartitionAndMergeDuringRuntimeTest:) [UnitTestTestNGListener] Test testOriginatorIsolatedPartitionWithDiscard(org.infinispan.partitionhandling.PessimisticTxPartitionAndMergeDuringRuntimeTest) failed.
java.lang.RuntimeException: Timed out waiting for rebalancing to complete on node NodeO-20310, expected member list is [NodeM-34309, NodeN-21012, NodeO-20310, NodeP-21283], current member list is [NodeN-21012, NodeO-20310, NodeP-21283]!
at org.infinispan.test.TestingUtil.waitForRehashToComplete(TestingUtil.java:239) ~[test-classes/:?]
at org.infinispan.test.TestingUtil.waitForRehashToComplete(TestingUtil.java:249) ~[test-classes/:?]
at org.infinispan.partitionhandling.BasePartitionHandlingTest$Partition.waitForPartitionToForm(BasePartitionHandlingTest.java:229) ~[test-classes/:?]
at org.infinispan.partitionhandling.BasePartitionHandlingTest$Partition.merge(BasePartitionHandlingTest.java:207) ~[test-classes/:?]
at org.infinispan.partitionhandling.BaseTxPartitionAndMergeTest.mergeCluster(BaseTxPartitionAndMergeTest.java:85) ~[test-classes/:?]
at org.infinispan.partitionhandling.BasePessimisticTxPartitionAndMergeTest.doTest(BasePessimisticTxPartitionAndMergeTest.java:80) ~[test-classes/:?]
at org.infinispan.partitionhandling.PessimisticTxPartitionAndMergeDuringRuntimeTest.testOriginatorIsolatedPartitionWithDiscard(PessimisticTxPartitionAndMergeDuringRuntimeTest.java:29) ~[test-classes/:?]
{noformat}
> Node can apply new topology after sending status response
> ---------------------------------------------------------
>
> Key: ISPN-5883
> URL: https://issues.jboss.org/browse/ISPN-5883
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Test Suite - Core
> Affects Versions: 8.0.1.Final, 7.2.5.Final, 8.1.0.Alpha2
> Reporter: Dan Berindei
> Priority: Critical
> Labels: testsuite_stability
> Fix For: 8.1.0.Beta2
>
>
> {{LocalTopologyManagerImpl}} is responsible for sending the {{ClusterTopologyControlCommand(GET_STATUS)}} response, and when it sends the response it doesn't check the current view id against the new coordinator's view id. If the old coordinator already sent a topology update before the merge, that topology update might be processed after sending the status response. The new coordinator will send a topology update with a topology id of {{max(status response topology ids) + 1}}. The node will then process the topology update from the old coordinator, but it will ignore the topology update from the new coordinator with the same topology id.
> This is extra common in the partition handling tests, e.g. {{BasePessimisticTxPartitionAndMergeTest}} subclasses, because the test "injects" the JGroups view on each node serially, and often the 4th node sends the status response before it gets the new view.
> {noformat}
> 22:16:37,776 DEBUG (remote-thread-NodeD-p26-t6:[]) [LocalTopologyManagerImpl] Sending cluster status response for view 10
> // Topology from NodeC
> 22:16:37,778 DEBUG (transport-thread-NodeD-p28-t2:[]) [LocalTopologyManagerImpl] Updating local topology for cache pes-cache: CacheTopology{id=8, rebalanceId=3, currentCH=DefaultConsistentHash{ns=60, owners = (4)[NodeA-37631: 15+15, NodeB-47846: 15+15, NodeC-46467: 15+15, NodeD-30486: 15+15]}, pendingCH=null, unionCH=null, actualMembers=[NodeC-46467, NodeD-30486]}
> // Later, topology from NodeA
> 22:16:37,827 DEBUG (transport-thread-NodeD-p28-t1:[]) [LocalTopologyManagerImpl] Ignoring late consistent hash update for cache pes-cache, current topology is 8: CacheTopology{id=8, rebalanceId=3, currentCH=DefaultConsistentHash{ns=60, owners = (4)[NodeA-37631: 15+15, NodeB-47846: 15+15, NodeC-46467: 15+15, NodeD-30486: 15+15]}, pendingCH=null, unionCH=null, actualMembers=[NodeA-37631, NodeB-47846, NodeC-46467, NodeD-30486]}
> {noformat}
> As a solution, we can delay sending the status response until we have the same view as the coordinator (or a later one). We already check that the sender is the current coordinator before applying a topology update, so this will guarantee that the we don't apply other topology updates from the old coordinator. Since the status request is only sent after the new view was installed, this will not introduce any delays in the vast majority of cases.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 5 months