[JBoss JIRA] (ISPN-9087) Timeout during put operation when a node is blocked
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-9087?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-9087:
----------------------------------
Sprint: Sprint 9.3.0.Beta1, Sprint 9.3.0.CR1 (was: Sprint 9.3.0.Beta1)
> Timeout during put operation when a node is blocked
> ---------------------------------------------------
>
> Key: ISPN-9087
> URL: https://issues.jboss.org/browse/ISPN-9087
> Project: Infinispan
> Issue Type: Bug
> Reporter: Diego Lovison
> Assignee: Dan Berindei
>
> {noformat}
> 2018-04-17 13:30:02.782 ERROR 14932 --- [timeout-thread--p3-t1] o.i.i.impl.InvocationContextInterceptor : ISPN000136: Error executing command PutKeyValueCommand, writing keys [5db796a3-3f65-468a-b86a-6d5ef8b4b330]
>
> org.infinispan.util.concurrent.TimeoutException: ISPN000427: Timeout after 15 seconds waiting for acks. Id=100000
> at org.infinispan.util.concurrent.CommandAckCollector.createTimeoutException(CommandAckCollector.java:188) ~[infinispan-embedded-8.5.0.Final-redhat-6.jar:8.5.0.Final-redhat-6]
> at org.infinispan.util.concurrent.CommandAckCollector.access$300(CommandAckCollector.java:51) ~[infinispan-embedded-8.5.0.Final-redhat-6.jar:8.5.0.Final-redhat-6]
> at org.infinispan.util.concurrent.CommandAckCollector$BaseCollector.call(CommandAckCollector.java:214) [infinispan-embedded-8.5.0.Final-redhat-6.jar:8.5.0.Final-redhat-6]
> at org.infinispan.util.concurrent.CommandAckCollector$BaseCollector.call(CommandAckCollector.java:191) [infinispan-embedded-8.5.0.Final-redhat-6.jar:8.5.0.Final-redhat-6]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_161]
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_161]
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_161]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_161]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_161]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> {noformat}
> After some investigation together with [~dan.berindei], we found that the FD_ALL is just too slow.
> {noformat}
> <FD_ALL timeout="60000"
> interval="15000"
> timeout_check_interval="5000"
> />
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 4 months
[JBoss JIRA] (ISPN-7977) Clean up NullPointerExceptions in the core test logs
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-7977?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-7977:
----------------------------------
Sprint: Sprint 9.3.0.Beta1, Sprint 9.3.0.CR1 (was: Sprint 9.3.0.Beta1)
> Clean up NullPointerExceptions in the core test logs
> ----------------------------------------------------
>
> Key: ISPN-7977
> URL: https://issues.jboss.org/browse/ISPN-7977
> Project: Infinispan
> Issue Type: Task
> Components: Core, Test Suite - Core
> Affects Versions: 9.1.0.Beta1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Attachments: ClusterTopologyManagerImpl_NullPointerException.txt
>
>
> Even benign {{NullPointerException}} s are a bad smell, so tests should never trigger them. But they seem pretty common in the test logs nowadays. Some examples from the core test suite:
> {noformat}
> 23:30:26,503 ERROR (stateTransferExecutor-thread-SingleJoinTest[DIST_SYNC, tx=true]-NodeC-p583-t3:[StateRequest-dist]) [RpcManagerImpl] ISPN000073: Unexpected error while replicating
> java.lang.NullPointerException: null
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotelyAsync(JGroupsTransport.java:650) ~[classes/:?]
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotelyAsync(RpcManagerImpl.java:127) ~[classes/:?]
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:168) ~[classes/:?]
> at org.infinispan.statetransfer.InboundTransferTask.requestSegments(InboundTransferTask.java:132) ~[classes/:?]
> at org.infinispan.statetransfer.StateConsumerImpl.lambda$addTransfer$7(StateConsumerImpl.java:1055) ~[classes/:?]
> at org.infinispan.executors.LimitedExecutor.lambda$executeAsync$1(LimitedExecutor.java:101) ~[classes/:?]
> at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:144) [classes/:?]
> at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:33) [classes/:?]
> at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:174) [classes/:?]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_121]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_121]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
> {noformat}
> {noformat}
> 23:37:52,617 WARN (testng-AsynchronousInvocationTest:[]) [TrianglePerCacheInboundInvocationHandler] ISPN000071: Caught exception when handling command StreamRequestCommand{cacheName='___defaultcache'}
> java.lang.NullPointerException: null
> at org.infinispan.stream.impl.StreamRequestCommand.invokeAsync(StreamRequestCommand.java:89) ~[classes/:?]
> at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokeCommand(BasePerCacheInboundInvocationHandler.java:101) ~[classes/:?]
> at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.runSync(BaseBlockingRunnable.java:54) [classes/:?]
> at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.run(BaseBlockingRunnable.java:38) [classes/:?]
> at org.infinispan.remoting.inboundhandler.TrianglePerCacheInboundInvocationHandler.handleDefaultCommand(TrianglePerCacheInboundInvocationHandler.java:183) [classes/:?]
> at org.infinispan.remoting.inboundhandler.TrianglePerCacheInboundInvocationHandler.handle(TrianglePerCacheInboundInvocationHandler.java:112) [classes/:?]
> at org.infinispan.remoting.inboundhandler.GlobalInboundInvocationHandler.handleCacheRpcCommand(GlobalInboundInvocationHandler.java:120) [classes/:?]
> at org.infinispan.remoting.inboundhandler.GlobalInboundInvocationHandler.handleFromCluster(GlobalInboundInvocationHandler.java:79) [classes/:?]
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:175) [classes/:?]
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:149) [classes/:?]
> at org.infinispan.remoting.AsynchronousInvocationTest.assertDispatchForCommand(AsynchronousInvocationTest.java:176) [test-classes/:?]
> at org.infinispan.remoting.AsynchronousInvocationTest.testCacheRpcCommands(AsynchronousInvocationTest.java:140) [test-classes/:?]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_121]
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_121]
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_121]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_121]
> at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:84) [testng-6.8.8.jar:?]
> at org.testng.internal.Invoker.invokeMethod(Invoker.java:714) [testng-6.8.8.jar:?]
> at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:901) [testng-6.8.8.jar:?]
> at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1231) [testng-6.8.8.jar:?]
> at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127) [testng-6.8.8.jar:?]
> at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111) [testng-6.8.8.jar:?]
> at org.testng.TestRunner.privateRun(TestRunner.java:767) [testng-6.8.8.jar:?]
> at org.testng.TestRunner.run(TestRunner.java:617) [testng-6.8.8.jar:?]
> at org.testng.SuiteRunner.runTest(SuiteRunner.java:348) [testng-6.8.8.jar:?]
> at org.testng.SuiteRunner.access$000(SuiteRunner.java:38) [testng-6.8.8.jar:?]
> at org.testng.SuiteRunner$SuiteWorker.run(SuiteRunner.java:382) [testng-6.8.8.jar:?]
> at org.testng.internal.thread.ThreadUtil$2.call(ThreadUtil.java:64) [testng-6.8.8.jar:?]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_121]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_121]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_121]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
> {noformat}
> {noformat}
> 23:37:54,821 TRACE (ForkThread-2,ThreadLocalLeakTest:[]) [InvocationContextInterceptor] Exception while executing code, but we're shutting down so failing silently.
> org.infinispan.persistence.spi.PersistenceException: java.lang.NullPointerException
> at org.infinispan.persistence.file.SingleFileStore.write(SingleFileStore.java:365) ~[classes/:?]
> at org.infinispan.persistence.manager.PersistenceManagerImpl.lambda$writeToAllNonTxStores$9(PersistenceManagerImpl.java:518) ~[classes/:?]
> at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) ~[?:1.8.0_121]
> at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) ~[?:1.8.0_121]
> at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) ~[?:1.8.0_121]
> at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) ~[?:1.8.0_121]
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) ~[?:1.8.0_121]
> at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) ~[?:1.8.0_121]
> at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) ~[?:1.8.0_121]
> at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) ~[?:1.8.0_121]
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_121]
> at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) ~[?:1.8.0_121]
> at org.infinispan.persistence.manager.PersistenceManagerImpl.writeToAllNonTxStores(PersistenceManagerImpl.java:518) ~[classes/:?]
> at org.infinispan.interceptors.impl.CacheWriterInterceptor.storeEntry(CacheWriterInterceptor.java:452) ~[classes/:?]
> at org.infinispan.interceptors.impl.CacheWriterInterceptor.lambda$visitPutKeyValueCommand$1(CacheWriterInterceptor.java:187) ~[classes/:?]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextThenAccept(BaseAsyncInterceptor.java:109) ~[classes/:?]
> at org.infinispan.interceptors.impl.CacheWriterInterceptor.visitPutKeyValueCommand(CacheWriterInterceptor.java:179) ~[classes/:?]
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:74) ~[classes/:?]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:58) ~[classes/:?]
> at org.infinispan.interceptors.impl.CacheLoaderInterceptor.visitDataCommand(CacheLoaderInterceptor.java:208) ~[classes/:?]
> at org.infinispan.interceptors.impl.CacheLoaderInterceptor.visitPutKeyValueCommand(CacheLoaderInterceptor.java:135) ~[classes/:?]
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:74) ~[classes/:?]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextThenAccept(BaseAsyncInterceptor.java:102) ~[classes/:?]
> at org.infinispan.interceptors.impl.EntryWrappingInterceptor.setSkipRemoteGetsAndInvokeNextForDataCommand(EntryWrappingInterceptor.java:689) ~[classes/:?]
> at org.infinispan.interceptors.impl.EntryWrappingInterceptor.visitPutKeyValueCommand(EntryWrappingInterceptor.java:310) ~[classes/:?]
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:74) ~[classes/:?]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndFinally(BaseAsyncInterceptor.java:154) ~[classes/:?]
> at org.infinispan.interceptors.locking.AbstractLockingInterceptor.visitNonTxDataWriteCommand(AbstractLockingInterceptor.java:136) ~[classes/:?]
> at org.infinispan.interceptors.locking.NonTransactionalLockingInterceptor.visitDataWriteCommand(NonTransactionalLockingInterceptor.java:38) ~[classes/:?]
> at org.infinispan.interceptors.locking.AbstractLockingInterceptor.visitPutKeyValueCommand(AbstractLockingInterceptor.java:86) ~[classes/:?]
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:74) ~[classes/:?]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:58) ~[classes/:?]
> at org.infinispan.interceptors.impl.CacheMgmtInterceptor.updateStoreStatistics(CacheMgmtInterceptor.java:197) ~[classes/:?]
> at org.infinispan.interceptors.impl.CacheMgmtInterceptor.visitPutKeyValueCommand(CacheMgmtInterceptor.java:162) ~[classes/:?]
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:74) ~[classes/:?]
> at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndExceptionally(BaseAsyncInterceptor.java:127) [classes/:?]
> at org.infinispan.interceptors.impl.InvocationContextInterceptor.visitCommand(InvocationContextInterceptor.java:97) [classes/:?]
> at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invoke(AsyncInterceptorChainImpl.java:248) [classes/:?]
> at org.infinispan.cache.impl.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1606) [classes/:?]
> at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1271) [classes/:?]
> at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1720) [classes/:?]
> at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:276) [classes/:?]
> at org.infinispan.cache.impl.AbstractDelegatingCache.put(AbstractDelegatingCache.java:358) [classes/:?]
> at org.infinispan.cache.impl.EncoderCache.put(EncoderCache.java:652) [classes/:?]
> at org.infinispan.util.ThreadLocalLeakTest$2.run(ThreadLocalLeakTest.java:115) [test-classes/:?]
> at org.infinispan.test.AbstractInfinispanTest$RunnableWrapper.run(AbstractInfinispanTest.java:510) [test-classes/:?]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
> Caused by: java.lang.NullPointerException
> at org.infinispan.persistence.file.SingleFileStore.allocate(SingleFileStore.java:232) ~[classes/:?]
> at org.infinispan.persistence.file.SingleFileStore.write(SingleFileStore.java:332) ~[classes/:?]
> ... 46 more
> {noformat}
> {noformat}
> 23:38:21,145 ERROR (transport-thread-OptimisticTxFailureAfterLockingTest-NodeD-p33826-t2:[Topology-___defaultcache]) [LocalTopologyManagerImpl] ISPN000452: Failed to update topology for cache ___defaultcache
> java.lang.NullPointerException: null
> at org.infinispan.topology.PersistentUUIDManagerImpl.mapAddresses(PersistentUUIDManagerImpl.java:70) ~[classes/:?]
> at org.infinispan.topology.LocalTopologyManagerImpl.doHandleTopologyUpdate(LocalTopologyManagerImpl.java:348) ~[classes/:?]
> at org.infinispan.topology.LocalTopologyManagerImpl.lambda$handleTopologyUpdate$1(LocalTopologyManagerImpl.java:266) ~[classes/:?]
> at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:144) [classes/:?]
> at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:33) [classes/:?]
> at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:174) [classes/:?]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_121]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_121]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
> {noformat}
> {noformat}
> 23:39:19,342 ERROR (transport-thread-ConcurrentStartChanelLookupTest-NodeA-p40485-t3:[Merge-1]) [ClusterCacheStatus] ISPN000228: Failed to recover cache repl state after the current node became the coordinator
> java.lang.NullPointerException: null
> at org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy.lambda$static$0(PreferAvailabilityStrategy.java:34) ~[classes/:?]
> at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355) ~[?:1.8.0_121]
> at java.util.TimSort.sort(TimSort.java:220) ~[?:1.8.0_121]
> at java.util.Arrays.sort(Arrays.java:1512) ~[?:1.8.0_121]
> at java.util.stream.SortedOps$SizedRefSortingSink.end(SortedOps.java:348) ~[?:1.8.0_121]
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~[?:1.8.0_121]
> at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) ~[?:1.8.0_121]
> at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[?:1.8.0_121]
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_121]
> at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) ~[?:1.8.0_121]
> at org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy.onPartitionMerge(PreferAvailabilityStrategy.java:122) ~[classes/:?]
> at org.infinispan.topology.ClusterCacheStatus.doMergePartitions(ClusterCacheStatus.java:570) ~[classes/:?]
> at org.infinispan.topology.ClusterTopologyManagerImpl.lambda$recoverClusterStatus$4(ClusterTopologyManagerImpl.java:499) ~[classes/:?]
> at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:144) [classes/:?]
> at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:33) [classes/:?]
> at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:174) [classes/:?]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_121]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_121]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 4 months
[JBoss JIRA] (ISPN-8008) Add Fault-tolerance to write-behind stores
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-8008?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-8008:
----------------------------------
Sprint: Sprint 9.3.0.Beta1, Sprint 9.3.0.CR1 (was: Sprint 9.3.0.Beta1)
> Add Fault-tolerance to write-behind stores
> ------------------------------------------
>
> Key: ISPN-8008
> URL: https://issues.jboss.org/browse/ISPN-8008
> Project: Infinispan
> Issue Type: Enhancement
> Components: Loaders and Stores
> Reporter: Ryan Emerson
> Assignee: Ryan Emerson
> Fix For: 9.3.0.CR1
>
>
> Currently when a store is configured to be write-behind, three attempts are made to write the queued modifications to the store. If all three attempts fail, then this error is simply logged and the modifications are never written to the store.
> We should allow users to configure a write-behind store so that: In the event that a write-behind fails, further operations on the cache are not allowed and the modifications which failed to write to the store are queued indefinitely. Once the underlying store becomes available, the queued modifications should be written and then the cache should become available.
> To determine whether a store has become available again, the AsyncCacheWriter will have to poll the store's availability. Currently the CacheWriter interface does not allow such behaviour, therefore there are two options:
> * Attempt a write and catch an exception if the store is not currently available
> * Add a method to return the current status of a store to the CacheWriter interface
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 4 months
[JBoss JIRA] (ISPN-8731) Write command times out waiting for wrong topology
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-8731?page=com.atlassian.jira.plugin.... ]
Dan Berindei edited comment on ISPN-8731 at 5/14/18 3:08 AM:
-------------------------------------------------------------
The failures happen because the {{DistributionManagerImpl}} topology is updated after the {{StateConsumerImpl}} topology (ISPN-7682).
{{BaseDistributionInterceptor.checkTopologyId()}} doesn't expect the command topology id (5) to be higher than the DM's current topology id, so it throws an {{OutdatedTopologyException}} without a topology id. {{StateTransferInterceptor.handleNonTxWriteCommand()}} then increments the command's topology id and times out waiting for topology 6.
{noformat}
09:47:27,497 TRACE (transport-thread-test-NodeA-p106-t6:[Topology-TestCache]) [StateConsumerImpl] Received new topology for cache TestCache, isRebalance = false, isMember = true, topology = CacheTopology{id=5, phase=NO_REBALANCE, rebalanceId=2, currentCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (2)[test-NodeA-14300 (flags=0), site-id=s1, rack-id=r1, machine-id=m1): 134+122, test-NodeB-59947 (flags=0), site-id=s2, rack-id=r2, machine-id=m2): 122+134]}, pendingCH=null, unionCH=null, actualMembers=[test-NodeA-14300 (flags=0), site-id=s1, rack-id=r1, machine-id=m1), test-NodeB-59947 (flags=0), site-id=s2, rack-id=r2, machine-id=m2)], persistentUUIDs=[1ffbf082-035b-432d-8715-197ab9fbcab2, 7b2ba190-f559-4dc1-b874-5c43c3f4ce29]}
09:47:27,524 TRACE (testng-test:[]) [StateTransferInterceptor] handleNonTxWriteCommand for command PutKeyValueCommand{key=key1, value=value1, flags=[], commandInvocationId=CommandInvocation:test-NodeA-14300 (flags=0), site-id=s1, rack-id=r1, machine-id=m1):36, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true, topologyId=-1}, topology id -1
09:47:27,524 TRACE (testng-test:[]) [StateTransferInterceptor] Setting command topology to 5
09:47:27,524 TRACE (testng-test:[]) [BaseDistributionInterceptor] Current topology 4, command topology 5
09:47:27,526 TRACE (testng-test:[]) [StateTransferInterceptor] Retrying command because of org.infinispan.statetransfer.OutdatedTopologyException: Cache topology changed while the command was executing: expected 5, got 4, current topology is 5 (requested: 6): PutKeyValueCommand{key=key1, value=value1, flags=[], commandInvocationId=CommandInvocation:test-NodeA-14300 (flags=0), site-id=s1, rack-id=r1, machine-id=m1):36, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true, topologyId=5}
09:47:27,526 TRACE (testng-test:[]) [StateTransferLockImpl] Waiting for transaction data for topology 6, current topology is 4
09:47:27,545 TRACE (transport-thread-test-NodeA-p106-t6:[Topology-TestCache]) [StateTransferLockImpl] Signalling topology 5 is installed
09:47:42,549 ERROR (testng-test:[]) [TestSuiteProgress] Test failed: org.infinispan.distexec.DistributedExecutorExecutionPolicyTest.testExecutionPolicyNotSameMachine
org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 6
at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invoke(AsyncInterceptorChainImpl.java:259) ~[classes/:?]
at org.infinispan.cache.impl.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1686) ~[classes/:?]
at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1349) ~[classes/:?]
at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1812) ~[classes/:?]
at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:223) ~[classes/:?]
at org.infinispan.cache.impl.AbstractDelegatingCache.put(AbstractDelegatingCache.java:358) ~[classes/:?]
at org.infinispan.cache.impl.EncoderCache.put(EncoderCache.java:673) ~[classes/:?]
at org.infinispan.distexec.DistributedExecutorExecutionPolicyTest.executeDifferentExecutionPolicies(DistributedExecutorExecutionPolicyTest.java:115) ~[test-classes/:?]
at org.infinispan.distexec.DistributedExecutorExecutionPolicyTest.testExecutionPolicyNotSameMachine(DistributedExecutorExecutionPolicyTest.java:59) ~[test-classes/:?]
{noformat}
was (Author: dan.berindei):
The failures happen because the {{DistributionManagerImpl}} topology is updated after the {{StateConsumerImpl}} topology (ISPN-7682).
{noformat}
09:47:27,497 TRACE (transport-thread-test-NodeA-p106-t6:[Topology-TestCache]) [StateConsumerImpl] Received new topology for cache TestCache, isRebalance = false, isMember = true, topology = CacheTopology{id=5, phase=NO_REBALANCE, rebalanceId=2, currentCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (2)[test-NodeA-14300 (flags=0), site-id=s1, rack-id=r1, machine-id=m1): 134+122, test-NodeB-59947 (flags=0), site-id=s2, rack-id=r2, machine-id=m2): 122+134]}, pendingCH=null, unionCH=null, actualMembers=[test-NodeA-14300 (flags=0), site-id=s1, rack-id=r1, machine-id=m1), test-NodeB-59947 (flags=0), site-id=s2, rack-id=r2, machine-id=m2)], persistentUUIDs=[1ffbf082-035b-432d-8715-197ab9fbcab2, 7b2ba190-f559-4dc1-b874-5c43c3f4ce29]}
09:47:27,524 TRACE (testng-test:[]) [StateTransferInterceptor] handleNonTxWriteCommand for command PutKeyValueCommand{key=key1, value=value1, flags=[], commandInvocationId=CommandInvocation:test-NodeA-14300 (flags=0), site-id=s1, rack-id=r1, machine-id=m1):36, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true, topologyId=-1}, topology id -1
09:47:27,524 TRACE (testng-test:[]) [StateTransferInterceptor] Setting command topology to 5
09:47:27,524 TRACE (testng-test:[]) [BaseDistributionInterceptor] Current topology 4, command topology 5
09:47:27,526 TRACE (testng-test:[]) [StateTransferInterceptor] Retrying command because of org.infinispan.statetransfer.OutdatedTopologyException: Cache topology changed while the command was executing: expected 5, got 4, current topology is 5 (requested: 6): PutKeyValueCommand{key=key1, value=value1, flags=[], commandInvocationId=CommandInvocation:test-NodeA-14300 (flags=0), site-id=s1, rack-id=r1, machine-id=m1):36, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true, topologyId=5}
09:47:27,526 TRACE (testng-test:[]) [StateTransferLockImpl] Waiting for transaction data for topology 6, current topology is 4
09:47:27,545 TRACE (transport-thread-test-NodeA-p106-t6:[Topology-TestCache]) [StateTransferLockImpl] Signalling topology 5 is installed
09:47:42,549 ERROR (testng-test:[]) [TestSuiteProgress] Test failed: org.infinispan.distexec.DistributedExecutorExecutionPolicyTest.testExecutionPolicyNotSameMachine
org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 6
at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invoke(AsyncInterceptorChainImpl.java:259) ~[classes/:?]
at org.infinispan.cache.impl.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1686) ~[classes/:?]
at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1349) ~[classes/:?]
at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1812) ~[classes/:?]
at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:223) ~[classes/:?]
at org.infinispan.cache.impl.AbstractDelegatingCache.put(AbstractDelegatingCache.java:358) ~[classes/:?]
at org.infinispan.cache.impl.EncoderCache.put(EncoderCache.java:673) ~[classes/:?]
at org.infinispan.distexec.DistributedExecutorExecutionPolicyTest.executeDifferentExecutionPolicies(DistributedExecutorExecutionPolicyTest.java:115) ~[test-classes/:?]
at org.infinispan.distexec.DistributedExecutorExecutionPolicyTest.testExecutionPolicyNotSameMachine(DistributedExecutorExecutionPolicyTest.java:59) ~[test-classes/:?]
{noformat}
> Write command times out waiting for wrong topology
> --------------------------------------------------
>
> Key: ISPN-8731
> URL: https://issues.jboss.org/browse/ISPN-8731
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Test Suite - Query, Test Suite - Server
> Affects Versions: 9.2.0.CR1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
> Labels: testsuite_stability
> Fix For: 9.3.0.Final
>
>
> {{MultipleCacheManagersTest.waitForClusterToForm(new String[]{})}} doesn't actually wait for the rebalance to finish everywhere: https://github.com/infinispan/infinispan/pull/5705/files#diff-f9f535214b9...
> Most tests should work regardless of whether we wait for the rebalance to finish or not, it's just a way to reduce the number of test failures when the command retry doesn't work. Waiting for the rebalance to finish is only required when the test needs to install new topologies in a specific order.
> However, some recent test failures in CI show that the effectiveness of {{MultipleCacheManagersTest.waitForClusterToForm(new String[]{})}} does make a difference. Most likely there is a bug in the retry logic during the latter phases of a rebalance, and the retry-specific tests do not cover all the scenarios.
> {noformat}
> 00:23:20,668 DEBUG (remote-thread-ClusteredScriptingTest-NodeI-p218-t5:[]) [ClusterTopologyManagerImpl] Updating cluster-wide current topology for cache ___defaultcache, topology = CacheTopology{id=5, phase=NO_REBALANCE, rebalanceId=2, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[ClusteredScriptingTest-NodeI-40041: 134, ClusteredScriptingTest-NodeJ-9982: 122]}, pendingCH=null, unionCH=null, actualMembers=[ClusteredScriptingTest-NodeI-40041, ClusteredScriptingTest-NodeJ-9982], persistentUUIDs=[91a47fef-fad6-479c-9ec9-12e9350dc6d5, d2200930-5793-4b74-beea-e001a2b414c1]}, availability mode = AVAILABLE
> 00:23:20,669 DEBUG (remote-thread-ClusteredScriptingTest-NodeI-p218-t5:[]) [ClusterTopologyManagerImpl] Updating cluster-wide stable topology for cache ___defaultcache, topology = CacheTopology{id=5, phase=NO_REBALANCE, rebalanceId=2, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[ClusteredScriptingTest-NodeI-40041: 134, ClusteredScriptingTest-NodeJ-9982: 122]}, pendingCH=null, unionCH=null, actualMembers=[ClusteredScriptingTest-NodeI-40041, ClusteredScriptingTest-NodeJ-9982], persistentUUIDs=[91a47fef-fad6-479c-9ec9-12e9350dc6d5, d2200930-5793-4b74-beea-e001a2b414c1]}
> 00:23:20,669 DEBUG (transport-thread-ClusteredScriptingTest-NodeI-p220-t6:[Topology-___defaultcache]) [LocalTopologyManagerImpl] Updating local topology for cache ___defaultcache: CacheTopology{id=5, phase=NO_REBALANCE, rebalanceId=2, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[ClusteredScriptingTest-NodeI-40041: 134, ClusteredScriptingTest-NodeJ-9982: 122]}, pendingCH=null, unionCH=null, actualMembers=[ClusteredScriptingTest-NodeI-40041, ClusteredScriptingTest-NodeJ-9982], persistentUUIDs=[91a47fef-fad6-479c-9ec9-12e9350dc6d5, d2200930-5793-4b74-beea-e001a2b414c1]}
> 00:23:20,669 DEBUG (transport-thread-ClusteredScriptingTest-NodeJ-p234-t6:[Topology-___defaultcache]) [LocalTopologyManagerImpl] Updating local topology for cache ___defaultcache: CacheTopology{id=5, phase=NO_REBALANCE, rebalanceId=2, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[ClusteredScriptingTest-NodeI-40041: 134, ClusteredScriptingTest-NodeJ-9982: 122]}, pendingCH=null, unionCH=null, actualMembers=[ClusteredScriptingTest-NodeI-40041, ClusteredScriptingTest-NodeJ-9982], persistentUUIDs=[91a47fef-fad6-479c-9ec9-12e9350dc6d5, d2200930-5793-4b74-beea-e001a2b414c1]}
> 00:23:35,673 ERROR (timeout-thread-ClusteredScriptingTest-NodeI-p219-t1:[]) [InvocationContextInterceptor] ISPN000136: Error executing command PutKeyValueCommand, writing keys [/macbeth.txt0]
> org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 6
> at org.infinispan.interceptors.impl.BaseStateTransferInterceptor$CancellableRetry.run(BaseStateTransferInterceptor.java:333) [infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> 00:23:35,673 DEBUG (testng-ClusteredScriptingTest:[]) [DefaultCacheManager] Stopping cache manager ISPN on ClusteredScriptingTest-NodeJ-9982
> 00:23:35,682 DEBUG (remote-thread-ClusteredScriptingTest-NodeI-p218-t5:[]) [ClusterTopologyManagerImpl] Updating cluster-wide current topology for cache ___defaultcache, topology = CacheTopology{id=6, phase=NO_REBALANCE, rebalanceId=2, currentCH=ReplicatedConsistentHash{ns = 256, owners = (1)[ClusteredScriptingTest-NodeI-40041: 256]}, pendingCH=null, unionCH=null, actualMembers=[ClusteredScriptingTest-NodeI-40041], persistentUUIDs=[91a47fef-fad6-479c-9ec9-12e9350dc6d5]}, availability mode = AVAILABLE
> 00:23:35,699 ERROR (testng-ClusteredScriptingTest:[]) [TestSuiteProgress] Test failed: org.infinispan.scripting.ClusteredScriptingTest.testDistributedMapReduceStreamWithFlag([REPL_SYNC])
> org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 6
> at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invoke(AsyncInterceptorChainImpl.java:259) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.cache.impl.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1636) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1284) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1750) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:217) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.cache.impl.AbstractDelegatingCache.put(AbstractDelegatingCache.java:358) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.cache.impl.EncoderCache.put(EncoderCache.java:671) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.scripting.utils.ScriptingUtils.loadData(ScriptingUtils.java:29) ~[test-classes/:?]
> at org.infinispan.scripting.ClusteredScriptingTest$6.call(ClusteredScriptingTest.java:144) ~[test-classes/:?]
> at org.infinispan.test.TestingUtil.withCacheManagers(TestingUtil.java:1522) ~[infinispan-core-9.2.0-SNAPSHOT-tests.jar:9.2.0-SNAPSHOT]
> at org.infinispan.scripting.ClusteredScriptingTest.testDistributedMapReduceStreamWithFlag(ClusteredScriptingTest.java:137) ~[test-classes/:?]
> {noformat}
> {noformat}
> 18:04:17,097 DEBUG (transport-thread-ClusteredCacheTest-NodeB-p3689-t1:[Topology-___defaultcache]) [LocalTopologyManagerImpl] Updating local topology for cache ___defaultcache: CacheTopology{id=5, phase=NO_REBALANCE, rebalanceId=2, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[ClusteredCacheTest-NodeA-53655: 134, ClusteredCacheTest-NodeB-27398: 122]}, pendingCH=null, unionCH=null, actualMembers=[ClusteredCacheTest-NodeA-53655, ClusteredCacheTest-NodeB-27398], persistentUUIDs=[5f5b2dd6-c570-4c6c-9c2d-92aa5fcd2362, b623ac7a-3c77-471d-88de-e151da823c0c]}
> 18:04:17,097 INFO (testng-ClusteredCacheTest:[]) [TestSuiteProgress] Test starting: org.infinispan.query.blackbox.ClusteredCacheTest.testConditionalRemoveFromNonOwner
> 18:04:17,097 DEBUG (transport-thread-ClusteredCacheTest-NodeA-p3679-t5:[Topology-___defaultcache]) [LocalTopologyManagerImpl] Updating local topology for cache ___defaultcache: CacheTopology{id=5, phase=NO_REBALANCE, rebalanceId=2, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[ClusteredCacheTest-NodeA-53655: 134, ClusteredCacheTest-NodeB-27398: 122]}, pendingCH=null, unionCH=null, actualMembers=[ClusteredCacheTest-NodeA-53655, ClusteredCacheTest-NodeB-27398], persistentUUIDs=[5f5b2dd6-c570-4c6c-9c2d-92aa5fcd2362, b623ac7a-3c77-471d-88de-e151da823c0c]}
> 18:04:32,100 ERROR (timeout-thread-ClusteredCacheTest-NodeA-p3678-t1:[]) [InvocationContextInterceptor] ISPN000136: Error executing command PutKeyValueCommand, writing keys [WrappedByteArray{bytes=[B0x010201054E617669..[9], hashCode=-1707624030}]
> org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 6
> at org.infinispan.interceptors.impl.BaseStateTransferInterceptor$CancellableRetry.run(BaseStateTransferInterceptor.java:333) [infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> 18:04:32,101 ERROR (testng-ClusteredCacheTest:[]) [TestSuiteProgress] Test failed: org.infinispan.query.blackbox.ClusteredCacheTest.testConditionalRemoveFromNonOwner
> org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 6
> at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invoke(AsyncInterceptorChainImpl.java:259) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.cache.impl.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1636) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1284) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1750) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:217) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.cache.impl.AbstractDelegatingCache.put(AbstractDelegatingCache.java:358) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.cache.impl.EncoderCache.put(EncoderCache.java:671) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
> at org.infinispan.query.blackbox.ClusteredCacheTest.prepareTestData(ClusteredCacheTest.java:140) ~[test-classes/:?]
> at org.infinispan.query.blackbox.ClusteredCacheTest.testConditionalRemoveFrom(ClusteredCacheTest.java:334) ~[test-classes/:?]
> at org.infinispan.query.blackbox.ClusteredCacheTest.testConditionalRemoveFromNonOwner(ClusteredCacheTest.java:298) ~[test-classes/:?]
> 18:04:32,214 DEBUG (testng-ClusteredCacheTest:[]) [DefaultCacheManager] Stopping cache manager ISPN on ClusteredCacheTest-NodeB-27398
> 18:04:32,479 DEBUG (remote-thread-ClusteredCacheTest-NodeA-p3677-t6:[]) [ClusterTopologyManagerImpl] Updating cluster-wide current topology for cache ___defaultcache, topology = CacheTopology{id=6, phase=NO_REBALANCE, rebalanceId=2, currentCH=ReplicatedConsistentHash{ns = 256, owners = (1)[ClusteredCacheTest-NodeA-53655: 256]}, pendingCH=null, unionCH=null, actualMembers=[ClusteredCacheTest-NodeA-53655], persistentUUIDs=[5f5b2dd6-c570-4c6c-9c2d-92aa5fcd2362]}, availability mode = AVAILABLE
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 4 months