[JBoss JIRA] (ISPN-10236) Hotrod client error releasing channel after server cache stop
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-10236?page=com.atlassian.jira.plugin... ]
Tristan Tarrant updated ISPN-10236:
-----------------------------------
Fix Version/s: 10.0.0.CR1
(was: 10.0.0.Beta5)
> Hotrod client error releasing channel after server cache stop
> -------------------------------------------------------------
>
> Key: ISPN-10236
> URL: https://issues.jboss.org/browse/ISPN-10236
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Server
> Affects Versions: 10.0.0.Beta3
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Labels: testsuite_stability
> Fix For: 10.0.0.CR1
>
> Attachments: ISPN-10137_package_private_scope_20190524-1732_ServerFailureRetryTest-infinispan-client-hotrod.log.gz
>
>
> Random failure in {{ServerFailureRetryTest.testRetryCacheStopped}} caused by an assert statement in {{ChannelPool.release()}}.
> {noformat}
> 17:37:36,562 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.client.hotrod.retry.ServerFailureRetryTest.testRetryCacheStopped
> java.lang.AssertionError: Error releasing [id: 0x5d9755e6, L:/127.0.0.1:42472 ! R:127.0.0.1/127.0.0.1:44865]
> at org.infinispan.client.hotrod.impl.transport.netty.ChannelPool.release(ChannelPool.java:170) ~[classes/:?]
> at org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory.releaseChannel(ChannelFactory.java:309) ~[classes/:?]
> at org.infinispan.client.hotrod.impl.operations.HotRodOperation.releaseChannel(HotRodOperation.java:105) ~[classes/:?]
> at org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.invoke(RetryOnFailureOperation.java:80) ~[classes/:?]
> at org.infinispan.client.hotrod.impl.transport.netty.ChannelPool.activateChannel(ChannelPool.java:217) ~[classes/:?]
> at org.infinispan.client.hotrod.impl.transport.netty.ChannelPool.acquire(ChannelPool.java:86) ~[classes/:?]
> at org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory.fetchChannelAndInvoke(ChannelFactory.java:259) ~[classes/:?]
> at org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory.fetchChannelAndInvoke(ChannelFactory.java:297) ~[classes/:?]
> at org.infinispan.client.hotrod.impl.operations.AbstractKeyOperation.fetchChannelAndInvoke(AbstractKeyOperation.java:41) ~[classes/:?]
> at org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.execute(RetryOnFailureOperation.java:61) ~[classes/:?]
> at org.infinispan.client.hotrod.impl.RemoteCacheImpl.putAsync(RemoteCacheImpl.java:366) ~[classes/:?]
> at org.infinispan.client.hotrod.impl.RemoteCacheImpl.put(RemoteCacheImpl.java:334) ~[classes/:?]
> at org.infinispan.client.hotrod.impl.RemoteCacheSupport.put(RemoteCacheSupport.java:79) ~[classes/:?]
> at org.infinispan.client.hotrod.retry.ServerFailureRetryTest.testRetryCacheStopped(ServerFailureRetryTest.java:63) ~[test-classes/:?]
> {noformat}
> I investigated a bit and I couldn't find an obvious mistake in the way {{ChannelPool.created}} is incremented and decremented, but I think it would help if access to it and {{ChannelPool.active}} were centralized in a smaller number of methods.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 5 months
[JBoss JIRA] (ISPN-10238) RemoteCacheManager.stop() hangs if a client thread is waiting for a server response
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-10238?page=com.atlassian.jira.plugin... ]
Tristan Tarrant updated ISPN-10238:
-----------------------------------
Fix Version/s: 10.0.0.CR1
(was: 10.0.0.Beta5)
> RemoteCacheManager.stop() hangs if a client thread is waiting for a server response
> -----------------------------------------------------------------------------------
>
> Key: ISPN-10238
> URL: https://issues.jboss.org/browse/ISPN-10238
> Project: Infinispan
> Issue Type: Bug
> Components: Server, Test Suite - Server
> Affects Versions: 10.0.0.Beta3, 9.4.14.Final
> Reporter: Dan Berindei
> Priority: Major
> Labels: testsuite_stability
> Fix For: 10.0.0.CR1
>
>
> One of our integration tests performs a blocking {{RemoteCache.size()}} operation on the thread where another asynchronous operation was completed (a {{HotRod-client-async-pool}} thread):
> {code:title=EvictionIT}
> CompletableFuture res = rc.putAllAsync(entries);
> res.thenRun(() -> assertEquals(3, rc.size()));
> {code}
> The test then finishes, but doesn't stop the {{RemoteCacheManager}}. When I changed the test to stop the {{RemoteCacheManager}}, the test started hanging:
> {noformat}
> "HotRod-client-async-pool-139-1" #2880 daemon prio=5 os_prio=0 cpu=434.56ms elapsed=1621.24s tid=0x00007f43a6b99800 nid=0x19c0 waiting on condition [0x00007f42ec9fd000]
> java.lang.Thread.State: TIMED_WAITING (parking)
> at jdk.internal.misc.Unsafe.park(java.base(a)11.0.3/Native Method)
> - parking to wait for <0x00000000d3321350> (a java.util.concurrent.CompletableFuture$Signaller)
> at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.3/LockSupport.java:234)
> at java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.3/CompletableFuture.java:1798)
> at java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.3/ForkJoinPool.java:3128)
> at java.util.concurrent.CompletableFuture.timedGet(java.base@11.0.3/CompletableFuture.java:1868)
> at java.util.concurrent.CompletableFuture.get(java.base@11.0.3/CompletableFuture.java:2021)
> at org.infinispan.client.hotrod.impl.Util.await(Util.java:46)
> at org.infinispan.client.hotrod.impl.RemoteCacheImpl.size(RemoteCacheImpl.java:307)
> at org.infinispan.server.test.eviction.EvictionIT.lambda$testPutAllAsyncEviction$0(EvictionIT.java:73)
> at org.infinispan.server.test.eviction.EvictionIT$$Lambda$347/0x000000010074a440.run(Unknown Source)
> at java.util.concurrent.CompletableFuture$UniRun.tryFire(java.base@11.0.3/CompletableFuture.java:783)
> at java.util.concurrent.CompletableFuture.postComplete(java.base@11.0.3/CompletableFuture.java:506)
> at java.util.concurrent.CompletableFuture.complete(java.base@11.0.3/CompletableFuture.java:2073)
> at org.infinispan.client.hotrod.impl.operations.HotRodOperation.complete(HotRodOperation.java:162)
> at org.infinispan.client.hotrod.impl.operations.PutAllOperation.acceptResponse(PutAllOperation.java:83)
> at org.infinispan.client.hotrod.impl.transport.netty.HeaderDecoder.decode(HeaderDecoder.java:144)
> at org.infinispan.client.hotrod.impl.transport.netty.HintedReplayingDecoder.callDecode(HintedReplayingDecoder.java:94)
> at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278)
> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
> at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
> at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
> at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:799)
> at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:421)
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:321)
> at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.3/ThreadPoolExecutor.java:1128)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.3/ThreadPoolExecutor.java:628)
> at java.lang.Thread.run(java.base@11.0.3/Thread.java:834)
> Locked ownable synchronizers:
> - <0x00000000ca248c30> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "main" #1 prio=5 os_prio=0 cpu=37300.10ms elapsed=2911.99s tid=0x00007f43a4023000 nid=0x37f7 in Object.wait() [0x00007f43a9c21000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(java.base(a)11.0.3/Native Method)
> - waiting on <no object reference available>
> at java.lang.Object.wait(java.base@11.0.3/Object.java:328)
> at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:231)
> - waiting to re-lock in wait() <0x00000000ca174af8> (a io.netty.util.concurrent.DefaultPromise)
> at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:33)
> at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:32)
> at org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory.destroy(ChannelFactory.java:216)
> at org.infinispan.client.hotrod.RemoteCacheManager.stop(RemoteCacheManager.java:365)
> at org.infinispan.client.hotrod.RemoteCacheManager.close(RemoteCacheManager.java:513)
> at org.infinispan.commons.junit.ClassResource.lambda$new$0(ClassResource.java:24)
> at org.infinispan.commons.junit.ClassResource$$Lambda$286/0x0000000100573040.accept(Unknown Source)
> at org.infinispan.commons.junit.ClassResource.after(ClassResource.java:41)
> at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:50)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at org.jboss.arquillian.junit.Arquillian.run(Arquillian.java:167)
> at org.junit.runners.Suite.runChild(Suite.java:128)
> at org.junit.runners.Suite.runChild(Suite.java:27)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at org.apache.maven.surefire.junitcore.JUnitCore.run(JUnitCore.java:55)
> at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.createRequestAndRun(JUnitCoreWrapper.java:137)
> at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.executeEager(JUnitCoreWrapper.java:107)
> at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:83)
> at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:75)
> at org.apache.maven.surefire.junitcore.JUnitCoreProvider.invoke(JUnitCoreProvider.java:158)
> at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> Locked ownable synchronizers:
> - None
> {noformat}
> {{HotRod-client-async-pool}} threads are not appropriate for doing blocking cache operations at any time, but we need to do more than just change the test:
> * We need an asynchronous {{RemoteCache.size()}} alternative
> * Currently blocking operations like {{size()}} wait for a response from the server for 1 day, they should wait for a much smaller (and configurable) timeout.
> * {{RemoteCacheManager.stop()}} should have a timeout as well, but more importantly it should cancel any pending operation.
> * We should consider running all application code on a separate thread pool.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 5 months
[JBoss JIRA] (ISPN-10365) PreferAvailabilityStrategy assertion failure
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-10365?page=com.atlassian.jira.plugin... ]
Tristan Tarrant updated ISPN-10365:
-----------------------------------
Fix Version/s: 10.0.0.CR1
(was: 10.0.0.Beta5)
> PreferAvailabilityStrategy assertion failure
> --------------------------------------------
>
> Key: ISPN-10365
> URL: https://issues.jboss.org/browse/ISPN-10365
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Test Suite - Core
> Affects Versions: 10.0.0.Beta3, 9.4.15.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Labels: testsuite_stability
> Fix For: 10.0.0.CR1, 9.4.17.Final
>
>
> This scenario happens unintentionally in {{RebalancePolicyJmxTest}}, because the test waits for the default cache to finish rebalancing before killing the coordinator but doesn't care about the {{CONFIG}} cache:
> * A and B are running, rebalancing is disabled, then C and D join
> * Re-enable rebalance, but stop B and A before the rebalance is done
> * C sees the finished rebalance, D sees the READ_OLD phase
> * C becomes coordinator and should recover with C's topology, but instead has an assertion failure and doesn't install a stable topology
> {noformat}
> 16:48:48,454 TRACE (stateTransferExecutor-thread-Test-NodeC-p49651-t6:[Merge-5]) [PreferAvailabilityStrategy] Cache org.infinispan.CONFIG keeping partition from [Test-NodeC-27509(rack-id=r2)]: CacheTopology{id=9, phase=NO_REBALANCE, rebalanceId=3, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeC-27509(rack-id=r2): 125, Test-NodeD-62603(rack-id=r2): 131]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeC-27509(rack-id=r2), Test-NodeD-62603(rack-id=r2)], persistentUUIDs=[59d0898c-f166-4129-9165-a22aca475286, 05d7dd0b-7cd8-464d-8adb-41fac100e8bf]}
> 16:48:48,454 TRACE (stateTransferExecutor-thread-Test-NodeC-p49651-t6:[Merge-5]) [PreferAvailabilityStrategy] Cache org.infinispan.CONFIG keeping partition from [Test-NodeD-62603(rack-id=r2)]: CacheTopology{id=6, phase=READ_OLD_WRITE_ALL, rebalanceId=3, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeA-4515(rack-id=r1): 127, Test-NodeB-42590(rack-id=r1): 129]}, pendingCH=ReplicatedConsistentHash{ns = 256, owners = (4)[Test-NodeA-4515(rack-id=r1): 63, Test-NodeB-42590(rack-id=r1): 62, Test-NodeC-27509(rack-id=r2): 64, Test-NodeD-62603(rack-id=r2): 67]}, unionCH=null, actualMembers=[Test-NodeA-4515(rack-id=r1), Test-NodeB-42590(rack-id=r1), Test-NodeC-27509(rack-id=r2), Test-NodeD-62603(rack-id=r2)], persistentUUIDs=[e9dcc3da-07a2-4159-a8b1-94e6428011c4, 3f27ddaa-1146-483e-8473-d79a5ba347f5, 59d0898c-f166-4129-9165-a22aca475286, 05d7dd0b-7cd8-464d-8adb-41fac100e8bf]}
> 16:48:48,454 TRACE (stateTransferExecutor-thread-Test-NodeC-p49651-t6:[Merge-5]) [PreferAvailabilityStrategy] Cache org.infinispan.CONFIG, resolveConflicts=false, newMembers=[Test-NodeC-27509(rack-id=r2), Test-NodeD-62603(rack-id=r2)], possibleOwners=[Test-NodeD-62603(rack-id=r2), Test-NodeC-27509(rack-id=r2)], preferredTopology=CacheTopology{id=6, phase=READ_OLD_WRITE_ALL, rebalanceId=3, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeA-4515(rack-id=r1): 127, Test-NodeB-42590(rack-id=r1): 129]}, pendingCH=ReplicatedConsistentHash{ns = 256, owners = (4)[Test-NodeA-4515(rack-id=r1): 63, Test-NodeB-42590(rack-id=r1): 62, Test-NodeC-27509(rack-id=r2): 64, Test-NodeD-62603(rack-id=r2): 67]}, unionCH=null, actualMembers=[Test-NodeA-4515(rack-id=r1), Test-NodeB-42590(rack-id=r1), Test-NodeC-27509(rack-id=r2), Test-NodeD-62603(rack-id=r2)], persistentUUIDs=[e9dcc3da-07a2-4159-a8b1-94e6428011c4, 3f27ddaa-1146-483e-8473-d79a5ba347f5, 59d0898c-f166-4129-9165-a22aca475286, 05d7dd0b-7cd8-464d-8adb-41fac100e8bf]}, mergeTopologyId=10
> 16:48:48,454 WARN (stateTransferExecutor-thread-Test-NodeC-p49651-t6:[Merge-5]) [PreferAvailabilityStrategy] ISPN000517: Ignoring cache topology from [Test-NodeC-27509(rack-id=r2)] during merge: CacheTopology{id=9, phase=NO_REBALANCE, rebalanceId=3, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeC-27509(rack-id=r2): 125, Test-NodeD-62603(rack-id=r2): 131]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeC-27509(rack-id=r2), Test-NodeD-62603(rack-id=r2)], persistentUUIDs=[59d0898c-f166-4129-9165-a22aca475286, 05d7dd0b-7cd8-464d-8adb-41fac100e8bf]}
> 16:48:48,454 DEBUG (stateTransferExecutor-thread-Test-NodeC-p49651-t6:[Merge-5]) [CLUSTER] ISPN000521: Cache org.infinispan.CONFIG recovered after merge with topology = CacheTopology{id=10, phase=NO_REBALANCE, rebalanceId=4, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeA-4515(rack-id=r1): 127, Test-NodeB-42590(rack-id=r1): 129]}, pendingCH=null, unionCH=null, actualMembers=[], persistentUUIDs=[]}, availability mode null
> 16:48:48,454 FATAL (stateTransferExecutor-thread-Test-NodeC-p49651-t6:[Merge-5]) [CLUSTER] [Context=org.infinispan.CONFIG] ISPN000313: Lost data because of abrupt leavers [Test-NodeA-4515(rack-id=r1), Test-NodeB-42590(rack-id=r1), Test-NodeC-27509(rack-id=r2), Test-NodeD-62603(rack-id=r2)]
> 16:48:48,455 ERROR (stateTransferExecutor-thread-Test-NodeC-p49651-t6:[Merge-5]) [LimitedExecutor] Exception in task
> java.lang.AssertionError: null
> at org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy.onPartitionMerge(PreferAvailabilityStrategy.java:217) ~[classes/:?]
> at org.infinispan.topology.ClusterCacheStatus.doMergePartitions(ClusterCacheStatus.java:647) ~[classes/:?]
> at org.infinispan.topology.ClusterTopologyManagerImpl.lambda$recoverClusterStatus$4(ClusterTopologyManagerImpl.java:500) ~[classes/:?]
> at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175) [classes/:?]
> {noformat}
> Eventually the missing stable topology makes the test fail:
> {noformat}
> 16:48:49,349 DEBUG (testng-Test:[null]) [ClusterCacheStatus] ISPN000519: Updating stable topology for cache org.infinispan.CONFIG, topology null
> 16:48:49,349 WARN (testng-Test:[null]) [CacheTopologyControlCommand] ISPN000071: Caught exception when handling command CacheTopologyControlCommand{cache=null, type=POLICY_ENABLE, sender=Test-NodeC-27509(rack-id=r2), joinInfo=null, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, phase=null, actualMembers=null, throwable=null, viewId=5}
> java.lang.NullPointerException: null
> at org.infinispan.topology.CacheTopologyControlCommand.<init>(CacheTopologyControlCommand.java:147) ~[classes/:?]
> at org.infinispan.topology.ClusterTopologyManagerImpl.broadcastStableTopologyUpdate(ClusterTopologyManagerImpl.java:659) ~[classes/:?]
> at org.infinispan.topology.ClusterCacheStatus.startQueuedRebalance(ClusterCacheStatus.java:806) ~[classes/:?]
> at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4772) ~[?:?]
> at org.infinispan.topology.ClusterTopologyManagerImpl.setRebalancingEnabled(ClusterTopologyManagerImpl.java:702) ~[classes/:?]
> at org.infinispan.topology.ClusterTopologyManagerImpl.setRebalancingEnabled(ClusterTopologyManagerImpl.java:682) ~[classes/:?]
> at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:215) ~[classes/:?]
> at org.infinispan.topology.CacheTopologyControlCommand.invokeAsync(CacheTopologyControlCommand.java:163) [classes/:?]
> at org.infinispan.commands.ReplicableCommand.invoke(ReplicableCommand.java:44) [classes/:?]
> at org.infinispan.topology.LocalTopologyManagerImpl.executeOnClusterSync(LocalTopologyManagerImpl.java:752) [classes/:?]
> at org.infinispan.topology.LocalTopologyManagerImpl.setCacheRebalancingEnabled(LocalTopologyManagerImpl.java:623) [classes/:?]
> at org.infinispan.topology.LocalTopologyManagerImpl.setRebalancingEnabled(LocalTopologyManagerImpl.java:581) [classes/:?]
> 16:48:49,355 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.statetransfer.RebalancePolicyJmxTest.testJoinAndLeaveWithRebalanceSuspendedAwaitingInitialTransfer[DIST_SYNC]
> javax.management.MBeanException: Error invoking setter for attribute rebalancingEnabled
> at org.infinispan.jmx.ResourceDMBean.setNamedAttribute(ResourceDMBean.java:358) ~[classes/:?]
> at org.infinispan.jmx.ResourceDMBean.setAttribute(ResourceDMBean.java:216) ~[classes/:?]
> at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.setAttribute(DefaultMBeanServerInterceptor.java:736) ~[?:?]
> at com.sun.jmx.mbeanserver.JmxMBeanServer.setAttribute(JmxMBeanServer.java:739) ~[?:?]
> at org.infinispan.statetransfer.RebalancePolicyJmxTest.doTest(RebalancePolicyJmxTest.java:163) ~[test-classes/:?]
> at org.infinispan.statetransfer.RebalancePolicyJmxTest.testJoinAndLeaveWithRebalanceSuspendedAwaitingInitialTransfer(RebalancePolicyJmxTest.java:44) ~[test-classes/:?]
> Caused by: java.lang.reflect.InvocationTargetException
> at jdk.internal.reflect.GeneratedMethodAccessor495.invoke(Unknown Source) ~[?:?]
> at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
> at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
> at org.infinispan.jmx.ResourceDMBean$InvokableSetterBasedMBeanAttributeInfo.invoke(ResourceDMBean.java:422) ~[classes/:?]
> at org.infinispan.jmx.ResourceDMBean.setNamedAttribute(ResourceDMBean.java:355) ~[classes/:?]
> ... 28 more
> Caused by: org.infinispan.commons.CacheException: Unsuccessful local response
> at org.infinispan.topology.LocalTopologyManagerImpl.executeOnClusterSync(LocalTopologyManagerImpl.java:757) ~[classes/:?]
> at org.infinispan.topology.LocalTopologyManagerImpl.setCacheRebalancingEnabled(LocalTopologyManagerImpl.java:623) ~[classes/:?]
> at org.infinispan.topology.LocalTopologyManagerImpl.setRebalancingEnabled(LocalTopologyManagerImpl.java:581) ~[classes/:?]
> at jdk.internal.reflect.GeneratedMethodAccessor495.invoke(Unknown Source) ~[?:?]
> at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
> at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
> at org.infinispan.jmx.ResourceDMBean$InvokableSetterBasedMBeanAttributeInfo.invoke(ResourceDMBean.java:422) ~[classes/:?]
> at org.infinispan.jmx.ResourceDMBean.setNamedAttribute(ResourceDMBean.java:355) ~[classes/:?]
> ... 28 more
> {noformat}
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 5 months
[JBoss JIRA] (ISPN-10362) Unify remove command initialization and invocation
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-10362?page=com.atlassian.jira.plugin... ]
Tristan Tarrant updated ISPN-10362:
-----------------------------------
Fix Version/s: 10.0.0.CR1
(was: 10.0.0.Beta5)
> Unify remove command initialization and invocation
> --------------------------------------------------
>
> Key: ISPN-10362
> URL: https://issues.jboss.org/browse/ISPN-10362
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core
> Affects Versions: 10.0.0.Beta3
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Fix For: 10.0.0.CR1, 10.0.0.Final
>
>
> ISPN-10322 unified command initialization with {{InitializableCommand}}, but we should go further and unify initialization with invocation.
> We can replace the current {{ReplicableCommand.invokeAsync}} and {{InitializableCommand.init(ComponentRegistry()}} methods with a method {{CacheRpcCommand.invokeAsync(ComponentRegistry)}} (or maybe {{execute}}?).
> For global commands we can create a {{GlobalRpcCommand}} interface with a method {{invokeAsync(GlobalComponentRegistry)}}.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 5 months
[JBoss JIRA] (ISPN-10301) Deprecate server media type application/x-java-object
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-10301?page=com.atlassian.jira.plugin... ]
Tristan Tarrant updated ISPN-10301:
-----------------------------------
Fix Version/s: 10.0.0.CR1
(was: 10.0.0.Beta5)
> Deprecate server media type application/x-java-object
> -----------------------------------------------------
>
> Key: ISPN-10301
> URL: https://issues.jboss.org/browse/ISPN-10301
> Project: Infinispan
> Issue Type: Task
> Components: Server
> Affects Versions: 10.0.0.Beta3
> Reporter: Dan Berindei
> Priority: Major
> Fix For: 10.0.0.CR1
>
>
> `application/x-java-object` needs to have the application classes deployed on the server in order to do anything useful with it, and the server must also be able to do reflection on those application classes.
> We should steer users towards always using `application/x-protostream` instead, because the protobuf schemas are much easier to deploy to the server. The first step would be to make protostream the default marshalling mechanism in the client.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 5 months
[JBoss JIRA] (ISPN-10366) ScatteredStateConsumerImpl sets segment state to OWNED before applying values
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-10366?page=com.atlassian.jira.plugin... ]
Tristan Tarrant updated ISPN-10366:
-----------------------------------
Fix Version/s: 10.0.0.CR1
(was: 10.0.0.Beta5)
> ScatteredStateConsumerImpl sets segment state to OWNED before applying values
> -----------------------------------------------------------------------------
>
> Key: ISPN-10366
> URL: https://issues.jboss.org/browse/ISPN-10366
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Test Suite - Core
> Affects Versions: 10.0.0.Beta3, 9.4.15.Final
> Reporter: Dan Berindei
> Assignee: Radim Vansa
> Priority: Major
> Labels: testsuite_stability
> Fix For: 10.0.0.CR1
>
> Attachments: ISPN-10363_LazyInitializingExecutorService_94x_20190627-2010_PrefetchTest-infinispan-core.log.gz
>
>
> {{ScatteredStateConsumerImpl}} uses {{InboundTransferTask}} only to request keys, then after it received all the keys of a segment it changes the segment state to {{VALUE_TRANSFER}} and starts an asynchronous request to fetch the values and replace the {{RemoteMetadata}} entries with real entries.
> {{ScatteredStateConsumerImpl.chunkCounter}} is supposed to delay the state transfer end and the segment state change to {{OWNED}}, but in rare occasions this doesn't happen.
> This happened in {{PrefetchTest.testPrefetch12}} while running the test suite with {{taskset -c 1-2}}:
> {noformat}
> 21:54:43,304 TRACE (transport-thread-Test-NodeC-p69907-t5:[Topology-___defaultcache]) [StateConsumerImpl] Received new topology for cache ___defaultcache, isRebalance = true, isMember = true, topology = CacheTopology{id=9, phase=TRANSITORY, rebalanceId=5, currentCH=PartitionerConsistentHash:ScatteredConsistentHash{ns=1, rebalanced=false, owners = (2)[Test-NodeA-39104: 1, Test-NodeC-3746: 0]}, pendingCH=PartitionerConsistentHash:ScatteredConsistentHash{ns=1, rebalanced=true, owners = (2)[Test-NodeA-39104: 0, Test-NodeC-3746: 1]}, unionCH=PartitionerConsistentHash:ScatteredConsistentHash{ns=1, rebalanced=false, owners = (2)[Test-NodeA-39104: 0, Test-NodeC-3746: 1]}, actualMembers=[Test-NodeA-39104, Test-NodeC-3746], persistentUUIDs=[f58e0a9a-dd4e-429a-8464-da64bf001d4e, 1471096f-c59a-4dc9-8f4d-31fbf399a2aa]}
> 21:54:43,305 TRACE (stateTransferExecutor-thread-Test-NodeC-p69908-t6:[StateRequest-___defaultcache]) [ScatteredStateConsumerImpl] Requesting keys for segments {0} from Test-NodeA-39104
> 21:54:43,313 TRACE (transport-thread-Test-NodeC-p69907-t5:[Topology-___defaultcache]) [StateConsumerImpl] Topology update processed, stateTransferTopologyId = 9, startRebalance = true, pending CH = PartitionerConsistentHash:ScatteredConsistentHash{ns=1, rebalanced=true, owners = (2)[Test-NodeA-39104: 0, Test-NodeC-3746: 1]}
> 21:54:43,313 TRACE (transport-thread-Test-NodeC-p69907-t5:[Topology-___defaultcache]) [StateTransferLockImpl] Signalling transaction data received for topology 9
> 21:54:43,313 TRACE (remote-thread-Test-NodeC-p69905-t2:[]) [TrianglePerCacheInboundInvocationHandler] Calling perform() on StateResponseCommand{cache=___defaultcache, pushTransfer=false, stateChunks=[StateChunk{segmentId=0, cacheEntries=1, isLastChunk=true}], origin=Test-NodeA-39104, topologyId=9, applyState=true}
> 21:54:43,313 TRACE (stateTransferExecutor-thread-Test-NodeC-p69908-t6:[]) [StateConsumerImpl] Applying new state chunk for segment 0 of cache ___defaultcache from node Test-NodeA-39104: received 1 cache entries
> 21:54:43,314 TRACE (stateTransferExecutor-thread-Test-NodeC-p69908-t6:[]) [ScatteredVersionManagerImpl] Finished transfer for segment 0 = KEY_TRANSFER -> VALUE_TRANSFER
> 21:54:43,314 TRACE (stateTransferExecutor-thread-Test-NodeC-p69908-t6:[]) [ScatteredVersionManagerImpl] Node Test-NodeC-3746, segment 0 has all keys in, expects value transfer
> 21:54:43,314 TRACE (stateTransferExecutor-thread-Test-NodeC-p69908-t6:[]) [ScatteredStateConsumerImpl] Requesting values from segments {0}, for in-memory keys
> 21:54:43,314 TRACE (stateTransferExecutor-thread-Test-NodeC-p69908-t6:[]) [ScatteredStateConsumerImpl] Retrieving values, chunk counter is 1
> 21:54:43,314 TRACE (stateTransferExecutor-thread-Test-NodeC-p69908-t6:[]) [JGroupsTransport] Test-NodeC-3746 sending request 11 to Test-NodeA-39104: ClusteredGetAllCommand{keys=[key], flags=[SKIP_OWNERSHIP_CHECK], topologyId=9}
> 21:54:43,314 TRACE (stateTransferExecutor-thread-Test-NodeC-p69908-t6:[]) [ScatteredStateConsumerImpl] Invalidating versions on Test-NodeC-3746, chunk counter incremented to 2
> 21:54:43,314 TRACE (stateTransferExecutor-thread-Test-NodeC-p69908-t6:[]) [ScatteredStateConsumerImpl] Versions invalidated on Test-NodeC-3746, chunk counter decremented to 1
> 21:54:43,314 TRACE (stateTransferExecutor-thread-Test-NodeC-p69908-t6:[]) [StateConsumerImpl] Removing inbound transfers from node {0} for segments Test-NodeA-39104
> 21:54:43,314 TRACE (stateTransferExecutor-thread-Test-NodeC-p69908-t6:[]) [ScatteredStateConsumerImpl] Inbound transfer removed, chunk counter is 1
> 21:54:43,314 TRACE (stateTransferExecutor-thread-Test-NodeC-p69908-t6:[]) [StateConsumerImpl] Latch 0
> 21:54:43,315 TRACE (jgroups-7,Test-NodeC-3746:[]) [JGroupsTransport] Test-NodeC-3746 received response for request 11 from Test-NodeA-39104: SuccessfulResponse([MetadataImmortalCacheValue {value=v0, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=SimpleClusteredVersion{topologyId=7, version=1}}}])
> 21:54:43,316 TRACE (jgroups-7,Test-NodeC-3746:[]) [BlockingInterceptor] Command blocking before completion of PutKeyValueCommand{key=key, value=v0, flags=[CACHE_MODE_LOCAL, SKIP_REMOTE_LOOKUP, PUT_FOR_STATE_TRANSFER, SKIP_SHARED_CACHE_STORE, SKIP_OWNERSHIP_CHECK, IGNORE_RETURN_VALUES, SKIP_XSITE_BACKUP], commandInvocationId=CommandInvocation:Test-NodeC-3746:121294, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=InternalMetadataImpl{actual=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=SimpleClusteredVersion{topologyId=7, version=1}}, created=-1, lastUsed=-1}, successful=true, topologyId=-1}
> 21:54:43,316 TRACE (remote-thread-Test-NodeC-p69905-t2:[___defaultcache]) [StateConsumerImpl] After applying the received state the data container of cache ___defaultcache has 1 keys
> 21:54:43,316 TRACE (remote-thread-Test-NodeC-p69905-t2:[___defaultcache]) [StateConsumerImpl] Segments not received yet for cache ___defaultcache: {}
> 21:54:43,316 DEBUG (transport-thread-Test-NodeC-p69907-t5:[Topology-___defaultcache]) [StateConsumerImpl] Finished receiving of segments for cache ___defaultcache for topology 9.
> 21:54:43,316 DEBUG (transport-thread-Test-NodeC-p69907-t5:[Topology-___defaultcache]) [ScatteredVersionManagerImpl] Node Test-NodeC-3746 received values for all segments in topology 9
> 21:54:43,316 TRACE (transport-thread-Test-NodeC-p69907-t5:[Topology-___defaultcache]) [StateConsumerImpl] Stop keeping track of changed keys for state transfer in topology 9
> {noformat}
> The test then starts a put operation and expects it to prefetch the previous value, but because the segment is {{OWNED}}, the {{RemoteMetadata}} is ignored:
> {noformat}
> 21:54:43,316 TRACE (ForkThread-1,Test:[]) [InvocationContextInterceptor] Invoked with command PutKeyValueCommand{key=key, value=v1, flags=[], commandInvocationId=CommandInvocation:Test-NodeC-3746:121295, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true, topologyId=-1} and InvocationContext [SingleKeyNonTxInvocationContext{isLocked=false, key=null, cacheEntry=null, origin=null, lockOwner=CommandInvocation:Test-NodeC-3746:121295}]
> 21:54:43,316 TRACE (ForkThread-1,Test:[]) [EntryFactoryImpl] Retrieved from container MetadataImmortalCacheEntry{key=key, value=null, metadata=RemoteMetadata{address=Test-NodeA-39104, version=1}}
> 21:54:43,316 TRACE (ForkThread-1,Test:[]) [ScatteredDistributionInterceptor] Committing entry RepeatableReadEntry(108d175b){key=key, value=v1, isCreated=false, isChanged=true, isRemoved=false, isExpired=false, skipLookup=true, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=SimpleClusteredVersion{topologyId=9, version=1}}}, replaced MetadataImmortalCacheEntry{key=key, value=null, metadata=RemoteMetadata{address=Test-NodeA-39104, version=1}}
> 21:54:53,316 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.scattered.statetransfer.PrefetchTest.testPrefetch12
> org.infinispan.test.TestException: java.util.concurrent.TimeoutException
> at org.infinispan.util.ControlledRpcManager.uncheckedGet(ControlledRpcManager.java:259) ~[test-classes/:?]
> at org.infinispan.util.ControlledRpcManager.expectCommand(ControlledRpcManager.java:124) ~[test-classes/:?]
> at org.infinispan.scattered.statetransfer.PrefetchTest.testPrefetch(PrefetchTest.java:110) ~[test-classes/:?]
> at org.infinispan.scattered.statetransfer.PrefetchTest.testPrefetch12(PrefetchTest.java:67) ~[test-classes/:?]
> {noformat}
> On a related note, {{StateConsumerImpl.applyState(pushTransfer=true)}} initializes a {{CountDownLatch(stateChunks.size())}}, but doesn't actually count down if {{stateChunk.getCacheEntries() == null}}, potentially hanging state transfer until it times out.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 5 months