[JBoss JIRA] (ISPN-9127) Remote commands can access components before they are started
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-9127?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-9127:
-------------------------------
Status: Pull Request Sent (was: Coding In Progress)
Git Pull Request: https://github.com/infinispan/infinispan/pull/5965, https://github.com/infinispan/infinispan/pull/6232 (was: https://github.com/infinispan/infinispan/pull/5965)
> Remote commands can access components before they are started
> -------------------------------------------------------------
>
> Key: ISPN-9127
> URL: https://issues.jboss.org/browse/ISPN-9127
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.2.2.Final, 9.3.0.Alpha1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Labels: testsuite_stability
> Attachments: server0, server1, server2, trace.tar.gz
>
>
> {{PerCacheInboundInvocationHandler.handle()}} may be called before the component was started, because {{GlobalInboundInvocationHandler}} fetches it from the component registry without any checks. {{CommandsFactoryImpl.initializeReplicableCommand()}} doesn't wait for the components that it injects into remote commands to be started, either.
> This started causing random test failures in {{ConcurrentStartForkChannelTest}} after ISPN-8515, which moved most initialization work from {{init()}} methods to {{start()}} methods. Because {{StateProviderImpl}} starts after {{StateTransferManagerImpl}}, it's possible for a node to receive a {{StateRequestCommand}} before {{StateProviderImpl}} has initialized:
> {noformat}
> 16:15:09,549 TRACE (remote-thread-Test-NodeB-p51957-t2:[org.infinispan.CONFIG]) [StateProviderImpl] Starting outbound transfer to node Test-NodeA for cache null, topology id 2, segments {0-255}
> 16:15:09,551 WARN (remote-thread-Test-NodeB-p51957-t2:[]) [NonTotalOrderPerCacheInboundInvocationHandler] ISPN000071: Caught exception when handling command StateRequestCommand{cache=org.infinispan.CONFIG, origin=Test-NodeA, type=START_STATE_TRANSFER, topologyId=2, segments={0-255}}
> java.lang.IllegalArgumentException: chunkSize must be greater than 0
> at org.infinispan.statetransfer.OutboundTransferTask.<init>(OutboundTransferTask.java:114) ~[classes/:?]
> at org.infinispan.statetransfer.StateProviderImpl.startOutboundTransfer(StateProviderImpl.java:273) ~[classes/:?]
> at org.infinispan.statetransfer.StateRequestCommand.invokeAsync(StateRequestCommand.java:101) ~[classes/:?]
> at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokeCommand(BasePerCacheInboundInvocationHandler.java:94) ~[classes/:?]
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years
[JBoss JIRA] (ISPN-9517) State transfer times out if initiated with yet to be verified suspected member and reincarnated member
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/ISPN-9517?page=com.atlassian.jira.plugin.... ]
Bela Ban commented on ISPN-9517:
--------------------------------
[~pferraro] Can you resolve this issue? It flags JGRP-2294 as a warning...
> State transfer times out if initiated with yet to be verified suspected member and reincarnated member
> ------------------------------------------------------------------------------------------------------
>
> Key: ISPN-9517
> URL: https://issues.jboss.org/browse/ISPN-9517
> Project: Infinispan
> Issue Type: Bug
> Components: State Transfer
> Affects Versions: 9.3.3.Final
> Reporter: Paul Ferraro
> Assignee: Paul Ferraro
> Attachments: Test.java, node-1.zip, node-2.zip
>
>
> Here's the scenario:
> 1. Cluster contains caches on 2 members, node-1 and node-2
> 2. node-2 is killed
> 3. node-2 is restarted (using same physical address)
> 4. State transfer initiates, view contains node-1, suspected node-2, and reincarnated node-2
> 5. State transfer times out
> Log of node-1 includes:
> {noformat}
> 12:09:51,882 WARN [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p14-t4) ISPN000197: Error updating cluster member list: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 3 from node-2
> at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
> at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
> at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_181]
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [rt.jar:1.8.0_181]
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [rt.jar:1.8.0_181]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_181]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_181]
> at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_181]
> Suppressed: org.infinispan.util.logging.TraceException
> at org.infinispan.remoting.transport.Transport.invokeRemotely(Transport.java:75)
> at org.infinispan.topology.ClusterTopologyManagerImpl.confirmMembersAvailable(ClusterTopologyManagerImpl.java:525)
> at org.infinispan.topology.ClusterTopologyManagerImpl.updateCacheMembers(ClusterTopologyManagerImpl.java:508)
> at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:321)
> at org.infinispan.topology.ClusterTopologyManagerImpl.access$500(ClusterTopologyManagerImpl.java:87)
> at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.lambda$handleViewChange$0(ClusterTopologyManagerImpl.java:731)
> at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175)
> at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37)
> at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_181]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_181]
> at org.wildfly.clustering.service.concurrent.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:47)
> ... 1 more
> {noformat}
> I've attached trace logs from node-1 and node-2.
> Changing ClusterTopologyManagerImpl.confirmMembersAvailable() to use ResponseMode.SYNCHRONOUS_IGNORE_LEAVERS instead of ResponseMode.SYNCHRONOUS allows state transfer to complete successfully.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years
[JBoss JIRA] (ISPN-4075) State transfer should preserve the creation timestamp of entries
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4075?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-4075:
-------------------------------
Fix Version/s: 10.0.0.Final
(was: 9.4.0.Final)
> State transfer should preserve the creation timestamp of entries
> ----------------------------------------------------------------
>
> Key: ISPN-4075
> URL: https://issues.jboss.org/browse/ISPN-4075
> Project: Infinispan
> Issue Type: Feature Request
> Components: Core
> Affects Versions: 6.0.1.Final
> Reporter: Dan Berindei
> Fix For: 10.0.0.Final
>
>
> State transfer inserts values with the current time as the creation time. Since the entries store the expected lifespan and not the expected expiration time, entries on the receiving node could expire much later than intended.
> The argument probably doesn't apply to the timestamp of the last usage. Since state transfer process could be interpreted as a reader, it should be fine to extend the update the time of the last usage both on the sending node and on the receiving node.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years
[JBoss JIRA] (ISPN-9541) Module initialization is not thread-safe
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-9541?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-9541:
-------------------------------
Summary: Module initialization is not thread-safe (was: Modules should not replace components after they were registered)
> Module initialization is not thread-safe
> ----------------------------------------
>
> Key: ISPN-9541
> URL: https://issues.jboss.org/browse/ISPN-9541
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Server
> Affects Versions: 9.4.0.CR3
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Fix For: 9.4.0.Final
>
>
> In my ISPN-9127 fix I created a {{BasicComponentRegistry}} interface that represents a mostly-read-only collection of components. It has {{replaceComponent()}} method and a {{rewire()}} method for testing purposes, but it turns out modules were also relying on the ability to replace existing components in order to work.
> Replacing global components is normally safe during the {{ModuleLifecycle.cacheManagerStarting()}}, because none of the components are started yet, so when a component starts later we can still start its dependencies first. But because some modules starts some global components, e.g. by calling {{manager.getCache(name)}}, that assumption breaks.
> The {{infinispan-server-event-logger}} module is a bit more sneaky: it doesn't replace a component, instead it replaces the actual implementation of the event logger in the {{EventLogManager}} component. Events that happen before the module's {{cacheManagerStarting()}} or after {{cacheManagerStopping()}} will be silently dropped from the persistent event log.
> I am investigating making the module a factory of factories. Instead of having a monolitic {{cacheManagerStarting()}} method, it could define a set of components that it can create, and a set of components that should be started before the cache manager is "running". We probably need a way to depend on other modules as well, maybe reusing the {{@Inject}} and {{@ComponentName}} annotations.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years
[JBoss JIRA] (ISPN-9542) DistributedStreamRehashStressTest fails many of the tests
by William Burns (JIRA)
[ https://issues.jboss.org/browse/ISPN-9542?page=com.atlassian.jira.plugin.... ]
William Burns commented on ISPN-9542:
-------------------------------------
Another failure is:
{code}
java.lang.AssertionError: We didn't get a matching size! Expected 25000 but was 24247 expected [24247] but found [25000]
Expected :24247
Actual :25000
<Click to see difference>
at org.testng.Assert.fail(Assert.java:94)
at org.testng.Assert.failNotEquals(Assert.java:496)
at org.testng.Assert.assertEquals(Assert.java:125)
at org.testng.Assert.assertEquals(Assert.java:267)
at org.infinispan.stream.stress.DistributedStreamRehashStressTest.lambda$testStressNodesLeavingWhileMultipleCount$1(DistributedStreamRehashStressTest.java:96)
at org.infinispan.stream.stress.DistributedStreamRehashStressTest.lambda$testStressNodesLeavingWhilePerformingCallable$5(DistributedStreamRehashStressTest.java:190)
at org.infinispan.commands.StressTest.lambda$forkWorkerThreads$1(StressTest.java:96)
at org.infinispan.test.AbstractInfinispanTest$CallableWrapper.call(AbstractInfinispanTest.java:528)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}
> DistributedStreamRehashStressTest fails many of the tests
> ---------------------------------------------------------
>
> Key: ISPN-9542
> URL: https://issues.jboss.org/browse/ISPN-9542
> Project: Infinispan
> Issue Type: Bug
> Components: Streams
> Affects Versions: 9.4.0.CR3
> Reporter: William Burns
> Assignee: William Burns
>
> Many of the stress tests either fail, timeout or say they processed no iterations. We need to ensure these are working properly. One of the causes was from https://github.com/infinispan/infinispan/pull/6270/files#r220378591. That luckily is a simple fix, however we need to plug the others.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years