[infinispan-issues] [JBoss JIRA] (ISPN-9536) Distributed stream iteration timeout during join

Dan Berindei (JIRA) issues at jboss.org
Mon Sep 24 06:53:00 EDT 2018


Dan Berindei created ISPN-9536:
----------------------------------

             Summary: Distributed stream iteration timeout during join
                 Key: ISPN-9536
                 URL: https://issues.jboss.org/browse/ISPN-9536
             Project: Infinispan
          Issue Type: Bug
          Components: Core
    Affects Versions: 9.4.0.CR3
            Reporter: Dan Berindei
            Assignee: Dan Berindei
             Fix For: 9.4.0.Final


{{LocalStreamManagerImpl}} checks in several places that the cache status is {{RUNNING}}, and if not it suspects all local segments with this log message:

{noformat}
[LocalStreamManagerImpl] Cache status is no longer running, all segments are now suspect for Test-NodeD-257370
{noformat}

This check is incorrect, because a node may receive distributed streaming requests before it is running: the initial state transfer could finish before all the components in the cache finish starting.

If the check fails because the cache is still starting, the originator of the distributed stream iteration will assume that the remote cache has a newer topology. Because there is no newer topology, the originator blocks for {{DistributedCacheTimeout.timeout()}} and then fails.

{noformat}
10:48:06,137 DEBUG (transport-thread-Test-NodeE-p660-t3:[Topology-org.infinispan.CONFIG]) [LocalTopologyManagerImpl] Updating local topology for cache org.infinispan.CONFIG: CacheTopology{id=9, phase=NO_REBALANCE, rebalanceId=3, currentCH=ReplicatedConsistentHash{ns = 256, owners = (5)[Test-NodeC-45735: 55, Test-NodeA-61957: 55, Test-NodeE-27687: 52, Test-NodeB-13641: 45, Test-NodeD-25737: 49]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeC-45735, Test-NodeA-61957, Test-NodeE-27687, Test-NodeB-13641, Test-NodeD-25737], persistentUUIDs=[5fccba8a-802a-423a-8e3b-ec9c1892a5ae, 5d04de93-79ce-4eff-a827-2a4bbdf54080, 70b79733-dbbe-4622-a5f6-38d3ec7771ff, 87e8c232-21a0-4416-8c81-4829f4619414, 78146ad7-7594-4161-9a7e-c4da0cc79f42]}
10:48:06,156 TRACE (remote-thread-Test-NodeE-p651-t2:[]) [LocalStreamManagerImpl] Request Test-NodeD-257370 completed for segments {2 4 12 20 24 27 31 36 38 42 47 56 58 66 72 75 77 87-88 93 100-101 109 120 125 148 150 159 166 172 179-180 188 194-195 198 204-205 212 216 219-220 222 226 233 236-237 239 242-244 255} with {} suspected segments
10:48:06,157 TRACE (remote-thread-Test-NodeE-p651-t2:[]) [LocalStreamManagerImpl] Cache status is no longer running, all segments are now suspect for Test-NodeD-257370
10:48:06,158 TRACE (ForkThread-5,CounterConcurrentStartTest:[org.infinispan.CONFIG]) [DefaultCacheManager] Cache org.infinispan.CONFIG started

10:48:06,165 TRACE (remote-thread-Test-NodeD-p649-t2:[]) [ClusterStreamManagerImpl] Received response from Test-NodeE-27687 with a completed response true for id Test-NodeD-257370 with {2 4 12 20 24 27 31 36 38 42 47 56 58 66 72 75 77 87-88 93 100-101 109 120 125 148 150 159 166 172 179-180 188 194-195 198 204-205 212 216 219-220 222 226 233 236-237 239 242-244 255} suspected segments.
10:48:06,165 TRACE (ForkThread-9,CounterConcurrentStartTest:[]) [DistributedCacheStream] Found {2 4 12 20 24 27 31 36 38 42 47 56 58 66 72 75 77 87-88 93 100-101 109 120 125 148 150 159 166 172 179-180 188 194-195 198 204-205 212 216 219-220 222 226 233 236-237 239 242-244 255} lost segments for identifier Test-NodeD-257370
10:48:06,165 TRACE (ForkThread-9,CounterConcurrentStartTest:[]) [DistributedCacheStream] Waiting for topology 10 to continue stream operation with segments {2 4 12 20 24 27 31 36 38 42 47 56 58 66 72 75 77 87-88 93 100-101 109 120 125 148 150 159 166 172 179-180 188 194-195 198 204-205 212 216 219-220 222 226 233 236-237 239 242-244 255}
10:48:06,165 TRACE (ForkThread-9,CounterConcurrentStartTest:[]) [StateTransferLockImpl] Waiting for topology 10 to be installed, current topology is 9
10:48:36,168 TRACE (ForkThread-9,CounterConcurrentStartTest:[]) [BasicComponentRegistryImpl] Changed status of org.infinispan.counter.api.CounterManager to FAILED
org.infinispan.manager.EmbeddedCacheManagerStartupException: org.infinispan.commons.CacheException: Unable to invoke method public void org.infinispan.counter.impl.manager.EmbeddedCounterManager.start() on object of type EmbeddedCounterManager
...
Caused by: java.util.concurrent.TimeoutException
	at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) ~[?:1.8.0_171]
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) ~[?:1.8.0_171]
	at org.infinispan.stream.impl.AbstractCacheStream.performOperationRehashAware(AbstractCacheStream.java:330) ~[classes/:?]
	at org.infinispan.stream.impl.AbstractCacheStream.performOperation(AbstractCacheStream.java:229) ~[classes/:?]
	at org.infinispan.stream.impl.DistributedCacheStream.anyMatch(DistributedCacheStream.java:406) ~[classes/:?]
	at org.infinispan.util.AbstractDelegatingCacheStream.anyMatch(AbstractDelegatingCacheStream.java:300) ~[classes/:?]
	at org.infinispan.CacheStream.anyMatch(CacheStream.java:462) ~[classes/:?]
	at org.infinispan.counter.impl.manager.CounterConfigurationManager.start(CounterConfigurationManager.java:90) ~[classes/:?]
	at org.infinispan.counter.impl.manager.EmbeddedCounterManager.start(EmbeddedCounterManager.java:89) ~[classes/:?]
{noformat}



--
This message was sent by Atlassian JIRA
(v7.5.0#75005)


More information about the infinispan-issues mailing list