[infinispan-issues] [JBoss JIRA] (ISPN-9536) Distributed stream iteration timeout during join

Dan Berindei (JIRA) issues at jboss.org
Mon Sep 24 08:26:01 EDT 2018


     [ https://issues.jboss.org/browse/ISPN-9536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dan Berindei updated ISPN-9536:
-------------------------------
    Status: Open  (was: New)


> Distributed stream iteration timeout during join
> ------------------------------------------------
>
>                 Key: ISPN-9536
>                 URL: https://issues.jboss.org/browse/ISPN-9536
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 9.4.0.CR3
>            Reporter: Dan Berindei
>            Assignee: Dan Berindei
>             Fix For: 9.4.0.Final
>
>
> {{LocalStreamManagerImpl}} checks in several places that the cache status is {{RUNNING}}, and if not it suspects all local segments with this log message:
> {noformat}
> [LocalStreamManagerImpl] Cache status is no longer running, all segments are now suspect for Test-NodeD-257370
> {noformat}
> This check is incorrect, because a node may receive distributed streaming requests before it is running: the initial state transfer could finish before all the components in the cache finish starting.
> If the check fails because the cache is still starting, the originator of the distributed stream iteration will assume that the remote cache has a newer topology. Because there is no newer topology, the originator blocks for {{DistributedCacheTimeout.timeout()}} and then fails.
> {noformat}
> 10:48:06,137 DEBUG (transport-thread-Test-NodeE-p660-t3:[Topology-org.infinispan.CONFIG]) [LocalTopologyManagerImpl] Updating local topology for cache org.infinispan.CONFIG: CacheTopology{id=9, phase=NO_REBALANCE, rebalanceId=3, currentCH=ReplicatedConsistentHash{ns = 256, owners = (5)[Test-NodeC-45735: 55, Test-NodeA-61957: 55, Test-NodeE-27687: 52, Test-NodeB-13641: 45, Test-NodeD-25737: 49]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeC-45735, Test-NodeA-61957, Test-NodeE-27687, Test-NodeB-13641, Test-NodeD-25737], persistentUUIDs=[5fccba8a-802a-423a-8e3b-ec9c1892a5ae, 5d04de93-79ce-4eff-a827-2a4bbdf54080, 70b79733-dbbe-4622-a5f6-38d3ec7771ff, 87e8c232-21a0-4416-8c81-4829f4619414, 78146ad7-7594-4161-9a7e-c4da0cc79f42]}
> 10:48:06,156 TRACE (remote-thread-Test-NodeE-p651-t2:[]) [LocalStreamManagerImpl] Request Test-NodeD-257370 completed for segments {2 4 12 20 24 27 31 36 38 42 47 56 58 66 72 75 77 87-88 93 100-101 109 120 125 148 150 159 166 172 179-180 188 194-195 198 204-205 212 216 219-220 222 226 233 236-237 239 242-244 255} with {} suspected segments
> 10:48:06,157 TRACE (remote-thread-Test-NodeE-p651-t2:[]) [LocalStreamManagerImpl] Cache status is no longer running, all segments are now suspect for Test-NodeD-257370
> 10:48:06,158 TRACE (ForkThread-5,CounterConcurrentStartTest:[org.infinispan.CONFIG]) [DefaultCacheManager] Cache org.infinispan.CONFIG started
> 10:48:06,165 TRACE (remote-thread-Test-NodeD-p649-t2:[]) [ClusterStreamManagerImpl] Received response from Test-NodeE-27687 with a completed response true for id Test-NodeD-257370 with {2 4 12 20 24 27 31 36 38 42 47 56 58 66 72 75 77 87-88 93 100-101 109 120 125 148 150 159 166 172 179-180 188 194-195 198 204-205 212 216 219-220 222 226 233 236-237 239 242-244 255} suspected segments.
> 10:48:06,165 TRACE (ForkThread-9,CounterConcurrentStartTest:[]) [DistributedCacheStream] Found {2 4 12 20 24 27 31 36 38 42 47 56 58 66 72 75 77 87-88 93 100-101 109 120 125 148 150 159 166 172 179-180 188 194-195 198 204-205 212 216 219-220 222 226 233 236-237 239 242-244 255} lost segments for identifier Test-NodeD-257370
> 10:48:06,165 TRACE (ForkThread-9,CounterConcurrentStartTest:[]) [DistributedCacheStream] Waiting for topology 10 to continue stream operation with segments {2 4 12 20 24 27 31 36 38 42 47 56 58 66 72 75 77 87-88 93 100-101 109 120 125 148 150 159 166 172 179-180 188 194-195 198 204-205 212 216 219-220 222 226 233 236-237 239 242-244 255}
> 10:48:06,165 TRACE (ForkThread-9,CounterConcurrentStartTest:[]) [StateTransferLockImpl] Waiting for topology 10 to be installed, current topology is 9
> 10:48:36,168 TRACE (ForkThread-9,CounterConcurrentStartTest:[]) [BasicComponentRegistryImpl] Changed status of org.infinispan.counter.api.CounterManager to FAILED
> org.infinispan.manager.EmbeddedCacheManagerStartupException: org.infinispan.commons.CacheException: Unable to invoke method public void org.infinispan.counter.impl.manager.EmbeddedCounterManager.start() on object of type EmbeddedCounterManager
> ...
> Caused by: java.util.concurrent.TimeoutException
> 	at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) ~[?:1.8.0_171]
> 	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) ~[?:1.8.0_171]
> 	at org.infinispan.stream.impl.AbstractCacheStream.performOperationRehashAware(AbstractCacheStream.java:330) ~[classes/:?]
> 	at org.infinispan.stream.impl.AbstractCacheStream.performOperation(AbstractCacheStream.java:229) ~[classes/:?]
> 	at org.infinispan.stream.impl.DistributedCacheStream.anyMatch(DistributedCacheStream.java:406) ~[classes/:?]
> 	at org.infinispan.util.AbstractDelegatingCacheStream.anyMatch(AbstractDelegatingCacheStream.java:300) ~[classes/:?]
> 	at org.infinispan.CacheStream.anyMatch(CacheStream.java:462) ~[classes/:?]
> 	at org.infinispan.counter.impl.manager.CounterConfigurationManager.start(CounterConfigurationManager.java:90) ~[classes/:?]
> 	at org.infinispan.counter.impl.manager.EmbeddedCounterManager.start(EmbeddedCounterManager.java:89) ~[classes/:?]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.5.0#75005)


More information about the infinispan-issues mailing list