[infinispan-issues] [JBoss JIRA] (ISPN-9536) Distributed stream iteration timeout during join
Dan Berindei (JIRA)
issues at jboss.org
Mon Sep 24 06:53:00 EDT 2018
Dan Berindei created ISPN-9536:
----------------------------------
Summary: Distributed stream iteration timeout during join
Key: ISPN-9536
URL: https://issues.jboss.org/browse/ISPN-9536
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 9.4.0.CR3
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 9.4.0.Final
{{LocalStreamManagerImpl}} checks in several places that the cache status is {{RUNNING}}, and if not it suspects all local segments with this log message:
{noformat}
[LocalStreamManagerImpl] Cache status is no longer running, all segments are now suspect for Test-NodeD-257370
{noformat}
This check is incorrect, because a node may receive distributed streaming requests before it is running: the initial state transfer could finish before all the components in the cache finish starting.
If the check fails because the cache is still starting, the originator of the distributed stream iteration will assume that the remote cache has a newer topology. Because there is no newer topology, the originator blocks for {{DistributedCacheTimeout.timeout()}} and then fails.
{noformat}
10:48:06,137 DEBUG (transport-thread-Test-NodeE-p660-t3:[Topology-org.infinispan.CONFIG]) [LocalTopologyManagerImpl] Updating local topology for cache org.infinispan.CONFIG: CacheTopology{id=9, phase=NO_REBALANCE, rebalanceId=3, currentCH=ReplicatedConsistentHash{ns = 256, owners = (5)[Test-NodeC-45735: 55, Test-NodeA-61957: 55, Test-NodeE-27687: 52, Test-NodeB-13641: 45, Test-NodeD-25737: 49]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeC-45735, Test-NodeA-61957, Test-NodeE-27687, Test-NodeB-13641, Test-NodeD-25737], persistentUUIDs=[5fccba8a-802a-423a-8e3b-ec9c1892a5ae, 5d04de93-79ce-4eff-a827-2a4bbdf54080, 70b79733-dbbe-4622-a5f6-38d3ec7771ff, 87e8c232-21a0-4416-8c81-4829f4619414, 78146ad7-7594-4161-9a7e-c4da0cc79f42]}
10:48:06,156 TRACE (remote-thread-Test-NodeE-p651-t2:[]) [LocalStreamManagerImpl] Request Test-NodeD-257370 completed for segments {2 4 12 20 24 27 31 36 38 42 47 56 58 66 72 75 77 87-88 93 100-101 109 120 125 148 150 159 166 172 179-180 188 194-195 198 204-205 212 216 219-220 222 226 233 236-237 239 242-244 255} with {} suspected segments
10:48:06,157 TRACE (remote-thread-Test-NodeE-p651-t2:[]) [LocalStreamManagerImpl] Cache status is no longer running, all segments are now suspect for Test-NodeD-257370
10:48:06,158 TRACE (ForkThread-5,CounterConcurrentStartTest:[org.infinispan.CONFIG]) [DefaultCacheManager] Cache org.infinispan.CONFIG started
10:48:06,165 TRACE (remote-thread-Test-NodeD-p649-t2:[]) [ClusterStreamManagerImpl] Received response from Test-NodeE-27687 with a completed response true for id Test-NodeD-257370 with {2 4 12 20 24 27 31 36 38 42 47 56 58 66 72 75 77 87-88 93 100-101 109 120 125 148 150 159 166 172 179-180 188 194-195 198 204-205 212 216 219-220 222 226 233 236-237 239 242-244 255} suspected segments.
10:48:06,165 TRACE (ForkThread-9,CounterConcurrentStartTest:[]) [DistributedCacheStream] Found {2 4 12 20 24 27 31 36 38 42 47 56 58 66 72 75 77 87-88 93 100-101 109 120 125 148 150 159 166 172 179-180 188 194-195 198 204-205 212 216 219-220 222 226 233 236-237 239 242-244 255} lost segments for identifier Test-NodeD-257370
10:48:06,165 TRACE (ForkThread-9,CounterConcurrentStartTest:[]) [DistributedCacheStream] Waiting for topology 10 to continue stream operation with segments {2 4 12 20 24 27 31 36 38 42 47 56 58 66 72 75 77 87-88 93 100-101 109 120 125 148 150 159 166 172 179-180 188 194-195 198 204-205 212 216 219-220 222 226 233 236-237 239 242-244 255}
10:48:06,165 TRACE (ForkThread-9,CounterConcurrentStartTest:[]) [StateTransferLockImpl] Waiting for topology 10 to be installed, current topology is 9
10:48:36,168 TRACE (ForkThread-9,CounterConcurrentStartTest:[]) [BasicComponentRegistryImpl] Changed status of org.infinispan.counter.api.CounterManager to FAILED
org.infinispan.manager.EmbeddedCacheManagerStartupException: org.infinispan.commons.CacheException: Unable to invoke method public void org.infinispan.counter.impl.manager.EmbeddedCounterManager.start() on object of type EmbeddedCounterManager
...
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) ~[?:1.8.0_171]
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) ~[?:1.8.0_171]
at org.infinispan.stream.impl.AbstractCacheStream.performOperationRehashAware(AbstractCacheStream.java:330) ~[classes/:?]
at org.infinispan.stream.impl.AbstractCacheStream.performOperation(AbstractCacheStream.java:229) ~[classes/:?]
at org.infinispan.stream.impl.DistributedCacheStream.anyMatch(DistributedCacheStream.java:406) ~[classes/:?]
at org.infinispan.util.AbstractDelegatingCacheStream.anyMatch(AbstractDelegatingCacheStream.java:300) ~[classes/:?]
at org.infinispan.CacheStream.anyMatch(CacheStream.java:462) ~[classes/:?]
at org.infinispan.counter.impl.manager.CounterConfigurationManager.start(CounterConfigurationManager.java:90) ~[classes/:?]
at org.infinispan.counter.impl.manager.EmbeddedCounterManager.start(EmbeddedCounterManager.java:89) ~[classes/:?]
{noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
More information about the infinispan-issues
mailing list