]
Dan Berindei updated ISPN-9536:
-------------------------------
Sprint: Sprint 9.4.0.Final
Distributed stream iteration timeout during join
------------------------------------------------
Key: ISPN-9536
URL:
https://issues.jboss.org/browse/ISPN-9536
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 9.4.0.CR3
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 9.4.0.Final
{{LocalStreamManagerImpl}} checks in several places that the cache status is {{RUNNING}},
and if not it suspects all local segments with this log message:
{noformat}
[LocalStreamManagerImpl] Cache status is no longer running, all segments are now suspect
for Test-NodeD-257370
{noformat}
This check is incorrect, because a node may receive distributed streaming requests before
it is running: the initial state transfer could finish before all the components in the
cache finish starting.
If the check fails because the cache is still starting, the originator of the distributed
stream iteration will assume that the remote cache has a newer topology. Because there is
no newer topology, the originator blocks for {{DistributedCacheTimeout.timeout()}} and
then fails.
{noformat}
10:48:06,137 DEBUG (transport-thread-Test-NodeE-p660-t3:[Topology-org.infinispan.CONFIG])
[LocalTopologyManagerImpl] Updating local topology for cache org.infinispan.CONFIG:
CacheTopology{id=9, phase=NO_REBALANCE, rebalanceId=3,
currentCH=ReplicatedConsistentHash{ns = 256, owners = (5)[Test-NodeC-45735: 55,
Test-NodeA-61957: 55, Test-NodeE-27687: 52, Test-NodeB-13641: 45, Test-NodeD-25737: 49]},
pendingCH=null, unionCH=null, actualMembers=[Test-NodeC-45735, Test-NodeA-61957,
Test-NodeE-27687, Test-NodeB-13641, Test-NodeD-25737],
persistentUUIDs=[5fccba8a-802a-423a-8e3b-ec9c1892a5ae,
5d04de93-79ce-4eff-a827-2a4bbdf54080, 70b79733-dbbe-4622-a5f6-38d3ec7771ff,
87e8c232-21a0-4416-8c81-4829f4619414, 78146ad7-7594-4161-9a7e-c4da0cc79f42]}
10:48:06,156 TRACE (remote-thread-Test-NodeE-p651-t2:[]) [LocalStreamManagerImpl] Request
Test-NodeD-257370 completed for segments {2 4 12 20 24 27 31 36 38 42 47 56 58 66 72 75 77
87-88 93 100-101 109 120 125 148 150 159 166 172 179-180 188 194-195 198 204-205 212 216
219-220 222 226 233 236-237 239 242-244 255} with {} suspected segments
10:48:06,157 TRACE (remote-thread-Test-NodeE-p651-t2:[]) [LocalStreamManagerImpl] Cache
status is no longer running, all segments are now suspect for Test-NodeD-257370
10:48:06,158 TRACE (ForkThread-5,CounterConcurrentStartTest:[org.infinispan.CONFIG])
[DefaultCacheManager] Cache org.infinispan.CONFIG started
10:48:06,165 TRACE (remote-thread-Test-NodeD-p649-t2:[]) [ClusterStreamManagerImpl]
Received response from Test-NodeE-27687 with a completed response true for id
Test-NodeD-257370 with {2 4 12 20 24 27 31 36 38 42 47 56 58 66 72 75 77 87-88 93 100-101
109 120 125 148 150 159 166 172 179-180 188 194-195 198 204-205 212 216 219-220 222 226
233 236-237 239 242-244 255} suspected segments.
10:48:06,165 TRACE (ForkThread-9,CounterConcurrentStartTest:[]) [DistributedCacheStream]
Found {2 4 12 20 24 27 31 36 38 42 47 56 58 66 72 75 77 87-88 93 100-101 109 120 125 148
150 159 166 172 179-180 188 194-195 198 204-205 212 216 219-220 222 226 233 236-237 239
242-244 255} lost segments for identifier Test-NodeD-257370
10:48:06,165 TRACE (ForkThread-9,CounterConcurrentStartTest:[]) [DistributedCacheStream]
Waiting for topology 10 to continue stream operation with segments {2 4 12 20 24 27 31 36
38 42 47 56 58 66 72 75 77 87-88 93 100-101 109 120 125 148 150 159 166 172 179-180 188
194-195 198 204-205 212 216 219-220 222 226 233 236-237 239 242-244 255}
10:48:06,165 TRACE (ForkThread-9,CounterConcurrentStartTest:[]) [StateTransferLockImpl]
Waiting for topology 10 to be installed, current topology is 9
10:48:36,168 TRACE (ForkThread-9,CounterConcurrentStartTest:[])
[BasicComponentRegistryImpl] Changed status of org.infinispan.counter.api.CounterManager
to FAILED
org.infinispan.manager.EmbeddedCacheManagerStartupException:
org.infinispan.commons.CacheException: Unable to invoke method public void
org.infinispan.counter.impl.manager.EmbeddedCounterManager.start() on object of type
EmbeddedCounterManager
...
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
~[?:1.8.0_171]
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
~[?:1.8.0_171]
at
org.infinispan.stream.impl.AbstractCacheStream.performOperationRehashAware(AbstractCacheStream.java:330)
~[classes/:?]
at
org.infinispan.stream.impl.AbstractCacheStream.performOperation(AbstractCacheStream.java:229)
~[classes/:?]
at
org.infinispan.stream.impl.DistributedCacheStream.anyMatch(DistributedCacheStream.java:406)
~[classes/:?]
at
org.infinispan.util.AbstractDelegatingCacheStream.anyMatch(AbstractDelegatingCacheStream.java:300)
~[classes/:?]
at org.infinispan.CacheStream.anyMatch(CacheStream.java:462) ~[classes/:?]
at
org.infinispan.counter.impl.manager.CounterConfigurationManager.start(CounterConfigurationManager.java:90)
~[classes/:?]
at
org.infinispan.counter.impl.manager.EmbeddedCounterManager.start(EmbeddedCounterManager.java:89)
~[classes/:?]
{noformat}