[
https://issues.jboss.org/browse/ISPN-8731?page=com.atlassian.jira.plugin....
]
Dan Berindei edited comment on ISPN-8731 at 5/14/18 3:08 AM:
-------------------------------------------------------------
The failures happen because the {{DistributionManagerImpl}} topology is updated after the
{{StateConsumerImpl}} topology (ISPN-7682).
{{BaseDistributionInterceptor.checkTopologyId()}} doesn't expect the command topology
id (5) to be higher than the DM's current topology id, so it throws an
{{OutdatedTopologyException}} without a topology id.
{{StateTransferInterceptor.handleNonTxWriteCommand()}} then increments the command's
topology id and times out waiting for topology 6.
{noformat}
09:47:27,497 TRACE (transport-thread-test-NodeA-p106-t6:[Topology-TestCache])
[StateConsumerImpl] Received new topology for cache TestCache, isRebalance = false,
isMember = true, topology = CacheTopology{id=5, phase=NO_REBALANCE, rebalanceId=2,
currentCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners =
(2)[test-NodeA-14300 (flags=0), site-id=s1, rack-id=r1, machine-id=m1): 134+122,
test-NodeB-59947 (flags=0), site-id=s2, rack-id=r2, machine-id=m2): 122+134]},
pendingCH=null, unionCH=null, actualMembers=[test-NodeA-14300 (flags=0), site-id=s1,
rack-id=r1, machine-id=m1), test-NodeB-59947 (flags=0), site-id=s2, rack-id=r2,
machine-id=m2)], persistentUUIDs=[1ffbf082-035b-432d-8715-197ab9fbcab2,
7b2ba190-f559-4dc1-b874-5c43c3f4ce29]}
09:47:27,524 TRACE (testng-test:[]) [StateTransferInterceptor] handleNonTxWriteCommand for
command PutKeyValueCommand{key=key1, value=value1, flags=[],
commandInvocationId=CommandInvocation:test-NodeA-14300 (flags=0), site-id=s1, rack-id=r1,
machine-id=m1):36, putIfAbsent=false, valueMatcher=MATCH_ALWAYS,
metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null},
successful=true, topologyId=-1}, topology id -1
09:47:27,524 TRACE (testng-test:[]) [StateTransferInterceptor] Setting command topology to
5
09:47:27,524 TRACE (testng-test:[]) [BaseDistributionInterceptor] Current topology 4,
command topology 5
09:47:27,526 TRACE (testng-test:[]) [StateTransferInterceptor] Retrying command because of
org.infinispan.statetransfer.OutdatedTopologyException: Cache topology changed while the
command was executing: expected 5, got 4, current topology is 5 (requested: 6):
PutKeyValueCommand{key=key1, value=value1, flags=[],
commandInvocationId=CommandInvocation:test-NodeA-14300 (flags=0), site-id=s1, rack-id=r1,
machine-id=m1):36, putIfAbsent=false, valueMatcher=MATCH_ALWAYS,
metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null},
successful=true, topologyId=5}
09:47:27,526 TRACE (testng-test:[]) [StateTransferLockImpl] Waiting for transaction data
for topology 6, current topology is 4
09:47:27,545 TRACE (transport-thread-test-NodeA-p106-t6:[Topology-TestCache])
[StateTransferLockImpl] Signalling topology 5 is installed
09:47:42,549 ERROR (testng-test:[]) [TestSuiteProgress] Test failed:
org.infinispan.distexec.DistributedExecutorExecutionPolicyTest.testExecutionPolicyNotSameMachine
org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 6
at
org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invoke(AsyncInterceptorChainImpl.java:259)
~[classes/:?]
at
org.infinispan.cache.impl.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1686)
~[classes/:?]
at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1349) ~[classes/:?]
at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1812) ~[classes/:?]
at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:223) ~[classes/:?]
at
org.infinispan.cache.impl.AbstractDelegatingCache.put(AbstractDelegatingCache.java:358)
~[classes/:?]
at org.infinispan.cache.impl.EncoderCache.put(EncoderCache.java:673) ~[classes/:?]
at
org.infinispan.distexec.DistributedExecutorExecutionPolicyTest.executeDifferentExecutionPolicies(DistributedExecutorExecutionPolicyTest.java:115)
~[test-classes/:?]
at
org.infinispan.distexec.DistributedExecutorExecutionPolicyTest.testExecutionPolicyNotSameMachine(DistributedExecutorExecutionPolicyTest.java:59)
~[test-classes/:?]
{noformat}
was (Author: dan.berindei):
The failures happen because the {{DistributionManagerImpl}} topology is updated after the
{{StateConsumerImpl}} topology (ISPN-7682).
{noformat}
09:47:27,497 TRACE (transport-thread-test-NodeA-p106-t6:[Topology-TestCache])
[StateConsumerImpl] Received new topology for cache TestCache, isRebalance = false,
isMember = true, topology = CacheTopology{id=5, phase=NO_REBALANCE, rebalanceId=2,
currentCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners =
(2)[test-NodeA-14300 (flags=0), site-id=s1, rack-id=r1, machine-id=m1): 134+122,
test-NodeB-59947 (flags=0), site-id=s2, rack-id=r2, machine-id=m2): 122+134]},
pendingCH=null, unionCH=null, actualMembers=[test-NodeA-14300 (flags=0), site-id=s1,
rack-id=r1, machine-id=m1), test-NodeB-59947 (flags=0), site-id=s2, rack-id=r2,
machine-id=m2)], persistentUUIDs=[1ffbf082-035b-432d-8715-197ab9fbcab2,
7b2ba190-f559-4dc1-b874-5c43c3f4ce29]}
09:47:27,524 TRACE (testng-test:[]) [StateTransferInterceptor] handleNonTxWriteCommand for
command PutKeyValueCommand{key=key1, value=value1, flags=[],
commandInvocationId=CommandInvocation:test-NodeA-14300 (flags=0), site-id=s1, rack-id=r1,
machine-id=m1):36, putIfAbsent=false, valueMatcher=MATCH_ALWAYS,
metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null},
successful=true, topologyId=-1}, topology id -1
09:47:27,524 TRACE (testng-test:[]) [StateTransferInterceptor] Setting command topology to
5
09:47:27,524 TRACE (testng-test:[]) [BaseDistributionInterceptor] Current topology 4,
command topology 5
09:47:27,526 TRACE (testng-test:[]) [StateTransferInterceptor] Retrying command because of
org.infinispan.statetransfer.OutdatedTopologyException: Cache topology changed while the
command was executing: expected 5, got 4, current topology is 5 (requested: 6):
PutKeyValueCommand{key=key1, value=value1, flags=[],
commandInvocationId=CommandInvocation:test-NodeA-14300 (flags=0), site-id=s1, rack-id=r1,
machine-id=m1):36, putIfAbsent=false, valueMatcher=MATCH_ALWAYS,
metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null},
successful=true, topologyId=5}
09:47:27,526 TRACE (testng-test:[]) [StateTransferLockImpl] Waiting for transaction data
for topology 6, current topology is 4
09:47:27,545 TRACE (transport-thread-test-NodeA-p106-t6:[Topology-TestCache])
[StateTransferLockImpl] Signalling topology 5 is installed
09:47:42,549 ERROR (testng-test:[]) [TestSuiteProgress] Test failed:
org.infinispan.distexec.DistributedExecutorExecutionPolicyTest.testExecutionPolicyNotSameMachine
org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 6
at
org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invoke(AsyncInterceptorChainImpl.java:259)
~[classes/:?]
at
org.infinispan.cache.impl.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1686)
~[classes/:?]
at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1349) ~[classes/:?]
at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1812) ~[classes/:?]
at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:223) ~[classes/:?]
at
org.infinispan.cache.impl.AbstractDelegatingCache.put(AbstractDelegatingCache.java:358)
~[classes/:?]
at org.infinispan.cache.impl.EncoderCache.put(EncoderCache.java:673) ~[classes/:?]
at
org.infinispan.distexec.DistributedExecutorExecutionPolicyTest.executeDifferentExecutionPolicies(DistributedExecutorExecutionPolicyTest.java:115)
~[test-classes/:?]
at
org.infinispan.distexec.DistributedExecutorExecutionPolicyTest.testExecutionPolicyNotSameMachine(DistributedExecutorExecutionPolicyTest.java:59)
~[test-classes/:?]
{noformat}
Write command times out waiting for wrong topology
--------------------------------------------------
Key: ISPN-8731
URL:
https://issues.jboss.org/browse/ISPN-8731
Project: Infinispan
Issue Type: Bug
Components: Core, Test Suite - Query, Test Suite - Server
Affects Versions: 9.2.0.CR1
Reporter: Dan Berindei
Assignee: Dan Berindei
Priority: Critical
Labels: testsuite_stability
Fix For: 9.3.0.Final
{{MultipleCacheManagersTest.waitForClusterToForm(new String[]{})}} doesn't actually
wait for the rebalance to finish everywhere:
https://github.com/infinispan/infinispan/pull/5705/files#diff-f9f535214b9...
Most tests should work regardless of whether we wait for the rebalance to finish or not,
it's just a way to reduce the number of test failures when the command retry
doesn't work. Waiting for the rebalance to finish is only required when the test needs
to install new topologies in a specific order.
However, some recent test failures in CI show that the effectiveness of
{{MultipleCacheManagersTest.waitForClusterToForm(new String[]{})}} does make a difference.
Most likely there is a bug in the retry logic during the latter phases of a rebalance, and
the retry-specific tests do not cover all the scenarios.
{noformat}
00:23:20,668 DEBUG (remote-thread-ClusteredScriptingTest-NodeI-p218-t5:[])
[ClusterTopologyManagerImpl] Updating cluster-wide current topology for cache
___defaultcache, topology = CacheTopology{id=5, phase=NO_REBALANCE, rebalanceId=2,
currentCH=ReplicatedConsistentHash{ns = 256, owners =
(2)[ClusteredScriptingTest-NodeI-40041: 134, ClusteredScriptingTest-NodeJ-9982: 122]},
pendingCH=null, unionCH=null, actualMembers=[ClusteredScriptingTest-NodeI-40041,
ClusteredScriptingTest-NodeJ-9982], persistentUUIDs=[91a47fef-fad6-479c-9ec9-12e9350dc6d5,
d2200930-5793-4b74-beea-e001a2b414c1]}, availability mode = AVAILABLE
00:23:20,669 DEBUG (remote-thread-ClusteredScriptingTest-NodeI-p218-t5:[])
[ClusterTopologyManagerImpl] Updating cluster-wide stable topology for cache
___defaultcache, topology = CacheTopology{id=5, phase=NO_REBALANCE, rebalanceId=2,
currentCH=ReplicatedConsistentHash{ns = 256, owners =
(2)[ClusteredScriptingTest-NodeI-40041: 134, ClusteredScriptingTest-NodeJ-9982: 122]},
pendingCH=null, unionCH=null, actualMembers=[ClusteredScriptingTest-NodeI-40041,
ClusteredScriptingTest-NodeJ-9982], persistentUUIDs=[91a47fef-fad6-479c-9ec9-12e9350dc6d5,
d2200930-5793-4b74-beea-e001a2b414c1]}
00:23:20,669 DEBUG
(transport-thread-ClusteredScriptingTest-NodeI-p220-t6:[Topology-___defaultcache])
[LocalTopologyManagerImpl] Updating local topology for cache ___defaultcache:
CacheTopology{id=5, phase=NO_REBALANCE, rebalanceId=2,
currentCH=ReplicatedConsistentHash{ns = 256, owners =
(2)[ClusteredScriptingTest-NodeI-40041: 134, ClusteredScriptingTest-NodeJ-9982: 122]},
pendingCH=null, unionCH=null, actualMembers=[ClusteredScriptingTest-NodeI-40041,
ClusteredScriptingTest-NodeJ-9982], persistentUUIDs=[91a47fef-fad6-479c-9ec9-12e9350dc6d5,
d2200930-5793-4b74-beea-e001a2b414c1]}
00:23:20,669 DEBUG
(transport-thread-ClusteredScriptingTest-NodeJ-p234-t6:[Topology-___defaultcache])
[LocalTopologyManagerImpl] Updating local topology for cache ___defaultcache:
CacheTopology{id=5, phase=NO_REBALANCE, rebalanceId=2,
currentCH=ReplicatedConsistentHash{ns = 256, owners =
(2)[ClusteredScriptingTest-NodeI-40041: 134, ClusteredScriptingTest-NodeJ-9982: 122]},
pendingCH=null, unionCH=null, actualMembers=[ClusteredScriptingTest-NodeI-40041,
ClusteredScriptingTest-NodeJ-9982], persistentUUIDs=[91a47fef-fad6-479c-9ec9-12e9350dc6d5,
d2200930-5793-4b74-beea-e001a2b414c1]}
00:23:35,673 ERROR (timeout-thread-ClusteredScriptingTest-NodeI-p219-t1:[])
[InvocationContextInterceptor] ISPN000136: Error executing command PutKeyValueCommand,
writing keys [/macbeth.txt0]
org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 6
at
org.infinispan.interceptors.impl.BaseStateTransferInterceptor$CancellableRetry.run(BaseStateTransferInterceptor.java:333)
[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
00:23:35,673 DEBUG (testng-ClusteredScriptingTest:[]) [DefaultCacheManager] Stopping
cache manager ISPN on ClusteredScriptingTest-NodeJ-9982
00:23:35,682 DEBUG (remote-thread-ClusteredScriptingTest-NodeI-p218-t5:[])
[ClusterTopologyManagerImpl] Updating cluster-wide current topology for cache
___defaultcache, topology = CacheTopology{id=6, phase=NO_REBALANCE, rebalanceId=2,
currentCH=ReplicatedConsistentHash{ns = 256, owners =
(1)[ClusteredScriptingTest-NodeI-40041: 256]}, pendingCH=null, unionCH=null,
actualMembers=[ClusteredScriptingTest-NodeI-40041],
persistentUUIDs=[91a47fef-fad6-479c-9ec9-12e9350dc6d5]}, availability mode = AVAILABLE
00:23:35,699 ERROR (testng-ClusteredScriptingTest:[]) [TestSuiteProgress] Test failed:
org.infinispan.scripting.ClusteredScriptingTest.testDistributedMapReduceStreamWithFlag([REPL_SYNC])
org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 6
at
org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invoke(AsyncInterceptorChainImpl.java:259)
~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at
org.infinispan.cache.impl.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1636)
~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1284)
~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1750)
~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:217)
~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at
org.infinispan.cache.impl.AbstractDelegatingCache.put(AbstractDelegatingCache.java:358)
~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.cache.impl.EncoderCache.put(EncoderCache.java:671)
~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.scripting.utils.ScriptingUtils.loadData(ScriptingUtils.java:29)
~[test-classes/:?]
at
org.infinispan.scripting.ClusteredScriptingTest$6.call(ClusteredScriptingTest.java:144)
~[test-classes/:?]
at org.infinispan.test.TestingUtil.withCacheManagers(TestingUtil.java:1522)
~[infinispan-core-9.2.0-SNAPSHOT-tests.jar:9.2.0-SNAPSHOT]
at
org.infinispan.scripting.ClusteredScriptingTest.testDistributedMapReduceStreamWithFlag(ClusteredScriptingTest.java:137)
~[test-classes/:?]
{noformat}
{noformat}
18:04:17,097 DEBUG
(transport-thread-ClusteredCacheTest-NodeB-p3689-t1:[Topology-___defaultcache])
[LocalTopologyManagerImpl] Updating local topology for cache ___defaultcache:
CacheTopology{id=5, phase=NO_REBALANCE, rebalanceId=2,
currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[ClusteredCacheTest-NodeA-53655:
134, ClusteredCacheTest-NodeB-27398: 122]}, pendingCH=null, unionCH=null,
actualMembers=[ClusteredCacheTest-NodeA-53655, ClusteredCacheTest-NodeB-27398],
persistentUUIDs=[5f5b2dd6-c570-4c6c-9c2d-92aa5fcd2362,
b623ac7a-3c77-471d-88de-e151da823c0c]}
18:04:17,097 INFO (testng-ClusteredCacheTest:[]) [TestSuiteProgress] Test starting:
org.infinispan.query.blackbox.ClusteredCacheTest.testConditionalRemoveFromNonOwner
18:04:17,097 DEBUG
(transport-thread-ClusteredCacheTest-NodeA-p3679-t5:[Topology-___defaultcache])
[LocalTopologyManagerImpl] Updating local topology for cache ___defaultcache:
CacheTopology{id=5, phase=NO_REBALANCE, rebalanceId=2,
currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[ClusteredCacheTest-NodeA-53655:
134, ClusteredCacheTest-NodeB-27398: 122]}, pendingCH=null, unionCH=null,
actualMembers=[ClusteredCacheTest-NodeA-53655, ClusteredCacheTest-NodeB-27398],
persistentUUIDs=[5f5b2dd6-c570-4c6c-9c2d-92aa5fcd2362,
b623ac7a-3c77-471d-88de-e151da823c0c]}
18:04:32,100 ERROR (timeout-thread-ClusteredCacheTest-NodeA-p3678-t1:[])
[InvocationContextInterceptor] ISPN000136: Error executing command PutKeyValueCommand,
writing keys [WrappedByteArray{bytes=[B0x010201054E617669..[9], hashCode=-1707624030}]
org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 6
at
org.infinispan.interceptors.impl.BaseStateTransferInterceptor$CancellableRetry.run(BaseStateTransferInterceptor.java:333)
[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
18:04:32,101 ERROR (testng-ClusteredCacheTest:[]) [TestSuiteProgress] Test failed:
org.infinispan.query.blackbox.ClusteredCacheTest.testConditionalRemoveFromNonOwner
org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 6
at
org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invoke(AsyncInterceptorChainImpl.java:259)
~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at
org.infinispan.cache.impl.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1636)
~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1284)
~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1750)
~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:217)
~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at
org.infinispan.cache.impl.AbstractDelegatingCache.put(AbstractDelegatingCache.java:358)
~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at org.infinispan.cache.impl.EncoderCache.put(EncoderCache.java:671)
~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
at
org.infinispan.query.blackbox.ClusteredCacheTest.prepareTestData(ClusteredCacheTest.java:140)
~[test-classes/:?]
at
org.infinispan.query.blackbox.ClusteredCacheTest.testConditionalRemoveFrom(ClusteredCacheTest.java:334)
~[test-classes/:?]
at
org.infinispan.query.blackbox.ClusteredCacheTest.testConditionalRemoveFromNonOwner(ClusteredCacheTest.java:298)
~[test-classes/:?]
18:04:32,214 DEBUG (testng-ClusteredCacheTest:[]) [DefaultCacheManager] Stopping cache
manager ISPN on ClusteredCacheTest-NodeB-27398
18:04:32,479 DEBUG (remote-thread-ClusteredCacheTest-NodeA-p3677-t6:[])
[ClusterTopologyManagerImpl] Updating cluster-wide current topology for cache
___defaultcache, topology = CacheTopology{id=6, phase=NO_REBALANCE, rebalanceId=2,
currentCH=ReplicatedConsistentHash{ns = 256, owners = (1)[ClusteredCacheTest-NodeA-53655:
256]}, pendingCH=null, unionCH=null, actualMembers=[ClusteredCacheTest-NodeA-53655],
persistentUUIDs=[5f5b2dd6-c570-4c6c-9c2d-92aa5fcd2362]}, availability mode = AVAILABLE
{noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)