]
Tristan Tarrant updated ISPN-11605:
-----------------------------------
Fix Version/s: 11.0.0.Final
(was: 11.0.0.CR1)
Remove TopologyUpdateStableCommand
----------------------------------
Key: ISPN-11605
URL:
https://issues.redhat.com/browse/ISPN-11605
Project: Infinispan
Issue Type: Task
Components: Core
Affects Versions: 11.0.0.Dev03
Reporter: Dan Berindei
Assignee: Dan Berindei
Priority: Major
Labels: testsuite_stability
Fix For: 11.0.0.Final
Having a separate command for stable topologies means nodes may receive a topology update
that should be stable without also receiving the confirmation that the topology is stable,
complicating cluster recovery.
As an example, when the coordinator node shuts down, it first installs a new cache
topology without itself for all of the running caches.
If the number of remaining owners is {{< numOwners}}, no rebalance is needed, and the
old coordinator immediately sends a stable topology update as well.
But any (stable) topology updates from the old coordinator not yet processed by the time
of the {{CacheStatusRequestCommand}} from the new coordinator will be ignored.
If the cluster had only 2 nodes, as in {{StatefulSetRollingUpgradeTest}} and
{{StatefulSetRollingUpgradeIT}}, the stable topology update is vital to keep the cache in
{{AVAILABLE}} mode, otherwise it goes into {{DEGRADED}} mode and no new nodes can join.
{noformat}
19:48:14,671 TRACE (jgroups-4,Test-NodeE:[]) [JGroupsTransport] Test-NodeE received
command from Test-NodeD: ConsistentHashUpdateCommand{cacheName='testCache',
origin=null, currentCH=DefaultConsistentHash{ns=256, owners = (1)[Test-NodeE: 256+0]},
pendingCH=null, phase=NO_REBALANCE, actualMembers=[Test-NodeE],
persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342], availabilityMode=AVAILABLE,
rebalanceId=5, topologyId=27, viewId=7}
19:48:14,672 TRACE (jgroups-5,Test-NodeE:[]) [JGroupsTransport] Test-NodeE received
command from Test-NodeD: StableTopologyUpdateCommand{cacheName='testCache',
origin=null, currentCH=DefaultConsistentHash{ns=256, owners = (1)[Test-NodeE: 256+0]},
pendingCH=null, actualMembers=[Test-NodeE],
persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342], rebalanceId=5, topologyId=27,
viewId=7}
19:48:14,675 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[])
[LocalTopologyManagerImpl] Acquired cache status testCache
19:48:14,675 DEBUG (non-blocking-thread-Test-NodeE-p9541-t5:[])
[LocalTopologyManagerImpl] Updating local topology for cache testCache:
CacheTopology{id=27, phase=NO_REBALANCE, rebalanceId=5,
currentCH=DefaultConsistentHash{ns=256, owners = (1)[Test-NodeE: 256+0]}, pendingCH=null,
unionCH=null, actualMembers=[Test-NodeE],
persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342]}
19:48:14,686 INFO (jgroups-5,Test-NodeE:[]) [CLUSTER] ISPN000094: Received new cluster
view for channel org.infinispan.statetransfer.Test: [Test-NodeE|8] (1) [Test-NodeE]
19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[])
[LocalTopologyManagerImpl] Released cache status testCache
19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8])
[PreferConsistencyStrategy] Max stable partition topology: CacheTopology{id=25,
phase=NO_REBALANCE, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners =
(2)[Test-NodeD: 129+127, Test-NodeE: 127+129]}, pendingCH=null, unionCH=null,
actualMembers=[Test-NodeD, Test-NodeE],
persistentUUIDs=[099401f9-c0b6-460d-8dc8-5051699e8287,
72b17309-62d1-4928-abf6-88a8606ef342]}
19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8])
[PreferConsistencyStrategy] Max active partition topology: CacheTopology{id=27,
phase=NO_REBALANCE, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners =
(1)[Test-NodeE: 256+0]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeE],
persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342]}
19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t3:[])
[LocalTopologyManagerImpl] Acquired cache status testCache
19:48:14,690 DEBUG (non-blocking-thread-Test-NodeE-p9541-t3:[])
[LocalTopologyManagerImpl] Ignoring topology 27 for cache testCache from old coordinator
Test-NodeD
19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t3:[])
[LocalTopologyManagerImpl] Released cache status testCache
19:48:14,704 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8])
[ClusterCacheStatus] Cache testCache availability changed: AVAILABLE -> DEGRADED_MODE
19:48:25,194 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed:
org.infinispan.statetransfer.StatefulSetRollingUpgradeTest.testStateTransferRestart[nodes=2]
org.infinispan.commons.CacheException: Initial state transfer timed out for cache
testCache on Test-NodeF
at
org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:240)
~[classes/:?]
at org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:1049) ~[classes/:?]
at
org.infinispan.cache.impl.AbstractDelegatingCache.start(AbstractDelegatingCache.java:513)
~[classes/:?]
at
org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:692)
~[classes/:?]
at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:631)
~[classes/:?]
at
org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:516)
~[classes/:?]
at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:497)
~[classes/:?]
at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:490)
~[classes/:?]
at
org.infinispan.test.MultipleCacheManagersTest.getCaches(MultipleCacheManagersTest.java:322)
~[test-classes/:?]
at
org.infinispan.test.MultipleCacheManagersTest.waitForClusterToForm(MultipleCacheManagersTest.java:329)
~[test-classes/:?]
at
org.infinispan.statetransfer.StatefulSetRollingUpgradeTest.testStateTransferRestart(StatefulSetRollingUpgradeTest.java:78)
~[test-classes/:?]
{noformat}