[infinispan-issues] [JBoss JIRA] (ISPN-11605) Remove TopologyUpdateStableCommand

Tristan Tarrant (Jira) issues at jboss.org
Mon May 4 02:25:03 EDT 2020


     [ https://issues.redhat.com/browse/ISPN-11605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tristan Tarrant updated ISPN-11605:
-----------------------------------
    Fix Version/s: 11.0.0.CR1
                       (was: 11.0.0.Dev05)


> Remove TopologyUpdateStableCommand
> ----------------------------------
>
>                 Key: ISPN-11605
>                 URL: https://issues.redhat.com/browse/ISPN-11605
>             Project: Infinispan
>          Issue Type: Task
>          Components: Core
>    Affects Versions: 11.0.0.Dev03
>            Reporter: Dan Berindei
>            Assignee: Dan Berindei
>            Priority: Major
>              Labels: testsuite_stability
>             Fix For: 11.0.0.CR1
>
>
> Having a separate command for stable topologies means nodes may receive a topology update that should be stable without also receiving the confirmation that the topology is stable, complicating cluster recovery.
> As an example, when the coordinator node shuts down, it first installs a new cache topology without itself for all of the running caches.
> If the number of remaining owners is {{< numOwners}}, no rebalance is needed, and the old coordinator immediately sends a stable topology update as well.
> But any (stable) topology updates from the old coordinator not yet processed by the time of the {{CacheStatusRequestCommand}} from the new coordinator will be ignored.
> If the cluster had only 2 nodes, as in {{StatefulSetRollingUpgradeTest}} and {{StatefulSetRollingUpgradeIT}}, the stable topology update is vital to keep the cache in {{AVAILABLE}} mode, otherwise it goes into {{DEGRADED}} mode and no new nodes can join.
> {noformat}
> 19:48:14,671 TRACE (jgroups-4,Test-NodeE:[]) [JGroupsTransport] Test-NodeE received command from Test-NodeD: ConsistentHashUpdateCommand{cacheName='testCache', origin=null, currentCH=DefaultConsistentHash{ns=256, owners = (1)[Test-NodeE: 256+0]}, pendingCH=null, phase=NO_REBALANCE, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342], availabilityMode=AVAILABLE, rebalanceId=5, topologyId=27, viewId=7}
> 19:48:14,672 TRACE (jgroups-5,Test-NodeE:[]) [JGroupsTransport] Test-NodeE received command from Test-NodeD: StableTopologyUpdateCommand{cacheName='testCache', origin=null, currentCH=DefaultConsistentHash{ns=256, owners = (1)[Test-NodeE: 256+0]}, pendingCH=null, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342], rebalanceId=5, topologyId=27, viewId=7}
> 19:48:14,675 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[]) [LocalTopologyManagerImpl] Acquired cache status testCache
> 19:48:14,675 DEBUG (non-blocking-thread-Test-NodeE-p9541-t5:[]) [LocalTopologyManagerImpl] Updating local topology for cache testCache: CacheTopology{id=27, phase=NO_REBALANCE, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (1)[Test-NodeE: 256+0]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342]}
> 19:48:14,686 INFO  (jgroups-5,Test-NodeE:[]) [CLUSTER] ISPN000094: Received new cluster view for channel org.infinispan.statetransfer.Test: [Test-NodeE|8] (1) [Test-NodeE]
> 19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[]) [LocalTopologyManagerImpl] Released cache status testCache
> 19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8]) [PreferConsistencyStrategy] Max stable partition topology: CacheTopology{id=25, phase=NO_REBALANCE, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeD: 129+127, Test-NodeE: 127+129]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeD, Test-NodeE], persistentUUIDs=[099401f9-c0b6-460d-8dc8-5051699e8287, 72b17309-62d1-4928-abf6-88a8606ef342]}
> 19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8]) [PreferConsistencyStrategy] Max active partition topology: CacheTopology{id=27, phase=NO_REBALANCE, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (1)[Test-NodeE: 256+0]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342]}
> 19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t3:[]) [LocalTopologyManagerImpl] Acquired cache status testCache
> 19:48:14,690 DEBUG (non-blocking-thread-Test-NodeE-p9541-t3:[]) [LocalTopologyManagerImpl] Ignoring topology 27 for cache testCache from old coordinator Test-NodeD
> 19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t3:[]) [LocalTopologyManagerImpl] Released cache status testCache
> 19:48:14,704 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8]) [ClusterCacheStatus] Cache testCache availability changed: AVAILABLE -> DEGRADED_MODE
> 19:48:25,194 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.statetransfer.StatefulSetRollingUpgradeTest.testStateTransferRestart[nodes=2]
> org.infinispan.commons.CacheException: Initial state transfer timed out for cache testCache on Test-NodeF
> 	at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:240) ~[classes/:?]
> 	at org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:1049) ~[classes/:?]
> 	at org.infinispan.cache.impl.AbstractDelegatingCache.start(AbstractDelegatingCache.java:513) ~[classes/:?]
> 	at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:692) ~[classes/:?]
> 	at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:631) ~[classes/:?]
> 	at org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:516) ~[classes/:?]
> 	at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:497) ~[classes/:?]
> 	at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:490) ~[classes/:?]
> 	at org.infinispan.test.MultipleCacheManagersTest.getCaches(MultipleCacheManagersTest.java:322) ~[test-classes/:?]
> 	at org.infinispan.test.MultipleCacheManagersTest.waitForClusterToForm(MultipleCacheManagersTest.java:329) ~[test-classes/:?]
> 	at org.infinispan.statetransfer.StatefulSetRollingUpgradeTest.testStateTransferRestart(StatefulSetRollingUpgradeTest.java:78) ~[test-classes/:?]
> {noformat}



--
This message was sent by Atlassian Jira
(v7.13.8#713008)


More information about the infinispan-issues mailing list