[Red Hat JIRA] (ISPN-11625) CLI commands to interact with Operator
by Tristan Tarrant (Jira)
[ https://issues.redhat.com/browse/ISPN-11625?page=com.atlassian.jira.plugi... ]
Tristan Tarrant updated ISPN-11625:
-----------------------------------
Fix Version/s: 12.1.0.Final
(was: 12.0.0.Final)
> CLI commands to interact with Operator
> --------------------------------------
>
> Key: ISPN-11625
> URL: https://issues.redhat.com/browse/ISPN-11625
> Project: Infinispan
> Issue Type: Feature Request
> Components: CLI, Operator
> Reporter: Tristan Tarrant
> Assignee: Tristan Tarrant
> Priority: Major
> Fix For: 12.1.0.Final
>
>
> The Infinispan Server CLI should be extended so that it can be used to control/configure the operator and custom resource definitions.
> The CLI should act as a `kubectl` plugin, i.e. installed as `kubectl-infinispan` so that it can be invoked as `kubectl infinispan ...` or `oc infinispan ...`
> The CLI should be able to perform the following operations:
> * install the operator CRD
> * create CRs (cache service, caches, etc)
> * transparently connect to a running cluster to perform interactive operations
--
This message was sent by Atlassian Jira
(v8.13.1#813001)
5 years, 3 months
[Red Hat JIRA] (ISPN-11605) Remove TopologyUpdateStableCommand
by Tristan Tarrant (Jira)
[ https://issues.redhat.com/browse/ISPN-11605?page=com.atlassian.jira.plugi... ]
Tristan Tarrant updated ISPN-11605:
-----------------------------------
Fix Version/s: 12.1.0.Final
(was: 12.0.0.Final)
> Remove TopologyUpdateStableCommand
> ----------------------------------
>
> Key: ISPN-11605
> URL: https://issues.redhat.com/browse/ISPN-11605
> Project: Infinispan
> Issue Type: Task
> Components: Core
> Affects Versions: 11.0.0.Dev03
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Labels: testsuite_stability
> Fix For: 12.1.0.Final
>
>
> Having a separate command for stable topologies means nodes may receive a topology update that should be stable without also receiving the confirmation that the topology is stable, complicating cluster recovery.
> As an example, when the coordinator node shuts down, it first installs a new cache topology without itself for all of the running caches.
> If the number of remaining owners is {{< numOwners}}, no rebalance is needed, and the old coordinator immediately sends a stable topology update as well.
> But any (stable) topology updates from the old coordinator not yet processed by the time of the {{CacheStatusRequestCommand}} from the new coordinator will be ignored.
> If the cluster had only 2 nodes, as in {{StatefulSetRollingUpgradeTest}} and {{StatefulSetRollingUpgradeIT}}, the stable topology update is vital to keep the cache in {{AVAILABLE}} mode, otherwise it goes into {{DEGRADED}} mode and no new nodes can join.
> {noformat}
> 19:48:14,671 TRACE (jgroups-4,Test-NodeE:[]) [JGroupsTransport] Test-NodeE received command from Test-NodeD: ConsistentHashUpdateCommand{cacheName='testCache', origin=null, currentCH=DefaultConsistentHash{ns=256, owners = (1)[Test-NodeE: 256+0]}, pendingCH=null, phase=NO_REBALANCE, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342], availabilityMode=AVAILABLE, rebalanceId=5, topologyId=27, viewId=7}
> 19:48:14,672 TRACE (jgroups-5,Test-NodeE:[]) [JGroupsTransport] Test-NodeE received command from Test-NodeD: StableTopologyUpdateCommand{cacheName='testCache', origin=null, currentCH=DefaultConsistentHash{ns=256, owners = (1)[Test-NodeE: 256+0]}, pendingCH=null, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342], rebalanceId=5, topologyId=27, viewId=7}
> 19:48:14,675 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[]) [LocalTopologyManagerImpl] Acquired cache status testCache
> 19:48:14,675 DEBUG (non-blocking-thread-Test-NodeE-p9541-t5:[]) [LocalTopologyManagerImpl] Updating local topology for cache testCache: CacheTopology{id=27, phase=NO_REBALANCE, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (1)[Test-NodeE: 256+0]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342]}
> 19:48:14,686 INFO (jgroups-5,Test-NodeE:[]) [CLUSTER] ISPN000094: Received new cluster view for channel org.infinispan.statetransfer.Test: [Test-NodeE|8] (1) [Test-NodeE]
> 19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[]) [LocalTopologyManagerImpl] Released cache status testCache
> 19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8]) [PreferConsistencyStrategy] Max stable partition topology: CacheTopology{id=25, phase=NO_REBALANCE, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeD: 129+127, Test-NodeE: 127+129]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeD, Test-NodeE], persistentUUIDs=[099401f9-c0b6-460d-8dc8-5051699e8287, 72b17309-62d1-4928-abf6-88a8606ef342]}
> 19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8]) [PreferConsistencyStrategy] Max active partition topology: CacheTopology{id=27, phase=NO_REBALANCE, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (1)[Test-NodeE: 256+0]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342]}
> 19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t3:[]) [LocalTopologyManagerImpl] Acquired cache status testCache
> 19:48:14,690 DEBUG (non-blocking-thread-Test-NodeE-p9541-t3:[]) [LocalTopologyManagerImpl] Ignoring topology 27 for cache testCache from old coordinator Test-NodeD
> 19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t3:[]) [LocalTopologyManagerImpl] Released cache status testCache
> 19:48:14,704 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8]) [ClusterCacheStatus] Cache testCache availability changed: AVAILABLE -> DEGRADED_MODE
> 19:48:25,194 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.statetransfer.StatefulSetRollingUpgradeTest.testStateTransferRestart[nodes=2]
> org.infinispan.commons.CacheException: Initial state transfer timed out for cache testCache on Test-NodeF
> at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:240) ~[classes/:?]
> at org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:1049) ~[classes/:?]
> at org.infinispan.cache.impl.AbstractDelegatingCache.start(AbstractDelegatingCache.java:513) ~[classes/:?]
> at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:692) ~[classes/:?]
> at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:631) ~[classes/:?]
> at org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:516) ~[classes/:?]
> at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:497) ~[classes/:?]
> at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:490) ~[classes/:?]
> at org.infinispan.test.MultipleCacheManagersTest.getCaches(MultipleCacheManagersTest.java:322) ~[test-classes/:?]
> at org.infinispan.test.MultipleCacheManagersTest.waitForClusterToForm(MultipleCacheManagersTest.java:329) ~[test-classes/:?]
> at org.infinispan.statetransfer.StatefulSetRollingUpgradeTest.testStateTransferRestart(StatefulSetRollingUpgradeTest.java:78) ~[test-classes/:?]
> {noformat}
--
This message was sent by Atlassian Jira
(v8.13.1#813001)
5 years, 3 months
[Red Hat JIRA] (ISPN-11634) Merge TopologyUpdateStableCommand and TopologyUpdateCommand
by Tristan Tarrant (Jira)
[ https://issues.redhat.com/browse/ISPN-11634?page=com.atlassian.jira.plugi... ]
Tristan Tarrant updated ISPN-11634:
-----------------------------------
Fix Version/s: 12.1.0.Final
(was: 12.0.0.Final)
> Merge TopologyUpdateStableCommand and TopologyUpdateCommand
> -----------------------------------------------------------
>
> Key: ISPN-11634
> URL: https://issues.redhat.com/browse/ISPN-11634
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core
> Affects Versions: 11.0.0.Dev04
> Reporter: Ryan Emerson
> Assignee: Ryan Emerson
> Priority: Major
> Fix For: 12.1.0.Final
>
>
> {code:java}
> 19:48:14,672 TRACE (testng-Test:[]) [TopologyManagementHelper] Attempting to execute command on self: StableTopologyUpdateCommand{cacheName='testCache', origin=Test-NodeD, currentCH=DefaultConsistentHash
> {ns=256, owners = (1)[Test-NodeE: 256+0]}
> , pendingCH=null, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342], rebalanceId=5, topologyId=27, viewId=7}
> 19:48:14,686 INFO (jgroups-5,Test-NodeE:[]) [CLUSTER] ISPN000094: Received new cluster view for channel org.infinispan.statetransfer.Test: [Test-NodeE|8] (1) [Test-NodeE]
> 19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8]) [PreferConsistencyStrategy] Max stable partition topology: CacheTopology{id=25, phase=NO_REBALANCE, rebalanceId=5, currentCH=DefaultConsistentHash
> {ns=256, owners = (2)[Test-NodeD: 129+127, Test-NodeE: 127+129]}
> , pendingCH=null, unionCH=null, actualMembers=[Test-NodeD, Test-NodeE], persistentUUIDs=[099401f9-c0b6-460d-8dc8-5051699e8287, 72b17309-62d1-4928-abf6-88a8606ef342]}
> 19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8]) [PreferConsistencyStrategy] Max active partition topology: CacheTopology{id=27, phase=NO_REBALANCE, rebalanceId=5, currentCH=DefaultConsistentHash
> {ns=256, owners = (1)[Test-NodeE: 256+0]}
> , pendingCH=null, unionCH=null, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342]}
> 19:48:14,690 DEBUG (non-blocking-thread-Test-NodeE-p9541-t3:[]) [LocalTopologyManagerImpl] Ignoring topology 27 for cache testCache from old coordinator Test-NodeD
> {code}
> The old coordinator (NodeD) sent the topology update and the stable topology update, and NodeE received both. However NodeE might ignores the stable topology update because it was processed after it got the new cluster view, with itself as the coordinator. The proper fix is to get rid of the StableTopologyUpdateCommand and add a stable=true/false field in CacheTopology or in TopologyUpdateCommand.
> There's still a small chance that the coordinator will shut down before sending the topology update command to any of the other nodes, but it's a small chance. It's much more likely that a separate stable topology update command will be ignored because it is queued by LocalTopologyManagerImpl after the regular topology update command, and the regular topology update has a lot of processing to do in StateConsumerImpl.
--
This message was sent by Atlassian Jira
(v8.13.1#813001)
5 years, 3 months