[infinispan-issues] [JBoss JIRA] (ISPN-11634) Merge StableTopologyUpdateCommand and TopologyUpdateCommand

Tue Apr 14 08:54:32 EDT 2020

Ryan Emerson created ISPN-11634:
-----------------------------------

             Summary: Merge StableTopologyUpdateCommand and TopologyUpdateCommand
                 Key: ISPN-11634
                 URL: https://issues.redhat.com/browse/ISPN-11634
             Project: Infinispan
          Issue Type: Enhancement
          Components: Core
    Affects Versions: 11.0.0.Dev04
            Reporter: Ryan Emerson
            Assignee: Ryan Emerson
             Fix For: 11.0.0.Dev05

{code:java}
19:48:14,672 TRACE (testng-Test:[]) [TopologyManagementHelper] Attempting to execute command on self: StableTopologyUpdateCommand{cacheName='testCache', origin=Test-NodeD, currentCH=DefaultConsistentHash

{ns=256, owners = (1)[Test-NodeE: 256+0]}
, pendingCH=null, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342], rebalanceId=5, topologyId=27, viewId=7}
19:48:14,686 INFO (jgroups-5,Test-NodeE:[]) [CLUSTER] ISPN000094: Received new cluster view for channel org.infinispan.statetransfer.Test: [Test-NodeE|8] (1) [Test-NodeE]
19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8]) [PreferConsistencyStrategy] Max stable partition topology: CacheTopology{id=25, phase=NO_REBALANCE, rebalanceId=5, currentCH=DefaultConsistentHash

{ns=256, owners = (2)[Test-NodeD: 129+127, Test-NodeE: 127+129]}
, pendingCH=null, unionCH=null, actualMembers=[Test-NodeD, Test-NodeE], persistentUUIDs=[099401f9-c0b6-460d-8dc8-5051699e8287, 72b17309-62d1-4928-abf6-88a8606ef342]}
19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8]) [PreferConsistencyStrategy] Max active partition topology: CacheTopology{id=27, phase=NO_REBALANCE, rebalanceId=5, currentCH=DefaultConsistentHash

{ns=256, owners = (1)[Test-NodeE: 256+0]}
, pendingCH=null, unionCH=null, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342]}

19:48:14,690 DEBUG (non-blocking-thread-Test-NodeE-p9541-t3:[]) [LocalTopologyManagerImpl] Ignoring topology 27 for cache testCache from old coordinator Test-NodeD
{code}

The old coordinator (NodeD) sent the topology update and the stable topology update, and NodeE received both. However NodeE might ignores the stable topology update because it was processed after it got the new cluster view, with itself as the coordinator. The proper fix is to get rid of the StableTopologyUpdateCommand and add a stable=true/false field in CacheTopology or in TopologyUpdateCommand.

There's still a small chance that the coordinator will shut down before sending the topology update command to any of the other nodes, but it's a small chance. It's much more likely that a separate stable topology update command will be ignored because it is queued by LocalTopologyManagerImpl after the regular topology update command, and the regular topology update has a lot of processing to do in StateConsumerImpl.

--
This message was sent by Atlassian Jira
(v7.13.8#713008)