]
Tristan Tarrant updated ISPN-11634:
-----------------------------------
Fix Version/s: 11.0.0.Final
(was: 11.0.0.CR1)
Merge TopologyUpdateStableCommand and TopologyUpdateCommand
-----------------------------------------------------------
Key: ISPN-11634
URL:
https://issues.redhat.com/browse/ISPN-11634
Project: Infinispan
Issue Type: Enhancement
Components: Core
Affects Versions: 11.0.0.Dev04
Reporter: Ryan Emerson
Assignee: Ryan Emerson
Priority: Major
Fix For: 11.0.0.Final
{code:java}
19:48:14,672 TRACE (testng-Test:[]) [TopologyManagementHelper] Attempting to execute
command on self: StableTopologyUpdateCommand{cacheName='testCache',
origin=Test-NodeD, currentCH=DefaultConsistentHash
{ns=256, owners = (1)[Test-NodeE: 256+0]}
, pendingCH=null, actualMembers=[Test-NodeE],
persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342], rebalanceId=5, topologyId=27,
viewId=7}
19:48:14,686 INFO (jgroups-5,Test-NodeE:[]) [CLUSTER] ISPN000094: Received new cluster
view for channel org.infinispan.statetransfer.Test: [Test-NodeE|8] (1) [Test-NodeE]
19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8])
[PreferConsistencyStrategy] Max stable partition topology: CacheTopology{id=25,
phase=NO_REBALANCE, rebalanceId=5, currentCH=DefaultConsistentHash
{ns=256, owners = (2)[Test-NodeD: 129+127, Test-NodeE: 127+129]}
, pendingCH=null, unionCH=null, actualMembers=[Test-NodeD, Test-NodeE],
persistentUUIDs=[099401f9-c0b6-460d-8dc8-5051699e8287,
72b17309-62d1-4928-abf6-88a8606ef342]}
19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8])
[PreferConsistencyStrategy] Max active partition topology: CacheTopology{id=27,
phase=NO_REBALANCE, rebalanceId=5, currentCH=DefaultConsistentHash
{ns=256, owners = (1)[Test-NodeE: 256+0]}
, pendingCH=null, unionCH=null, actualMembers=[Test-NodeE],
persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342]}
19:48:14,690 DEBUG (non-blocking-thread-Test-NodeE-p9541-t3:[])
[LocalTopologyManagerImpl] Ignoring topology 27 for cache testCache from old coordinator
Test-NodeD
{code}
The old coordinator (NodeD) sent the topology update and the stable topology update, and
NodeE received both. However NodeE might ignores the stable topology update because it was
processed after it got the new cluster view, with itself as the coordinator. The proper
fix is to get rid of the StableTopologyUpdateCommand and add a stable=true/false field in
CacheTopology or in TopologyUpdateCommand.
There's still a small chance that the coordinator will shut down before sending the
topology update command to any of the other nodes, but it's a small chance. It's
much more likely that a separate stable topology update command will be ignored because it
is queued by LocalTopologyManagerImpl after the regular topology update command, and the
regular topology update has a lot of processing to do in StateConsumerImpl.