]
Dan Berindei updated ISPN-8240:
-------------------------------
Priority: Critical (was: Minor)
Coordinator sends REBALANCE_START command when there is already a
rebalance in progress
---------------------------------------------------------------------------------------
Key: ISPN-8240
URL:
https://issues.jboss.org/browse/ISPN-8240
Project: Infinispan
Issue Type: Bug
Reporter: Dan Berindei
Assignee: Dan Berindei
Priority: Critical
Normally the {{REBALANCE_START}} command should only be sent at the start of a rebalance,
and any topology updates sent before all the nodes confirm the rebalance phase should have
{{CH_UPDATE}}.
Since the change to 4 phases, this is no longer true: first
{{ClusterCacheStatus.updateTopologyMembers}} first clears the
{{RebalanceConfirmationCollector}}, then it broadcasts a {{CH_UPDATE}}. Then
{{queueRebalance}} immediately creates a new {{RCC}} and broadcasts a {{REBALANCE_START}},
instead of waiting for the current rebalance to finish.
I propose we remove {{REBALANCE_START}}, as it was just a crude version of
{{CacheTopology.Phase}}. We should also remove the {{isRebalance}} parameter from
{{StateConsumerImpl.onTopologyUpdate()}}.
I'm still not sure if rebalancing the pending CH immediately is ok. On the one hand,
I would like the rebalance to finish with {{updateMembers(union(currentCH, pendingCH))}}
as the new pending CH, so that segments that were already transferred keep an extra copy.
On the other hand, that would only help for segments that have at least on owner in the
current CH: if the current CH has 0 owners and {{updateMembers}} allocates new ones, those
new owners won't request data from the pending CH owners anyway. Fixing that case
would require the coordinator to fetch the transfer status from all the nodes before
removing a node from the topology. But if the coordinator knew exactly which segments were
transferred, it could finish the rebalance immediately and start a new one -- so it would
be more similar to the current approach.
Note: the {{SyncConsistentHashFactory}} allocation is not 100% stable, even when nodes
are not added, so A ∈ owners(segment) in topology ABCD does not guarantee that A ∈
owners(segment) in topology ABC. But it should be good enough to keep A an owner in 90% of
the cases.