[infinispan-issues] [JBoss JIRA] (ISPN-8240) Coordinator sends REBALANCE_START command when there is already a rebalance in progress
Dan Berindei (JIRA)
issues at jboss.org
Tue Aug 29 06:57:01 EDT 2017
[ https://issues.jboss.org/browse/ISPN-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dan Berindei updated ISPN-8240:
-------------------------------
Priority: Critical (was: Minor)
> Coordinator sends REBALANCE_START command when there is already a rebalance in progress
> ---------------------------------------------------------------------------------------
>
> Key: ISPN-8240
> URL: https://issues.jboss.org/browse/ISPN-8240
> Project: Infinispan
> Issue Type: Bug
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
>
> Normally the {{REBALANCE_START}} command should only be sent at the start of a rebalance, and any topology updates sent before all the nodes confirm the rebalance phase should have {{CH_UPDATE}}.
> Since the change to 4 phases, this is no longer true: first {{ClusterCacheStatus.updateTopologyMembers}} first clears the {{RebalanceConfirmationCollector}}, then it broadcasts a {{CH_UPDATE}}. Then {{queueRebalance}} immediately creates a new {{RCC}} and broadcasts a {{REBALANCE_START}}, instead of waiting for the current rebalance to finish.
> I propose we remove {{REBALANCE_START}}, as it was just a crude version of {{CacheTopology.Phase}}. We should also remove the {{isRebalance}} parameter from {{StateConsumerImpl.onTopologyUpdate()}}.
> I'm still not sure if rebalancing the pending CH immediately is ok. On the one hand, I would like the rebalance to finish with {{updateMembers(union(currentCH, pendingCH))}} as the new pending CH, so that segments that were already transferred keep an extra copy. On the other hand, that would only help for segments that have at least on owner in the current CH: if the current CH has 0 owners and {{updateMembers}} allocates new ones, those new owners won't request data from the pending CH owners anyway. Fixing that case would require the coordinator to fetch the transfer status from all the nodes before removing a node from the topology. But if the coordinator knew exactly which segments were transferred, it could finish the rebalance immediately and start a new one -- so it would be more similar to the current approach.
> Note: the {{SyncConsistentHashFactory}} allocation is not 100% stable, even when nodes are not added, so A ∈ owners(segment) in topology ABCD does not guarantee that A ∈ owners(segment) in topology ABC. But it should be good enough to keep A an owner in 90% of the cases.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
More information about the infinispan-issues
mailing list