]
Galder Zamarreño updated ISPN-4743:
-----------------------------------
Status: Resolved (was: Pull Request Sent)
Resolution: Done
Rebalance can hang after the coordinator and another node leave
---------------------------------------------------------------
Key: ISPN-4743
URL:
https://issues.jboss.org/browse/ISPN-4743
Project: Infinispan
Issue Type: Bug
Components: Core, State Transfer
Affects Versions: 7.0.0.Beta2
Reporter: Dan Berindei
Assignee: Dan Berindei
Priority: Critical
Labels: testsuite_stability
Fix For: 7.0.0.CR1
This caused a failure in {{ClusterTopologyManagerTest.testAbruptLeaveAfterGetStatus}}.
When the coordinator changes, the new coordinator first sends a
{{CacheTopologyControlCommand(type=CH_UPDATE)}} to reset any ongoing rebalance, then a
{{CacheTopologyControlCommand(type=REBALANCE_START)}} to start a new rebalance with the
remaining members. If another node leaves afterwards, the coordinator sends yet another
{{CacheTopologyControlCommand(type=CH_UPDATE)}} to remove the leaver from the CHs.
If one node (in this case the coordinator itself) processes the last {{CH_UPDATE}} before
the other two commands, it will fail to confirm the rebalance, and the cache will stay in
"rebalancing" state until another node joins or leaves.