]
Tristan Tarrant updated ISPN-7749:
----------------------------------
Fix Version/s: 9.2.0.Final
(was: 9.1.0.Final)
A node leaving during rebalance causes extra segment transfers
--------------------------------------------------------------
Key: ISPN-7749
URL:
https://issues.jboss.org/browse/ISPN-7749
Project: Infinispan
Issue Type: Bug
Components: Core, State Transfer
Affects Versions: 9.0.0.Final
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 9.2.0.Final
If a node leaves during a rebalance, the only change is that other nodes will no longer
request segments from that node. Otherwise the rebalance proceeds as usual, and at the end
a node may delete its copy of a segment even if that leaves less than {{numOwners}} copies
in the cluster.
When the leaver is the joiner, a 2nd rebalance is very likely to return to the initial
CH, but only after transferring some segments twice:
{noformat}
Rebalance starts: current_owners(s) = AB, pending_owners(s) = AC
C leaves: current_owners(s) = AB, pending_owners(s) = A
Rebalance finishes: current_owners(s) = A, pending_owners(s) = A
2nd rebalance starts: current_owners(s) = A, pending_owners(s) = AB
{noformat}
Even if the leaver is one of the old owners and the 2 segment transfers are necessary,
the cluster stays for too long with less than {{numOwners}} copies of the segment:
{noformat}
Rebalance starts: current_owners(s) = AB, pending_owners(s) = AC
A leaves: current_owners(s) = B, pending_owners(s) = C
C transfers segment s from B
Rebalance finishes: current_owners(s) = C, pending_owners(s) = C
2nd rebalance starts: current_owners(s) = C, pending_owners(s) = DC
{noformat}
We can fix this by making the pending CH a union of the current and pending CH whenever a
node leavers, and only removing extra segment copies after the 2nd rebalance.