[infinispan-dev] Rebalancing flag as part of the CacheStatusResponse

Fri Nov 7 10:24:20 EST 2014

Hi Erik

This makes a lot of sense. In fact, I was really close to implementing it
while I was replacing RebalancePolicy with AvailabilityStrategy.
Unfortunately I hit some problems and I had to postpone it (mostly because
I was also trying to make the flag per-cache).

The only question is what happens after a merge, if one partition has
rebalancing enabled, and the other has rebalancing disabled.

I think I would prefer to keep it disabled if at least one partition had it
disabled. E.g. if you start a new node and it doesn't join properly, you
wouldn't want it to trigger a rebalance when it finally finds the cluster,
only after you enable rebalancing yourself.

Cheers
Dan

On Tue, Oct 28, 2014 at 12:00 AM, Erik Salter <an1310 at hotmail.com> wrote:

> Hi all,
>
> This topic came up in a separate discussion with Mircea, and he suggested
> I post something on the mailing list for a wider audience.
>
> I have a business case where I need the value of the rebalancing flag read
> by the joining nodes.  Let's say we have a TACH where we want our keys
> striped across machines, racks, etc.  Due to how NBST works, if we start a
> bunch of nodes on one side of the topology marker, we'rewill end up with
> the case where all keys will dog-pile on the first node that joins before
> being disseminated to the other nodes.  In other words, the first joining
> node on the other side of the topology acts as a "pivot."  That's bad,
> especially if the key is marked as DELTA_WRITE, where the receiving node
> must pull the key from the readCH before applying the changelog.
>
> So not only do we have a single choke-point, but it's made worse by the
> initial burst of every write requiring numOwner threads for remote reads.
>
> If we disable rebalancing and start up the nodes on the other side of the
> topology, we can process this in a single view change.  But there's a
> catch -- and this is the reason I added the state of the flag.  We've run
> into a case where the current coordinator changed (crash or a MERGE) as
> the other nodes are starting up.  And the new coordinator was elected from
> the new side of the topology.  So we had two separate but balanced CHs on
> both sides of the topology.  And data integrity went out the window.
>
> Hence the flag.  Note also that this deployment requires the
> awaitInitialTransfer flag to be false.
>
> In a real production environment, this has saved me more times than I can
> count.  Node failover/failback is now reasonably deterministic with a
> simple operational procedure for our customer(s) to follow.
>
>
> The question is whether this feature would be useful for the community.
> Even with the new partition handling, I think this implementation is still
> viable and may warrant inclusion into 7.0 (or 7.1).  What does the team
> think?  I welcome any and all feedback.
>
> Regards,
>
> Erik Salter
> Cisco Systems, SPVTG
> (404) 317-0693
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20141107/3afe4c8a/attachment.html