<div dir="ltr">Hi Erik<div><br></div><div>This makes a lot of sense. In fact, I was really close to implementing it while I was replacing RebalancePolicy with AvailabilityStrategy. Unfortunately I hit some problems and I had to postpone it (mostly because I was also trying to make the flag per-cache).</div><div><br></div><div>The only question is what happens after a merge, if one partition has rebalancing enabled, and the other has rebalancing disabled. </div><div><br></div><div>I think I would prefer to keep it disabled if at least one partition had it disabled. E.g. if you start a new node and it doesn&#39;t join properly, you wouldn&#39;t want it to trigger a rebalance when it finally finds the cluster, only after you enable rebalancing yourself. </div><div><br></div><div>Cheers</div><div>Dan</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Oct 28, 2014 at 12:00 AM, Erik Salter <span dir="ltr">&lt;<a href="mailto:an1310@hotmail.com" target="_blank">an1310@hotmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi all,<br>

<br>

This topic came up in a separate discussion with Mircea, and he suggested<br>

I post something on the mailing list for a wider audience.<br>

<br>

I have a business case where I need the value of the rebalancing flag read<br>

by the joining nodes.  Let&#39;s say we have a TACH where we want our keys<br>

striped across machines, racks, etc.  Due to how NBST works, if we start a<br>

bunch of nodes on one side of the topology marker, we&#39;rewill end up with<br>

the case where all keys will dog-pile on the first node that joins before<br>

being disseminated to the other nodes.  In other words, the first joining<br>

node on the other side of the topology acts as a &quot;pivot.&quot;  That&#39;s bad,<br>

especially if the key is marked as DELTA_WRITE, where the receiving node<br>

must pull the key from the readCH before applying the changelog.<br>

<br>

So not only do we have a single choke-point, but it&#39;s made worse by the<br>

initial burst of every write requiring numOwner threads for remote reads.<br>

<br>

If we disable rebalancing and start up the nodes on the other side of the<br>

topology, we can process this in a single view change.  But there&#39;s a<br>

catch -- and this is the reason I added the state of the flag.  We&#39;ve run<br>

into a case where the current coordinator changed (crash or a MERGE) as<br>

the other nodes are starting up.  And the new coordinator was elected from<br>

the new side of the topology.  So we had two separate but balanced CHs on<br>

both sides of the topology.  And data integrity went out the window.<br>

<br>

Hence the flag.  Note also that this deployment requires the<br>

awaitInitialTransfer flag to be false.<br>

<br>

In a real production environment, this has saved me more times than I can<br>

count.  Node failover/failback is now reasonably deterministic with a<br>

simple operational procedure for our customer(s) to follow.<br>

<br>

<br>

The question is whether this feature would be useful for the community.<br>

Even with the new partition handling, I think this implementation is still<br>

viable and may warrant inclusion into 7.0 (or 7.1).  What does the team<br>

think?  I welcome any and all feedback.<br>

<br>

Regards,<br>

<br>

Erik Salter<br>

Cisco Systems, SPVTG<br>

(404) 317-0693<br>

<br>

<br>

_______________________________________________<br>

infinispan-dev mailing list<br>

<a href="mailto:infinispan-dev@lists.jboss.org">infinispan-dev@lists.jboss.org</a><br>

<a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>

</blockquote></div><br></div>