[infinispan-dev] Suppressing state transfer via JMX

Tue Jun 11 11:27:02 EDT 2013

On Tue, Jun 11, 2013 at 2:01 PM, Manik Surtani <msurtani at redhat.com> wrote:

>
> On 10 Jun 2013, at 15:12, Dan Berindei <dan.berindei at gmail.com> wrote:
>
> Erik, I think in your case you'd be better served by a
> ConsistentHashFactory that always assigns at most one owner from each
> machine for each segment.
>
> I guess the fix for ISPN-3140 should work as well, but it wouldn't be very
> straightforward: you'd have to keep the rebalancingEnabled attribute set to
> false by default, and you'd have to enable it temporarily every time you
> have a topology change that you do want to process.
>
>
> Why?  Does the workflow detailed in ISPN-3140 not work?
>
>
ISPN-3140 is geared toward planned shutdowns, my understanding was that
Erik's scenario involves an unexpected failure.

Say we have a cluster with 4 nodes spread on 2 machines: A(m1), B(m1),
C(m2), D(m2).
If m2 fails, rebalancing will start automatically and m1 will have 2 copies
of each entry (one on A and one on B).
Trying to suspend rebalancing after m2 has already failed won't have any
effect - if state transfer is already in progress it won't be cancelled.
In order to avoid the unnecessary transfers, rebalancing would have to be
suspended before the failure - i.e. rebalancing should be suspended by
default.

> It's certainly possible to do this automatically from your app or from a
> monitoring daemon, but I'm pretty sure an enhanced topology-aware CHF would
> be a better fit.
>
>
> Do explain.
>
>
A custom ConsistentHashFactory could distribute segments so that a machine
never has more than 1 copy of each segment. If m2 failed, there would be
just one machine in the cluster, and just one copy of each segment. The
factory would not change the consistent hash, and there wouldn't be any
state transfer.

It could be even simpler - the existing
TopologyAwareConsistentHashFactory/TopologyAwareSyncConsistentHashFactory
implementations already ensure just one copy per machine if the number of
machines is >= numOwners. So a custom ConsistentHashFactory could just
extend one of these and skip calling super.rebalance() when the number of
machines in the cluster is < numOwners.

>
>
>
> On Fri, Jun 7, 2013 at 1:45 PM, Erik Salter <an1310 at hotmail.com> wrote:
>
>> I'd like something similar.  If I have equal keys on two machines (given
>> an
>> orthogonal setup and a TACH), I'd like to suppress state transfer and run
>> with only one copy until I can recover my machines.  The business case is
>> that in a degraded scenario, additional replicas aren't going to buy me
>> anything, as a failure will most likely be at the machine level and will
>> cause me to lose data.  Once I've recovered the other machine, I can turn
>> back on state transfer to get my data redundancy.
>>
>> Erik
>>
>> -----Original Message-----
>> From: infinispan-dev-bounces at lists.jboss.org
>> [mailto:infinispan-dev-bounces at lists.jboss.org] On Behalf Of Mircea
>> Markus
>> Sent: Tuesday, June 04, 2013 5:44 AM
>> To: infinispan -Dev List
>> Subject: Re: [infinispan-dev] Suppressing state transfer via JMX
>>
>> Manik, what's wrong with Dan's suggestion with clearing the cache before
>> shutdown?
>>
>> On 31 May 2013, at 14:20, Manik Surtani <msurtani at redhat.com> wrote:
>>
>> >>
>> >> If we only want to deal with full cluster shutdown, then I think
>> stopping
>> all application requests, calling Cache.clear() on one node, and then
>> shutting down all the nodes should be simpler. On start, assuming no cache
>> store, the caches will start empty, so starting all the nodes at once and
>> only allowing application requests when they've all joined should also
>> work
>> without extra work.
>> >>
>> >> If we only want to stop a part of the cluster, suppressing rebalancing
>> would be better, because we wouldn't lose all the data. But we'd still
>> lose
>> the keys whose owners are all among the nodes we want to stop. I've
>> discussed this with Adrian, and we think if we want to stop a part of the
>> cluster without losing data we need a JMX operation on the coordinator
>> that
>> will "atomically" remove a set of nodes from the CH. After the operation
>> completes, the user will know it's safe to stop those nodes without losing
>> data.
>> >
>> > I think the no-data-loss option is bigger scope, perhaps part of
>> ISPN-1394.  And that's not what I am asking about.
>> >
>> >> When it comes to starting a part of the cluster, a "pause rebalancing"
>> option would probably be better - but again, on the coordinator, not on
>> each
>> joining node. And clearly, if more than numOwner nodes leave while
>> rebalancing is suspended, data will be lost.
>> >
>> > Yup.  This sort of option would only be used where data loss isn't an
>> issue (such as a distributed cache).  Where data loss is an issue, we'd
>> need
>> more control - ISPN-1394.
>> >
>>
>> Cheers,
>> --
>> Mircea Markus
>> Infinispan lead (www.infinispan.org)
>>
>>
>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
>  --
> Manik Surtani
> manik at jboss.org
> twitter.com/maniksurtani
>
> Platform Architect, JBoss Data Grid
> http://red.ht/data-grid
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20130611/0a368b43/attachment-0001.html