[infinispan-dev] Suppressing state transfer via JMX

Wed Jun 12 08:04:29 EDT 2013

On Wed, Jun 12, 2013 at 2:57 PM, Manik Surtani <msurtani at redhat.com> wrote:

>
> On 11 Jun 2013, at 16:27, Dan Berindei <dan.berindei at gmail.com> wrote:
>
>
>
>
> On Tue, Jun 11, 2013 at 2:01 PM, Manik Surtani <msurtani at redhat.com>wrote:
>
>>
>> On 10 Jun 2013, at 15:12, Dan Berindei <dan.berindei at gmail.com> wrote:
>>
>> Erik, I think in your case you'd be better served by a
>> ConsistentHashFactory that always assigns at most one owner from each
>> machine for each segment.
>>
>> I guess the fix for ISPN-3140 should work as well, but it wouldn't be
>> very straightforward: you'd have to keep the rebalancingEnabled attribute
>> set to false by default, and you'd have to enable it temporarily every time
>> you have a topology change that you do want to process.
>>
>>
>> Why?  Does the workflow detailed in ISPN-3140 not work?
>>
>>
> ISPN-3140 is geared toward planned shutdowns, my understanding was that
> Erik's scenario involves an unexpected failure.
>
> Say we have a cluster with 4 nodes spread on 2 machines: A(m1), B(m1),
> C(m2), D(m2).
> If m2 fails, rebalancing will start automatically and m1 will have 2
> copies of each entry (one on A and one on B).
> Trying to suspend rebalancing after m2 has already failed won't have any
> effect - if state transfer is already in progress it won't be cancelled.
> In order to avoid the unnecessary transfers, rebalancing would have to be
> suspended before the failure - i.e. rebalancing should be suspended by
> default.
>
>
>> It's certainly possible to do this automatically from your app or from a
>> monitoring daemon, but I'm pretty sure an enhanced topology-aware CHF would
>> be a better fit.
>>
>>
>> Do explain.
>>
>>
> A custom ConsistentHashFactory could distribute segments so that a machine
> never has more than 1 copy of each segment. If m2 failed, there would be
> just one machine in the cluster, and just one copy of each segment. The
> factory would not change the consistent hash, and there wouldn't be any
> state transfer.
>
>
> But that's bad for unplanned failures, as you lose data in that case.
>
>
Erik's assumption was that machine failures were much more likely than
individual node failures, and having a second copy on m1 wouldn't help if
m1 failed as well.

>
> It could be even simpler - the existing
> TopologyAwareConsistentHashFactory/TopologyAwareSyncConsistentHashFactory
> implementations already ensure just one copy per machine if the number of
> machines is >= numOwners. So a custom ConsistentHashFactory could just
> extend one of these and skip calling super.rebalance() when the number of
> machines in the cluster is < numOwners.
>
>
>
>>
>>
>>
>> On Fri, Jun 7, 2013 at 1:45 PM, Erik Salter <an1310 at hotmail.com> wrote:
>>
>>> I'd like something similar.  If I have equal keys on two machines (given
>>> an
>>> orthogonal setup and a TACH), I'd like to suppress state transfer and run
>>> with only one copy until I can recover my machines.  The business case is
>>> that in a degraded scenario, additional replicas aren't going to buy me
>>> anything, as a failure will most likely be at the machine level and will
>>> cause me to lose data.  Once I've recovered the other machine, I can turn
>>> back on state transfer to get my data redundancy.
>>>
>>> Erik
>>>
>>> -----Original Message-----
>>> From: infinispan-dev-bounces at lists.jboss.org
>>> [mailto:infinispan-dev-bounces at lists.jboss.org] On Behalf Of Mircea
>>> Markus
>>> Sent: Tuesday, June 04, 2013 5:44 AM
>>> To: infinispan -Dev List
>>> Subject: Re: [infinispan-dev] Suppressing state transfer via JMX
>>>
>>> Manik, what's wrong with Dan's suggestion with clearing the cache before
>>> shutdown?
>>>
>>> On 31 May 2013, at 14:20, Manik Surtani <msurtani at redhat.com> wrote:
>>>
>>> >>
>>> >> If we only want to deal with full cluster shutdown, then I think
>>> stopping
>>> all application requests, calling Cache.clear() on one node, and then
>>> shutting down all the nodes should be simpler. On start, assuming no
>>> cache
>>> store, the caches will start empty, so starting all the nodes at once and
>>> only allowing application requests when they've all joined should also
>>> work
>>> without extra work.
>>> >>
>>> >> If we only want to stop a part of the cluster, suppressing rebalancing
>>> would be better, because we wouldn't lose all the data. But we'd still
>>> lose
>>> the keys whose owners are all among the nodes we want to stop. I've
>>> discussed this with Adrian, and we think if we want to stop a part of the
>>> cluster without losing data we need a JMX operation on the coordinator
>>> that
>>> will "atomically" remove a set of nodes from the CH. After the operation
>>> completes, the user will know it's safe to stop those nodes without
>>> losing
>>> data.
>>> >
>>> > I think the no-data-loss option is bigger scope, perhaps part of
>>> ISPN-1394.  And that's not what I am asking about.
>>> >
>>> >> When it comes to starting a part of the cluster, a "pause rebalancing"
>>> option would probably be better - but again, on the coordinator, not on
>>> each
>>> joining node. And clearly, if more than numOwner nodes leave while
>>> rebalancing is suspended, data will be lost.
>>> >
>>> > Yup.  This sort of option would only be used where data loss isn't an
>>> issue (such as a distributed cache).  Where data loss is an issue, we'd
>>> need
>>> more control - ISPN-1394.
>>> >
>>>
>>> Cheers,
>>> --
>>> Mircea Markus
>>> Infinispan lead (www.infinispan.org)
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
>>  --
>> Manik Surtani
>> manik at jboss.org
>> twitter.com/maniksurtani
>>
>> Platform Architect, JBoss Data Grid
>> http://red.ht/data-grid
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
>   --
> Manik Surtani
> manik at jboss.org
> twitter.com/maniksurtani
>
> Platform Architect, JBoss Data Grid
> http://red.ht/data-grid
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20130612/7f30293b/attachment.html