On 11 Jun 2013, at 16:27, Dan Berindei <dan.berindei(a)gmail.com> wrote:
On Tue, Jun 11, 2013 at 2:01 PM, Manik Surtani <msurtani(a)redhat.com> wrote:
On 10 Jun 2013, at 15:12, Dan Berindei <dan.berindei(a)gmail.com> wrote:
> Erik, I think in your case you'd be better served by a ConsistentHashFactory that
always assigns at most one owner from each machine for each segment.
>
> I guess the fix for ISPN-3140 should work as well, but it wouldn't be very
straightforward: you'd have to keep the rebalancingEnabled attribute set to false by
default, and you'd have to enable it temporarily every time you have a topology change
that you do want to process.
Why? Does the workflow detailed in ISPN-3140 not work?
ISPN-3140 is geared toward planned shutdowns, my understanding was that Erik's
scenario involves an unexpected failure.
Say we have a cluster with 4 nodes spread on 2 machines: A(m1), B(m1), C(m2), D(m2).
If m2 fails, rebalancing will start automatically and m1 will have 2 copies of each entry
(one on A and one on B).
Trying to suspend rebalancing after m2 has already failed won't have any effect - if
state transfer is already in progress it won't be cancelled.
In order to avoid the unnecessary transfers, rebalancing would have to be suspended
before the failure - i.e. rebalancing should be suspended by default.
> It's certainly possible to do this automatically from your app or from a
monitoring daemon, but I'm pretty sure an enhanced topology-aware CHF would be a
better fit.
Do explain.
A custom ConsistentHashFactory could distribute segments so that a machine never has more
than 1 copy of each segment. If m2 failed, there would be just one machine in the cluster,
and just one copy of each segment. The factory would not change the consistent hash, and
there wouldn't be any state transfer.
But that's bad for unplanned failures, as you lose data in that case.
It could be even simpler - the existing
TopologyAwareConsistentHashFactory/TopologyAwareSyncConsistentHashFactory implementations
already ensure just one copy per machine if the number of machines is >= numOwners. So
a custom ConsistentHashFactory could just extend one of these and skip calling
super.rebalance() when the number of machines in the cluster is < numOwners.
>
>
>
> On Fri, Jun 7, 2013 at 1:45 PM, Erik Salter <an1310(a)hotmail.com> wrote:
> I'd like something similar. If I have equal keys on two machines (given an
> orthogonal setup and a TACH), I'd like to suppress state transfer and run
> with only one copy until I can recover my machines. The business case is
> that in a degraded scenario, additional replicas aren't going to buy me
> anything, as a failure will most likely be at the machine level and will
> cause me to lose data. Once I've recovered the other machine, I can turn
> back on state transfer to get my data redundancy.
>
> Erik
>
> -----Original Message-----
> From: infinispan-dev-bounces(a)lists.jboss.org
> [mailto:infinispan-dev-bounces@lists.jboss.org] On Behalf Of Mircea Markus
> Sent: Tuesday, June 04, 2013 5:44 AM
> To: infinispan -Dev List
> Subject: Re: [infinispan-dev] Suppressing state transfer via JMX
>
> Manik, what's wrong with Dan's suggestion with clearing the cache before
> shutdown?
>
> On 31 May 2013, at 14:20, Manik Surtani <msurtani(a)redhat.com> wrote:
>
> >>
> >> If we only want to deal with full cluster shutdown, then I think stopping
> all application requests, calling Cache.clear() on one node, and then
> shutting down all the nodes should be simpler. On start, assuming no cache
> store, the caches will start empty, so starting all the nodes at once and
> only allowing application requests when they've all joined should also work
> without extra work.
> >>
> >> If we only want to stop a part of the cluster, suppressing rebalancing
> would be better, because we wouldn't lose all the data. But we'd still lose
> the keys whose owners are all among the nodes we want to stop. I've
> discussed this with Adrian, and we think if we want to stop a part of the
> cluster without losing data we need a JMX operation on the coordinator that
> will "atomically" remove a set of nodes from the CH. After the operation
> completes, the user will know it's safe to stop those nodes without losing
> data.
> >
> > I think the no-data-loss option is bigger scope, perhaps part of
> ISPN-1394. And that's not what I am asking about.
> >
> >> When it comes to starting a part of the cluster, a "pause
rebalancing"
> option would probably be better - but again, on the coordinator, not on each
> joining node. And clearly, if more than numOwner nodes leave while
> rebalancing is suspended, data will be lost.
> >
> > Yup. This sort of option would only be used where data loss isn't an
> issue (such as a distributed cache). Where data loss is an issue, we'd need
> more control - ISPN-1394.
> >
>
> Cheers,
> --
> Mircea Markus
> Infinispan lead (
www.infinispan.org)
>
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Manik Surtani
manik(a)jboss.org
twitter.com/maniksurtani
Platform Architect, JBoss Data Grid
http://red.ht/data-grid
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Manik Surtani
manik(a)jboss.org
twitter.com/maniksurtani
Platform Architect, JBoss Data Grid
http://red.ht/data-grid