Manik, what's wrong with Dan's suggestion with clearing the cache before
shutdown?
On 31 May 2013, at 14:20, Manik Surtani <msurtani(a)redhat.com> wrote:
>
> If we only want to deal with full cluster shutdown, then I think stopping all
application requests, calling Cache.clear() on one node, and then shutting down all the
nodes should be simpler. On start, assuming no cache store, the caches will start empty,
so starting all the nodes at once and only allowing application requests when they've
all joined should also work without extra work.
>
> If we only want to stop a part of the cluster, suppressing rebalancing would be
better, because we wouldn't lose all the data. But we'd still lose the keys whose
owners are all among the nodes we want to stop. I've discussed this with Adrian, and
we think if we want to stop a part of the cluster without losing data we need a JMX
operation on the coordinator that will "atomically" remove a set of nodes from
the CH. After the operation completes, the user will know it's safe to stop those
nodes without losing data.
I think the no-data-loss option is bigger scope, perhaps part of ISPN-1394. And
that's not what I am asking about.
> When it comes to starting a part of the cluster, a "pause rebalancing"
option would probably be better - but again, on the coordinator, not on each joining node.
And clearly, if more than numOwner nodes leave while rebalancing is suspended, data will
be lost.
Yup. This sort of option would only be used where data loss isn't an issue (such as
a distributed cache). Where data loss is an issue, we'd need more control -
ISPN-1394.
Cheers,
--
Mircea Markus
Infinispan lead (
www.infinispan.org)