If we only want to deal with full cluster shutdown, then I think stopping all application requests, calling Cache.clear() on one node, and then shutting down all the nodes should be simpler. On start, assuming no cache store, the caches will start empty, so starting all the nodes at once and only allowing application requests when they've all joined should also work without extra work.

If we only want to stop a part of the cluster, suppressing rebalancing would be better, because we wouldn't lose all the data. But we'd still lose the keys whose owners are all among the nodes we want to stop. I've discussed this with Adrian, and we think if we want to stop a part of the cluster without losing data we need a JMX operation on the coordinator that will "atomically" remove a set of nodes from the CH. After the operation completes, the user will know it's safe to stop those nodes without losing data.

When it comes to starting a part of the cluster, a "pause rebalancing" option would probably be better - but again, on the coordinator, not on each joining node. And clearly, if more than numOwner nodes leave while rebalancing is suspended, data will be lost.

Cheers
Dan



On Fri, May 31, 2013 at 12:17 PM, Manik Surtani <msurtani@redhat.com> wrote:
Guys

We've discussed ISPN-3140 elsewhere before, I'm brining it to this forum now.

https://issues.jboss.org/browse/ISPN-3140

Any thoughts/concerns?  Particularly looking to hear from Dan or Adrian about viability, complexity, ease of implementation.

Thanks
Manik
--
Manik Surtani
manik@jboss.org
twitter.com/maniksurtani

Platform Architect, JBoss Data Grid
http://red.ht/data-grid


_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev