On Mon, Feb 6, 2012 at 3:49 PM, Mircea Markus <mmarkus(a)redhat.com> wrote:
> numOwners==2 is and will very likely remain the most common
case,
> particularly for small clusters.
>
> But if we have two sites, it makes sense to configure 2 owners per
> site. If only one node goes down, the surviving owner will supply
> state to the new owner. If both nodes go down, the new owners will
> fetch the data from the other site. So while 2 nodes going down will
> be quite costly, it should be infrequent enough that it's worth
> optimizing for the more frequent "1 node goes down and than comes
> back
> up" case.
>
Agreed; this mixed batching (leaves with joins) makes sense for non-site clusters as
well.
Yup, even with numOwners == 2 you could say that 2 nodes dying in 1
minute is highly improbable. So you can delay rehashes triggered by
leaves by 1 minute just in case the node comes back up.
> > For total shutdown, I guess we can use other means that
rehash,
> > e.g. a specific command that would disable it and start flushing
> > to the store.
> >
>
> I think just stopping the cache is enough to get it to flush data to
> the store with passivation enabled.
ATM, wouldn't the shutdown of a cluster of servers trigger a rehash storm?
Right. I was commenting only on the cache store part, which should be
completely orthogonal to graceful shutdown.
I think we need a mechanism to better handle partial cluster shutdown,
and that mechanism can be used for full shutdown as well.
> But for now any data saved to a
> private store in distributed mode is useless after restart, because
> we
> have no safe way to push data that we don't own to other nodes (and
> by
> safe I mean avoiding overwriting newer data or resurrecting deleted
> data).
I think that should work with a clustered cache store though.
Yep, cluster shutdown with a shared cache store should work - even
without any changes to how the cache is flushed to the store.
The application (could be an AS instance or a HotRod server) just
needs to shut down cleanly on each node, because Infinispan is just a
library and can't decide to kill the entire application on its own.
Cheers
Dan