On Mon, Feb 6, 2012 at 2:01 PM, Mircea Markus <mmarkus(a)redhat.com> wrote:
> We could extend the policy for craches as well, by adding a
> minNumOwners setting and only triggering an automatic rehash when
> there is a segment on the hash wheel with <= minNumOwners owners.
minNumOwners should be at least 1, which in the default use case (numOwners==2) would
mean that we react to every leave.
Whilst this functinality is very nice, I don't think it should have a high prio as
most of the use cases I'm aware of use numOwners=2.
numOwners==2 is and will very likely remain the most common case,
particularly for small clusters.
But if we have two sites, it makes sense to configure 2 owners per
site. If only one node goes down, the surviving owner will supply
state to the new owner. If both nodes go down, the new owners will
fetch the data from the other site. So while 2 nodes going down will
be quite costly, it should be infrequent enough that it's worth
optimizing for the more frequent "1 node goes down and than comes back
up" case.
>
> We would have a properly installed CH, that guarantees at some point
> in the past each key had numOwners owners, and a filter on top of it
> that removes any leavers from the result of DM.locate(),
> DM.getPrimaryLocation() etc.
>
> It would probably undo our recent optimizations around locate and
> getPrimaryLocation, so it's slowing the normal case (without any
> leavers) in order to make the exceptional case (organized shutdown or
> a part of the cluster) faster.
> The question is how big the cluster
> has
> to get before the exceptional case becomes common enough that it's
> worth optimizing for...
re:partial shutdown, due to consistency constraints, it won't be posible to
controlled shutdow more than numOwners-1 nodes at any time, so not sure it this
optimisation has a broad scope.
Again, in a multi-site scenario it makes sense to shut down an entire
site. The data will remain safe on the surviving site(s), regardless
of the number of nodes being shut down.
For total shutdown, I guess we can use other means that rehash, e.g.
a specific command that would disable it and start flushing to the store.
I think just stopping the cache is enough to get it to flush data to
the store with passivation enabled. But for now any data saved to a
private store in distributed mode is useless after restart, because we
have no safe way to push data that we don't own to other nodes (and by
safe I mean avoiding overwriting newer data or resurrecting deleted
data).
Cheers
Dan