[infinispan-dev] Proposal: ISPN-1394 Manual rehashing in 5.2

Mon Feb 6 14:41:32 EST 2012


----- Original Message -----
> From: "Dan Berindei" <dan.berindei at gmail.com>
> >> numOwners==2 is and will very likely remain the most common case,
> >> particularly for small clusters.
> >>
> >> But if we have two sites, it makes sense to configure 2 owners per
> >> site. If only one node goes down, the surviving owner will supply
> >> state to the new owner. If both nodes go down, the new owners will
> >> fetch the data from the other site. So while 2 nodes going down
> >> will
> >> be quite costly, it should be infrequent enough that it's worth
> >> optimizing for the more frequent "1 node goes down and than comes
> >> back
> >> up" case.
> >>
> > Agreed; this mixed batching (leaves with joins) makes sense for
> > non-site clusters as well.
> >
> 
> Yup, even with numOwners == 2 you could say that 2 nodes dying in 1
> minute is highly improbable. So you can delay rehashes triggered by
> leaves by 1 minute just in case the node comes back up.
> 
> 
> >> > For total shutdown, I guess we can use other means that rehash,
> >> > e.g. a specific command that would disable it and start flushing
> >> > to the store.
> >> >
> >>
> >> I think just stopping the cache is enough to get it to flush data
> >> to
> >> the store with passivation enabled.
> > ATM, wouldn't the shutdown of a cluster of servers trigger a rehash
> > storm?
> >
> 
> Right. I was commenting only on the cache store part, which should be
> completely orthogonal to graceful shutdown.
> 
> I think we need a mechanism to better handle partial cluster
> shutdown,
> and that mechanism can be used for full shutdown as well.
+1. Shuting down a cluster can be done much more effcient than stopping individual nodes in sequence.