On 9/28/11 1:48 PM, Dan Berindei wrote:
On Wed, Sep 28, 2011 at 12:59 PM, Bela Ban<bban(a)redhat.com>
wrote:
> My 5 cents:
> - Are you clubbing (virtual) view updates and rebalancing
together ? And
> if so (I should probably read on first...), can't you have view
> installations *without* rebalancing ?
>
I'm not sure what benefits you would get from joining the cache view
but not receiving state compared to not sending the join request at
all. Non-members are allowed to send/receive commands, so in the
future we could even have separate "server" nodes (that join the cache
view) and "client" nodes (joining the JGroups cluster to send
commands, but not the cache view and so not holding any data except
L1).
I had the scenario in mind where you join 100 members and only *then* do
a state transfer (rebalancing).
My idea was that the cache view was a representation of the caches
that are able to service requests, so it doesn't make sense to include
in the view caches that don't hold data.
OK. So with periodic rebalancing, you'd hold (virtual) views and state
*until* the trigger fires, which then installs the new virtual views and
rebalances the state ? In this case, tying view delivery and rebalancing
together makes sense...
> - Do we need the complex PREPARE_VIEW / ROLLBACK_VIEW /
COMMIT_VIEW 2PC
> handling ? This adds a lot of complexity. Is it only used when we have a
> transactional cache ?
>
Nope, this doesn't have anything to do with transactional caches,
instead it is all about computing the owner that will push the key
during the rebalance operation.
In order to do it deterministically we need to have a common "last
good consistent hash" for the last rebalance that finished
successfully, and each node must determine if it should push a key or
not based on that last good CH.
OK. I just hope this makes sense for large clusters, as it is a 2PC,
which doesn't scale to a larger number of nodes. I mean, we don't use
FLUSH in large clusters for the same reason.
Hmm, on the upside, you don't run this algorithm a lot though, so maybe
running it only a few times amortizes the cost of it.
With this algorithm, I assume you won't need the transitory view anymore
(UnionConsistentHashFunction or whatever it was called), which includes
both current and new owners of a key ?
A rebalance operation can also fail for various reasons (e.g. the
coordinator died). If that happens the new owners won't have all the
state, so they should not receive requests for the state that they
would have had in the pending CH.
OK, fair enough
> - State is to be transferred *within* this 2PC time frame. Hmm,
again,
> this ties rebalancing and view installation together (see my argument
> above)...
>
If view installation wasn't tied to state transfer then we'd have to
keep yet the last rebalanced view somewhere else. We would hold the
"last pending view" (pending rebalance, that is) in the
CacheViewsManager and the "last rebalanced view" in another component,
and that component would have it's own mechanism for synchronizing the
"last rebalanced view" among cache members. So I think the 2PC
approach in CacheViewsManager actually simplifies things.
OK, agreed. I would not like this if it was run on every view
installation, but since we're running it after a cooldown period, or
after having received N JOIN requests or M LEAVE requests, I guess it
should be fine. +1 on simplification
--
Bela Ban
Lead JGroups (
http://www.jgroups.org)
JBoss / Red Hat