[infinispan-dev] Asymmetric caches and manual rehashing design
Bela Ban
bban at redhat.com
Thu Sep 29 03:01:42 EDT 2011
On 9/28/11 1:48 PM, Dan Berindei wrote:
> On Wed, Sep 28, 2011 at 12:59 PM, Bela Ban<bban at redhat.com> wrote:
>> My 5 cents:
>> - Are you clubbing (virtual) view updates and rebalancing together ? And
>> if so (I should probably read on first...), can't you have view
>> installations *without* rebalancing ?
>>
>
> I'm not sure what benefits you would get from joining the cache view
> but not receiving state compared to not sending the join request at
> all. Non-members are allowed to send/receive commands, so in the
> future we could even have separate "server" nodes (that join the cache
> view) and "client" nodes (joining the JGroups cluster to send
> commands, but not the cache view and so not holding any data except
> L1).
I had the scenario in mind where you join 100 members and only *then* do
a state transfer (rebalancing).
> My idea was that the cache view was a representation of the caches
> that are able to service requests, so it doesn't make sense to include
> in the view caches that don't hold data.
OK. So with periodic rebalancing, you'd hold (virtual) views and state
*until* the trigger fires, which then installs the new virtual views and
rebalances the state ? In this case, tying view delivery and rebalancing
together makes sense...
>> - Do we need the complex PREPARE_VIEW / ROLLBACK_VIEW / COMMIT_VIEW 2PC
>> handling ? This adds a lot of complexity. Is it only used when we have a
>> transactional cache ?
>>
>
> Nope, this doesn't have anything to do with transactional caches,
> instead it is all about computing the owner that will push the key
> during the rebalance operation.
>
> In order to do it deterministically we need to have a common "last
> good consistent hash" for the last rebalance that finished
> successfully, and each node must determine if it should push a key or
> not based on that last good CH.
OK. I just hope this makes sense for large clusters, as it is a 2PC,
which doesn't scale to a larger number of nodes. I mean, we don't use
FLUSH in large clusters for the same reason.
Hmm, on the upside, you don't run this algorithm a lot though, so maybe
running it only a few times amortizes the cost of it.
With this algorithm, I assume you won't need the transitory view anymore
(UnionConsistentHashFunction or whatever it was called), which includes
both current and new owners of a key ?
> A rebalance operation can also fail for various reasons (e.g. the
> coordinator died). If that happens the new owners won't have all the
> state, so they should not receive requests for the state that they
> would have had in the pending CH.
OK, fair enough
>> - State is to be transferred *within* this 2PC time frame. Hmm, again,
>> this ties rebalancing and view installation together (see my argument
>> above)...
>>
>
> If view installation wasn't tied to state transfer then we'd have to
> keep yet the last rebalanced view somewhere else. We would hold the
> "last pending view" (pending rebalance, that is) in the
> CacheViewsManager and the "last rebalanced view" in another component,
> and that component would have it's own mechanism for synchronizing the
> "last rebalanced view" among cache members. So I think the 2PC
> approach in CacheViewsManager actually simplifies things.
OK, agreed. I would not like this if it was run on every view
installation, but since we're running it after a cooldown period, or
after having received N JOIN requests or M LEAVE requests, I guess it
should be fine. +1 on simplification
--
Bela Ban
Lead JGroups (http://www.jgroups.org)
JBoss / Red Hat
More information about the infinispan-dev
mailing list