[infinispan-dev] Asymmetric caches and manual rehashing design

Thu Sep 29 03:01:42 EDT 2011

On 9/28/11 1:48 PM, Dan Berindei wrote:
> On Wed, Sep 28, 2011 at 12:59 PM, Bela Ban<bban at redhat.com>  wrote:
>> My 5 cents:

>> - Are you clubbing (virtual) view updates and rebalancing together ? And
>> if so (I should probably read on first...), can't you have view
>> installations *without* rebalancing ?
>>
>
> I'm not sure what benefits you would get from joining the cache view
> but not receiving state compared to not sending the join request at
> all. Non-members are allowed to send/receive commands, so in the
> future we could even have separate "server" nodes (that join the cache
> view) and "client" nodes (joining the JGroups cluster to send
> commands, but not the cache view and so not holding any data except
> L1).

I had the scenario in mind where you join 100 members and only *then* do 
a state transfer (rebalancing).

> My idea was that the cache view was a representation of the caches
> that are able to service requests, so it doesn't make sense to include
> in the view caches that don't hold data.

OK. So with periodic rebalancing, you'd hold (virtual) views and state 
*until* the trigger fires, which then installs the new virtual views and 
rebalances the state ? In this case, tying view delivery and rebalancing 
together makes sense...

>> - Do we need the complex PREPARE_VIEW / ROLLBACK_VIEW / COMMIT_VIEW 2PC
>> handling ? This adds a lot of complexity. Is it only used when we have a
>> transactional cache ?
>>
>
> Nope, this doesn't have anything to do with transactional caches,
> instead it is all about computing the owner that will push the key
> during the rebalance operation.
>
> In order to do it deterministically we need to have a common "last
> good consistent hash" for the last rebalance that finished
> successfully, and each node must determine if it should push a key or
> not based on that last good CH.

OK. I just hope this makes sense for large clusters, as it is a 2PC, 
which doesn't scale to a larger number of nodes. I mean, we don't use 
FLUSH in large clusters for the same reason.
Hmm, on the upside, you don't run this algorithm a lot though, so maybe 
running it only a few times amortizes the cost of it.

With this algorithm, I assume you won't need the transitory view anymore 
(UnionConsistentHashFunction or whatever it was called), which includes 
both current and new owners of a key ?

> A rebalance operation can also fail for various reasons (e.g. the
> coordinator died). If that happens the new owners won't have all the
> state, so they should not receive requests for the state that they
> would have had in the pending CH.

OK, fair enough

>> - State is to be transferred *within* this 2PC time frame. Hmm, again,
>> this ties rebalancing and view installation together (see my argument
>> above)...
>>
>
> If view installation wasn't tied to state transfer then we'd have to
> keep yet the last rebalanced view somewhere else. We would hold the
> "last pending view" (pending rebalance, that is) in the
> CacheViewsManager and the "last rebalanced view" in another component,
> and that component would have it's own mechanism for synchronizing the
> "last rebalanced view" among cache members. So I think the 2PC
> approach in CacheViewsManager actually simplifies things.

OK, agreed. I would not like this if it was run on every view 
installation, but since we're running it after a cooldown period, or 
after having received N JOIN requests or M LEAVE requests, I guess it 
should be fine. +1 on simplification

-- 
Bela Ban
Lead JGroups (http://www.jgroups.org)
JBoss / Red Hat