[infinispan-dev] Asymmetric caches and manual rehashing design

Dan Berindei dan.berindei at gmail.com
Fri Sep 30 03:42:11 EDT 2011


On Thu, Sep 29, 2011 at 10:01 AM, Bela Ban <bban at redhat.com> wrote:
>
>
> On 9/28/11 1:48 PM, Dan Berindei wrote:
>> On Wed, Sep 28, 2011 at 12:59 PM, Bela Ban<bban at redhat.com>  wrote:
>>> My 5 cents:
>
>>> - Are you clubbing (virtual) view updates and rebalancing together ? And
>>> if so (I should probably read on first...), can't you have view
>>> installations *without* rebalancing ?
>>>
>>
>> I'm not sure what benefits you would get from joining the cache view
>> but not receiving state compared to not sending the join request at
>> all. Non-members are allowed to send/receive commands, so in the
>> future we could even have separate "server" nodes (that join the cache
>> view) and "client" nodes (joining the JGroups cluster to send
>> commands, but not the cache view and so not holding any data except
>> L1).
>
>
> I had the scenario in mind where you join 100 members and only *then* do
> a state transfer (rebalancing).
>
>
>
>> My idea was that the cache view was a representation of the caches
>> that are able to service requests, so it doesn't make sense to include
>> in the view caches that don't hold data.
>
>
> OK. So with periodic rebalancing, you'd hold (virtual) views and state
> *until* the trigger fires, which then installs the new virtual views and
> rebalances the state ? In this case, tying view delivery and rebalancing
> together makes sense...
>

Yes, I'm installing the views only when the trigger fires, and
installing the views also will also include transferring state.
So the 100 members might ask the coordinator to join the virtual view,
but they will only actually join (and receive state) when the new view
that includes them is installed.


>
>>> - Do we need the complex PREPARE_VIEW / ROLLBACK_VIEW / COMMIT_VIEW 2PC
>>> handling ? This adds a lot of complexity. Is it only used when we have a
>>> transactional cache ?
>>>
>>
>> Nope, this doesn't have anything to do with transactional caches,
>> instead it is all about computing the owner that will push the key
>> during the rebalance operation.
>>
>> In order to do it deterministically we need to have a common "last
>> good consistent hash" for the last rebalance that finished
>> successfully, and each node must determine if it should push a key or
>> not based on that last good CH.
>
>
> OK. I just hope this makes sense for large clusters, as it is a 2PC,
> which doesn't scale to a larger number of nodes. I mean, we don't use
> FLUSH in large clusters for the same reason.
> Hmm, on the upside, you don't run this algorithm a lot though, so maybe
> running it only a few times amortizes the cost of it.
>

Yeah, it is kind of bad with large clusters, at least with the
blocking state transfer we have now. But in order to ensure we don't
lose data we need to serialize the rebalance operations, and I don't
think we can do that without 2PC.

If we switch to the non-blocking state transfer proposal it shouldn't
matter any more, the installation of the view will be delayed until we
get the consensus but transactions will still be able to proceed while
a view is in the prepare phase.


> With this algorithm, I assume you won't need the transitory view anymore
> (UnionConsistentHashFunction or whatever it was called), which includes
> both current and new owners of a key ?
>

We are going to need something similar: since we're waiting for the
coordinator to tell us to install a new CH, our CH may contain nodes
that are already stopped. So we will have to check the addresses
returned by the CH against the list of nodes still running.

But other than that we are just going to use the CH of the latest
committed view (with blocking state transfer) or the CH of the latest
pending view (with non-blocking state transfer). With blocking state
transfer we just know that the old owners will hold the key until the
new view is committed. The non-blocking state transfer proposal works
by letting the new owners fetch the key from the old owners, so there
is no need for the callers to fall back to the previous owners.

Cheers
Dan



More information about the infinispan-dev mailing list