[infinispan-dev] Non-blocking state transfer (ISPN-1424)

Dan Berindei dan.berindei at gmail.com
Thu Mar 22 06:33:36 EDT 2012


On Tue, Mar 20, 2012 at 2:51 PM, Mircea Markus <mircea.markus at jboss.com> wrote:
> Hi Dan,
>
> Some notes:
>
> Ownership Information:
> - I remember the discussion with Sanne about an algorithm that wouldn't
> require view prepare/rollback, but I think it would be very interring to see
> it described in detail somewhere as all the points you raised in the
> document are very closely tied to that

Yes, actually I'm trying to describe precisely that. Of course, there
are some complications...

> - "However, it's not that clear what to do when a node leaves while the
> cache is in "steady state"" --> if (numOwners-1) nodes crash a state
> transfer is (most likely) wanted in order not to loose consistency in the
> eventuality another node goes down. By default numOwners is 2, so in the
> first release, can't we just assume that for every leave we'd want
> immediately issue a state transfer? I think this would cover most of our
> user's needs and simplify the problem considerably.
>

Not necessarily: if the user has a TopologyAwareConsistentHash and has
split his cluster in two "sites" or "racks", he can bring down an
entire site/rack before rebalancing, without the risk of losing any
data.

> Recovery
> - I agree with your point re:recovery. This shouldn't be considered hight
> prio in the first release. The recovery info is kept in an independent
> cache, which allows a lot of flexibility: e.g. can point to a shared cache
> store so that recovery caches on other nodes can read that info when needed.
>

I don't agree about the shared cache store, because then every cache
transaction would have to write to that store. I think that would be
cost-prohibitive.

>
> Locking/sync..
> "The state transfer task will acquire the DCL in exclusive mode after
> updating the cache view id and release it after obtaining the data container
> snapshot. This will ensure the state transfer task can't proceed on the old
> owner until all write commands that knew about the old CH have finished
> executing." <-- that means that incoming writes would block for the duration
> of iterating the DataContainer? That shouldn't be too bad, unless a cache
> store is present..
>

Ok, first of all, in the new draft merged the data container lock and
the cache view lock, I don't think making them separate actually
helped with anything but it made things harder to explain.

The idea was that incoming writes would only block while state
transfer gets the iterator to the data container, which is very short.
But the state transfer will also block on all the writes to the data
container - to make sure that we won't miss any writes.

The algorithm is like this:

Write command
===
1. acquire read lock
2. check cache view id
3. write entry to data container
4. release read lock

State Transfer
===
1. acquire write lock
2. update ch
3. update cache view id
4. release read lock

> "CacheViewInterceptor [..] also needs to block in the interval between a
> node noticing a leave and actually receiving the new cache view from the
> coordinator" <-- why can't the local cache star computing the new CH
> independently and not wait for the coordinator..?
>

Because the local cache doesn't listen for join and leave requests,
only the coordinator does, and the cache view is computed based on the
requests that the coordinator has seen (and not on the JGroups cluster
joins and leaves).

Cheers
Dan



More information about the infinispan-dev mailing list