[infinispan-dev] ISPN-30: DIST cache mode
Bela Ban
bban at redhat.com
Thu Apr 16 03:02:04 EDT 2009
Thinking more about the wait time, this might be problematic:
* V1={A,B,C,D}
* K maps to C and D, repl-count=2
* V2={A,B,C}
* V3={A,B}
* If C and D crashed (say within 10 seconds), then we need to
rebalance K immediately upon receiving V2 because repl-count < 2
Why not handle views immediately and rebalance elements ? We can always
think of optimizations later...
Rebalancing should occur a lot when starting the cluster, but at this
time, we don't have many elements in the cache anyway. During operation,
views should be infrequent. And, if we pick a good consistent hashing
algorithm, rehashing should only affect 1/N of all elements anyway.
And remember: at the JGroups level, we can bundle views (= process
multiple LEAVE or JOIN requests together and generate only 1 view) with
GMS.view_bundling, this might help reducing churn without any code
changes, at least for the moment.
Manik Surtani wrote:
>
> On 15 Apr 2009, at 13:49, Bela Ban wrote:
>
>> Looks good. A few suggestions though:
>>
>> * I'd move the old part (Data Partitioning) to a separate wiki and
>> reference it rather than including it
>
> This is the old wiki. :-) It is the same thing, I have just updated
> it and this is why I don't want to remove the old stuff in case people
> have bookmarked it or are still referencing it. Lets see though, if
> it gets too cluttered I'll archive it somewhere.
>
>> * The wait time for rehashing is problematic
>> o On the good side, we don't have too much rehashing, but...
>> o Clients using the old and new view might get stale data when
>> we've just rehashed but the client fetched the data from the
>> old view
>
> The way I see it a union of views will be used.
>
> E.g.,
>
> V1: {A, B, C, D}
> V2: {A, B, C, D, E}
>
> K maps to {A, D} in V1 and {D, E} in V2.
>
> The moment the view change is detected, a timer is started that runs
> for rehashWait millis. During this period, all puts, gets, etc. go to
> a union of where keys are hashed (minus dead hosts), e.g., in this
> case, {A, D, E}. This way these 3 nodes are kept in sync for all new
> puts. Could be more than 2 views by this stage.
>
> The only tricky situation is dealing with a nonexistent value. (How
> do we know whether K is null, or whether it just hasn't been moved
> over to the new owner yet?). To deal with this, the only thing I can
> think of is to use a RspFilter and "double-check" all nulls by waiting
> for others in the replication group to respond as well.
>
> Now at what point to we switch over to the new view completely?
> Rather than broadcast a "new view installed" message or something of
> the sort, I reckon we just use the timeout. From the time the new
> view was received, everyone starts a timer (they all have to rehash
> anyway) and at the end of this, we stop using a combined view and
> switch to a single, latest view.
>
> When dealing with the race in switching to a single view, I suggest a
> new response type (NOT_OWNER?) which a node can reply with, causing
> the requestor to wait for the next response. Similar to receiving a
> null in the case above.
>
>> o In general, I like the idea of 'bundling' views, to avoid
>> potentially nenecessary rehashing
>> o In case of a GET during a rehashing, maybe simply execute a
>> multicast and pick the response with the highest view ID... ?
>
> A possible fallback for when a null response is seen. Is view id
> available in a RspFilter? Since this is where we decide whether to
> accept the answer or wait for a better one.
>
> Cheers
> --
> Manik Surtani
> manik at jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
>
>
>
>
--
Bela Ban
Lead JGroups / Clustering Team
JBoss - a division of Red Hat
More information about the infinispan-dev
mailing list