[infinispan-dev] ISPN-30: DIST cache mode

Thu Apr 16 03:02:04 EDT 2009

Thinking more about the wait time, this might be problematic:

    * V1={A,B,C,D}
    * K maps to C and D, repl-count=2
    * V2={A,B,C}
    * V3={A,B}
    * If C and D crashed (say within 10 seconds), then we need to
      rebalance K immediately upon receiving V2 because repl-count < 2

Why not handle views immediately and rebalance elements ? We can always 
think of optimizations later...

Rebalancing should occur a lot when starting the cluster, but at this 
time, we don't have many elements in the cache anyway. During operation, 
views should be infrequent. And, if we pick a good consistent hashing 
algorithm, rehashing should only affect 1/N of all elements anyway.

And remember: at the JGroups level, we can bundle views (= process 
multiple LEAVE or JOIN requests together and generate only 1 view) with 
GMS.view_bundling, this might help reducing churn without any code 
changes, at least for the moment.

Manik Surtani wrote:
>
> On 15 Apr 2009, at 13:49, Bela Ban wrote:
>
>> Looks good. A few suggestions though:
>>
>>   * I'd move the old part (Data Partitioning) to a separate wiki and
>>     reference it rather than including it
>
> This is the old wiki.  :-)  It is the same thing, I have just updated 
> it and this is why I don't want to remove the old stuff in case people 
> have bookmarked it or are still referencing it.  Lets see though, if 
> it gets too cluttered I'll archive it somewhere.
>
>>   * The wait time for rehashing is problematic
>>         o On the good side, we don't have too much rehashing, but...
>>         o Clients using the old and new view might get stale data when
>>           we've just rehashed but the client fetched the data from the
>>           old view
>
> The way I see it a union of views will be used.
>
> E.g.,
>
> V1: {A, B, C, D}
> V2: {A, B, C, D, E}
>
> K maps to {A, D} in V1 and {D, E} in V2.
>
> The moment the view change is detected, a timer is started that runs 
> for rehashWait millis.  During this period, all puts, gets, etc. go to 
> a union of where keys are hashed (minus dead hosts), e.g., in this 
> case, {A, D, E}.  This way these 3 nodes are kept in sync for all new 
> puts.  Could be more than 2 views by this stage.
>
> The only tricky situation is dealing with a nonexistent value.  (How 
> do we know whether K is null, or whether it just hasn't been moved 
> over to the new owner yet?).  To deal with this, the only thing I can 
> think of is to use a RspFilter and "double-check" all nulls by waiting 
> for others in the replication group to respond as well.
>
> Now at what point to we switch over to the new view completely?  
> Rather than broadcast a "new view installed" message or something of 
> the sort, I reckon we just use the timeout.  From the time the new 
> view was received, everyone starts a timer (they all have to rehash 
> anyway) and at the end of this, we stop using a combined view and 
> switch to a single, latest view.
>
> When dealing with the race in switching to a single view, I suggest a 
> new response type (NOT_OWNER?) which a node can reply with, causing 
> the requestor to wait for the next response.  Similar to receiving a 
> null in the case above.
>
>>         o In general, I like the idea of 'bundling' views, to avoid
>>           potentially nenecessary rehashing
>>         o In case of a GET during a rehashing, maybe simply execute a
>>           multicast and pick the response with the highest view ID... ?
>
> A possible fallback for when a null response is seen.  Is view id 
> available in a RspFilter?  Since this is where we decide whether to 
> accept the answer or wait for a better one.
>
> Cheers
> -- 
> Manik Surtani
> manik at jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
>
>
>
>

-- 
Bela Ban
Lead JGroups / Clustering Team
JBoss - a division of Red Hat