Re: [infinispan-dev] Re: Distributed hashing - heterogenous nodes

Tuesday, 7 July 2009

Hi Alex

Yeah the DIST code is pretty fledgling and a lot of it does need to be  
sorted out.  I hope to spend a lot of time on it this week.

Thanks for your willing to contribute a better CH impl.  I agree with  
your points below.  Perhaps you could directly email me a patch which  
I can look at?

Cheers
Manik

On 23 Jun 2009, at 19:47, Alex Kluge wrote:

...

 Hi,

   I have been working on a use of Jboss cache, which has a lot of  
 overlap with the Infinispan project. I expect to be able to employ  
 significant parts of this work in Infinispan as well. One point of  
 overlap is the use of a consistent hash.

 I have looked at the  
 org.infinispan.distribution.DefaultConsistentHash, and this is  
 actually a simple hash, and not a consistent hash. Luckily I have a  
 version of a consistent hash that can almost be dropped in here.  
 There are a number of properties of a consistent hash that make it  
 valuable for a distributed hash table.

  - If a server is removed, the number of keys that shift to a  
 different
    bin (different server) is minimal.
  - The same key is always mapped to the same server.
  - If a server is added, the number of keys that shift is minimal.

 The current DefaultConsistentHash doesn't deliver on these. I hope  
 you don't mind if I go into some details here.

   For example, the hash is sensitive to the order in which servers  
 appear in the initial collection of caches. If one cache is built  
 with a list of servers (S1, S2, S3), and another is built with a  
 list (S3, S2, S1), keys will be mapped to different servers, even  
 though the set of servers is actually the same.

 If one server is removed, many, or even all, keys will be shifted.  
 For example one hash with the set of servers (S1, S2, S3) will map  
 many keys to different servers than one with (S2, S3). In a true  
 consistent hash, the keys originally mapped to S2 will remain mapped  
 to S2, and those mapped to S3 will remain mapped to S3. The keys  
 that were mapped to S1 will (depending on the exact implementation)  
 will be divided between S1 and S2.

  There are a few differences, specifically, I work with arrays  
 rather than collections – in part for performance, I also support  
 weights for the servers, and the replication count is an instance  
 variable rather than an argument to the locate method. How wedded  
 are you to supplying the replication count as part of the locate  
 method? Other than this, it looks like an adaptation of my  
 implementation to Infinispan would be fairly painless, and I suggest  
 replacing the current implementation with it.

                                        Alex Kluge

 _______________________________________________
 infinispan-dev mailing list
 infinispan-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev 
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] Re: Distributed hashing - heterogenous nodes