Hi Alex
Yeah the DIST code is pretty fledgling and a lot of it does need to be
sorted out. I hope to spend a lot of time on it this week.
Thanks for your willing to contribute a better CH impl. I agree with
your points below. Perhaps you could directly email me a patch which
I can look at?
Cheers
Manik
On 23 Jun 2009, at 19:47, Alex Kluge wrote:
Hi,
I have been working on a use of Jboss cache, which has a lot of
overlap with the Infinispan project. I expect to be able to employ
significant parts of this work in Infinispan as well. One point of
overlap is the use of a consistent hash.
I have looked at the
org.infinispan.distribution.DefaultConsistentHash, and this is
actually a simple hash, and not a consistent hash. Luckily I have a
version of a consistent hash that can almost be dropped in here.
There are a number of properties of a consistent hash that make it
valuable for a distributed hash table.
- If a server is removed, the number of keys that shift to a
different
bin (different server) is minimal.
- The same key is always mapped to the same server.
- If a server is added, the number of keys that shift is minimal.
The current DefaultConsistentHash doesn't deliver on these. I hope
you don't mind if I go into some details here.
For example, the hash is sensitive to the order in which servers
appear in the initial collection of caches. If one cache is built
with a list of servers (S1, S2, S3), and another is built with a
list (S3, S2, S1), keys will be mapped to different servers, even
though the set of servers is actually the same.
If one server is removed, many, or even all, keys will be shifted.
For example one hash with the set of servers (S1, S2, S3) will map
many keys to different servers than one with (S2, S3). In a true
consistent hash, the keys originally mapped to S2 will remain mapped
to S2, and those mapped to S3 will remain mapped to S3. The keys
that were mapped to S1 will (depending on the exact implementation)
will be divided between S1 and S2.
There are a few differences, specifically, I work with arrays
rather than collections – in part for performance, I also support
weights for the servers, and the replication count is an instance
variable rather than an argument to the locate method. How wedded
are you to supplying the replication count as part of the locate
method? Other than this, it looks like an adaptation of my
implementation to Infinispan would be fairly painless, and I suggest
replacing the current implementation with it.
Alex Kluge
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev