On 22 Mar 2010, at 11:16, Galder Zamarreno wrote:
See below:
On Thu, 18 Mar 2010 22:26:44 +0100, Alex Kluge <java_kluge(a)yahoo.com>
wrote:
> </snip>
>
>> Firstly, this is hard when consumed by non-Java clients as you'd need
> to
>> implement the way the JDK calculates the hash code of a byte array.
>
> This is a much easier problem if you don't use the built in hashing.
> There are
> a number of hash algorithms that can be used, including
> FNV 1 (
http://www.isthe.com/chongo/tech/comp/fnv/)
> and others
http://www.azillionmonkeys.com/qed/hash.html.
>
> These are implementable in other languages, are fast, and provide good
> distributions of
> results. We can use, and similarly document, an integer hash used to
> further spread the
> hash values if needed. If these are chosen to be implementable in
> multiple languages,
> clearly documented, and don't change too often, it should be reasonable
> to put them into
> the client.
>
> Using this approach removes the dependency on the hash that the VM
> happens to be
> using. Indeed, the hash for a byte array may simply be the address of
> the array, which
> makes it very poor for our use.
This is a very good point. This could even be done internally. If your
starting Infinispan normally, use default CHA. If you're starting a Hot
Rod server, the implementation could inject one of these CHA which are
clearly documented and are easy to implement in other languages. That way
you'd get the best of both worlds. You don't need to expose your internal
details for normal Infinispan and we have a robust, stable and easy to
implement algorithm in other languages.
On no, it shouldn't even be as complex as that. What I suggest is this:
DefaultConsistentHash currently uses a Wang/Jenkins hash as a bit-spreader on
key.hashcode(), and address.hashcode().
This is poor, since this is JVM dependent. Secondly (and for a different reason) the W/J
hash isn't providing us with adequate spread, as we've found out.
So step 1 is to identify a better-spread hash, some have been suggested on this thread,
preferably one that can operate directly on byte[]'s and eliminate the need for a bit
spreader.
The next step would be to change DefaultConsistentHash to:
* For addresses, use ${HASH_FUNCTION} on address.hashcode()
* For keys which are byte[]s, use ${HASH_FUNCTION} directly on the byte[] (this would
directly benefit use via HotRod)
* For keys which are Strings, use ${HASH_FUNCTION} directly on the String (this is an
optimisation)
* For keys which are Objects, use ${HASH_FUNCTION} on object.hashcode() (for in-VM use)
We would need to document ${HASH_FUNCTION} as a part of the HotRod protocol, and to
successfully locate entries, clients would need the following info:
* Server endpoints on the backend and their address.hashcode() values
* Hash space size (the modulus for all modular arithmetic, hard-coded for now, may change
in future). An int.
* Hash function version. This could point to details on the spec. This could be a
short.
* Num owners the servers have been configured to use. This again could be a short as far
as HotRod is concerned.
WDYT?
Cheers
Manik
Cheers,
> Alex
>
> --- On Thu, 3/18/10, Manik Surtani <manik(a)jboss.org> wrote:
>
>> From: Manik Surtani <manik(a)jboss.org>
>> Subject: [infinispan-dev] HotRod, ClientIntelligence and client-side
>> key location
>> To: "infinispan -Dev List" <infinispan-dev(a)lists.jboss.org>
>> Date: Thursday, March 18, 2010, 11:59 AM
>> I've been thinking about how we
>> handle this, and I think we have a problem with smart
>> clients where clients have the ability to locate the key on
>> the server cluster in order to direct the request to the
>> specific node.
>>
>> The problem is in hash code calculation. The HotRod
>> protocol caters for this with regards to calculating node
>> address hash code by passing this in the topology map (see
>> "Hasher Client Topology Change Header" in [1]), but the only
>> way this can be meaningfully used is if the client has the
>> ability to calculate the hash code of the key in the same
>> manner the servers do. Firstly, this is hard when
>> consumed by non-Java clients as you'd need to implement the
>> way the JDK calculates the hash code of a byte array.Second, you'd need
>> detailed and specific knowledge of any
>> bit spreading that takes place within Infinispan - and this
>> is internal implementation detail which may change from
>> release to release.
>>
>> So the way I see it I can't see how non-Java clients will
>> be able to locate keys and then direct requests to the
>> necessary nodes. In fact, even with Java clients the
>> only way this could be done would be to send back marshalled
>> Addresses in the topology map, *and* have the same version
>> of the Infinispan server libs installed on the client, *and*
>> ensure that the same JDK/JVM version is used on the
>> client.
>> Can we think of a better way to do this? If not, is
>> it worth still supporting client-side consistent hash based
>> key location for the weird but vaguely workable scenario for
>> Java-based clients?
>>
>> Thoughts?
>>
>> Cheers
>> Manik
>>
>>
>> [1]
http://community.jboss.org/wiki/HotRodProtocol
>> --
>> Manik Surtani
>> manik(a)jboss.org
>> Lead, Infinispan
>> Lead, JBoss Cache
>>
http://www.infinispan.org
>>
http://www.jbosscache.org
>>
>>
>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev(a)lists.jboss.org
>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Galder ZamarreƱo
Sr. Software Engineer
Infinispan, JBoss Cache
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org