[infinispan-dev] HotRod, ClientIntelligence and client-side key location

Tue Mar 23 07:04:48 EDT 2010

On 23 Mar 2010, at 09:48, Galder Zamarreno wrote:

> See below:
> 
> On Mon, 22 Mar 2010 13:02:15 +0100, Manik Surtani <manik at jboss.org> wrote:
> 
>> 
>> On 22 Mar 2010, at 11:16, Galder Zamarreno wrote:
>> 
>>> See below:
>>> 
>>> On Thu, 18 Mar 2010 22:26:44 +0100, Alex Kluge <java_kluge at yahoo.com>
>>> wrote:
>>> 
>>>> </snip>
>>>> 
>>>>> Firstly, this is hard when consumed by non-Java clients as you'd need
>>>> to
>>>>> implement the way the JDK calculates the hash code of a byte array.
>>>> 
>>>> This is a much easier problem if you don't use the built in hashing.
>>>> There are
>>>> a number of hash algorithms that can be used, including
>>>> FNV 1 (http://www.isthe.com/chongo/tech/comp/fnv/)
>>>> and others http://www.azillionmonkeys.com/qed/hash.html.
>>>> 
>>>> These are implementable in other languages, are fast, and provide good
>>>> distributions of
>>>> results. We can use, and similarly document, an integer hash used to
>>>> further spread the
>>>> hash values if needed. If these are chosen to be implementable in
>>>> multiple languages,
>>>> clearly documented, and don't change too often, it should be reasonable
>>>> to put them into
>>>> the client.
>>>> 
>>>> Using this approach removes the dependency on the hash that the VM
>>>> happens to be
>>>> using. Indeed, the hash for a byte array may simply be the address of
>>>> the array, which
>>>> makes it very poor for our use.
>>> 
>>> This is a very good point. This could even be done internally. If your
>>> starting Infinispan normally, use default CHA. If you're starting a Hot
>>> Rod server, the implementation could inject one of these CHA which are
>>> clearly documented and are easy to implement in other languages. That  
>>> way
>>> you'd get the best of both worlds. You don't need to expose your  
>>> internal
>>> details for normal Infinispan and we have a robust, stable and easy to
>>> implement algorithm in other languages.
>> 
>> On no, it shouldn't even be as complex as that.  What I suggest is this:
>> 
>> DefaultConsistentHash currently uses a Wang/Jenkins hash as a  
>> bit-spreader on key.hashcode(), and address.hashcode().
>> 
>> This is poor, since this is JVM dependent.  Secondly (and for a  
>> different reason) the W/J hash isn't providing us with adequate spread,  
>> as we've found out.
>> 
>> So step 1 is to identify a better-spread hash, some have been suggested  
>> on this thread, preferably one that can operate directly on byte[]'s and  
>> eliminate the need for a bit spreader.
>> 
>> The next step would be to change DefaultConsistentHash to:
>> 
>> * For addresses, use ${HASH_FUNCTION} on address.hashcode()
not sure how CH will look in near future (are we adding virtual nodes? if so hash code on address might be useless). 
It might be a good idea dropping this from first HR version and approach it once we are clear on how improved consistent hashing will look?
>> * For keys which are byte[]s, use ${HASH_FUNCTION} directly on the  
>> byte[] (this would directly benefit use via HotRod)
>> * For keys which are Strings, use ${HASH_FUNCTION} directly on the  
>> String (this is an optimisation)
>> * For keys which are Objects, use ${HASH_FUNCTION} on object.hashcode()  
>> (for in-VM use)
> 
> Yeah, this would much simpler to implement and maintain, hence +1
> 
>> 
>> We would need to document ${HASH_FUNCTION} as a part of the HotRod  
>> protocol, and to successfully locate entries, clients would need the  
>> following info:
>> 
>> * Server endpoints on the backend and their address.hashcode() values
> 
> We have that in the protocol.
> 
>> * Hash space size (the modulus for all modular arithmetic, hard-coded  
>> for now, may change in future).  An int.
> 
> That would need adding to the response header, wouldn't it?  
> http://community.jboss.org/wiki/HotRodProtocol#HashDistributionAware_Client_Topology_Change_Header
> 
>> * Hash function version.  This could point to details on the spec.  This  
>> could be a short.
> 
> You've included that, good.
> 
>> * Num owners the servers have been configured to use.  This again could  
>> be a short as far as HotRod is concerned.
>> 
>> WDYT?
>> 
>> Cheers
>> Manik
>> 
>>> Cheers,
>>> 
>>>>                                                                   Alex
>>>> 
>>>> --- On Thu, 3/18/10, Manik Surtani <manik at jboss.org> wrote:
>>>> 
>>>>> From: Manik Surtani <manik at jboss.org>
>>>>> Subject: [infinispan-dev] HotRod, ClientIntelligence and client-side
>>>>> key location
>>>>> To: "infinispan -Dev List" <infinispan-dev at lists.jboss.org>
>>>>> Date: Thursday, March 18, 2010, 11:59 AM
>>>>> I've been thinking about how we
>>>>> handle this, and I think we have a problem with smart
>>>>> clients where clients have the ability to locate the key on
>>>>> the server cluster in order to direct the request to the
>>>>> specific node.
>>>>> 
>>>>> The problem is in hash code calculation.  The HotRod
>>>>> protocol caters for this with regards to calculating node
>>>>> address hash code by passing this in the topology map (see
>>>>> "Hasher Client Topology Change Header" in [1]), but the only
>>>>> way this can be meaningfully used is if the client has the
>>>>> ability to calculate the hash code of the key in the same
>>>>> manner the servers do.  Firstly, this is hard when
>>>>> consumed by non-Java clients as you'd need to implement the
>>>>> way the JDK calculates the hash code of a byte array.Second, you'd  
>>>>> need
>>>>> detailed and specific knowledge of any
>>>>> bit spreading that takes place within Infinispan - and this
>>>>> is internal implementation detail which may change from
>>>>> release to release.
>>>>> 
>>>>> So the way I see it I can't see how non-Java clients will
>>>>> be able to locate keys and then direct requests to the
>>>>> necessary nodes.  In fact, even with Java clients the
>>>>> only way this could be done would be to send back marshalled
>>>>> Addresses in the topology map, *and* have the same version
>>>>> of the Infinispan server libs installed on the client, *and*
>>>>> ensure that the same JDK/JVM version is used on the
>>>>> client.
>>>>> Can we think of a better way to do this?  If not, is
>>>>> it worth still supporting client-side consistent hash based
>>>>> key location for the weird but vaguely workable scenario for
>>>>> Java-based clients?
>>>>> 
>>>>> Thoughts?
>>>>> 
>>>>> Cheers
>>>>> Manik
>>>>> 
>>>>> 
>>>>> [1] http://community.jboss.org/wiki/HotRodProtocol
>>>>> --
>>>>> Manik Surtani
>>>>> manik at jboss.org
>>>>> Lead, Infinispan
>>>>> Lead, JBoss Cache
>>>>> http://www.infinispan.org
>>>>> http://www.jbosscache.org
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> 
>>> 
>>> --
>>> Galder Zamarreño
>>> Sr. Software Engineer
>>> Infinispan, JBoss Cache
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> --
>> Manik Surtani
>> manik at jboss.org
>> Lead, Infinispan
>> Lead, JBoss Cache
>> http://www.infinispan.org
>> http://www.jbosscache.org
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> 
> -- 
> Galder Zamarreño
> Sr. Software Engineer
> Infinispan, JBoss Cache
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev