[infinispan-dev] HotRod, ClientIntelligence and client-side key location
Mircea Markus
mircea.markus at jboss.com
Tue Mar 23 07:04:48 EDT 2010
On 23 Mar 2010, at 09:48, Galder Zamarreno wrote:
> See below:
>
> On Mon, 22 Mar 2010 13:02:15 +0100, Manik Surtani <manik at jboss.org> wrote:
>
>>
>> On 22 Mar 2010, at 11:16, Galder Zamarreno wrote:
>>
>>> See below:
>>>
>>> On Thu, 18 Mar 2010 22:26:44 +0100, Alex Kluge <java_kluge at yahoo.com>
>>> wrote:
>>>
>>>> </snip>
>>>>
>>>>> Firstly, this is hard when consumed by non-Java clients as you'd need
>>>> to
>>>>> implement the way the JDK calculates the hash code of a byte array.
>>>>
>>>> This is a much easier problem if you don't use the built in hashing.
>>>> There are
>>>> a number of hash algorithms that can be used, including
>>>> FNV 1 (http://www.isthe.com/chongo/tech/comp/fnv/)
>>>> and others http://www.azillionmonkeys.com/qed/hash.html.
>>>>
>>>> These are implementable in other languages, are fast, and provide good
>>>> distributions of
>>>> results. We can use, and similarly document, an integer hash used to
>>>> further spread the
>>>> hash values if needed. If these are chosen to be implementable in
>>>> multiple languages,
>>>> clearly documented, and don't change too often, it should be reasonable
>>>> to put them into
>>>> the client.
>>>>
>>>> Using this approach removes the dependency on the hash that the VM
>>>> happens to be
>>>> using. Indeed, the hash for a byte array may simply be the address of
>>>> the array, which
>>>> makes it very poor for our use.
>>>
>>> This is a very good point. This could even be done internally. If your
>>> starting Infinispan normally, use default CHA. If you're starting a Hot
>>> Rod server, the implementation could inject one of these CHA which are
>>> clearly documented and are easy to implement in other languages. That
>>> way
>>> you'd get the best of both worlds. You don't need to expose your
>>> internal
>>> details for normal Infinispan and we have a robust, stable and easy to
>>> implement algorithm in other languages.
>>
>> On no, it shouldn't even be as complex as that. What I suggest is this:
>>
>> DefaultConsistentHash currently uses a Wang/Jenkins hash as a
>> bit-spreader on key.hashcode(), and address.hashcode().
>>
>> This is poor, since this is JVM dependent. Secondly (and for a
>> different reason) the W/J hash isn't providing us with adequate spread,
>> as we've found out.
>>
>> So step 1 is to identify a better-spread hash, some have been suggested
>> on this thread, preferably one that can operate directly on byte[]'s and
>> eliminate the need for a bit spreader.
>>
>> The next step would be to change DefaultConsistentHash to:
>>
>> * For addresses, use ${HASH_FUNCTION} on address.hashcode()
not sure how CH will look in near future (are we adding virtual nodes? if so hash code on address might be useless).
It might be a good idea dropping this from first HR version and approach it once we are clear on how improved consistent hashing will look?
>> * For keys which are byte[]s, use ${HASH_FUNCTION} directly on the
>> byte[] (this would directly benefit use via HotRod)
>> * For keys which are Strings, use ${HASH_FUNCTION} directly on the
>> String (this is an optimisation)
>> * For keys which are Objects, use ${HASH_FUNCTION} on object.hashcode()
>> (for in-VM use)
>
> Yeah, this would much simpler to implement and maintain, hence +1
>
>>
>> We would need to document ${HASH_FUNCTION} as a part of the HotRod
>> protocol, and to successfully locate entries, clients would need the
>> following info:
>>
>> * Server endpoints on the backend and their address.hashcode() values
>
> We have that in the protocol.
>
>> * Hash space size (the modulus for all modular arithmetic, hard-coded
>> for now, may change in future). An int.
>
> That would need adding to the response header, wouldn't it?
> http://community.jboss.org/wiki/HotRodProtocol#HashDistributionAware_Client_Topology_Change_Header
>
>> * Hash function version. This could point to details on the spec. This
>> could be a short.
>
> You've included that, good.
>
>> * Num owners the servers have been configured to use. This again could
>> be a short as far as HotRod is concerned.
>>
>> WDYT?
>>
>> Cheers
>> Manik
>>
>>> Cheers,
>>>
>>>> Alex
>>>>
>>>> --- On Thu, 3/18/10, Manik Surtani <manik at jboss.org> wrote:
>>>>
>>>>> From: Manik Surtani <manik at jboss.org>
>>>>> Subject: [infinispan-dev] HotRod, ClientIntelligence and client-side
>>>>> key location
>>>>> To: "infinispan -Dev List" <infinispan-dev at lists.jboss.org>
>>>>> Date: Thursday, March 18, 2010, 11:59 AM
>>>>> I've been thinking about how we
>>>>> handle this, and I think we have a problem with smart
>>>>> clients where clients have the ability to locate the key on
>>>>> the server cluster in order to direct the request to the
>>>>> specific node.
>>>>>
>>>>> The problem is in hash code calculation. The HotRod
>>>>> protocol caters for this with regards to calculating node
>>>>> address hash code by passing this in the topology map (see
>>>>> "Hasher Client Topology Change Header" in [1]), but the only
>>>>> way this can be meaningfully used is if the client has the
>>>>> ability to calculate the hash code of the key in the same
>>>>> manner the servers do. Firstly, this is hard when
>>>>> consumed by non-Java clients as you'd need to implement the
>>>>> way the JDK calculates the hash code of a byte array.Second, you'd
>>>>> need
>>>>> detailed and specific knowledge of any
>>>>> bit spreading that takes place within Infinispan - and this
>>>>> is internal implementation detail which may change from
>>>>> release to release.
>>>>>
>>>>> So the way I see it I can't see how non-Java clients will
>>>>> be able to locate keys and then direct requests to the
>>>>> necessary nodes. In fact, even with Java clients the
>>>>> only way this could be done would be to send back marshalled
>>>>> Addresses in the topology map, *and* have the same version
>>>>> of the Infinispan server libs installed on the client, *and*
>>>>> ensure that the same JDK/JVM version is used on the
>>>>> client.
>>>>> Can we think of a better way to do this? If not, is
>>>>> it worth still supporting client-side consistent hash based
>>>>> key location for the weird but vaguely workable scenario for
>>>>> Java-based clients?
>>>>>
>>>>> Thoughts?
>>>>>
>>>>> Cheers
>>>>> Manik
>>>>>
>>>>>
>>>>> [1] http://community.jboss.org/wiki/HotRodProtocol
>>>>> --
>>>>> Manik Surtani
>>>>> manik at jboss.org
>>>>> Lead, Infinispan
>>>>> Lead, JBoss Cache
>>>>> http://www.infinispan.org
>>>>> http://www.jbosscache.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>>
>>> --
>>> Galder Zamarreño
>>> Sr. Software Engineer
>>> Infinispan, JBoss Cache
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> --
>> Manik Surtani
>> manik at jboss.org
>> Lead, Infinispan
>> Lead, JBoss Cache
>> http://www.infinispan.org
>> http://www.jbosscache.org
>>
>>
>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> --
> Galder Zamarreño
> Sr. Software Engineer
> Infinispan, JBoss Cache
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
More information about the infinispan-dev
mailing list