On Mon, Nov 23, 2015 at 2:16 PM, Radim Vansa <rvansa(a)redhat.com> wrote:
On 11/23/2015 01:07 PM, Sanne Grinovero wrote:
> +1
>
> See also
https://issues.jboss.org/browse/ISPN-3905 ; although I was
> mostly concerned on it allocating on a (very) hot path and didn't look
> at it in terms of compatibility modes.
Yes, due to compatibility we cannot remove the UTF-8 encoding from
MurmurHash3 since compatibility with clients (in other languages)
depends on this as well, though, we could theoretically merge encoding
and hashing into one function - UTF-8 encoder implementation looks quite
simple (60 lines of code?) - could be worth it even if used only for
server. However, my proposal was to remove that computation from
hot-code path completely.
>
> Rather than xorring with "magic numbers" don't you think we
> Equivalence implementation should be able to rule on that?
We shouldn't require user to provide a pair of hashCode functions, I
don't think that would work well in practice. Though, we could make the
second function Java 8-default method (with return hashCode() ^
0xWHATEVER), still allowing it to be overridable.
The JDK team learned long ago to use a spreader on top of
user-supplied hashCode() implementations, as user-supplied hash codes
are usually very clustered. In the case of strings, many times a
common prefix makes up most of the key, and the hash codes of the keys
are again clustered. A XOR with a magic value would definitely not
help with the clustering issue, that's why java.util.HashMap doesn't
use it.
Note that our consistent hashes map adjacent keys to the same segment:
we use hash / buckets, whereas HashMap uses hash % buckets. So we
require a better spread across the hash space than HashMap does, and
because of that I think we really need MurmurHash3. Still, we could
change it to work on the result of Equivalence.hashCode(Object),
instead of dealing with the contents of byte[] and String directly,
but maintaining compatibility with old clients may not be possible.
Regarding client-server divergences, I think we require compatibility
mode to be enabled in order to access a cache both via HotRod and with
the embedded API (because the server casts keys and values to byte[]).
That means the distribution interceptor sees only the unmarshalled
key, and getting the same hash code from the marshalled byte[] (on the
client) and the unmarshalled Object (in the distribution interceptor)
is going to be quite complex - either with a custom Object.hashCode()
implementation, or with a custom Equivalence.hash(). I think the only
way around this would be to change compatibility mode to store keys
and values as byte[].
Cheers
Dan
Radim
>
> On 23 November 2015 at 10:26, Radim Vansa <rvansa(a)redhat.com> wrote:
>> Hi guys,
>>
>> I have noticed that even in library mode we use MurmurHash3 to find out
>> the segment for particular key. For strings, this involves encoding into
>> UTF-8 and computation of hashCode, instead of just reading the cached
>> value in string. Common objects just remix the bits of hashCode. When
>> user provides custom Equivalence with non-default hashCode, it is not
>> used to determine the segment.
>>
>> I think that in library mode we should rather use Equivalence.hashCode,
>> maybe XORed with some magic number so that there are less collisions in
>> DataContainer.
>>
>> If we simply replaced the function in CH, we would break the case when
>> user starts HR server on top of library mode, as the clients expect key
>> location based on MurmurHash3. ATM user only has to set
>> AnyServerEquivalence for keys in DC; we would need to detect
>> configuration with server equivalence and set CH function to MH3, and
>> probably also log some warning if the equivalence is set to unknown
>> class and CH function is not specified.
>>
>> WDYT?
>>
>> Radim
>>
>> --
>> Radim Vansa <rvansa(a)redhat.com>
>> JBoss Performance Team
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev(a)lists.jboss.org
>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Radim Vansa <rvansa(a)redhat.com>
JBoss Performance Team
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev