[infinispan-dev] CHM or CHMv8?

Galder Zamarreño galder at redhat.com
Thu Apr 25 06:44:51 EDT 2013


On Apr 22, 2013, at 1:37 PM, Sanne Grinovero <sanne at infinispan.org> wrote:

> We also have been toying with the idea to hash each key only once,
> instead of both with the consistent hash (to assign the node owner)
> and once in the CHM backing the datacontainer.

^ With the new EquivalentCHMv8, I think we can make this happen quite easily, cos we provide a function that is used to calculate the hash of an object. Hence, we can think of a way to plug what you suggest there.

> I doubt we need the datacontainer to implement Map at all, but at
> least if we go this way we don't want the hash to be affected by the
> VM instance or different nodes won't agree on the expected owner ;-)
> 
> Also there where reports of it having a very bad impact on
> performance, I'm not sure if they where resolved yet, or are going to
> be resolved at all as it was important for security reasons.
> 
> 
> On 22 April 2013 12:19, Dan Berindei <dan.berindei at gmail.com> wrote:
>> Right. If we have anywhere a map that's initialized from a single thread and
>> then accessed only for reading from many threads, it probably makes sense to
>> use a HashMap and wrap it in an UnmodifiableMap. But if it can be written
>> from multiple threads as well, I think we should use a CHMV8.
>> 
>> BTW, the HashMap implementation in OpenJDK 1.7 seems to have some
>> anti-collision features (a VM-dependent hash code generator for Strings),
>> but our version of CHMV8 doesn't. Perhaps we need to upgrade to the latest
>> CHMV8 version?
>> 
>> 
>> 
>> On Fri, Apr 19, 2013 at 4:32 PM, David M. Lloyd <david.lloyd at redhat.com>
>> wrote:
>>> 
>>> On 04/19/2013 08:22 AM, Sanne Grinovero wrote:
>>>> On 19 April 2013 13:52, David M. Lloyd <david.lloyd at redhat.com> wrote:
>>>>> On 04/19/2013 05:17 AM, Sanne Grinovero wrote:
>>>>>> On 19 April 2013 11:10, Dan Berindei <dan.berindei at gmail.com> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Apr 19, 2013 at 12:58 PM, Sanne Grinovero
>>>>>>> <sanne at infinispan.org>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> On 19 April 2013 10:37, Dan Berindei <dan.berindei at gmail.com> wrote:
>>>>>>>>> Testing mixed read/write performance with capacity 100000, keys
>>>>>>>>> 300000,
>>>>>>>>> concurrency level 32, threads 12, read:write ratio 99:1
>>>>>>>>> Container CHM           Ops/s 5178894.77  Gets/s 5127105.82  Puts/s
>>>>>>>>> 51788.95  HitRatio      86.23  Size     177848  stdDev   60896.42
>>>>>>>>> Container CHMV8         Ops/s 5768824.37  Gets/s 5711136.13  Puts/s
>>>>>>>>> 57688.24  HitRatio      84.72  Size     171964  stdDev   60249.99
>>>>>>>> 
>>>>>>>> Nice, thanks.
>>>>>>>>> 
>>>>>>>>> The test is probably limited by the 1% writes, but I think it does
>>>>>>>>> show
>>>>>>>>> that
>>>>>>>>> reads in CHMV8 are not slower than reads in OpenJDK7's CHM.
>>>>>>>>> I haven't measured it, but the memory footprint should also be
>>>>>>>>> better,
>>>>>>>>> because it doesn't use segments any more.
>>>>>>>>> 
>>>>>>>>> AFAIK the memoryCHMV8 also uses copy-on-write at the bucket level,
>>>>>>>>> but
>>>>>>>>> we
>>>>>>>>> could definitely do a pure read test with a HashMap to see how big
>>>>>>>>> the
>>>>>>>>> performance difference is.
>>>>>>>> 
>>>>>>>> By copy-on-write I didn't mean on the single elements, but on the
>>>>>>>> whole map instance:
>>>>>>>> 
>>>>>>>> private volatile HashMap configuration;
>>>>>>>> 
>>>>>>>> synchronized addConfigurationProperty(String, String) {
>>>>>>>>       HashMap newcopy = new HashMap( configuration ):
>>>>>>>>       newcopy.put(..);
>>>>>>>>       configuration = newcopy;
>>>>>>>> }
>>>>>>>> 
>>>>>>>> Of course that is never going to scale for writes, but if writes
>>>>>>>> stop
>>>>>>>> at runtime after all services are started I would expect that the
>>>>>>>> simplicity of the non-threadsafe HashMap should have some benefit
>>>>>>>> over
>>>>>>>> CHM{whatever}, or it would have been removed already?
>>>>>>>> 
>>>>>>> 
>>>>>>> Right, we should be able to tell whether that's worth doing with a
>>>>>>> pure read
>>>>>>> test with a CHMV8 and a HashMap :)
>>>>>> 
>>>>>> IFF you find out CHMV8 is as good as HashMap for read only, you have
>>>>>> two options:
>>>>>>   - ask the JDK team to drop the HashMap code as it's no longer
>>>>>> needed
>>>>>>   - fix your benchmark :-P
>>>>>> 
>>>>>> In other words, I'd consider it highly surprising and suspicious
>>>>>> (still interesting though!)
>>>>> 
>>>>> It's not as surprising as you think.  On x86, volatile reads are the
>>>>> same as regular reads (not counting some possible reordering magic).
>>>>> So
>>>>> if a CHM read is a hash, an array access, and a list traversal, and so
>>>>> is HM (and I believe this is true though I'd have to review the code
>>>>> again to be sure), I'd expect very similar execution performance on
>>>>> read.  I think some of the anti-collision features in V8 might come
>>>>> into
>>>>> play under some circumstances though which might affect performance in
>>>>> a
>>>>> negative way (wrt the constant big-O component) but overall in a
>>>>> positive way (by turning the linear big-O component into a logarithmic
>>>>> one).
>>>> 
>>>> Thanks David. I know about the cost of a volatile read, what I'm
>>>> referring to
>>>> is that I would expect the non-concurrent Maps to generally contain some
>>>> simpler code than a conccurrent one. If this was not the case,
>>>> why would any JDK team maintain two different implementations?
>>>> That's why I would consider it surprising if it turned out that the
>>>> CHMV8 was
>>>> superior over a regular one on all fronts: there certainly is some
>>>> scenario in which the regular one would be a more appropriate choice,
>>>> which directly proofs that blindly replacing all usages in a large
>>>> project
>>>> is not optimal. Of course, it might be close to optimal..
>>> 
>>> You are right, it is not superior on all fronts.  It is definitely
>>> similar in terms of read, but writes will have a substantially higher
>>> cost, involving (at the very least) multiple volatile writes which are
>>> orders of magnitude more expensive than normal writes (on Intel they
>>> have the costly impact of memory fence instructions).  So I don't think
>>> anyone will want to drop HashMap any time soon. :-)
>>> 
>>> --
>>> - DML
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarreño
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org




More information about the infinispan-dev mailing list