[hibernate-dev] Hibernate Developer IRC meeting - 5/03

Fri May 4 07:57:56 EDT 2012

+1!
Agreed to be the a good library in the area it's critical to be user
friendly: good APIs, good docs and good quality, but to be the best
out there it's not enough: users demand top efficiency and are
extremely annoyed when they find a performance issue in our code, to
the point they will advertise problems loudly and discourage the use
of any ORM.

It shouldn't be hard to defend that using Hibernate won't slow you
down significantly.. still currently when this subject comes up in
public bar meetings or talks it's like opening pandora's vase, plus
OGM would be nonsense if we weren't sure we can improve on this; it's
not bad at all currently, but since you're having fun rewriting a
critical area it makes sense to design things from the ground up
keeping this "new age" requirements in consideration.

I don't think striving for top design and excellent performance are
necessarily conflicting goals.. make something to be very proud off!
;-)

Sanne

On 4 May 2012 12:41, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
> Yes, but we can't blindly go for the nicer approach: we are a library and hence used by many.
> Look at the bottleneck we found on Hibernate ORM due to the change of magnitude Hibernate OGM involved. Look at the perf issues we found in OGM itself just because I used the builder pattern for some objects using in critical paths. Same for hashCode that was not cached in some critical objects.
>
> I'm all fine with the nice approach if it's followed by a perf test before being pushed :)
>
> On 4 mai 2012, at 12:39, Hardy Ferentschik wrote:
>
>> Even taking the risk of pouring oil onto the fire, I think a simpler data structure wins in most cases over
>> the parallel arrays. It is much harder to use the latter and easier to make mistakes which leads to more
>> bugs and higher maintenance costs.
>>
>> As Sanne is saying performance questions are tricky. So many thing are happening with the code our days
>> before they are getting executed on the bare metal that it is hard to know what performance impacts a certain
>> change has. In the end you just have to measure.
>>
>> Personally I think we should primarily strive for a better and easier to use API. Oppertunities to optimizes arise
>> then often naturally.
>>
>> And now my dear disciples let me close with:
>> "The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet." — Michael A. Jackson
>>
>> :-)
>>
>> --Hardy
>>
>> On May 4, 2012, at 11:58 AM, Sanne Grinovero wrote:
>>
>>> tricky subject.
>>> I'm confident that there are many cases in which we should have used
>>> arrays rather than maps, especially for temporary objects which aren't
>>> short lived enough (an HashMap living in the scope of a single method
>>> is going to be cheap). We should have either objects allocated for
>>> very long (like forever in the scope of the SessionFactory), or very
>>> short.
>>>
>>> In the case of how we keep metadata, I think performance would be
>>> dominated not that much by the fact it's a slightly bigger object but
>>> by prefetching and what is going to be available in the cache lines
>>> you just have filled in: obviously cache is way faster than memory so
>>> being clever in the sequence you lay out your data structure could
>>> speed you up by a couple of orders of magnitude.
>>>
>>> Using primitives and array matrixes makes the data smaller, hence more
>>> likely to fit in the cache; but if using an array of objects in which
>>> each object collects the needed fields in one group, that's likely
>>> going to be faster.. but I'm making assumptions on how this structure
>>> is going to be read more frequently.
>>>
>>> For example when declaring a matrix as an [ ][ ], performance will be
>>> very different depending if you read by columns or rows - forgot which
>>> one is better now - but in that case if the common use case is using
>>> the slower path it's usually a good idea to invert the matrix.
>>>
>>> I'd love it if we could enter this space, or even if it's not suited
>>> for it, at least be considered "lite":
>>> http://stackoverflow.com/questions/10374735/lucene-and-ormlite-on-android
>>>
>>> Sanne
>>>
>>> On 4 May 2012 10:07, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>>> Performance I don't know, you are probably right. But memory wise, that could be way different.
>>>> Even ignoring the overhead of the object + pointer in memory, the alignment of boolean or other small objects would make a significant impact.
>>>>
>>>> Of course if we are talking about 20 values, we should not bother. But persisters and the like store more than 20 values and we have more than one persister / loader. It might be inconsequential in the end but that might be worth testing.
>>>>
>>>> On a related note it's up for debate whether or not putting data in a hash map for faster lookup later is worth it in all cases:
>>>>
>>>> - it takes much more space than raw arrays
>>>> - array scan might be as fast or faster for a small enough array. As we have seen in Infinispan and OGM, computing a hash is not a cheap operation.
>>>>
>>>> Again this require testing but I am guilty as charge of using collections in AnnotationBinder when doing some computations that would be better off written as an array + array scan.
>>>>
>>>>
>>>> On 3 mai 2012, at 19:32, Steve Ebersole wrote:
>>>>
>>>>> I seriously doubt the performance cost of 20 'parallel arrays' versus 1 array of Objects holding those 20 values is anything but negligible at best.
>>
>>
>> _______________________________________________
>> hibernate-dev mailing list
>> hibernate-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>
>
> _______________________________________________
> hibernate-dev mailing list
> hibernate-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev