[hibernate-dev] Runtime efficiency: sharing some profiling output

Tuesday, 2 October 2012

Hi all,
I had this draft email which I apparently never sent, about some
profiling metrics I've taken in May; so relatively outdated. Also
while it stresses Hibernate ORM heavily, my goal was to check sanity
of Hibernate Search; I'm not sure anymore what version it was exactly,
but for sure it was whatever was just released in beginning of May.
Since we're discussing performance now, maybe it's worth sharing after all.
To get to these observations I run it several times with a mixture of
JProfiler, OProfile, JConsole.

Cheers,
Sanne

======

I'm having some profiler runs on a MassIndexer job, loading the
Wikipedia dump out of a MySQL instance and dumping it into a Lucene
index; it's all run on my laptop, the required disk size is at around
120GB to run this test.

This job can take a while: just running the initial COUNT(*) operation
used by Hibernate Search to estimate to overall effort takes ~40
minutes; both MySQL and Lucene indexes are sharing the same single SSD
drive (a Plextor M3).

The good news is that we're pretty efficient, I don't see any mayor
bottleneck - glad so as we didn't have big problems last time I
checked a month ago - it's actually very nice to see how efficient we
are today: my performance is constrained by I/O and the average CPU
usage fluctuates between 5% and 20%.

As expected the Lucene text processing is the peak operations; the top
5 CPU consumers are in the org.apache.lucene code; adding them up,
Lucene takes around 33% of total CPU time (relative to all the CPU
this JVM is taking). Looking into what Lucene is doing, it seems that
we could help it a bit by pre-interning the field names and keeping
these in a constants pool -> HSEARCH-1127.

Looking further for resource consumers, a note for personal
satisfaction is that no classes from org.hibernate.search can be found
on the first page of CPU consumers, but LuceneOptionsImpl is being
highlighted as a hotspot in the object allocation view -> HSEARCH-1128

=== Looking now into Hibernate ORM:

# IdentityMap
Some performance is lost in the well known
org.hibernate.engine.internal.StatefulPersistenceContext; especially
the IdentityMap and it's need for System.identityHashCode is
demanding.
New in this run is the surprising cost of #clear() : 0,5% CPU.. not
alarming, but it looks like I can blame again IdentityMap,
specifically: concurrentEntries.

# DefaultEntityAliases.intern
Why are we interning this?
org.hibernate.loader.DefaultEntityAliases.intern(String[])
If they are constants, I guess they could be reused rather than
re-computed all the time.. in fact, see next point:

# Aliases and other string generations
It seems Hibernate ORM is generating property aliases and other little
Strings all the time; this test isn't using any HQL, just direct
loading of entities via primary keys or by following relations from
other managed entities.

=== Looking into allocation hot-spots:

 #
org.hibernate.type.descriptor.sql.IntegerTypeDescriptor.getExtractor(JavaTypeDescriptor<X>)
 Could we avoid returning a new instance on each demand ?

 # iterators on
org.hibernate.internal.util.collections.ConcurrentReferenceHashMap.ConcurrentReferenceHashMap
 Why are we having a concurrent Map in the StatefulPersistenceContext
? I thought this class wasn't meant to be shared among multiple
threads?

 # Proxy of resultSets
 This one was tricky. It seems we generate small sized, but huge
amounts of small primitive wrappers.
It turns out that this happens while extracting values from the
resulting java.sql.ResultSet after any query: the return types of
methods such as #getInt and #getBoolean are obviously defeined as
primitives.
Consistently to this API (couldn't be otherwise as it's the interface)
all points in our code using ResultSet expect a primitive. But the
InvocationHandler#invoke mandates returning an Object, and we're
applying an interceptor to the actual ResultSet via
org.hibernate.engine.jdbc.internal.proxy.AbstractProxyHandler.
  In short, it turns out that every time such a method is invoked, the
JVM seems to do autoboxing+unboxing instead of being clever.. could
there be a way to avoid autoboxing for each field being read from the
database? Hironically, most of the times we actually need to box it
right after again, so an alternative would be to have it box only
once?

 # Hibernate Search DocumentBuilder
 Creation of Lucene Documents is relatively allocation-intensive.. I
guess it's expected but maybe worth a closer look in the future.

=== Thread contention:
it looks like the BlockingQueue is a huge waste of CPU - I'll need to
change this structure for something better suited.

== Conclusions..
Nothing which needs immediate attention, but hope it gives some
ideas.. Why I did it then? Need to run some tests on the Infinispan
Lucene Directory, need to rebuild my big index to collect some up to
date metrics.. might as well have a look if I can make it work a bit
faster :D

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[hibernate-dev] Runtime efficiency: sharing some profiling output