Hello,
I was playing with a "standard" FieldCache, having as goal to
accelerate the extraction of what we need mostly: classtype and
primary Id.
Focusing on classtype only, it seems that we have to be careful with
FieldCaches:
1 - Strings are extracted as different instances, copying over and
over the same value
2 - The loading via a FieldCache always loads *all* elements from a
given segment, resulting in IndexReader.docMax()
so it's quite possible that this approach takes hostage of a lot of
memory. It's all kept via soft references, so shouldn't really be a
dangerous business, but if these soft references are cleared
frequently that really defeats the purpose, as the FieldCache being
enabled would force extraction again of all values each time they are
needed. They won't just load the values we need over and over, but all
values from the index, including all which are not being matched by
the current query.
So to cope with this, I've now built a custom fieldCache and a
specialized Collector, which extract the data and keep it in memory
not as String but as Class instances, which have the advantage of
pointing to the same instances.
This gets with more reasonable numbers in terms of memory usage, but
now I have tried to get some performance number which justifies the
memory usage:
very bad news, it's around 100X slower when using the cache than
without (considering the warmed up cache only).
profiling, it's all about an innocently looking Map.put() which I'm
now using in the custom collector.
Basically, I'm going to need to change the approach, or kill the
feature. I'm now collecting values into an array and will see how it
goes.
In case someone wants to have a look, there's a working branch here:
https://github.com/Sanne/hibernate-search/tree/HSEARCH-531
and this is the line which is taking now 93% CPU time during a fulltext search:
https://github.com/Sanne/hibernate-search/commit/735c4644834cc516effa33f6...
(Hardy beware of your map usage! this is worse than we expected.. you
definitely have to switch to a per-field collector instance if
possible)
Cheers,
Sanne