Sanne Grinovero commented on HSEARCH-194:
I should enhance the priority for my Reader, the one I had spoken about at Orlando; I have
much code ready but it's not based on Search's architecture at all, so it needs
more work.
I froze the developing on it as I'm not sure the transactional safety you want to
guarantee in Search, with my design I really can't guarantee
the indexreader being always up-to date, it must be acceptable to not see some changes
already committed even on new transactions;
repeatable reads is easy, consistency with database contents is not; also in my specifical
case we never remove from DB, that makes stuff simpler.
I don't understand your last post, if you think the scalability bottleneck is the
indexReader why is it different from using Lucene directly?
I was suspecting more about the DirectoryProvider locks (which I still didn't
understand fully as locking code is all around),
or maybe the isCurrent() check you do on each new request for
I'm going to experiment with a SharedReaderProvider capable of skipping isCurrent() if
it has already done so "recently" (configurable parm?),
but I don't like to commit experiments; is there same way for us to share
experimental/concept code (a sandbox)?
Also the tests Emmanuel made would be useful.
Also could someone explain to me this comment in SharedReaderProvider:
"directoryProviderLock.lock(); //needed for same problem as the double-checked
I know about the (in)famous double-checked locking issue but don't see the
this comment is certainly about visibility? it is not blocking other threads to
concurrently build several readers,
and all readers being opened want to load the data locking the FS;
It is possible the performance bottleneck is resolved actually keeping the lock until
until the ==null and isCurrent() are done and (eventually) a new reader is created.
wrong? probably some other lock I'm not seeing is there around;
in that case we use a ConcurrentHashMap for the HashMap and remove the locking.
Inconsistent performance between hibernate search and pure lucene
Key: HSEARCH-194
Project: Hibernate Search
Issue Type: Bug
Components: query
Affects Versions: 3.0.1.GA
Environment: Linux - Hibernate 3.2.6, Hibernate Annotations 3..3.1 - Lucene
Reporter: Stephane Nicoll
Priority: Critical
Attachments: Monitor_Usage_Statistics.html
I have a simple index that contains:
* id (pk of the entity)
* keywords (a list of tokens)
The index contains 100.000 objects and the keywords field has 2 tokens from a list of 40
different values
What I want to do is retrieve all the IDs that matches a given lucene query on the
keywords. So for that I'm doing something like:
FullTextSession fullTextSession = Search.createFullTextSession(session);
QueryParser parser = new QueryParser("keywords", luceneAnalyzer);
org.apache.lucene.search.Query hibernateQuery = parser.parse("foo AND bar");
FullTextQuery fullTextQuery = fullTextSession.createFullTextQuery(hibernateQuery,
Iterator it = fullTextQuery.iterate();
Where ResultTransformer is
private static class FirstObjectResultTransformer implements ResultTransformer {
public Object transformTuple(Object[] objects, String[] strings) {
return objects[0];
public List transformList(List list) {
return list;
If I do a load test with a single thread, the execution time of my lucene query is around
200 msec. If I do a load test with 10 threads, the execution time is 2 sec (per user!). If
I run the profiler on the service, I see lots of deadocks on SegmentReader.
Switching to a "non-shared" strategy removes the deadlocks but it's still
slow (1.5 sec).
Now, If I execute the same query on the same index and the same host with only the lucene
API, the query takes around 100msec with 10 concurrent users. I tried to use the lucene
API from Hibernate Search but it did not change anything.
What am I missing? Attached the profiling result.
This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see: