[hibernate-issues] [Hibernate-JIRA] Commented: (HSEARCH-194) Inconsistent performance between hibernate search and pure lucene access

Thu May 29 10:20:33 EDT 2008

    [ http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_30277 ] 

Sanne Grinovero commented on HSEARCH-194:
-----------------------------------------

I should enhance the priority for my Reader, the one I had spoken about at Orlando; I have much code ready but it's not based on Search's architecture at all, so it needs more work.
I froze the developing on it as I'm not sure the transactional safety you want to guarantee in Search, with my design I really can't guarantee
the indexreader being always up-to date, it must be acceptable to not see some changes already committed even on new transactions;
repeatable reads is easy, consistency with database contents is not; also in my specifical case we never remove from DB, that makes stuff simpler.

I don't understand your last post, if you think the scalability bottleneck is the indexReader why is it different from using Lucene directly?
I was suspecting more about the DirectoryProvider locks (which I still didn't understand fully as locking code is all around),
or maybe the isCurrent() check you do on each new request for SharedReaderProvider.openReader.

I'm going to experiment with a SharedReaderProvider capable of skipping isCurrent() if it has already done so "recently" (configurable parm?),
but I don't like to commit experiments; is there same way for us to share experimental/concept code (a sandbox)?
Also the tests Emmanuel made would be useful.

Also could someone explain to me this comment in SharedReaderProvider:
"directoryProviderLock.lock(); //needed for same problem as the double-checked locking"
I know about the (in)famous double-checked locking issue but don't see the comparison;
this comment is certainly about visibility? it is not blocking other threads to concurrently build several readers,
and all readers being opened want to load the data locking the FS;
It is possible the performance bottleneck is resolved actually keeping the lock until longer,
until the ==null and isCurrent() are done and (eventually) a new reader is created.
wrong? probably some other lock I'm not seeing is there around;
in that case we use a ConcurrentHashMap for the HashMap and remove the locking.

> Inconsistent performance between hibernate search and pure lucene access
> ------------------------------------------------------------------------
>
>                 Key: HSEARCH-194
>                 URL: http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-194
>             Project: Hibernate Search
>          Issue Type: Bug
>          Components: query
>    Affects Versions: 3.0.1.GA
>         Environment: Linux - Hibernate 3.2.6, Hibernate Annotations 3..3.1 - Lucene 2.3.1
>            Reporter: Stephane Nicoll
>            Priority: Critical
>         Attachments: Monitor_Usage_Statistics.html
>
>
> I have a simple index that contains:
> * id (pk of the entity)
> * keywords (a list of tokens)
> The index contains 100.000 objects and the keywords field has 2 tokens from a list of 40 different values
> What I want to do is retrieve all the IDs that matches a given lucene query on the keywords. So for that I'm doing something like:
> FullTextSession fullTextSession = Search.createFullTextSession(session);
> QueryParser parser = new QueryParser("keywords", luceneAnalyzer);
> org.apache.lucene.search.Query hibernateQuery = parser.parse("foo AND bar");
> FullTextQuery fullTextQuery = fullTextSession.createFullTextQuery(hibernateQuery, target);
> fullTextQuery.setProjection("id");
> fullTextQuery.setResultTransformer(resultTransformer);
> Iterator it = fullTextQuery.iterate();
> Where ResultTransformer is
> private static class FirstObjectResultTransformer implements ResultTransformer {
>         public Object transformTuple(Object[] objects, String[] strings) {
>             return objects[0];
>         }
>         public List transformList(List list) {
>             return list;
>         }
>     }
> If I do a load test with a single thread, the execution time of my lucene query is around 200 msec. If I do a load test with 10 threads, the execution time is 2 sec (per user!). If I run the profiler on the service, I see lots of deadocks on SegmentReader.
> Switching to a "non-shared" strategy removes the deadlocks but it's still slow (1.5 sec).
> Now, If I execute the same query on the same index and the same host with only the lucene API, the query takes around 100msec with 10 concurrent users. I tried to use the lucene API from Hibernate Search but it did not change anything.
> What am I missing? Attached the profiling result.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://opensource.atlassian.com/projects/hibernate/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira