[hibernate-issues] [Hibernate-JIRA] Commented: (HSEARCH-194) Inconsistent performance between hibernate search and pure lucene access

Emmanuel Bernard (JIRA) noreply at atlassian.com
Thu May 29 14:29:33 EDT 2008


    [ http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_30279 ] 

Emmanuel Bernard commented on HSEARCH-194:
------------------------------------------

About a new ReaderProvider, let's start a discussion on the dev mailing list. With some design ideas.

The scalability issues I saw compared to Lucene were due to the fact that my Lucene test did not reuse the IndexSearcher. It had one searcher per Thread. Which made it much faster in the end on my duo-core and on this extreme test (100 // queries)

The DirectoryProvider itself is locked when you update data, not when you read. Pure read (ReaderProvider) are not locked in that regard. But for the structure keeping the cached IndexReaders for a given DirectoryProvider, you need to have a lock around. "//needed for same problem as the double-checked locking" is basically ensuring a memory barrier to make sure the instructions are not shuffled by the VM.

I will commit my test, probably today or tomorrow. You can commit the experiment on the test directory and we can move it eventually. From he tests I've done, the isCurrent was not the issue really.



> Inconsistent performance between hibernate search and pure lucene access
> ------------------------------------------------------------------------
>
>                 Key: HSEARCH-194
>                 URL: http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-194
>             Project: Hibernate Search
>          Issue Type: Bug
>          Components: query
>    Affects Versions: 3.0.1.GA
>         Environment: Linux - Hibernate 3.2.6, Hibernate Annotations 3..3.1 - Lucene 2.3.1
>            Reporter: Stephane Nicoll
>            Priority: Critical
>         Attachments: Monitor_Usage_Statistics.html
>
>
> I have a simple index that contains:
> * id (pk of the entity)
> * keywords (a list of tokens)
> The index contains 100.000 objects and the keywords field has 2 tokens from a list of 40 different values
> What I want to do is retrieve all the IDs that matches a given lucene query on the keywords. So for that I'm doing something like:
> FullTextSession fullTextSession = Search.createFullTextSession(session);
> QueryParser parser = new QueryParser("keywords", luceneAnalyzer);
> org.apache.lucene.search.Query hibernateQuery = parser.parse("foo AND bar");
> FullTextQuery fullTextQuery = fullTextSession.createFullTextQuery(hibernateQuery, target);
> fullTextQuery.setProjection("id");
> fullTextQuery.setResultTransformer(resultTransformer);
> Iterator it = fullTextQuery.iterate();
> Where ResultTransformer is
> private static class FirstObjectResultTransformer implements ResultTransformer {
>         public Object transformTuple(Object[] objects, String[] strings) {
>             return objects[0];
>         }
>         public List transformList(List list) {
>             return list;
>         }
>     }
> If I do a load test with a single thread, the execution time of my lucene query is around 200 msec. If I do a load test with 10 threads, the execution time is 2 sec (per user!). If I run the profiler on the service, I see lots of deadocks on SegmentReader.
> Switching to a "non-shared" strategy removes the deadlocks but it's still slow (1.5 sec).
> Now, If I execute the same query on the same index and the same host with only the lucene API, the query takes around 100msec with 10 concurrent users. I tried to use the lucene API from Hibernate Search but it did not change anything.
> What am I missing? Attached the profiling result.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://opensource.atlassian.com/projects/hibernate/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        



More information about the hibernate-issues mailing list