[hibernate-issues] [Hibernate-JIRA] Commented: (HSEARCH-121) Should optimize set batch to true (for indexWriters)?

Tue Oct 23 13:26:38 EDT 2007

    [ http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_28519 ] 

John Griffin commented on HSEARCH-121:
--------------------------------------

OK, guys here's my 2 1\2 cents. Everything I say here is in general terms. Speed of searching is dependent on the number of file the index is spread across. The larger the segment file count, the slower searches are. As far as indexing speed is concerned it also is dependent on segment file count but in inverse proportion to search speed. As search speed goes up indexing speed goes down and vice versa. That said, the three settings maxMergeDocs, maxBufferedDocs and mergeFactor all effect these two speeds AND each other. I have never had occasion to change maxMergeDocs from its default so I won't talk about that one. If you need me to let me know.

I've included an Open Office spreadsheet with some test data where I vary these two quantities while keeping the other constant (basic linear algebra, keep all variables constant except the one you're testing). 

Looking at graph 1 as mergeFactor increases the number of segment files also increases almost in direct proportion to the mergeFactor number. earches are slower because of the file count increase but indexing is faster due to less merging.

Graph 2 shows maxBufferDocs as documents are added. This variable governs how many segments file are open and any one time and also the total number of files allowed open in the index directory. This graph is a little hard to read so I'll try to come up with something better (emmanuel ;>) ). This graph shows that as maxBufferedDocs becomes larger it takes longer for the segment count to increase. From the fact that more documents are in a smaller number of files and merges take place in a RAMDirectory indexing is faster but the increased count of other files (besides segments) causes searching to be slower. The bottom line is that users are going to have to experiment with these quantities to determine the best settings for their environment. The values given in the Javadocs are just for starters.

These values DO effect optimization since optimization is basically a merge process. That's why I've always suggested that batch builds have one set of values and interactive indexes have others. That way you get the best performance regardless of what you're doing.

Hope this helps. If you need anything else let me know.

John G.

> Should optimize set batch to true (for indexWriters)?
> -----------------------------------------------------
>
>                 Key: HSEARCH-121
>                 URL: http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-121
>             Project: Hibernate Search
>          Issue Type: Bug
>          Components: documentation, engine
>            Reporter: Emmanuel Bernard
>            Assignee: John Griffin
>         Attachments: mergefactorGraph.ods
>
>
> John, can you provide your lights on that issue.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://opensource.atlassian.com/projects/hibernate/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira