[
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-121?pag...
]
John Griffin commented on HSEARCH-121:
--------------------------------------
OK, guys here's my 2 1\2 cents. Everything I say here is in general terms. Speed of
searching is dependent on the number of file the index is spread across. The larger the
segment file count, the slower searches are. As far as indexing speed is concerned it also
is dependent on segment file count but in inverse proportion to search speed. As search
speed goes up indexing speed goes down and vice versa. That said, the three settings
maxMergeDocs, maxBufferedDocs and mergeFactor all effect these two speeds AND each other.
I have never had occasion to change maxMergeDocs from its default so I won't talk
about that one. If you need me to let me know.
I've included an Open Office spreadsheet with some test data where I vary these two
quantities while keeping the other constant (basic linear algebra, keep all variables
constant except the one you're testing).
Looking at graph 1 as mergeFactor increases the number of segment files also increases
almost in direct proportion to the mergeFactor number. earches are slower because of the
file count increase but indexing is faster due to less merging.
Graph 2 shows maxBufferDocs as documents are added. This variable governs how many
segments file are open and any one time and also the total number of files allowed open in
the index directory. This graph is a little hard to read so I'll try to come up with
something better (emmanuel ;>) ). This graph shows that as maxBufferedDocs becomes
larger it takes longer for the segment count to increase. From the fact that more
documents are in a smaller number of files and merges take place in a RAMDirectory
indexing is faster but the increased count of other files (besides segments) causes
searching to be slower. The bottom line is that users are going to have to experiment with
these quantities to determine the best settings for their environment. The values given in
the Javadocs are just for starters.
These values DO effect optimization since optimization is basically a merge process.
That's why I've always suggested that batch builds have one set of values and
interactive indexes have others. That way you get the best performance regardless of what
you're doing.
Hope this helps. If you need anything else let me know.
John G.
Should optimize set batch to true (for indexWriters)?
-----------------------------------------------------
Key: HSEARCH-121
URL:
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-121
Project: Hibernate Search
Issue Type: Bug
Components: documentation, engine
Reporter: Emmanuel Bernard
Assignee: John Griffin
Attachments: mergefactorGraph.ods
John, can you provide your lights on that issue.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://opensource.atlassian.com/projects/hibernate/secure/Administrators....
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira