Author: hardy.ferentschik
Date: 2008-12-02 11:33:47 -0500 (Tue, 02 Dec 2008)
New Revision: 15642
Modified:
search/trunk/doc/reference/en/modules/batchindex.xml
search/trunk/doc/reference/en/modules/lucene-native.xml
search/trunk/doc/reference/en/modules/optimize.xml
Log:
HSEARCH-303
Modified: search/trunk/doc/reference/en/modules/batchindex.xml
===================================================================
--- search/trunk/doc/reference/en/modules/batchindex.xml 2008-12-02 15:11:04 UTC (rev
15641)
+++ search/trunk/doc/reference/en/modules/batchindex.xml 2008-12-02 16:33:47 UTC (rev
15642)
@@ -22,8 +22,8 @@
~ 51 Franklin Street, Fifth Floor
~ Boston, MA 02110-1301 USA
-->
-
-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
+"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
<chapter id="search-batchindex">
<!-- $Id$ -->
@@ -32,37 +32,36 @@
<section id="search-batchindex-indexing">
<title>Indexing</title>
- <para>It is sometimes useful to index an object even if this object is not
- inserted nor updated to the database. This is especially true when you
- want to build your index for the first time. You can achieve that goal
- using the <classname>FullTextSession</classname>.</para>
+ <para>It is sometimes useful to index an entity even if this entity is not
+ inserted or updated to the database. This is for example the case when you
+ want to build your index for the first time.
+
<classname>FullTextSession</classname>.<methodname>index()</methodname>
+ allows you to do so.</para>
- <programlisting>FullTextSession fullTextSession =
Search.getFullTextSession(session);
+ <example>
+ <title>Indexing an entity via
+ <methodname>FullTextSession.index()</methodname></title>
+
+ <programlisting>FullTextSession fullTextSession =
Search.getFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
for (Customer customer : customers) {
<emphasis
role="bold">fullTextSession.index(customer);</emphasis>
}
tx.commit(); //index are written at commit time </programlisting>
+ </example>
<para>For maximum efficiency, Hibernate Search batches index operations
- and executes them at commit time (Note: you don't need to use
- <classname>org.hibernate.Transaction</classname> in a JTA
- environment).</para>
+ and executes them at commit time. If you expect to index a lot of data,
+ however, you need to be careful about memory consumption since all
+ documents are kept in a queue until the transaction commit. You can
+ potentially face an <classname>OutOfMemoryException</classname>. To
avoid
+ this exception, you can use
+ <methodname>fullTextSession.flushToIndexes()</methodname>. Every time
+ <methodname>fullTextSession.flushToIndexes()</methodname> is called (or
if
+ the transaction is committed), the batch queue is processed (freeing
+ memory) applying all index changes. Be aware that once flushed changes
+ cannot be rolled back.</para>
- <para>If you expect to index a lot of data, you need to be careful about
- memory consumption: since all documents are kept in a queue until the
- transaction commit, you can potentially face an
- <classname>OutOfMemoryException</classname>.</para>
-
- <para>To avoid that, you can use
- <methodname>fullTextSession.flushToIndexes()</methodname>: all index
- operations are queued until
- <methodname>fullTextSession.flushToIndexes()</methodname> is called.
Every
- time <methodname>fullTextSession.flushToIndexes()</methodname> is called
- (or if the transaction is committed), the queue is processed (freeing
- memory) and emptied. Be aware that changes made before a flush cannot be
- rollbacked. </para>
-
<note>
<para><literal>hibernate.search.worker.batch_size</literal> has
been
deprecated in favor of this explicit API which provides better
@@ -70,26 +69,43 @@
</note>
<para>Other parameters which also can affect indexing time and memory
- consumption are
-
<literal>hibernate.search.[default|<indexname>].indexwriter.batch.max_buffered_docs</literal>
- ,
-
<literal>hibernate.search.[default|<indexname>].indexwriter.batch.max_field_length</literal>
- ,
-
<literal>hibernate.search.[default|<indexname>].indexwriter.batch.max_merge_docs</literal>
- ,
-
<literal>hibernate.search.[default|<indexname>].indexwriter.batch.merge_factor</literal>
- ,
-
<literal>hibernate.search.[default|<indexname>].indexwriter.batch.ram_buffer_size</literal>
- and
-
<literal>hibernate.search.[default|<indexname>].indexwriter.batch.term_index_interval</literal>
- . These parameters are Lucene specific and Hibernate Search is just
+ consumption are:</para>
+
+ <itemizedlist>
+ <listitem>
+
<literal>hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].max_buffered_docs</literal>
+ </listitem>
+
+ <listitem>
+
<literal>hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].max_field_length</literal>
+ </listitem>
+
+ <listitem>
+
<literal>hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].max_merge_docs</literal>
+ </listitem>
+
+ <listitem>
+
<literal>hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].merge_factor</literal>
+ </listitem>
+
+ <listitem>
+
<literal>hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].ram_buffer_size</literal>
+ </listitem>
+
+ <listitem>
+
<literal>hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].term_index_interval</literal>
+ </listitem>
+ </itemizedlist>
+
+ <para>These parameters are Lucene specific and Hibernate Search is just
passing these parameters through - see <xref
linkend="lucene-indexing-performance" /> for more details.</para>
- <para>Here is an especially efficient way to index a given class (useful
- for index (re)initialization):</para>
+ <example>
+ <title>Efficiently indexing a given class (useful for index
+ (re)initialization)</title>
- <programlisting>fullTextSession.setFlushMode(FlushMode.MANUAL);
+ <programlisting>fullTextSession.setFlushMode(FlushMode.MANUAL);
fullTextSession.setCacheMode(CacheMode.IGNORE);
transaction = fullTextSession.beginTransaction();
//Scrollable results will avoid loading too many objects in memory
@@ -106,9 +122,10 @@
}
}
transaction.commit();</programlisting>
+ </example>
- <para>Try to use a batch size that guaranty that your application will not
- run out of memory.</para>
+ <para>Try to use a batch size that guarantees that your application will
+ not run out of memory.</para>
</section>
<section>
@@ -116,29 +133,38 @@
<para>It is equally possible to remove an entity or all entities of a
given type from a Lucene index without the need to physically remove them
- from the database. This operation is named purging and is done through the
- <classname>FullTextSession</classname>.</para>
+ from the database. This operation is named purging and is also done
+ through the <classname>FullTextSession</classname>.</para>
- <programlisting>FullTextSession fullTextSession =
Search.getFullTextSession(session);
+ <example>
+ <title>Purging a specific instance of an entity from the index</title>
+
+ <programlisting>FullTextSession fullTextSession =
Search.getFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
for (Customer customer : customers) {
<emphasis role="bold">fullTextSession.purge( Customer.class,
customer.getId() );</emphasis>
}
tx.commit(); //index are written at commit time </programlisting>
+ </example>
<para>Purging will remove the entity with the given id from the Lucene
index but will not touch the database.</para>
<para>If you need to remove all entities of a given type, you can use the
- <methodname>purgeAll</methodname> method. This operation remove all
entities of the type passed
- as a parameter as well as all its subtypes.</para>
+ <methodname>purgeAll</methodname> method. This operation remove all
+ entities of the type passed as a parameter as well as all its
+ subtypes.</para>
- <programlisting>FullTextSession fullTextSession =
Search.getFullTextSession(session);
+ <example>
+ <title>Purging all instances of an entity from the index</title>
+
+ <programlisting>FullTextSession fullTextSession =
Search.getFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
<emphasis role="bold">fullTextSession.purgeAll( Customer.class
);</emphasis>
//optionally optimize the index
//fullTextSession.getSearchFactory().optimize( Customer.class );
tx.commit(); //index are written at commit time </programlisting>
+ </example>
<para>It is recommended to optimize the index after such an
operation.</para>
@@ -150,4 +176,4 @@
well.</para>
</note>
</section>
-</chapter>
\ No newline at end of file
+</chapter>
Modified: search/trunk/doc/reference/en/modules/lucene-native.xml
===================================================================
--- search/trunk/doc/reference/en/modules/lucene-native.xml 2008-12-02 15:11:04 UTC (rev
15641)
+++ search/trunk/doc/reference/en/modules/lucene-native.xml 2008-12-02 16:33:47 UTC (rev
15642)
@@ -22,8 +22,8 @@
~ 51 Franklin Street, Fifth Floor
~ Boston, MA 02110-1301 USA
-->
-
-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
+"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
<chapter id="search-lucene-native">
<!-- $Id$ -->
@@ -37,8 +37,12 @@
way to access Lucene natively. The <classname>SearchFactory</classname>
can be accessed from a
<classname>FullTextSession</classname>:</para>
- <programlisting>FullTextSession fullTextSession =
Search.getFullTextSession(regularSession);
+ <example>
+ <title>Accessing the
<classname>SearchFactory</classname></title>
+
+ <programlisting>FullTextSession fullTextSession =
Search.getFullTextSession(regularSession);
SearchFactory searchFactory = fullTextSession.getSearchFactory();</programlisting>
+ </example>
</section>
<section>
@@ -51,12 +55,16 @@
<classname>DirectoryProvider</classname>s per indexed class. One
directory
provider can be shared amongst several indexed classes if the classes
share the same underlying index directory. While usually not the case, a
- given entity can have several <classname>DirectoryProvider</classname>s
is
+ given entity can have several <classname>DirectoryProvider</classname>s
if
the index is sharded (see <xref
linkend="search-configuration-directory-sharding" />).</para>
- <programlisting>DirectoryProvider[] provider =
searchFactory.getDirectoryProviders(Order.class);
+ <example>
+ <title>Accessing the Lucene
<classname>Directory</classname></title>
+
+ <programlisting>DirectoryProvider[] provider =
searchFactory.getDirectoryProviders(Order.class);
org.apache.lucene.store.Directory directory =
provider[0].getDirectory();</programlisting>
+ </example>
<para>In this example, directory points to the lucene index storing
<classname>Order</classname>s information. Note that the obtained Lucene
@@ -68,11 +76,14 @@
<title>Using an IndexReader</title>
<para>Queries in Lucene are executed on an
<literal>IndexReader</literal>.
- Hibernate Search caches such index readers to maximize performances. Your
- code can access such cached / shared resources. You will just have to
- follow some "good citizen" rules.</para>
+ Hibernate Search caches all index readers to maximize performance. Your
+ code can access this cached resources, but you have to follow some "good
+ citizen" rules.</para>
- <programlisting>DirectoryProvider orderProvider =
searchFactory.getDirectoryProviders(Order.class)[0];
+ <example>
+ <title>Accesing an
<classname>IndexReader</classname></title>
+
+ <programlisting>DirectoryProvider orderProvider =
searchFactory.getDirectoryProviders(Order.class)[0];
DirectoryProvider clientProvider = searchFactory.getDirectoryProviders(Client.class)[0];
ReaderProvider readerProvider = searchFactory.getReaderProvider();
@@ -84,24 +95,26 @@
finally {
readerProvider.closeReader(reader);
}</programlisting>
+ </example>
<para>The ReaderProvider (described in <xref
linkend="search-architecture-readerstrategy" />), will open an
IndexReader
- on top of the index(es) referenced by the directory providers. This
- IndexReader being shared amongst several clients, you must adhere to the
- following rules:</para>
+ on top of the index(es) referenced by the directory providers. Because
+ this <classname>IndexReader</classname> is shared amongst several
clients,
+ you must adhere to the following rules:</para>
<itemizedlist>
<listitem>
<para>Never call indexReader.close(), but always call
- readerProvider.closeReader(reader); (a finally block is the best
- area).</para>
+ readerProvider.closeReader(reader), preferably in a finally
+ block.</para>
</listitem>
<listitem>
- <para>This indexReader can't be used for modification operations
- (you would get an exception). If you want to use a read/write index reader,
- open one from the Lucene Directory object.</para>
+ <para>Don't use this <classname>IndexReader</classname>
for
+ modification operations (you would get an exception). If you want to
+ use a read/write index reader, open one from the Lucene Directory
+ object.</para>
</listitem>
</itemizedlist>
@@ -156,10 +169,10 @@
</row>
<row>
- <entry align="left">queryNorm(q) </entry>
+ <entry align="left">queryNorm(q)</entry>
<entry>Normalizing factor used to make scores between queries
- comparable. </entry>
+ comparable.</entry>
</row>
<row>
@@ -178,7 +191,7 @@
</tgroup>
</informaltable>It is beyond the scope of this manual to explain this
formula in more detail. Please refer to
- <classname>Similarity</classname>'s Javadocs for more information.
</para>
+ <classname>Similarity</classname>'s Javadocs for more
information.</para>
<para>Hibernate Search provides two ways to modify Lucene's similarity
calculation. First you can set the default similarity by specifying the
@@ -196,6 +209,6 @@
term appears in a document. Documents with a single occurrence of the term
should be scored the same as documents with multiple occurrences. In this
case your custom implementation of the method <methodname>tf(float
- freq)</methodname> should return 1.0. </para>
+ freq)</methodname> should return 1.0.</para>
</section>
-</chapter>
\ No newline at end of file
+</chapter>
Modified: search/trunk/doc/reference/en/modules/optimize.xml
===================================================================
--- search/trunk/doc/reference/en/modules/optimize.xml 2008-12-02 15:11:04 UTC (rev
15641)
+++ search/trunk/doc/reference/en/modules/optimize.xml 2008-12-02 16:33:47 UTC (rev
15642)
@@ -22,23 +22,23 @@
~ 51 Franklin Street, Fifth Floor
~ Boston, MA 02110-1301 USA
-->
-
-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
+"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
<chapter id="search-optimize">
<!-- $Id$ -->
<title>Index Optimization</title>
<para>From time to time, the Lucene index needs to be optimized. The process
- is essentially a defragmentation: until the optimization occurs deleted
- documents are just marked as such, no physical deletion is applied; the
- optimization can also adjust the number of files in the Lucene
- Directory.</para>
+ is essentially a defragmentation. Until an optimization is triggered Lucene
+ only marks deleted documents as such, no physical deletions are applied.
+ During the optimization process the deletions will be applied which also
+ effects the number of files in the Lucene Directory.</para>
- <para>The optimization speeds up searches but in no way speeds up indexation
- (update). During an optimization, searches can be performed (but will most
- likely be slowed down), and all index updates will be stopped. Prefer
- optimizing:</para>
+ <para>Optimising the Lucene index speeds up searches but has no effect on
+ the indexation (update) performance. During an optimization, searches can be
+ performed, but will most likely be slowed down. All index updates will be
+ stopped. It is recommended to schedule optimization:</para>
<itemizedlist>
<listitem>
@@ -46,40 +46,42 @@
</listitem>
<listitem>
- <para>after a lot of index modifications (doing so before will not speed
- up the indexation process)</para>
+ <para>after a lot of index modifications</para>
</listitem>
</itemizedlist>
<section>
<title>Automatic optimization</title>
- <para>Hibernate Search can optimize automatically an index after:</para>
+ <para>Hibernate Search can automatically optimize an index after:</para>
<itemizedlist>
<listitem>
- <para>a certain amount of operations have been applied (insertion,
- deletion)</para>
+ <para>a certain amount of operations (insertion, deletion)</para>
</listitem>
<listitem>
- <para>or a certain amout of transactions have been applied</para>
+ <para>or a certain amout of transactions </para>
</listitem>
</itemizedlist>
- <para>The configuration can be global or defined at the index
- level:</para>
+ <para>The configuration for automatic index optimization can be defined on
+ a global level or per index:</para>
- <programlisting>hibernate.search.default.optimizer.operation_limit.max = 1000
+ <example>
+ <title>Defining automatic optimization parameters</title>
+
+ <programlisting>hibernate.search.default.optimizer.operation_limit.max =
1000
hibernate.search.default.optimizer.transaction_limit.max = 100
hibernate.search.Animal.optimizer.transaction_limit.max = 50</programlisting>
+ </example>
<para>An optimization will be triggered to the
<literal>Animal</literal>
index as soon as either:</para>
<itemizedlist>
<listitem>
- <para>the number of addition and deletion reaches 1000</para>
+ <para>the number of additions and deletions reaches 1000</para>
</listitem>
<listitem>
@@ -100,22 +102,25 @@
<para>You can programmatically optimize (defragment) a Lucene index from
Hibernate Search through the
<classname>SearchFactory</classname>:</para>
- <programlisting>searchFactory.optimize(Order.class);</programlisting>
+ <example>
+ <title>Programmatic index optimization</title>
- <programlisting>searchFactory.optimize();</programlisting>
+ <programlisting>FullTextSession fullTextSession =
Search.getFullTextSession(regularSession);
+SearchFactory searchFactory = fullTextSession.getSearchFactory();
+searchFactory.optimize(Order.class);
+// or
+searchFactory.optimize();</programlisting>
+ </example>
+
<para>The first example optimizes the Lucene index holding
<classname>Order</classname>s; the second, optimizes all
indexes.</para>
- <para>The <classname>SearchFactory</classname> can be accessed from
a
- <classname>FullTextSession</classname>:</para>
-
- <programlisting>FullTextSession fullTextSession =
Search.getFullTextSession(regularSession);
-SearchFactory searchFactory = fullTextSession.getSearchFactory();</programlisting>
-
- <para>Note that <literal>searchFactory.optimize()</literal> has no
effect
- on a JMS backend. You must apply the optimize operation on the Master
- node.</para>
+ <note>
+ <para><literal>searchFactory.optimize()</literal> has no effect
on a JMS
+ backend. You must apply the optimize operation on the Master
+ node.</para>
+ </note>
</section>
<section>
@@ -151,4 +156,4 @@
</itemizedlist> See <xref linkend="lucene-indexing-performance"
/> for
more details.</para>
</section>
-</chapter>
\ No newline at end of file
+</chapter>