Author: epbernard
Date: 2007-09-03 23:05:10 -0400 (Mon, 03 Sep 2007)
New Revision: 13993
Modified:
search/trunk/doc/reference/en/modules/batchindex.xml
search/trunk/doc/reference/en/modules/query.xml
Log:
documentaton on purge
Modified: search/trunk/doc/reference/en/modules/batchindex.xml
===================================================================
--- search/trunk/doc/reference/en/modules/batchindex.xml 2007-09-04 02:57:31 UTC (rev
13992)
+++ search/trunk/doc/reference/en/modules/batchindex.xml 2007-09-04 03:05:10 UTC (rev
13993)
@@ -1,66 +1,60 @@
<?xml version="1.0" encoding="UTF-8"?>
<!-- $Id$ -->
<chapter id="search-batchindex">
+ <title>Manual indexing</title>
- <title>Indexing</title>
+ <section id="search-batchindex-indexing">
+ <title>Indexing</title>
- <para>It is sometimes useful to index an object event if this object is not
- inserted nor updated to the database. This is especially true when you want
- to build your index the first time. You can achieve that goal using the
- <classname>FullTextSession</classname> .</para>
+ <para>It is sometimes useful to index an object event if this object is
+ not inserted nor updated to the database. This is especially true when you
+ want to build your index the first time. You can achieve that goal using
+ the <classname>FullTextSession</classname> .</para>
- <programlisting>FullTextSession fullTextSession =
Search.createFullTextSession(session);
+ <programlisting>FullTextSession fullTextSession =
Search.createFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
for (Customer customer : customers) {
<emphasis
role="bold">fullTextSession.index(customer);</emphasis>
}
tx.commit(); //index are written at commit time </programlisting>
- <para>For maximum efficiency, Hibernate Search batch index operations which
- and execute them at commit time (Note: you don't need to use
- <classname>org.hibernate.Transaction</classname> in a JTA
- environment).</para>
+ <para>For maximum efficiency, Hibernate Search batch index operations
+ which and execute them at commit time (Note: you don't need to use
+ <classname>org.hibernate.Transaction</classname> in a JTA
+ environment).</para>
- <para>If you expect to index a lot of data, you need to be careful about
- memory consumption: since all documents are kept in a queue until the
- transaction commit, you can potentially face an OutOfMemoryException.</para>
+ <para>If you expect to index a lot of data, you need to be careful about
+ memory consumption: since all documents are kept in a queue until the
+ transaction commit, you can potentially face an
+ OutOfMemoryException.</para>
- <para>To avoid that, you can set up the
- <literal>hibernate.search.worker.batch_size</literal> property to a
- sensitive value: all index operations are queued until
- <literal>batch_size</literal> is reached. Every time
- <literal>batch_size</literal> is reached (or if the transaction is
- committed), the queue is processed (freeing memory) and emptied. Be aware
- that the changes cannot be rollbacked if the number of index elements goes
- beyond <literal>batch_size</literal>. Be also aware that the queue limits
are
- also applied on regular transparent indexing (and not only when
- <literal>session.index()</literal> is used). That's why a sensitive
- <literal>batch_size</literal> value is expected.</para>
+ <para>To avoid that, you can set up the
+ <literal>hibernate.search.worker.batch_size</literal> property to a
+ sensitive value: all index operations are queued until
+ <literal>batch_size</literal> is reached. Every time
+ <literal>batch_size</literal> is reached (or if the transaction is
+ committed), the queue is processed (freeing memory) and emptied. Be aware
+ that the changes cannot be rollbacked if the number of index elements goes
+ beyond <literal>batch_size</literal>. Be also aware that the queue
limits
+ are also applied on regular transparent indexing (and not only when
+ <literal>session.index()</literal> is used). That's why a sensitive
+ <literal>batch_size</literal> value is expected.</para>
- <para>Other parameters which also can effect indexing time and memory consumption
are
+ <para>Other parameters which also can effect indexing time and memory
+ consumption are
+
<literal>hibernate.search.[default|<indexname>].batch.merge_factor</literal>
+ ,
+
<literal>hibernate.search.[default|<indexname>].batch.max_merge_docs</literal>
+ and
+
<literal>hibernate.search.[default|<indexname>].batch.max_buffered_docs</literal>
+ . These parameters are Lucene specific and Hibernate Search is just
+ passing these paramters through - see <xref
+ linkend="lucene-indexing-performance" /> for more details.</para>
-
<literal>hibernate.search.[default|<indexname>].batch.merge_factor</literal>
+ <para>Here is an especially efficient way to index a given class (useful
+ for index (re)initialization):</para>
- ,
-
-
<literal>hibernate.search.[default|<indexname>].batch.max_merge_docs</literal>
-
- and
-
-
<literal>hibernate.search.[default|<indexname>].batch.max_buffered_docs</literal>
-
- . These parameters are Lucene specific and Hibernate Search is just passing these
paramters through - see
-
- <xref linkend="lucene-indexing-performance" />
-
- for more details.
-</para>
- <para>Here is an especially efficient way to index a given class (useful for
- index (re)initialization):</para>
-
-
-
- <programlisting>fullTextSession.setFlushMode(FlushMode.MANUAL);
+ <programlisting>fullTextSession.setFlushMode(FlushMode.MANUAL);
transaction = fullTextSession.beginTransaction();
//Scrollable results will avoid loading too many objects in memory
ScrollableResults results = fullTextSession.createCriteria( Email.class ).scroll(
ScrollMode.FORWARD_ONLY );
@@ -72,11 +66,40 @@
}
transaction.commit();</programlisting>
-
+ <para>It is critical that <literal>batchSize</literal> in the
previous
+ example matches the <literal>batch_size</literal> value described
+ previously.</para>
+ </section>
- <para>It is critical that <literal>batchSize</literal> in the
previous
- example matches the <literal>batch_size</literal> value described
- previously.</para>
+ <section>
+ <title>Purging</title>
-
+ <para>It is equally possible to remove an entity or all entities of a
+ given type from a Lucene index without the need to physically remove them
+ from the database. This operation is named purging and is done through the
+ <classname>FullTextSession</classname>.</para>
+
+ <programlisting>FullTextSession fullTextSession =
Search.createFullTextSession(session);
+Transaction tx = fullTextSession.beginTransaction();
+for (Customer customer : customers) {
+ <emphasis role="bold">fullTextSession.purge( Customer.class,
customer.getId() );</emphasis>
+}
+tx.commit(); //index are written at commit time </programlisting>
+
+ <para>Purging will remove the entity with the given id from the Lucene
+ index but will not touch the database.</para>
+
+ <para>If you need to remove all entities of a given type, you can use the
+ <methodname>purgeAll</methodname> method.</para>
+
+ <programlisting>FullTextSession fullTextSession =
Search.createFullTextSession(session);
+Transaction tx = fullTextSession.beginTransaction();
+<emphasis role="bold">fullTextSession.purge( Customer.class
);</emphasis>
+//optionally optimize the index
+//fullTextSession.getSearchFactory().optimize( Customer.class );
+tx.commit(); //index are written at commit time </programlisting>
+
+ <para>It is recommended to optimize the index after such an
+ operation.</para>
+ </section>
</chapter>
\ No newline at end of file
Modified: search/trunk/doc/reference/en/modules/query.xml
===================================================================
--- search/trunk/doc/reference/en/modules/query.xml 2007-09-04 02:57:31 UTC (rev 13992)
+++ search/trunk/doc/reference/en/modules/query.xml 2007-09-04 03:05:10 UTC (rev 13993)
@@ -85,10 +85,10 @@
<programlisting>FullTextSession fullTextSession =
Search.createFullTextSession( session );
org.hibernate.Query fullTextQuery = fullTextSession.createFullTextQuery( luceneQuery
);</programlisting>
- <para>If not specified otherwise, the query will be executed against all
indexed entities,
- potentially returning all types of indexed classes. It is advised,
- from a performance point of view, to restrict the returned
- types:</para>
+ <para>If not specified otherwise, the query will be executed against
+ all indexed entities, potentially returning all types of indexed
+ classes. It is advised, from a performance point of view, to restrict
+ the returned types:</para>
<programlisting>org.hibernate.Query fullTextQuery =
fullTextSession.createFullTextQuery( luceneQuery, Customer.class );
//or
@@ -280,11 +280,18 @@
Search has to process all Lucene Hits elements (within the pagination)
when using <methodname>list()</methodname> ,
<methodname>uniqueResult()</methodname> and
- <methodname>iterate()</methodname>. If you wish to minimize Lucene
- document loading, <methodname>scroll()</methodname> is more
appropriate,
- Don't forget to close the <classname>ScrollableResults</classname>
- object when you're done, since it keeps Lucene resources. Pagination is
- a preferred method over scrolling though.</para>
+ <methodname>iterate()</methodname>. </para>
+
+ <para>If you wish to minimize Lucene document loading,
+ <methodname>scroll()</methodname> is more appropriate. Don't forget
to
+ close the <classname>ScrollableResults</classname> object when
you're
+ done, since it keeps Lucene resources. If you expect to use
+ <methodname>scroll</methodname> but wish to load objects in batch, you
+ can use <methodname>query.setFetchSize()</methodname>: When an object
is
+ accessed, and if not already loaded, Hibernate Search will load the next
+ <literal>fetchSize</literal> objects in one pass. </para>
+
+ <para>Pagination is a preferred method over scrolling though.</para>
</section>
<section>
@@ -445,7 +452,7 @@
fullTextQuery.enableFullTextFilter("security")<emphasis
role="bold">.setParameter( "level", 5
)</emphasis>;</programlisting>
<para>Each parameter name should have an associated setter on either the
- filter or filter factory of the targeted named filter definition. </para>
+ filter or filter factory of the targeted named filter definition.</para>
<programlisting>public class SecurityFilterFactory {
private Integer level;
@@ -498,8 +505,8 @@
implementation to each of the parameters equals and hashcode
methods.</para>
- <para>Why should filters be cached? There are two area where filter caching
- shines:</para>
+ <para>Why should filters be cached? There are two area where filter
+ caching shines:</para>
<itemizedlist>
<listitem>