From hibernate-commits at lists.jboss.org Tue Dec 2 11:33:47 2008
Content-Type: multipart/mixed; boundary="===============5097598525766419249=="
MIME-Version: 1.0
From: hibernate-commits at lists.jboss.org
To: hibernate-commits at lists.jboss.org
Subject: [hibernate-commits] Hibernate SVN: r15642 -
search/trunk/doc/reference/en/modules.
Date: Tue, 02 Dec 2008 11:33:47 -0500
Message-ID:
--===============5097598525766419249==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Author: hardy.ferentschik
Date: 2008-12-02 11:33:47 -0500 (Tue, 02 Dec 2008)
New Revision: 15642
Modified:
search/trunk/doc/reference/en/modules/batchindex.xml
search/trunk/doc/reference/en/modules/lucene-native.xml
search/trunk/doc/reference/en/modules/optimize.xml
Log:
HSEARCH-303
Modified: search/trunk/doc/reference/en/modules/batchindex.xml
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- search/trunk/doc/reference/en/modules/batchindex.xml 2008-12-02 15:11:0=
4 UTC (rev 15641)
+++ search/trunk/doc/reference/en/modules/batchindex.xml 2008-12-02 16:33:4=
7 UTC (rev 15642)
@@ -22,8 +22,8 @@
~ 51 Franklin Street, Fifth Floor
~ Boston, MA 02110-1301 USA
-->
-
- =
+
=
@@ -32,37 +32,36 @@
Indexing
=
- It is sometimes useful to index an object even if this object is=
not
- inserted nor updated to the database. This is especially true when you
- want to build your index for the first time. You can achieve that goal
- using the FullTextSession.
+ It is sometimes useful to index an entity even if this entity is=
not
+ inserted or updated to the database. This is for example the case when=
you
+ want to build your index for the first time.
+ FullTextSession.index()
+ allows you to do so.
=
- FullTextSession fullTextSession =3D Search.getFullText=
Session(session);
+
+ Indexing an entity via
+ FullTextSession.index()
+
+ FullTextSession fullTextSession =3D Search.getFullTe=
xtSession(session);
Transaction tx =3D fullTextSession.beginTransaction();
for (Customer customer : customers) {
fullTextSession.index(customer);
}
tx.commit(); //index are written at commit time
+
=
For maximum efficiency, Hibernate Search batches index operations
- and executes them at commit time (Note: you don't need to use
- org.hibernate.Transaction in a JTA
- environment).
+ and executes them at commit time. If you expect to index a lot of data,
+ however, you need to be careful about memory consumption since all
+ documents are kept in a queue until the transaction commit. You can
+ potentially face an OutOfMemoryException. To av=
oid
+ this exception, you can use
+ fullTextSession.flushToIndexes(). Every time
+ fullTextSession.flushToIndexes() is called (o=
r if
+ the transaction is committed), the batch queue is processed (freeing
+ memory) applying all index changes. Be aware that once flushed changes
+ cannot be rolled back.
=
- If you expect to index a lot of data, you need to be careful abo=
ut
- memory consumption: since all documents are kept in a queue until the
- transaction commit, you can potentially face an
- OutOfMemoryException.
-
- To avoid that, you can use
- fullTextSession.flushToIndexes(): all index
- operations are queued until
- fullTextSession.flushToIndexes() is called. E=
very
- time fullTextSession.flushToIndexes() is call=
ed
- (or if the transaction is committed), the queue is processed (freeing
- memory) and emptied. Be aware that changes made before a flush cannot =
be
- rollbacked.
-
hibernate.search.worker.batch_size has been
deprecated in favor of this explicit API which provides better
@@ -70,26 +69,43 @@
=
Other parameters which also can affect indexing time and memory
- consumption are
- hibernate.search.[default|<indexname>].indexwriter.batc=
h.max_buffered_docs
- ,
- hibernate.search.[default|<indexname>].indexwriter.batc=
h.max_field_length
- ,
- hibernate.search.[default|<indexname>].indexwriter.batc=
h.max_merge_docs
- ,
- hibernate.search.[default|<indexname>].indexwriter.batc=
h.merge_factor
- ,
- hibernate.search.[default|<indexname>].indexwriter.batc=
h.ram_buffer_size
- and
- hibernate.search.[default|<indexname>].indexwriter.batc=
h.term_index_interval
- . These parameters are Lucene specific and Hibernate Search is just
+ consumption are:
+
+
+
+ hibernate.search.[default|<indexname>].indexwriter.=
[batch|transaction].max_buffered_docs
+
+
+
+ hibernate.search.[default|<indexname>].indexwriter.=
[batch|transaction].max_field_length
+
+
+
+ hibernate.search.[default|<indexname>].indexwriter.=
[batch|transaction].max_merge_docs
+
+
+
+ hibernate.search.[default|<indexname>].indexwriter.=
[batch|transaction].merge_factor
+
+
+
+ hibernate.search.[default|<indexname>].indexwriter.=
[batch|transaction].ram_buffer_size
+
+
+
+ hibernate.search.[default|<indexname>].indexwriter.=
[batch|transaction].term_index_interval
+
+
+
+ These parameters are Lucene specific and Hibernate Search is just
passing these parameters through - see for more details.
=
- Here is an especially efficient way to index a given class (usef=
ul
- for index (re)initialization):
+
+ Efficiently indexing a given class (useful for index
+ (re)initialization)
=
- fullTextSession.setFlushMode(FlushMode.MANUAL);
+ fullTextSession.setFlushMode(FlushMode.MANUAL);
fullTextSession.setCacheMode(CacheMode.IGNORE);
transaction =3D fullTextSession.beginTransaction();
//Scrollable results will avoid loading too many objects in memory
@@ -106,9 +122,10 @@
}
}
transaction.commit();
+
=
- Try to use a batch size that guaranty that your application will=
not
- run out of memory.
+ Try to use a batch size that guarantees that your application wi=
ll
+ not run out of memory.
=
@@ -116,29 +133,38 @@
=
It is equally possible to remove an entity or all entities of a
given type from a Lucene index without the need to physically remove t=
hem
- from the database. This operation is named purging and is done through=
the
- FullTextSession.
+ from the database. This operation is named purging and is also done
+ through the FullTextSession.
=
- FullTextSession fullTextSession =3D Search.getFullText=
Session(session);
+
+ Purging a specific instance of an entity from the index
+
+ FullTextSession fullTextSession =3D Search.getFullTe=
xtSession(session);
Transaction tx =3D fullTextSession.beginTransaction();
for (Customer customer : customers) {
fullTextSession.purge( Customer.class, custome=
r.getId() );
}
tx.commit(); //index are written at commit time
+
=
Purging will remove the entity with the given id from the Lucene
index but will not touch the database.
=
If you need to remove all entities of a given type, you can use =
the
- purgeAll method. This operation remove all en=
tities of the type passed
- as a parameter as well as all its subtypes.
+ purgeAll method. This operation remove all
+ entities of the type passed as a parameter as well as all its
+ subtypes.
=
- FullTextSession fullTextSession =3D Search.getFullText=
Session(session);
+
+ Purging all instances of an entity from the index
+
+ FullTextSession fullTextSession =3D Search.getFullTe=
xtSession(session);
Transaction tx =3D fullTextSession.beginTransaction();
fullTextSession.purgeAll( Customer.class );
//optionally optimize the index
//fullTextSession.getSearchFactory().optimize( Customer.class );
tx.commit(); //index are written at commit time
+
=
It is recommended to optimize the index after such an
operation.
@@ -150,4 +176,4 @@
well.
-
\ No newline at end of file
+
Modified: search/trunk/doc/reference/en/modules/lucene-native.xml
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- search/trunk/doc/reference/en/modules/lucene-native.xml 2008-12-02 15:1=
1:04 UTC (rev 15641)
+++ search/trunk/doc/reference/en/modules/lucene-native.xml 2008-12-02 16:3=
3:47 UTC (rev 15642)
@@ -22,8 +22,8 @@
~ 51 Franklin Street, Fifth Floor
~ Boston, MA 02110-1301 USA
-->
-
- =
+
=
@@ -37,8 +37,12 @@
way to access Lucene natively. The SearchFactory
can be accessed from a FullTextSession:
=
- FullTextSession fullTextSession =3D Search.getFullText=
Session(regularSession);
+
+ Accessing the SearchFactory
+
+ FullTextSession fullTextSession =3D Search.getFullTe=
xtSession(regularSession);
SearchFactory searchFactory =3D fullTextSession.getSearchFactory();
+
=
@@ -51,12 +55,16 @@
DirectoryProviders per indexed class. One direc=
tory
provider can be shared amongst several indexed classes if the classes
share the same underlying index directory. While usually not the case,=
a
- given entity can have several DirectoryProvider=
s is
+ given entity can have several DirectoryProvider=
s if
the index is sharded (see ).
=
- DirectoryProvider[] provider =3D searchFactory.getDire=
ctoryProviders(Order.class);
+
+ Accessing the Lucene Directory
+
+ DirectoryProvider[] provider =3D searchFactory.getDi=
rectoryProviders(Order.class);
org.apache.lucene.store.Directory directory =3D provider[0].getDirectory()=
;
+
=
In this example, directory points to the lucene index storing
Orders information. Note that the obtained Luce=
ne
@@ -68,11 +76,14 @@
Using an IndexReader
=
Queries in Lucene are executed on an IndexReader.
- Hibernate Search caches such index readers to maximize performances. Y=
our
- code can access such cached / shared resources. You will just have to
- follow some "good citizen" rules.
+ Hibernate Search caches all index readers to maximize performance. Your
+ code can access this cached resources, but you have to follow some "go=
od
+ citizen" rules.
=
- DirectoryProvider orderProvider =3D searchFactory.getD=
irectoryProviders(Order.class)[0];
+
+ Accesing an IndexReader
+
+ DirectoryProvider orderProvider =3D searchFactory.ge=
tDirectoryProviders(Order.class)[0];
DirectoryProvider clientProvider =3D searchFactory.getDirectoryProviders(C=
lient.class)[0];
=
ReaderProvider readerProvider =3D searchFactory.getReaderProvider();
@@ -84,24 +95,26 @@
finally {
readerProvider.closeReader(reader);
}
+
=
The ReaderProvider (described in ), will open an Index=
Reader
- on top of the index(es) referenced by the directory providers. This
- IndexReader being shared amongst several clients, you must adhere to t=
he
- following rules:
+ on top of the index(es) referenced by the directory providers. Because
+ this IndexReader is shared amongst several clie=
nts,
+ you must adhere to the following rules:
=
Never call indexReader.close(), but always call
- readerProvider.closeReader(reader); (a finally block is the best
- area).
+ readerProvider.closeReader(reader), preferably in a finally
+ block.
=
- This indexReader can't be used for modification operations
- (you would get an exception). If you want to use a read/write inde=
x reader,
- open one from the Lucene Directory object.
+ Don't use this IndexReader for
+ modification operations (you would get an exception). If you want =
to
+ use a read/write index reader, open one from the Lucene Directory
+ object.
=
@@ -156,10 +169,10 @@
=
- queryNorm(q)
+ queryNorm(q)
=
Normalizing factor used to make scores between queries
- comparable.
+ comparable.
=
@@ -178,7 +191,7 @@
It is beyond the scope of this manual to explain this
formula in more detail. Please refer to
- Similarity's Javadocs for more information.
+ Similarity's Javadocs for more information.
=
Hibernate Search provides two ways to modify Lucene's similarity
calculation. First you can set the default similarity by specifying the
@@ -196,6 +209,6 @@
term appears in a document. Documents with a single occurrence of the =
term
should be scored the same as documents with multiple occurrences. In t=
his
case your custom implementation of the method tf(float
- freq) should return 1.0.
+ freq) should return 1.0.
-
\ No newline at end of file
+
Modified: search/trunk/doc/reference/en/modules/optimize.xml
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- search/trunk/doc/reference/en/modules/optimize.xml 2008-12-02 15:11:04 =
UTC (rev 15641)
+++ search/trunk/doc/reference/en/modules/optimize.xml 2008-12-02 16:33:47 =
UTC (rev 15642)
@@ -22,23 +22,23 @@
~ 51 Franklin Street, Fifth Floor
~ Boston, MA 02110-1301 USA
-->
-
- =
+
=
Index Optimization
=
From time to time, the Lucene index needs to be optimized. The pro=
cess
- is essentially a defragmentation: until the optimization occurs deleted
- documents are just marked as such, no physical deletion is applied; the
- optimization can also adjust the number of files in the Lucene
- Directory.
+ is essentially a defragmentation. Until an optimization is triggered Luc=
ene
+ only marks deleted documents as such, no physical deletions are applied.
+ During the optimization process the deletions will be applied which also
+ effects the number of files in the Lucene Directory.
=
- The optimization speeds up searches but in no way speeds up indexa=
tion
- (update). During an optimization, searches can be performed (but will mo=
st
- likely be slowed down), and all index updates will be stopped. Prefer
- optimizing:
+ Optimising the Lucene index speeds up searches but has no effect on
+ the indexation (update) performance. During an optimization, searches ca=
n be
+ performed, but will most likely be slowed down. All index updates will be
+ stopped. It is recommended to schedule optimization:
=
@@ -46,40 +46,42 @@
=
- after a lot of index modifications (doing so before will not s=
peed
- up the indexation process)
+ after a lot of index modifications
=
Automatic optimization
=
- Hibernate Search can optimize automatically an index after:
+ Hibernate Search can automatically optimize an index after:
=
- a certain amount of operations have been applied (insertion,
- deletion)
+ a certain amount of operations (insertion, deletion)
=
- or a certain amout of transactions have been applied
+ or a certain amout of transactions
=
- The configuration can be global or defined at the index
- level:
+ The configuration for automatic index optimization can be define=
d on
+ a global level or per index:
=
- hibernate.search.default.optimizer.operation_limit.max=
=3D 1000
+
+ Defining automatic optimization parameters
+
+ hibernate.search.default.optimizer.operation_limit.m=
ax =3D 1000
hibernate.search.default.optimizer.transaction_limit.max =3D 100
hibernate.search.Animal.optimizer.transaction_limit.max =3D 50
+
=
An optimization will be triggered to the Animal
index as soon as either:
=
- the number of addition and deletion reaches 1000
+ the number of additions and deletions reaches 1000
=
@@ -100,22 +102,25 @@
You can programmatically optimize (defragment) a Lucene index fr=
om
Hibernate Search through the SearchFactory:
=
- searchFactory.optimize(Order.class);
+
+ Programmatic index optimization
=
- searchFactory.optimize();
+ FullTextSession fullTextSession =3D Search.getFullTe=
xtSession(regularSession);
+SearchFactory searchFactory =3D fullTextSession.getSearchFactory();
=
+searchFactory.optimize(Order.class);
+// or
+searchFactory.optimize();
+
+
The first example optimizes the Lucene index holding
Orders; the second, optimizes all indexes.
=
- The SearchFactory can be accessed from a
- FullTextSession:
-
- FullTextSession fullTextSession =3D Search.getFullText=
Session(regularSession);
-SearchFactory searchFactory =3D fullTextSession.getSearchFactory();
-
- Note that searchFactory.optimize() has no eff=
ect
- on a JMS backend. You must apply the optimize operation on the Master
- node.
+
+ searchFactory.optimize() has no effect on a=
JMS
+ backend. You must apply the optimize operation on the Master
+ node.
+
=
@@ -151,4 +156,4 @@
See =
for
more details.
-
\ No newline at end of file
+
--===============5097598525766419249==--