From hibernate-commits at lists.jboss.org Tue Dec 2 11:33:47 2008 Content-Type: multipart/mixed; boundary="===============5097598525766419249==" MIME-Version: 1.0 From: hibernate-commits at lists.jboss.org To: hibernate-commits at lists.jboss.org Subject: [hibernate-commits] Hibernate SVN: r15642 - search/trunk/doc/reference/en/modules. Date: Tue, 02 Dec 2008 11:33:47 -0500 Message-ID: --===============5097598525766419249== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Author: hardy.ferentschik Date: 2008-12-02 11:33:47 -0500 (Tue, 02 Dec 2008) New Revision: 15642 Modified: search/trunk/doc/reference/en/modules/batchindex.xml search/trunk/doc/reference/en/modules/lucene-native.xml search/trunk/doc/reference/en/modules/optimize.xml Log: HSEARCH-303 Modified: search/trunk/doc/reference/en/modules/batchindex.xml =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- search/trunk/doc/reference/en/modules/batchindex.xml 2008-12-02 15:11:0= 4 UTC (rev 15641) +++ search/trunk/doc/reference/en/modules/batchindex.xml 2008-12-02 16:33:4= 7 UTC (rev 15642) @@ -22,8 +22,8 @@ ~ 51 Franklin Street, Fifth Floor ~ Boston, MA 02110-1301 USA --> - - = + = @@ -32,37 +32,36 @@
Indexing = - It is sometimes useful to index an object even if this object is= not - inserted nor updated to the database. This is especially true when you - want to build your index for the first time. You can achieve that goal - using the FullTextSession. + It is sometimes useful to index an entity even if this entity is= not + inserted or updated to the database. This is for example the case when= you + want to build your index for the first time. + FullTextSession.index() + allows you to do so. = - FullTextSession fullTextSession =3D Search.getFullText= Session(session); + + Indexing an entity via + <methodname>FullTextSession.index()</methodname> + + FullTextSession fullTextSession =3D Search.getFullTe= xtSession(session); Transaction tx =3D fullTextSession.beginTransaction(); for (Customer customer : customers) { fullTextSession.index(customer); } tx.commit(); //index are written at commit time + = For maximum efficiency, Hibernate Search batches index operations - and executes them at commit time (Note: you don't need to use - org.hibernate.Transaction in a JTA - environment). + and executes them at commit time. If you expect to index a lot of data, + however, you need to be careful about memory consumption since all + documents are kept in a queue until the transaction commit. You can + potentially face an OutOfMemoryException. To av= oid + this exception, you can use + fullTextSession.flushToIndexes(). Every time + fullTextSession.flushToIndexes() is called (o= r if + the transaction is committed), the batch queue is processed (freeing + memory) applying all index changes. Be aware that once flushed changes + cannot be rolled back. = - If you expect to index a lot of data, you need to be careful abo= ut - memory consumption: since all documents are kept in a queue until the - transaction commit, you can potentially face an - OutOfMemoryException. - - To avoid that, you can use - fullTextSession.flushToIndexes(): all index - operations are queued until - fullTextSession.flushToIndexes() is called. E= very - time fullTextSession.flushToIndexes() is call= ed - (or if the transaction is committed), the queue is processed (freeing - memory) and emptied. Be aware that changes made before a flush cannot = be - rollbacked. - hibernate.search.worker.batch_size has been deprecated in favor of this explicit API which provides better @@ -70,26 +69,43 @@ = Other parameters which also can affect indexing time and memory - consumption are - hibernate.search.[default|<indexname>].indexwriter.batc= h.max_buffered_docs - , - hibernate.search.[default|<indexname>].indexwriter.batc= h.max_field_length - , - hibernate.search.[default|<indexname>].indexwriter.batc= h.max_merge_docs - , - hibernate.search.[default|<indexname>].indexwriter.batc= h.merge_factor - , - hibernate.search.[default|<indexname>].indexwriter.batc= h.ram_buffer_size - and - hibernate.search.[default|<indexname>].indexwriter.batc= h.term_index_interval - . These parameters are Lucene specific and Hibernate Search is just + consumption are: + + + + hibernate.search.[default|<indexname>].indexwriter.= [batch|transaction].max_buffered_docs + + + + hibernate.search.[default|<indexname>].indexwriter.= [batch|transaction].max_field_length + + + + hibernate.search.[default|<indexname>].indexwriter.= [batch|transaction].max_merge_docs + + + + hibernate.search.[default|<indexname>].indexwriter.= [batch|transaction].merge_factor + + + + hibernate.search.[default|<indexname>].indexwriter.= [batch|transaction].ram_buffer_size + + + + hibernate.search.[default|<indexname>].indexwriter.= [batch|transaction].term_index_interval + + + + These parameters are Lucene specific and Hibernate Search is just passing these parameters through - see for more details. = - Here is an especially efficient way to index a given class (usef= ul - for index (re)initialization): + + Efficiently indexing a given class (useful for index + (re)initialization) = - fullTextSession.setFlushMode(FlushMode.MANUAL); + fullTextSession.setFlushMode(FlushMode.MANUAL); fullTextSession.setCacheMode(CacheMode.IGNORE); transaction =3D fullTextSession.beginTransaction(); //Scrollable results will avoid loading too many objects in memory @@ -106,9 +122,10 @@ } } transaction.commit(); + = - Try to use a batch size that guaranty that your application will= not - run out of memory. + Try to use a batch size that guarantees that your application wi= ll + not run out of memory.
=
@@ -116,29 +133,38 @@ = It is equally possible to remove an entity or all entities of a given type from a Lucene index without the need to physically remove t= hem - from the database. This operation is named purging and is done through= the - FullTextSession. + from the database. This operation is named purging and is also done + through the FullTextSession. = - FullTextSession fullTextSession =3D Search.getFullText= Session(session); + + Purging a specific instance of an entity from the index</titl= e> + + <programlisting>FullTextSession fullTextSession =3D Search.getFullTe= xtSession(session); Transaction tx =3D fullTextSession.beginTransaction(); for (Customer customer : customers) { <emphasis role=3D"bold">fullTextSession.purge( Customer.class, custome= r.getId() );</emphasis> } tx.commit(); //index are written at commit time </programlisting> + </example> = <para>Purging will remove the entity with the given id from the Lucene index but will not touch the database.</para> = <para>If you need to remove all entities of a given type, you can use = the - <methodname>purgeAll</methodname> method. This operation remove all en= tities of the type passed - as a parameter as well as all its subtypes.</para> + <methodname>purgeAll</methodname> method. This operation remove all + entities of the type passed as a parameter as well as all its + subtypes.</para> = - <programlisting>FullTextSession fullTextSession =3D Search.getFullText= Session(session); + <example> + <title>Purging all instances of an entity from the index + + FullTextSession fullTextSession =3D Search.getFullTe= xtSession(session); Transaction tx =3D fullTextSession.beginTransaction(); fullTextSession.purgeAll( Customer.class ); //optionally optimize the index //fullTextSession.getSearchFactory().optimize( Customer.class ); tx.commit(); //index are written at commit time + = It is recommended to optimize the index after such an operation. @@ -150,4 +176,4 @@ well.
-
\ No newline at end of file + Modified: search/trunk/doc/reference/en/modules/lucene-native.xml =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- search/trunk/doc/reference/en/modules/lucene-native.xml 2008-12-02 15:1= 1:04 UTC (rev 15641) +++ search/trunk/doc/reference/en/modules/lucene-native.xml 2008-12-02 16:3= 3:47 UTC (rev 15642) @@ -22,8 +22,8 @@ ~ 51 Franklin Street, Fifth Floor ~ Boston, MA 02110-1301 USA --> - - = + = @@ -37,8 +37,12 @@ way to access Lucene natively. The SearchFactory can be accessed from a FullTextSession: = - FullTextSession fullTextSession =3D Search.getFullText= Session(regularSession); + + Accessing the <classname>SearchFactory</classname> + + FullTextSession fullTextSession =3D Search.getFullTe= xtSession(regularSession); SearchFactory searchFactory =3D fullTextSession.getSearchFactory(); + =
@@ -51,12 +55,16 @@ DirectoryProviders per indexed class. One direc= tory provider can be shared amongst several indexed classes if the classes share the same underlying index directory. While usually not the case,= a - given entity can have several DirectoryProvider= s is + given entity can have several DirectoryProvider= s if the index is sharded (see ). = - DirectoryProvider[] provider =3D searchFactory.getDire= ctoryProviders(Order.class); + + Accessing the Lucene <classname>Directory</classname> + + DirectoryProvider[] provider =3D searchFactory.getDi= rectoryProviders(Order.class); org.apache.lucene.store.Directory directory =3D provider[0].getDirectory()= ; + = In this example, directory points to the lucene index storing Orders information. Note that the obtained Luce= ne @@ -68,11 +76,14 @@ Using an IndexReader = Queries in Lucene are executed on an IndexReader. - Hibernate Search caches such index readers to maximize performances. Y= our - code can access such cached / shared resources. You will just have to - follow some "good citizen" rules. + Hibernate Search caches all index readers to maximize performance. Your + code can access this cached resources, but you have to follow some "go= od + citizen" rules. = - DirectoryProvider orderProvider =3D searchFactory.getD= irectoryProviders(Order.class)[0]; + + Accesing an <classname>IndexReader</classname> + + DirectoryProvider orderProvider =3D searchFactory.ge= tDirectoryProviders(Order.class)[0]; DirectoryProvider clientProvider =3D searchFactory.getDirectoryProviders(C= lient.class)[0]; = ReaderProvider readerProvider =3D searchFactory.getReaderProvider(); @@ -84,24 +95,26 @@ finally { readerProvider.closeReader(reader); } + = The ReaderProvider (described in ), will open an Index= Reader - on top of the index(es) referenced by the directory providers. This - IndexReader being shared amongst several clients, you must adhere to t= he - following rules: + on top of the index(es) referenced by the directory providers. Because + this IndexReader is shared amongst several clie= nts, + you must adhere to the following rules: = Never call indexReader.close(), but always call - readerProvider.closeReader(reader); (a finally block is the best - area). + readerProvider.closeReader(reader), preferably in a finally + block. = - This indexReader can't be used for modification operations - (you would get an exception). If you want to use a read/write inde= x reader, - open one from the Lucene Directory object. + Don't use this IndexReader for + modification operations (you would get an exception). If you want = to + use a read/write index reader, open one from the Lucene Directory + object. = @@ -156,10 +169,10 @@ = - queryNorm(q) + queryNorm(q) = Normalizing factor used to make scores between queries - comparable. + comparable. = @@ -178,7 +191,7 @@ It is beyond the scope of this manual to explain this formula in more detail. Please refer to - Similarity's Javadocs for more information. + Similarity's Javadocs for more information. = Hibernate Search provides two ways to modify Lucene's similarity calculation. First you can set the default similarity by specifying the @@ -196,6 +209,6 @@ term appears in a document. Documents with a single occurrence of the = term should be scored the same as documents with multiple occurrences. In t= his case your custom implementation of the method tf(float - freq) should return 1.0. + freq) should return 1.0.
-
\ No newline at end of file + Modified: search/trunk/doc/reference/en/modules/optimize.xml =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- search/trunk/doc/reference/en/modules/optimize.xml 2008-12-02 15:11:04 = UTC (rev 15641) +++ search/trunk/doc/reference/en/modules/optimize.xml 2008-12-02 16:33:47 = UTC (rev 15642) @@ -22,23 +22,23 @@ ~ 51 Franklin Street, Fifth Floor ~ Boston, MA 02110-1301 USA --> - - = + = Index Optimization = From time to time, the Lucene index needs to be optimized. The pro= cess - is essentially a defragmentation: until the optimization occurs deleted - documents are just marked as such, no physical deletion is applied; the - optimization can also adjust the number of files in the Lucene - Directory. + is essentially a defragmentation. Until an optimization is triggered Luc= ene + only marks deleted documents as such, no physical deletions are applied. + During the optimization process the deletions will be applied which also + effects the number of files in the Lucene Directory. = - The optimization speeds up searches but in no way speeds up indexa= tion - (update). During an optimization, searches can be performed (but will mo= st - likely be slowed down), and all index updates will be stopped. Prefer - optimizing: + Optimising the Lucene index speeds up searches but has no effect on + the indexation (update) performance. During an optimization, searches ca= n be + performed, but will most likely be slowed down. All index updates will be + stopped. It is recommended to schedule optimization: = @@ -46,40 +46,42 @@ = - after a lot of index modifications (doing so before will not s= peed - up the indexation process) + after a lot of index modifications =
Automatic optimization = - Hibernate Search can optimize automatically an index after: + Hibernate Search can automatically optimize an index after: = - a certain amount of operations have been applied (insertion, - deletion) + a certain amount of operations (insertion, deletion) = - or a certain amout of transactions have been applied + or a certain amout of transactions = - The configuration can be global or defined at the index - level: + The configuration for automatic index optimization can be define= d on + a global level or per index: = - hibernate.search.default.optimizer.operation_limit.max= =3D 1000 + + Defining automatic optimization parameters + + hibernate.search.default.optimizer.operation_limit.m= ax =3D 1000 hibernate.search.default.optimizer.transaction_limit.max =3D 100 hibernate.search.Animal.optimizer.transaction_limit.max =3D 50 + = An optimization will be triggered to the Animal index as soon as either: = - the number of addition and deletion reaches 1000 + the number of additions and deletions reaches 1000 = @@ -100,22 +102,25 @@ You can programmatically optimize (defragment) a Lucene index fr= om Hibernate Search through the SearchFactory: = - searchFactory.optimize(Order.class); + + Programmatic index optimization = - searchFactory.optimize(); + FullTextSession fullTextSession =3D Search.getFullTe= xtSession(regularSession); +SearchFactory searchFactory =3D fullTextSession.getSearchFactory(); = +searchFactory.optimize(Order.class); +// or +searchFactory.optimize(); + + The first example optimizes the Lucene index holding Orders; the second, optimizes all indexes. = - The SearchFactory can be accessed from a - FullTextSession: - - FullTextSession fullTextSession =3D Search.getFullText= Session(regularSession); -SearchFactory searchFactory =3D fullTextSession.getSearchFactory(); - - Note that searchFactory.optimize() has no eff= ect - on a JMS backend. You must apply the optimize operation on the Master - node. + + searchFactory.optimize() has no effect on a= JMS + backend. You must apply the optimize operation on the Master + node. +
=
@@ -151,4 +156,4 @@ See = for more details.
-
\ No newline at end of file + --===============5097598525766419249==--