From hibernate-commits at lists.jboss.org Thu Nov 23 17:41:32 2006 Content-Type: multipart/mixed; boundary="===============5526014745254583156==" MIME-Version: 1.0 From: hibernate-commits at lists.jboss.org To: hibernate-commits at lists.jboss.org Subject: [hibernate-commits] Hibernate SVN: r10866 - branches/Lucene_Integration/HibernateExt/metadata/doc/reference/en/modules Date: Thu, 23 Nov 2006 17:41:31 -0500 Message-ID: --===============5526014745254583156== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Author: epbernard Date: 2006-11-23 17:41:27 -0500 (Thu, 23 Nov 2006) New Revision: 10866 Modified: branches/Lucene_Integration/HibernateExt/metadata/doc/reference/en/modul= es/lucene.xml Log: Hibernate Search documentation Modified: branches/Lucene_Integration/HibernateExt/metadata/doc/reference/e= n/modules/lucene.xml =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- branches/Lucene_Integration/HibernateExt/metadata/doc/reference/en/modu= les/lucene.xml 2006-11-23 22:30:01 UTC (rev 10865) +++ branches/Lucene_Integration/HibernateExt/metadata/doc/reference/en/modu= les/lucene.xml 2006-11-23 22:41:27 UTC (rev 10866) @@ -1,91 +1,63 @@ - - Hibernate Lucene Integration + + Hibernate Search: Apache <trademark>Lucene</trademark> + Integration = - Lucene is a high-performance Java search engine library available = from - the Apache Software Foundation. Hibernate Annotations includes a package= of - annotations that allows you to mark any domain model object as indexable= and - have Hibernate maintain a Lucene index of any instances persisted via - Hibernate. + Apache Lucene is a + high-performance Java search engine library available at the Apache Soft= ware + Foundation. Hibernate Annotations includes a package of annotations that + allows you to mark any domain model object as indexable and have Hiberna= te + maintain a Lucene index of any instances persisted via Hibernate. Apache + Lucene is also integrated with the Hibernate query facility. = - Hibernate Lucene is a work in progress and new features are cookin= g in + Hibernate Search is a work in progress and new features are cookin= g in this area. So expect some compatibility changes in subsequent versions. = -
- Mapping the entities to the index +
+ Architecture = - First, we must declare a persistent class as indexable. This is = done - by annotating the class with @Indexed: + Hibernate Search is made of an indexing engine and an index sear= ch + engine. Both are backed by Apache Lucene. = - @Entity -(a)Indexed(index=3D"indexes/essays") -public class Essay { - ... -} + When an entity is inserted, updated or removed to/from the datab= ase, + Hibernate Search will keep track of this ev= ent + (through the Hibernate event system) and schedule an index update. When + out of transaction, the update is executed right after the actual data= base + operation. It is however recommended, for both your database and Hiber= nate + Search, to execute your operation in a transaction (whether JDBC or JT= A). + When in a transaction, the index update is schedule for the transaction + commit (and discarded in case of transaction rollback). You can think = of + this as the regular (infamous) autocommit vs transactional behavior. F= rom + a performance perspective, the in transaction mod= e is + recommended. All the index updates are handled for you without you hav= ing + to use the Apache Lucene APIs. = - The index attribute tells Hibernate what the - lucene directory name is (usually a directory on your file system). If= you - wish to define a base directory for all lucene indexes, you can use the - hibernate.lucene.default.indexDir property in your - configuration file. + To interact with Apache Lucene indexes, Hibernate Search has the + notion of DirectoryProvider. A directory provid= er + will manage a given Lucene Directory type. You = can + configure directory providers to adjust the directory target. = - Lucene indexes contain four kinds of fields: - keyword fields, text fields, - unstored fields and unindexed - fields. Hibernate Annotations provides annotations to mark a property = of - an entity as one of the first three kinds of indexed fields. - - @Entity -(a)Indexed(index=3D"indexes/essays") -public class Essay { - ... - - @Id - @Keyword(id=3Dtrue) - public Long getId() { return id; } - = - @Text(name=3D"Abstract") - public String getSummary() { return summary; } - = - @Lob - @Unstored - public String getText() { return text; } - = -} - - These annotations define an index with three fields: - id, Abstract and - text. Note that by default the field name is - decapitalized, following the JavaBean specification. - - Note: you must specify - @Keyword(id=3Dtrue) on the identifier property of y= our - entity class. - - Lucene has the notion of boost factor. It's= a - way to give more weigth to a field or to an indexed element over an ot= her - during the indexation process. You can use @Boost at - the field or the class level. - - The analyzer class used to index the elements is configurable - through the hibernate.lucene.analyzer property. If = none - defined, - org.apache.lucene.analysis.standard.StandardAnalyzer - is used as the default. + Hibernate Search can also use a Lucene + index to search an entity and return a (list of) managed entity saving= you + from the tedious Object / Lucene Document mapping and low level Lucene + APIs. The application code use the unified + org.hibernate.Query API exactly the way a HQL or + native query would be done.
=
Configuration =
- directory configuration + Directory configuration = - Lucene has a notion of Directory where the index is stored. The - Directory implementation can be customized but Lucene comes bundled = with - a file system and a full memory implementation. Hibernate Lucene has= the - notion of DirectoryProvider that handle the - configuration and the initialization of the Lucene Directory. + Apache Lucene has a notion of Directory where the index is sto= red. + The Directory implementation can be customized but Lucene comes bund= led + with a file system and a full memory implementation. + Hibernate Search has the notion of + DirectoryProvider that handle the configuration a= nd + the initialization of the Lucene Directory. = List of built-in Directory Providers @@ -103,19 +75,19 @@ = - org.hibernate.lucene.store.FSDirectoryProvider + org.hibernate.search.store.FSDirectoryProvider = File system based directory. The directory used will = be - <indexBase>/<@Index.name> + <indexBase>/<@Indexed.name> = indexBase: Base directory = - org.hibernate.lucene.store.RAMDirectoryProvider + org.hibernate.search.store.RAMDirectoryProvider = Memory based directory, the directory will be uniquely - indentified by the @Index.name + indentified by the @Indexed.name element = none @@ -132,17 +104,17 @@ Each indexed entity is associated to a Lucene index (an index = can be shared by several entities but this is not usually the case). You= can configure the index through properties prefixed by - hibernate.lucene.<indexname>. + hibernate.search.indexname. Default properties inherited to all indexes can be defined using the - prefix hibernate.lucene.default. + prefix hibernate.search.default. = To define the directory provider of a given index, you use the - hibernate.lucene.<indexname>.directory_provider + hibernate.search.indexname.dire= ctory_provider = - hibernate.lucene.default.directory_provider org.hibe= rnate.lucene.store.FSDirectoryProvider -hibernate.lucene.default.indexDir=3D/usr/lucene/indexes + hibernate.search.default.directory_provider org.hibe= rnate.search.store.FSDirectoryProvider +hibernate.search.default.indexDir=3D/usr/lucene/indexes = -hibernate.lucene.Rules.directory_provider org.hibernate.lucene.store.RAMDi= rectoryProvider +hibernate.search.Rules.directory_provider org.hibernate.search.store.RAMDi= rectoryProvider = applied on @@ -162,32 +134,537 @@ and base directory, and overide those default later on on a per index basis. = - Writing your own DirectoryProvider, you can benefit this - configuration mechanism too. + Writing your own DirectoryProvider, you= can + benefit this configuration mechanism too. = -
+
Enabling automatic indexing = - Finally, we enable the LuceneEventListener = for - the three Hibernate events that occur after changes are committed to= the + Finally, we enable the SearchEventListener = for + the three Hibernate events that occur after changes are executed to = the database. = <hibernate-configuration> ... - <event type=3D"post-commit-update" = - <listener = - class=3D"org.hibernate.lucene.event.LuceneEventListener"/> + <event type=3D"post-update" = + <listener class=3D"org.hibernate.search.event.FullTextIndexEven= tListener"/> </event> - <event type=3D"post-commit-insert" = - <listener = - class=3D"org.hibernate.lucene.event.LuceneEventListener"/> + <event type=3D"post-insert" = + <listener class=3D"org.hibernate.search.event.FullTextIndexEven= tListener"/> </event> - <event type=3D"post-commit-delete" = - <listener = - class=3D"org.hibernate.lucene.event.LuceneEventListener"/> + <event type=3D"post-delete" = + <listener class=3D"org.hibernate.search.event.FullTextIndexEven= tListener"/> </event> </hibernate-configuration>
+ +
+ Mapping entities to the index structure + + All the metadata information related to indexed entities is + described through some Java annotations. There is no need for xml mapp= ing + files nor a list of indexed entities. The list is discovered at startup + time scanning the Hibernate mapped entities. + + First, we must declare a persistent class as indexable. This is = done + by annotating the class with @Indexed (all entities= not + annotated with @Indexed will be ignored by the inde= xing + process): + + @Entity +@Indexed(index=3D"indexes/essays") +public class Essay { + ... +} + + The index attribute tells Hibernate what the + Lucene directory name is (usually a directory on your file system). If= you + wish to define a base directory for all Lucene indexes, you can use the + hibernate.search.default.indexDir property in your + configuration file. Each entity instance will be represented by a Luce= ne + Document inside the given index (aka + Directory). + + For each property (or attribute) of your entity, you have the + ability to describe how it will be indexed. The default (ie no annotat= ion) + means that the property is completly ignored by the indexing process. + @Field does declare a property as indexed. When + indexing an element to a Lucene document you can specify how it is + indexed: + + + + name: describe under which name, the prop= erty + should be stored in the Lucene Document. The default value is the + property name (following the JavaBeans convention) + + + + store: describe whether or not the proper= ty + is stored in the Lucene index. You can store the value + Store.YES (comsuming more space in the index), + store it in a compressed way Store.COMPRESS (th= is + does consume more CPU), or avoid any storage + Store.NO (this is the default value). When a + property is stored, you can retrieve it from the Lucene Document (= note + that this is not related to whether the element is indexed or + not). + + + + index: describe how the element is indexed (ie the process u= sed + to index the property and the type of information store). The + different values are Index.NO (no indexing, ie + cannot be found by a query), Index.TOKENIZED (u= se + an analyzer to process the property), + Index.UN_TOKENISED (no analyzer pre processing), + Index.NO_NORM (do not store the normalization + data). + + + + These attributes are part of the @Field + annotation. + + Whether or not you want to store the data depends on how you wis= h to + use the index query result. As of today, for a pure Hiber= nate + Search usage, storing is not necessary. Whether or not y= ou + want to tokenize a property or not depends on whether you wish to sear= ch + the element as is, or only normalized part of it. It make sense to + tokenize a text field, but it does not to do it for a date field (or a= n id + field). + + Finally, the id property of an entity is a special property used= by + Hibernate Search to ensure index unicity of= a + given entity. By design, an id has to be stored and must not be tokeni= zed. + To mark a property as index id, use the @DocumentId + annotation. + + @Entity +(a)Indexed(index=3D"indexes/essays") +public class Essay { + ... + + @Id + @DocumentId + public Long getId() { return id; } + = + @Field(name=3D"Abstract", index=3DIndex.TOKENI= ZED, store=3DStore.YES) + public String getSummary() { return summary; } + = + @Lob + @Field(index=3DIndex.TOKENIZED) + public String getText() { return text; } + = +} + + These annotations define an index with three fields: + id, Abstract and + text. Note that by default the field name is + decapitalized, following the JavaBean specification. + + Note: you must specify + @DocumentId on the identifier property of your enti= ty + class. + + Lucene has the notion of boost factor. It's= a + way to give more weigth to a field or to an indexed element over an ot= her + during the indexation process. You can use @Boost at + the field or the class level. + + @Entity +(a)Indexed(index=3D"indexes/essays") +@Boost(2) +public class Essay { + ... + + @Id + @DocumentId + public Long getId() { return id; } + = + @Field(name=3D"Abstract", index=3DIndex.TOKENIZED, store=3DStore.YES) + @Boost(2.5f) + public String getSummary() { return summary; } + = + @Lob + @Field(index=3DIndex.TOKENIZED) + public String getText() { return text; } + = +} + + In our example, Essay's probability to reach the top of the sear= ch + list will be multiplied by 2 and the summary field will be 2.5 more + important than the test field. Note that this explaination is actually + wrong, but it is simple and close enought to the reality. Please check= the + Lucene documentation or the excellent Lucene In + Action from Otis Gospodnetic and Erik Hatcher. + + The analyzer class used to index the elements is configurable + through the hibernate.search.analyzer property. If = none + defined, + org.apache.lucene.analysis.standard.StandardAnalyzer + is used as the default. +
+ +
+ Property/Field Bridge + + All field of a full text index in Lucene have to be represented = as + Strings. Ones Java properties have to be indexed in a String form. For + most of your properties, Hibernate Search d= oes + the translation job for you thanks to a built-in set of bridges. In so= me + cases, though you need a fine grain control over the translation + process. + +
+ Built-in bridges + + Hibernate Search comes bundled with a set of + built-in bridges between a Java property type and its full text + representation. + + Null elements are not indexed (Lucene does = not + support null elements and it does not make much sense either) + + + + null + + + null elements are not indexed. Lucene does not support n= ull + elements and this does not make much sense either. + + + + + java.lang.String + + + String are indexed as is + + + + + short, Short, integer, Integer, long, Long, float, Float, + double, Double, BigInteger, BigDecimal + + + Numbers are converted in their String representation. No= te + that numbers cannot be compared by Lucene (ie used in ranged + queries) out of the box: they have to be padded + Using a Range query is debattable and has drawbacks,= an + alternative approach is to use a Filter query which will + filter the result query to the appropriate range. + + Hibernate Search will sup= port + a padding mechanism + + + + + + java.util.Date + + + Dates are stored as yyyyMMddHHmmssSSS in GMT time + (200611072203012 for Nov 7th of 2006 4:03PM and 12ms EST). You + shouldn't really bother with the internal format. What is + important is that when using a DateRange Query, you should know + that the dates have to be expressed in GMT time. + + Usually, storing the date up to the milisecond is not + necessary. @DateBridge defines the appropri= ate + resolution you are willing to store in the index + (@DateBridge(resolution=3DResolution.DAY)). + The date pattern will then be truncated accordingly. + + @Entity @Indexed = +public class Meeting { + @Field(index=3DIndex.UN_TOKENIZED) + @DateBridge(resolution=3DResolution.MINUTE) + private Date date; + ... +} + + + A Date whose resolution is lower than + MILLISECOND cannot be a + @DocumentId + + + + + + +
+ +
+ Custom Bridge + + It can happen that the built-in bridges of Hibernate Search do= es + not cover some of your property types, or that the String representa= tion + used is not what you expect. + +
+ StringBridge + + The simpliest custom solution is to give Hibern= ate + Search an implementation of your expected + object to String bridge. To do so you need to + implements the + org.hibernate.search.bridge.StringBridge + interface + + /** + * Padding Integer bridge. + * All numbers will be padded with 0 to match 5 digits + * + * @author Emmanuel Bernard + */ +public class PaddedIntegerBridge implements String= Bridge { + + private int PADDING =3D 5; + + public String objectToString(Object object) { + String rawInteger =3D ( (Integer) object ).toString(); + if (rawInteger.length() > PADDING) throw new IllegalArgumentExc= eption( "Try to pad on a number too big" ); + StringBuilder paddedInteger =3D new StringBuilder( ); + for ( int padIndex =3D rawInteger.length() ; padIndex < PADDING= ; padIndex++ ) { + paddedInteger.append('0'); + } + return paddedInteger.append( rawInteger ).toString(); + } +} + + Then any property or field can use this bridge thanks to the + @FieldBridge annotation + + @FieldBridge(impl =3D Padd= edIntegerBridge.class) +private Integer length; + + Parameters can be passed to the Bridge implementation making= it + more flexible. The Bridge implementation implements a + ParameterizedBridge interface, and the + parameters are passed through the @FieldBridge + annotation. + + public class PaddedIntegerBridge implements String= Bridge, ParameterizedBridge { + + public static String PADDING_PROPERTY =3D "padding"; + private int padding =3D 5; //default + + public void setParameterValues(Map parameters)= { + Object padding =3D parameters.get( PADDING_PROPERTY ); + if (padding !=3D null) this.padding =3D (Integer) padding; + } + + public String objectToString(Object object) { + String rawInteger =3D ( (Integer) object ).toString(); + if (rawInteger.length() > padding) throw new IllegalArgumentExc= eption( "Try to pad on a number too big" ); + StringBuilder paddedInteger =3D new StringBuilder( ); + for ( int padIndex =3D rawInteger.length() ; padIndex < padding= ; padIndex++ ) { + paddedInteger.append('0'); + } + return paddedInteger.append( rawInteger ).toString(); + } +} + + +//property +(a)FieldBridge(impl =3D PaddedIntegerBridge.class, = + params =3D @Parameter(name=3D"padding", va= lue=3D"10") ) +private Integer length; + + The ParameterizedBridge interface can= be + implemented by StringBridge, + TwoWayStringBridge, + FieldBridge implementations (see + bellow). + + If you expect to use your bridge implementation on for an id + property (ie annotated with @DocumentId), you n= eed + to use a slightly extended version of StringBridge + named TwoWayStringBridge. Hibernate + Search needs to read the string representation of the + identifier and generate the object out of it. There is not differe= nce + in the way the @FieldBridge annotation is + used. + + public class PaddedIntegerBridge implements TwoWay= StringBridge, ParameterizedBridge { + + public static String PADDING_PROPERTY =3D "padding"; + private int padding =3D 5; //default + + public void setParameterValues(Map parameters) { + Object padding =3D parameters.get( PADDING_PROPERTY ); + if (padding !=3D null) this.padding =3D (Integer) padding; + } + + public String objectToString(Object object) { + String rawInteger =3D ( (Integer) object ).toString(); + if (rawInteger.length() > padding) throw new IllegalArgumentExc= eption( "Try to pad on a number too big" ); + StringBuilder paddedInteger =3D new StringBuilder( ); + for ( int padIndex =3D rawInteger.length() ; padIndex < padding= ; padIndex++ ) { + paddedInteger.append('0'); + } + return paddedInteger.append( rawInteger ).toString(); + } + + public Object stringToObject(String stringValu= e) { + return new Integer(stringValue); + } +} + + +//id property +(a)DocumentId +(a)FieldBridge(impl =3D PaddedIntegerBridge.class, + params =3D @Parameter(name=3D"padding", value=3D"10") ) +private Integer id; + + It is critically important for the two-way process to be + idempotent (ie object =3D stringToObject( objectToString( object )= ) + ). +
+ +
+ FieldBridge + + Some usecase requires more than a simple object to string + translation when mapping a property to a Lucene index. To give you + most of the flexibility you can also implement a bridge as a + FieldBridge. This interface give you a prop= erty + value and let you map it the way you want in your Lucene + Document.This interface is very similar in = its + concept to the Hibernate + UserType. + + You can for example store a given property in two different + document fields + + /** + * Store the date in 3 different field year, month, day + * to ease Range Query per year, month or day + * (eg get all the elements of december for the last 5 years) + * + * @author Emmanuel Bernard + */ +public class DateSplitBridge implements FieldBridge { + private final static TimeZone GMT =3D TimeZone.getTimeZone("GMT"); + + public void set(String name, Object value, Doc= ument document, Field.Store store, Field.Index index, Float boost) { + Date date =3D (Date) value; + Calendar cal =3D GregorianCalendar.getInstance( GMT ); + cal.setTime( date ); + int year =3D cal.get( Calendar.YEAR ); + int month =3D cal.get( Calendar.MONTH ) + 1; + int day =3D cal.get( Calendar.DAY_OF_MONTH ); + //set year + Field field =3D new Field( name + ".year", String.valueOf(year), s= tore, index ); + if ( boost !=3D null ) field.setBoost( boost ); + document.add( field ); + //set month and pad it if needed + field =3D new Field( name + ".month", month < 10 ? "0" : "" + S= tring.valueOf(month), store, index ); + if ( boost !=3D null ) field.setBoost( boost ); + document.add( field ); + //set day and pad it if needed + field =3D new Field( name + ".day", day < 10 ? "0" : "" + Strin= g.valueOf(day), store, index ); + if ( boost !=3D null ) field.setBoost( boost ); + document.add( field ); + } +} + + +//property +@FieldBridge(impl =3D DateSplitBridge.class) +private Integer length; + + +
+
+
+ +
+ Querying + + The second most important capability of Hibernate + Search is the ability to execute a Lucene query and retr= ieve + entities managed by an Hibernate session, providing the power of Lucene + without living the Hibernate paradygm, and giving another dimension to= the + Hibernate classic search mechanisms (HQL, Criteria query, native SQL + query). + + To access the Hibernate Search queryi= ng + facilities, you have to use an Hibernate + FullTextSession. A SearchSession wrap an regular + org.hibernate.Session to provide query and inde= xing + capabilities. + + Session session =3D sessionFactory.openSession(); +... +FullTextSession fullTextSession =3D Search.createFullTextSession(session);= + + The search facility is built on native Lucene queries. + + org.apache.lucene.QueryParser parser =3D new QueryPars= er("title", new StopAnalyzer() ); + +org.hibernate.lucene.search.Query luceneQuery =3D parser.parse( "summary:F= estina Or brand:Seiko" ); +org.hibernate.Query fullTextQuery =3D fullTextSess= ion.createFullTextQuery( luceneQuery ); + +List result =3D fullTextQuery.list(); //return a list of managed objects + + The Hibernate query built on top of the Lucene query is a regular + org.hibernate.Query, you are is the same paradygm as + the other Hibernate query facilities (HQL, Native or Criteria). The + regular list(), uniqueResult(), + iterate() and scroll() can be + used. + + If you expect a reasonnable result number and expect to work on = all + of them, list() or + uniqueResult() are recommanded. + list() work best if the entity + batch-size is set up properly. Note that Hibernate + Seach has to process all Lucene Hits elements when using + list(), uniqueResult() + and iterate(). If you wish to minimize Lucene + document loading, scroll() is more appropriat= e, + Don't forget to close the ScrollableResults obj= ect + when you're done, since it keeps Lucene resources. + + An efficient way to work with queries is to use pagination. The + pagination API is exactly the one available in + org.hibernate.Query: + + org.hibernate.Query fullTextQu= ery =3D fullTextSession.createFullTextQuery( luceneQuery ); +fullTextQuery.setFirstResult(30); +fullTextQuery.setMaxResult(20); +fullTextQuery.list(); //will return a list of 20 elements starting from th= e 30th + + Only the relevant Lucene Documents are accessed. +
+ +
+ Indexing + + It is sometimes useful to index an object event if this object is + not inserted nor updated to the database. This is especially true when= you + want to build your index the first time. You can achieve that goal usi= ng + the FullTextSession. + + FullTextSession fullTextSession =3D Search.createFullT= extSession(session); +Transaction tx =3D fullTextSession.beginTransaction(); +for (Customer customer : customers) { + fullTextSession.index(customer); +} +tx.commit(); //index are written at commit time + + For maximum efficiency, Hibernate Search batch index operations + which and execute them at commit time (Note: you don't need to use + org.hibernate.Transaction in a JTA + environment). +
\ No newline at end of file --===============5526014745254583156==--