[hibernate-dev] Hibernate Lucene massive rework

Emmanuel Bernard emmanuel at hibernate.org
Thu Nov 2 18:23:15 EST 2006


Hi all,
I have had time to work on Hibernate Lucene recently and finished the 
work I wanted to do. This is a major rework and will bring both API and 
index breaks, but for the good. This implements the core ideas that were 
floating around for a while.

*What's new

*o Index querying
Keeping the index uptodate is nice, but querying it is even better.
A LuceneSession has been introduced (a wrapper of an Hibernate session). 
You can now create a lucene query and get managed object back.
luceneSession.createLuceneQuery(luceneQuery).list(); //get all matching 
entities
   initialize all entities one by one: setting a sensitive batch-size is 
critical
   future evolution: use of the Hibernate Core fetch profile once it is 
exposed to the user
luceneSession.createLuceneQuery(luceneQuery, Book.class, 
Clock.class).iterate(); //get all matching entities of a given type
   iterate() read all the lucene index but initialize the entities on demand
luceneSession.createLuceneQuery(luceneQuery).scroll(); //use scrollable 
resultset to maximize performance when a subset i needed.
   scroll() keep the index opened (Hits) and let you wald through it, 
this methods is the most IO/memory efficient
   do not forget to close the Scrollable Resultset

All three query methods support setFirstResult/setMaxResult. As a matter 
of fact, the LuceneQuery return is an implementation of 
org.hibernate.Query, so your code is unaware of Lucene

o object (re)indexation.
You can now index an object, even if you do not apply any change on it. 
The index operation is batched to maximize speed. 
(luceneSession.index()): If no tx is in progress, the indexing is done 
immediately, otherwise the operations are batched and done right after 
the transaction commit.

o FieldBridge
Like the Hibernate UserType, a FieldBridge is an interface aiming to do 
the translation work between a property and it's indexed (ie String) 
representation. This interface is very flexible and even allows you to 
map a property into several index fields.
For the simple cases: ie most cases, a StringBridge has been introduced, 
it convert your property into a String to be indexed. The API is much 
simpler to implement, so I expect most of the custom bridge to use this 
approach

o Built-in bridges
There is a built-in support for Date (with resolution), Numbers (ie 
java.lang.Number and its subclasses), and String.
I'm willing to expand the support, please tell me what you need

o New event listener / lucene interaction
The Event listener has been reimplemented. It is now threadsafe (ie it 
does not depends on the underlying Directory locking mechanism - in a 
single VM).
It fixes a flaw in the previous implementation that indexed entities 
even when the transaction was rollbacked (yuk!). You should no longer 
use the post-commit-* events but the post-* events,
If no tx is in progress, the indexing is done immediately, otherwise the 
operations are done right after the transaction commit.
This reimplementation opens the doors to:
 - a better batching system (need some adjustments in Hibernate Core)
 - the ability to delegate the actual indexation to a remote machine 
(through JMS or any other messaging mechanism).
I say 'opens the doors' because the actual implementation is not there 
yet (but is not hard I think).

o New annotations
First of all the project has be repackaged, the annotations are now all 
under org.hibernate.lucene.annotations
Second, the previous annotations have been deprecated to align with the 
Lucene 2.0 APIs @DocumentId and @Field() are now to be used. The old 
ones are still here but will be removed in a future release
You can also annotate a property to use a custom field bridge 
(@FieldBridge) and inject parameters
@DateBridge allows you do define the resolution (YEAR, MONTH, ...) of a 
Date to be indexed
@Boost can be defined on an entity and on a property


o Support for annotated fields
Only annotated properties were supported, you now can annotate fields as 
well

o indexing
As described earlier, the interaction with Lucene has been reworked to 
allow better efficiency.
Several entities per index, as well a class hierarchy indexing is now 
supported

o DirectoryProvider
Still present a pluggable directory provider with 2 default 
implementations (Memory and File system).


*When do we get it and Feedbacks?*
I expect to release all this right after Hibernate Core 3.2.1 and as 
soon as I update the documentation.
Please have a look at the API and the feature set. Nothing is cast in 
stone yet. I'll follow up with a What's next email.
Can I have a preview? Yes, you can get the code from
http://anonsvn.jboss.org/repos/hibernate/branches/Lucene_Integration/
(you need too get Hibernate Core from trunk of branch_3_2
I also have uploaded a snapshot version of the javadoc 
http://www.hibernate.org/~emmanuel/lucenesnapshot20061102/doc/api/ check 
for the org.hibernate.lucene.* packages


Question:
Should it be part of the Hibernate Annotation 3.2.x series?
Hibernate Lucene is considered experimental (ie still evolving). It does 
break the applications using Hibernate Lucene right now, but the 
migration will bring a big plus.
If I release it as part of the  3.3.x series that binds me to a Core 
release and will delay the adoption.
Maybe it is time for a separate package (event if I don't think this 
will really solve the problem)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/hibernate-dev/attachments/20061102/17c12cd0/attachment.html 


More information about the hibernate-dev mailing list