Hibernate Lucene: What's next
by Emmanuel Bernard
Hi again,
Here are some ideas I have in mind for Hibernate Lucene
o User feedback
First and most important, I need and will use user feedback on the
changes: Usability, Use case coverage, etc
I know our model work well for non complex querying requirement, I would
like to see it evolving to enhance flexibility to cope with more complex
ones. The design of Hibernate Lucene tries to not hide the Lucene APIs
LuceneSession currecly use the delegation pattern, is it OK any better
idea? Right now you have to do new LuceneSession(session);
o More built-in types
I want to increase the number of built in types, speak up!
o Distributed Directory(Provider)
Using JBossCache as a distributed Lucene Directory is something I have
in mind.
o Distributed read / centralized update
Push the indexing work to a centralized machine and do in asynchronously.
This is a fairly common architectural practice for Lucene.
o Analyzer
Today we have one analyzer per DirectoryProvider. This is most likely
too coarse grained
o Better integration of the bridge system and the query mechanism
A subclass of the lucene QueryParser might be a good candidate to hook
into some params translation between the user view and the index view
Alternatively, query parameters is a good candidate
o Filtered Lucene query (Ales idea)
Apply filters on all elements queried through lucene, basically an
additional BooleanQuery to the initial query
o More access to the Lucene information
When a query is executed by Lucene, you can have access to:
- the number of results
- the normalized score of a given document
It might be interesting to give access to this information
o Use of the lucene index to build the object (ie avoid DB access)
http://www.mail-archive.com/hibernate-devel@lists.sourceforge.net/msg0601...
o Default QueryParser
Right now you need a query parser, to create a lucene query than pass
this query to the luceneSession. This need to be simplified in the
default cases
Some additional elements are available on JIRA. Look for opened issues
on HibernateAnnotation/lucene
o Name you needed feature here...
18 years, 1 month
Hibernate Lucene massive rework
by Emmanuel Bernard
Hi all,
I have had time to work on Hibernate Lucene recently and finished the
work I wanted to do. This is a major rework and will bring both API and
index breaks, but for the good. This implements the core ideas that were
floating around for a while.
*What's new
*o Index querying
Keeping the index uptodate is nice, but querying it is even better.
A LuceneSession has been introduced (a wrapper of an Hibernate session).
You can now create a lucene query and get managed object back.
luceneSession.createLuceneQuery(luceneQuery).list(); //get all matching
entities
initialize all entities one by one: setting a sensitive batch-size is
critical
future evolution: use of the Hibernate Core fetch profile once it is
exposed to the user
luceneSession.createLuceneQuery(luceneQuery, Book.class,
Clock.class).iterate(); //get all matching entities of a given type
iterate() read all the lucene index but initialize the entities on demand
luceneSession.createLuceneQuery(luceneQuery).scroll(); //use scrollable
resultset to maximize performance when a subset i needed.
scroll() keep the index opened (Hits) and let you wald through it,
this methods is the most IO/memory efficient
do not forget to close the Scrollable Resultset
All three query methods support setFirstResult/setMaxResult. As a matter
of fact, the LuceneQuery return is an implementation of
org.hibernate.Query, so your code is unaware of Lucene
o object (re)indexation.
You can now index an object, even if you do not apply any change on it.
The index operation is batched to maximize speed.
(luceneSession.index()): If no tx is in progress, the indexing is done
immediately, otherwise the operations are batched and done right after
the transaction commit.
o FieldBridge
Like the Hibernate UserType, a FieldBridge is an interface aiming to do
the translation work between a property and it's indexed (ie String)
representation. This interface is very flexible and even allows you to
map a property into several index fields.
For the simple cases: ie most cases, a StringBridge has been introduced,
it convert your property into a String to be indexed. The API is much
simpler to implement, so I expect most of the custom bridge to use this
approach
o Built-in bridges
There is a built-in support for Date (with resolution), Numbers (ie
java.lang.Number and its subclasses), and String.
I'm willing to expand the support, please tell me what you need
o New event listener / lucene interaction
The Event listener has been reimplemented. It is now threadsafe (ie it
does not depends on the underlying Directory locking mechanism - in a
single VM).
It fixes a flaw in the previous implementation that indexed entities
even when the transaction was rollbacked (yuk!). You should no longer
use the post-commit-* events but the post-* events,
If no tx is in progress, the indexing is done immediately, otherwise the
operations are done right after the transaction commit.
This reimplementation opens the doors to:
- a better batching system (need some adjustments in Hibernate Core)
- the ability to delegate the actual indexation to a remote machine
(through JMS or any other messaging mechanism).
I say 'opens the doors' because the actual implementation is not there
yet (but is not hard I think).
o New annotations
First of all the project has be repackaged, the annotations are now all
under org.hibernate.lucene.annotations
Second, the previous annotations have been deprecated to align with the
Lucene 2.0 APIs @DocumentId and @Field() are now to be used. The old
ones are still here but will be removed in a future release
You can also annotate a property to use a custom field bridge
(@FieldBridge) and inject parameters
@DateBridge allows you do define the resolution (YEAR, MONTH, ...) of a
Date to be indexed
@Boost can be defined on an entity and on a property
o Support for annotated fields
Only annotated properties were supported, you now can annotate fields as
well
o indexing
As described earlier, the interaction with Lucene has been reworked to
allow better efficiency.
Several entities per index, as well a class hierarchy indexing is now
supported
o DirectoryProvider
Still present a pluggable directory provider with 2 default
implementations (Memory and File system).
*When do we get it and Feedbacks?*
I expect to release all this right after Hibernate Core 3.2.1 and as
soon as I update the documentation.
Please have a look at the API and the feature set. Nothing is cast in
stone yet. I'll follow up with a What's next email.
Can I have a preview? Yes, you can get the code from
http://anonsvn.jboss.org/repos/hibernate/branches/Lucene_Integration/
(you need too get Hibernate Core from trunk of branch_3_2
I also have uploaded a snapshot version of the javadoc
http://www.hibernate.org/~emmanuel/lucenesnapshot20061102/doc/api/ check
for the org.hibernate.lucene.* packages
Question:
Should it be part of the Hibernate Annotation 3.2.x series?
Hibernate Lucene is considered experimental (ie still evolving). It does
break the applications using Hibernate Lucene right now, but the
migration will bring a big plus.
If I release it as part of the 3.3.x series that binds me to a Core
release and will delay the adoption.
Maybe it is time for a separate package (event if I don't think this
will really solve the problem)
18 years, 1 month