[hibernate-dev] Hibernate Search 3.5 or 4

Sanne Grinovero sanne at hibernate.org
Wed Apr 20 04:21:57 EDT 2011


Hi,
About changing contracts, we don't get this chance very often so we
should make sure we don't miss any.
I have some favourites I'd like to discuss:

- work list sent to backend
 -- As you know Lucene dropped all guarantees about serializability,
supporting stuff like JMS requires a format change; especially the
NumericField is not working right now as it was never serializable
(HSEARCH-681)
 -- Lucene is being more flexible about updates, I don't think we
should keep remapping an "update" operation as a delete+add operation,
but transmit the "update operation" and let the backend figure out
what's best.

 - DirectoryProvider
  -- make a "DirectoryManager" instead, which is able to provide
factories for both IndexReader an IndexWriters
  -- add utility methods like "getName()", wish I had that in some
cases to provide better error messages. This leads me to think that
instead of trying to foresee all needed methods, the extension point
should not be the DirectoryManager interface directly, but have people
plug in different aspects.
 -- this is needed to support both Instantiated indexes and to make
good use of all new so called "Near-Real-Time" Lucene improvements.

 - ReaderProvider
 -- (assuming should a thing would still exist): I think it would be
very nice if the responsibility of such a provider would be to provide
the IndexReader for a single index. currently it has to provide a
"multiReader" on each different index, making some implementations
very tricky (seems I got it right in SharingBufferReaderProvider, but
I recently had some other interesting ideas which revelaed quite
dounting after a draft: take responsibility of the FieldCache expiry
directly, to be able to plug different cache implementations, we
control the lifecycle and we can be much smarter).

 - backends and workers
  -- I'd like to make it possible to configure different backends per
index. currently a backend is global, while in some cases (extreme) it
would have been hand to configure even single shards to different
backends. So really a backend should be something coupled to the
"DirectoryManager" mentioned before. Question is, at what level is
sharding going to work, it could work as a multiplexing
DirectoryManager.

-- defaults to change:
 - remove the notions of transactional / batch IndexWriter setting,
was deprecated since long enough.
 - make the FullTextEventLister final (people still extent and replace
it to better control when an entity is to be indexed, but I hope we
can solve that as well)
 - default to NumericField for numeric properties
 - set exclusive_index_use=true by default, benefits are far too high
and some optimizations I was thinking of are impossible if this is
disabled.

-- bridges
 - It happened many times that we couldn't do X or optimize Y as "user
bridge might read/write any field"; I think we should stop exposing
the o.a.lucene.Document - especially since we change the format of
messages to the backend - and make sure to expose something as good
and as flexible. Need some thinking on this: we can't expose Document
but we want to make sure people won't ever miss advanced features for
which such a bridge was a nice "advanced api". Or we split the
conteps, having a less-powerful API and a more advanced one, which
could be named, and operate on the Document itself but inside the
backend rather than in the DocumentBuilder (so the name could be used
in the message to the backend to point to some transformer to apply
for final touches - it could be a customization of the implementation
which applies the message in our own format to the
o.a.lucene.Document)

 - at some point, we'll need to track also which entity properties are
being "read" by a custom ClassBridge/DynamicBoost, to better check for
index dirtyness. Might be done by proxying the entity, or just having
the implementation declare by which properties it's affected: in this
case, an API change is needed but this can possibly be postponed.


this is just out the top of my head, I'm sure I forgot to break some
interface ;)
I'll give you some time to think about it, then I'll insert the
proposals which survived in the wiki & JIRA.
(needles to say, no objections on your proposals)

Cheers,
Sanne


2011/4/20 Emmanuel Bernard <emmanuel at hibernate.org>:
> Hi,
>
> We have had in our road map an Hibernate Search 3.5 before Hibernate 4. Hibernate 4 is the release where the following should happen:
>  - split packages into API, SPI and private packages
>  - use JBoss Logging
>  - be compliant with Core 4
>  - break whatever contract we need to break to open up the future
>  - split dependency between the core of Hibernate Search and Hibernate Core
>
> Do you see more task for 4?
>
> Since Hibernate Core 4 seems to be doing alright and that the time pressure will be strong to get Hibernate Search aligned, I propose to skip 3.5 entirely and focus on 4. We did not that that many new features planned anyways for 3.5, it was more a consolidation release.
>
> Even with skipping 3.5, the 4 release will be a lot of work. We should start early. Any objection or comment?
>
> Changing contracts
> We have had a few contracts that we wanted to change to make way for future improvements:
>  - should a bridge know about the field it changes (make the optimization more efficient)
>  - rework the backend to let IndexReader and IndexWriter communicate
>  - rework the backend to support instantiated IndexReaders
>
> Can you help collect the list of changes you would like to see happening?
>
> I would like to get this work started asap, this is really the unknown quantity and we tend to be slow to converge on the things
>
> Split packages in API/SPI/private packages
> Hibernate 4 is the ideal time to properly split stuff into API, SPI, private. Moving classes to private packages is the least impacting move for users as these should not be used. The API / SPI split is sometimes difficult to do so if you have a doubt in an area, ask on the ML or on IRC and we can discuss it together. If you need an example, check out the query engine. It is relatively clean now.
>
> We might have to break a few user APIs which is fine but I don't expect too many will be necessary:
>  - make sure to discuss it when you plan to do one
>  - list them in the migration guide
>
> I'd say that the package splitting should be done when you have a change and when you work in a specific area. It's more a background task.
>
> Be compliant with Core 4
> We can do this one a bit later in the cycle to give time for core to mature.
>
> Split dependency between Hibernate Search and Hibernate Core
> I think in practice we are not too far. This work should be done in parallel to the package splitting. If you look at the query engine, we do have specific hibernate packages. We also have a HibernateHelper class of all low level Hibernate contracts like unproxying, initializing etc. We should use that class everywhere instead of relying on the direct Hibernate Core contracts. That will help up to move this class as an implementable contract.
> The next step potentially is to actually move Hibernate Core specific code into a separate package.
>
> I don't have much opinion on this but we should definitively discuss it.
>
> Use JBoss Logging
> I tend to think we should do this migration late in the game. WDYT?
>
> New features
> Do you want any new feature per se? I think this would be a great time to get the community involved to back new features and fix bugs while we do the grunt work for 4. So if you know some shy people motivated or if you are one of them, stand up :)
>
> Note: I have create a vague copy of this email in http://community.jboss.org/wiki/PlansforHibernateSearch4
> We can discuss via email but be sure to add the feedback or list of todos in the wiki as well for posterity.
> _______________________________________________
> hibernate-dev mailing list
> hibernate-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>




More information about the hibernate-dev mailing list