[hibernate-dev] difficulties in upgrading to Lucene 3

Sanne Grinovero sanne.grinovero at gmail.com
Sun Jan 10 14:09:42 EST 2010


Hello all,
I've been thinking about the strategy to upgrade to Lucene 3;
Ignoring new features at the moment, the main issues in migration:

1- Store.COMPRESS not supported anymore
2- Some Analyzers and the QueryParser require an additional
constructor parameter
3- DocIdSet interface (used in filters) changed - changed even in
Lucene 2.9.x, making step-by-step migration harder

While point 2 is not a great problem (I'm having a patch ready);
points 1 and 3 are connected: DocIdSet must be solved as soon as we
move to 2.9, while 2.9 is a requirement to implement COMPRESS in a
different way if we choose to:
It appears we can't maintain binary index compatibility, but
supporting the feature is an option.
Lucene 3 will transparently decompress an old-style compressed field
when reading it and it will even decompress all fields during
optimization, effectively transforming the index to the new format.

If we want to still support the contract of
org.hibernate.search.annotations.Store.COMPRESS we will have to
compress ourself the field, possibly using a pluggable strategy;
assuming the use of org.apache.lucene.document.CompressionTools as
default implementation we can provide a backwards compatible-API but
the resulting index is going to have a different format.

A future improvement could be to use any external
compression/decompression function (user provided implementation), any
idea where? Maybe replace the Store enum with an interface?

What should I do to solve HSEARCH-425 ?
The options I've considered so far:

A) Deprecate the Store.COMPRESS, without providing an alternative

B) Change implementation to make use of Lucene's CompressionTools

CompressionTools only exist since Lucene 2.9, so an upgrade is
mandatory but other features are going to break, like filters
(org.hibernate.search.filter.AndDocIdSet needs to implement an updated
interface)
So basically I'll need a branch, break some tests temporarily, or
provide a single huge patch refactoring some features and tests at
same time :-/
An alternative to branching would be to solve the Compress issue
later, and focus on the build breaking changes first; in practice this
would break the compression feature until it's fixed, but this
shouldn't be a great problem as it going to change anyway...

WDYT?

I'm working on the new DocIdSet, even that will be a considerable change.

Cheers,
Sanne



More information about the hibernate-dev mailing list