Hello again,
it got better I've a simple patch ready to migrate to 2.9.1, all tests green;
it was easier than I initially thought to fix the new DocIdSet
shall I apply it?
We can then think about the compression issue and after that move 3.0.x
I've moved HSEARCH-424 "Update to Lucene 3.0" from beta2 to beta3
as beta3 is containing already stuff like HSEARCH-415 "Consider moving
to Lucene 2.9"
(didn't have sense the other way around)
Cheers,
Sanne
2010/1/10 Sanne Grinovero <sanne.grinovero(a)gmail.com>:
Hello all,
I've been thinking about the strategy to upgrade to Lucene 3;
Ignoring new features at the moment, the main issues in migration:
1- Store.COMPRESS not supported anymore
2- Some Analyzers and the QueryParser require an additional
constructor parameter
3- DocIdSet interface (used in filters) changed - changed even in
Lucene 2.9.x, making step-by-step migration harder
While point 2 is not a great problem (I'm having a patch ready);
points 1 and 3 are connected: DocIdSet must be solved as soon as we
move to 2.9, while 2.9 is a requirement to implement COMPRESS in a
different way if we choose to:
It appears we can't maintain binary index compatibility, but
supporting the feature is an option.
Lucene 3 will transparently decompress an old-style compressed field
when reading it and it will even decompress all fields during
optimization, effectively transforming the index to the new format.
If we want to still support the contract of
org.hibernate.search.annotations.Store.COMPRESS we will have to
compress ourself the field, possibly using a pluggable strategy;
assuming the use of org.apache.lucene.document.CompressionTools as
default implementation we can provide a backwards compatible-API but
the resulting index is going to have a different format.
A future improvement could be to use any external
compression/decompression function (user provided implementation), any
idea where? Maybe replace the Store enum with an interface?
What should I do to solve HSEARCH-425 ?
The options I've considered so far:
A) Deprecate the Store.COMPRESS, without providing an alternative
B) Change implementation to make use of Lucene's CompressionTools
CompressionTools only exist since Lucene 2.9, so an upgrade is
mandatory but other features are going to break, like filters
(org.hibernate.search.filter.AndDocIdSet needs to implement an updated
interface)
So basically I'll need a branch, break some tests temporarily, or
provide a single huge patch refactoring some features and tests at
same time :-/
An alternative to branching would be to solve the Compress issue
later, and focus on the build breaking changes first; in practice this
would break the compression feature until it's fixed, but this
shouldn't be a great problem as it going to change anyway...
WDYT?
I'm working on the new DocIdSet, even that will be a considerable change.
Cheers,
Sanne