[hibernate-dev] difficulties in upgrading to Lucene 3
Emmanuel Bernard
emmanuel at hibernate.org
Mon Jan 11 11:49:06 EST 2010
Right, go forward and temporarily break COMPRESS.
Option B (ie use CompressionTools) seems like a better option than getting rid of it altogether.
I think we should offer a pluggable compression method but I would not mind it being global and thus not require change to the enum.
On 11 janv. 2010, at 00:39, Sanne Grinovero wrote:
> Hello again,
> it got better I've a simple patch ready to migrate to 2.9.1, all tests green;
> it was easier than I initially thought to fix the new DocIdSet
> shall I apply it?
> We can then think about the compression issue and after that move 3.0.x
>
> I've moved HSEARCH-424 "Update to Lucene 3.0" from beta2 to beta3
> as beta3 is containing already stuff like HSEARCH-415 "Consider moving
> to Lucene 2.9"
> (didn't have sense the other way around)
>
> Cheers,
> Sanne
>
> 2010/1/10 Sanne Grinovero <sanne.grinovero at gmail.com>:
>> Hello all,
>> I've been thinking about the strategy to upgrade to Lucene 3;
>> Ignoring new features at the moment, the main issues in migration:
>>
>> 1- Store.COMPRESS not supported anymore
>> 2- Some Analyzers and the QueryParser require an additional
>> constructor parameter
>> 3- DocIdSet interface (used in filters) changed - changed even in
>> Lucene 2.9.x, making step-by-step migration harder
>>
>> While point 2 is not a great problem (I'm having a patch ready);
>> points 1 and 3 are connected: DocIdSet must be solved as soon as we
>> move to 2.9, while 2.9 is a requirement to implement COMPRESS in a
>> different way if we choose to:
>> It appears we can't maintain binary index compatibility, but
>> supporting the feature is an option.
>> Lucene 3 will transparently decompress an old-style compressed field
>> when reading it and it will even decompress all fields during
>> optimization, effectively transforming the index to the new format.
>>
>> If we want to still support the contract of
>> org.hibernate.search.annotations.Store.COMPRESS we will have to
>> compress ourself the field, possibly using a pluggable strategy;
>> assuming the use of org.apache.lucene.document.CompressionTools as
>> default implementation we can provide a backwards compatible-API but
>> the resulting index is going to have a different format.
>>
>> A future improvement could be to use any external
>> compression/decompression function (user provided implementation), any
>> idea where? Maybe replace the Store enum with an interface?
>>
>> What should I do to solve HSEARCH-425 ?
>> The options I've considered so far:
>>
>> A) Deprecate the Store.COMPRESS, without providing an alternative
>>
>> B) Change implementation to make use of Lucene's CompressionTools
>>
>> CompressionTools only exist since Lucene 2.9, so an upgrade is
>> mandatory but other features are going to break, like filters
>> (org.hibernate.search.filter.AndDocIdSet needs to implement an updated
>> interface)
>> So basically I'll need a branch, break some tests temporarily, or
>> provide a single huge patch refactoring some features and tests at
>> same time :-/
>> An alternative to branching would be to solve the Compress issue
>> later, and focus on the build breaking changes first; in practice this
>> would break the compression feature until it's fixed, but this
>> shouldn't be a great problem as it going to change anyway...
>>
>> WDYT?
>>
>> I'm working on the new DocIdSet, even that will be a considerable change.
>>
>> Cheers,
>> Sanne
>>
> _______________________________________________
> hibernate-dev mailing list
> hibernate-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev
More information about the hibernate-dev
mailing list