[hibernate-dev] moving Hibernate Search to Lucene 2.9
Sanne Grinovero
sanne.grinovero at gmail.com
Mon Dec 14 07:59:08 EST 2009
inline answers:
2009/12/14 Emmanuel Bernard <emmanuel at hibernate.org>:
>
> On 14 déc. 2009, at 11:15, Sanne Grinovero wrote:
>
>> Even though I'm ok with breaking index compatibility,
>> it looks like option "2" will not break compatibility per-se
>> (other features might) as they state the same compression algorithm is
>> being used, just
>> they don't apply it automatically anymore (and the Store.Compress
>> isn't defined in 3.0)
>
> So today (2.4) they use a binary format to store the compressed data?
exactly
>
>> Actually option "1" (dropping support for Compression) would break it,
>> in case you are
>> having compressed field.
>
> right unless we simulate what they did before exactly.
> What's the rational for dropping the support on their side?
>
It was dropped because Lucene was doing behind the scenes something
that the user could do himself.
Having the user explicitly control the compression strategy enables
him to select a specialized compression algorithm best suited for the
kind of data. Previously java.util.zip.Deflater was used, which is not
always the optimal choice. Also the same field was added multiple
times by the IndexWriter (one indexed, one stored compressed), being
unexpected by users when not aware of the implementation.
>>
>> The numbers/dates indexing and new rangequeries will force us to drop
>> backwards compatibility,
>> unless we implement those later on keeping the old broken strategy.
>
> You lost me, what is that?
You'll find a better explanation in the papers in your dropbox; in
short: a date is split in multiple fields, like year-month-day (it's
doing better actually, I'm keeping it simple) and so the number of
terms is reduced, making a rangequerey not blow up on a tooManyClauses
and it also appears to perform better. In this case one field is
actually indexed as more fields. The new RangeQuery is smart enough to
rewrite himself to a correct boolean query considering the correct
fields.
As there are many changes already in 3.2 I'm fine moving these
improvements further, even if solving the RangeQuery problem is very
cool. That's why is was referring to "the old broken strategy": it's
not really broken but the new one spares some trouble as you can
forget about the tooManyClouses issue (not sure if that is a good
thing).
WDYT?
More information about the hibernate-dev
mailing list