[hibernate-dev] [Hibernate Search] DocValues and Sorting API -> new mapping annotations ?

Sanne Grinovero sanne at hibernate.org
Fri Aug 7 08:14:27 EDT 2015


A quick update on some more exploration on this:
it turns out sorting on a NumericField when this field is also using
an "indexNullAs" token gets the UninvertingReader approach to throw an
exception.
My two conclusions:
 - we need to move away from supporting those tokens in NumericField,
especially as stricter schema is coming in next gen Lucene
 - yet another reason to clearly separate fields meant for searching vs sorting

-- Sanne

On 5 August 2015 at 10:41, Gunnar Morling <gunnar at hibernate.org> wrote:
> Hi,
>
> I think a great solution for that would be to leverage "annotation
> composition" as done e.g. in CDI and Bean Validation.
>
> There would be an annotation @DocValuesField which will cause the creation
> of a DocValues and which would expose attributes required for its
> configuration:
>
>     @DocValuesField(name="foo", ... )
>     private String bar;
>
> Besides using @DocValuesField itself directly, it would be usable as a
> meta-annotation on "doc value annotation types":
>
>     @DocValuesField
>     public @interface SortField {
>
>         @OverridesAttribute(name="name")
>         String name();
>     }
>
> And then its usage:
>
>     @SortField
>     private String bar;
>
>
> Of course such doc value annotation types (I think @SortField, @Facet and
> @DocumentId could be modelled as that) would only expose those degrees of
> freedom needed for a specific use case.
>
> Most users would only use these more abstract, easy-to-use specific purpose
> annotations. But others could use @DocValuesField directly to create custom
> doc values or they could even create their own, domain-specific doc value
> annotation type.
>
> Cheers,
>
> --Gunnar
>
>
>
>
> 2015-08-04 18:00 GMT+02:00 Sanne Grinovero <sanne at hibernate.org>:
>>
>> Hi Guillaume,
>> thanks! great input. Some comments inline:
>>
>> On 4 August 2015 at 15:11, Guillaume Smet <guillaume.smet at gmail.com>
>> wrote:
>> > Hi Sanne,
>> >
>> > On Wed, Jul 29, 2015 at 1:26 PM, Sanne Grinovero <sanne at hibernate.org>
>> > wrote:
>> >>
>> >> I'm not sure if this should be extending the @Field annotation as
>> >> there are special restrictions implied in terms of analysis: are we
>> >> going to enforce a specific type of tokenizer, or simply take the
>> >> analysis option away?
>> >
>> >
>> > You can't remove the analysis option away: it's often used to normalize
>> > sorting on strings (lowercase, remove accents, remove special characters
>> > and
>> > so on).
>>
>> Right we made this same example in a recent meeting we had on this same
>> subject.
>> So that's what makes it tricky: we want to allow Analysis, but while
>> Lucene needs a strong guarantee that it will be unique, we can't
>> really verify for that unless we take away the liberty to use any
>> analyzer.
>> An alternative would be to wrap the Analyzer to monitor and verify it
>> to be "well-behaved" but I'm not sure if that's doable, or if the
>> performance would be negligible. I guess we'll just put it into user's
>> hands to make a sensible choice.. not that we've done better so far on
>> this aspect.
>>
>> > FWIW, we use specific fields for sorting each time we need to sort on a
>> > string as we don't want to tokenize the string (but not for numerics and
>> > dates). Maybe @SortFields/@SortField annotations would be in order (I
>> > don't
>> > like Sortable as I don't think it's a good idea to use these fields for
>> > search).
>>
>> I like that name proposal, and +1 to not encourage people to try reuse
>> the same field for sorting and indexing.
>>
>> The next action for us is to verify what the performance impact is of
>> the current approach, which is based on the UninvertingReader from
>> lucene-misc. Gunnar pointed out that uninverting and loading into a
>> FieldCache is not very different than what Lucene has been doing so
>> far, so that might be a good strategy to allow migrating to Lucene 5
>> incrementally, and provide an incremental improvement in this area
>> rather than requiring the new mapping.
>>
>> I'll soon merge this approach, and as usual I'm lacking on real-world
>> applications to benchmark so if you're interested in helping on that
>> that would be awesome; we just need to know that the new code won't be
>> significantly slower than the Lucene 4 based strategies for sorted
>> queries.
>>
>> Thanks,
>> Sanne
>> _______________________________________________
>> hibernate-dev mailing list
>> hibernate-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>
>


More information about the hibernate-dev mailing list