[hibernate-dev] [Hibernate Search] DocValues and Sorting API -> new mapping annotations ?

Wed Aug 5 05:41:30 EDT 2015

Hi,

I think a great solution for that would be to leverage "annotation
composition" as done e.g. in CDI and Bean Validation.

There would be an annotation @DocValuesField which will cause the creation
of a DocValues and which would expose attributes required for its
configuration:

    @DocValuesField(name="foo", ... )
    private String bar;

Besides using @DocValuesField itself directly, it would be usable as a
meta-annotation on "doc value annotation types":

    @DocValuesField
    public @interface SortField {

        @OverridesAttribute(name="name")
        String name();
    }

And then its usage:

    @SortField
    private String bar;

Of course such doc value annotation types (I think @SortField, @Facet and
@DocumentId could be modelled as that) would only expose those degrees of
freedom needed for a specific use case.

Most users would only use these more abstract, easy-to-use specific purpose
annotations. But others could use @DocValuesField directly to create custom
doc values or they could even create their own, domain-specific doc value
annotation type.

Cheers,

--Gunnar

2015-08-04 18:00 GMT+02:00 Sanne Grinovero <sanne at hibernate.org>:

> Hi Guillaume,
> thanks! great input. Some comments inline:
>
> On 4 August 2015 at 15:11, Guillaume Smet <guillaume.smet at gmail.com>
> wrote:
> > Hi Sanne,
> >
> > On Wed, Jul 29, 2015 at 1:26 PM, Sanne Grinovero <sanne at hibernate.org>
> > wrote:
> >>
> >> I'm not sure if this should be extending the @Field annotation as
> >> there are special restrictions implied in terms of analysis: are we
> >> going to enforce a specific type of tokenizer, or simply take the
> >> analysis option away?
> >
> >
> > You can't remove the analysis option away: it's often used to normalize
> > sorting on strings (lowercase, remove accents, remove special characters
> and
> > so on).
>
> Right we made this same example in a recent meeting we had on this same
> subject.
> So that's what makes it tricky: we want to allow Analysis, but while
> Lucene needs a strong guarantee that it will be unique, we can't
> really verify for that unless we take away the liberty to use any
> analyzer.
> An alternative would be to wrap the Analyzer to monitor and verify it
> to be "well-behaved" but I'm not sure if that's doable, or if the
> performance would be negligible. I guess we'll just put it into user's
> hands to make a sensible choice.. not that we've done better so far on
> this aspect.
>
> > FWIW, we use specific fields for sorting each time we need to sort on a
> > string as we don't want to tokenize the string (but not for numerics and
> > dates). Maybe @SortFields/@SortField annotations would be in order (I
> don't
> > like Sortable as I don't think it's a good idea to use these fields for
> > search).
>
> I like that name proposal, and +1 to not encourage people to try reuse
> the same field for sorting and indexing.
>
> The next action for us is to verify what the performance impact is of
> the current approach, which is based on the UninvertingReader from
> lucene-misc. Gunnar pointed out that uninverting and loading into a
> FieldCache is not very different than what Lucene has been doing so
> far, so that might be a good strategy to allow migrating to Lucene 5
> incrementally, and provide an incremental improvement in this area
> rather than requiring the new mapping.
>
> I'll soon merge this approach, and as usual I'm lacking on real-world
> applications to benchmark so if you're interested in helping on that
> that would be awesome; we just need to know that the new code won't be
> significantly slower than the Lucene 4 based strategies for sorted
> queries.
>
> Thanks,
> Sanne
> _______________________________________________
> hibernate-dev mailing list
> hibernate-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>