A quick update on some more exploration on this:
it turns out sorting on a NumericField when this field is also using
an "indexNullAs" token gets the UninvertingReader approach to throw an
exception.
My two conclusions:
- we need to move away from supporting those tokens in NumericField,
especially as stricter schema is coming in next gen Lucene
- yet another reason to clearly separate fields meant for searching vs sorting
-- Sanne
On 5 August 2015 at 10:41, Gunnar Morling <gunnar(a)hibernate.org> wrote:
Hi,
I think a great solution for that would be to leverage "annotation
composition" as done e.g. in CDI and Bean Validation.
There would be an annotation @DocValuesField which will cause the creation
of a DocValues and which would expose attributes required for its
configuration:
@DocValuesField(name="foo", ... )
private String bar;
Besides using @DocValuesField itself directly, it would be usable as a
meta-annotation on "doc value annotation types":
@DocValuesField
public @interface SortField {
@OverridesAttribute(name="name")
String name();
}
And then its usage:
@SortField
private String bar;
Of course such doc value annotation types (I think @SortField, @Facet and
@DocumentId could be modelled as that) would only expose those degrees of
freedom needed for a specific use case.
Most users would only use these more abstract, easy-to-use specific purpose
annotations. But others could use @DocValuesField directly to create custom
doc values or they could even create their own, domain-specific doc value
annotation type.
Cheers,
--Gunnar
2015-08-04 18:00 GMT+02:00 Sanne Grinovero <sanne(a)hibernate.org>:
>
> Hi Guillaume,
> thanks! great input. Some comments inline:
>
> On 4 August 2015 at 15:11, Guillaume Smet <guillaume.smet(a)gmail.com>
> wrote:
> > Hi Sanne,
> >
> > On Wed, Jul 29, 2015 at 1:26 PM, Sanne Grinovero <sanne(a)hibernate.org>
> > wrote:
> >>
> >> I'm not sure if this should be extending the @Field annotation as
> >> there are special restrictions implied in terms of analysis: are we
> >> going to enforce a specific type of tokenizer, or simply take the
> >> analysis option away?
> >
> >
> > You can't remove the analysis option away: it's often used to normalize
> > sorting on strings (lowercase, remove accents, remove special characters
> > and
> > so on).
>
> Right we made this same example in a recent meeting we had on this same
> subject.
> So that's what makes it tricky: we want to allow Analysis, but while
> Lucene needs a strong guarantee that it will be unique, we can't
> really verify for that unless we take away the liberty to use any
> analyzer.
> An alternative would be to wrap the Analyzer to monitor and verify it
> to be "well-behaved" but I'm not sure if that's doable, or if the
> performance would be negligible. I guess we'll just put it into user's
> hands to make a sensible choice.. not that we've done better so far on
> this aspect.
>
> > FWIW, we use specific fields for sorting each time we need to sort on a
> > string as we don't want to tokenize the string (but not for numerics and
> > dates). Maybe @SortFields/@SortField annotations would be in order (I
> > don't
> > like Sortable as I don't think it's a good idea to use these fields for
> > search).
>
> I like that name proposal, and +1 to not encourage people to try reuse
> the same field for sorting and indexing.
>
> The next action for us is to verify what the performance impact is of
> the current approach, which is based on the UninvertingReader from
> lucene-misc. Gunnar pointed out that uninverting and loading into a
> FieldCache is not very different than what Lucene has been doing so
> far, so that might be a good strategy to allow migrating to Lucene 5
> incrementally, and provide an incremental improvement in this area
> rather than requiring the new mapping.
>
> I'll soon merge this approach, and as usual I'm lacking on real-world
> applications to benchmark so if you're interested in helping on that
> that would be awesome; we just need to know that the new code won't be
> significantly slower than the Lucene 4 based strategies for sorted
> queries.
>
> Thanks,
> Sanne
> _______________________________________________
> hibernate-dev mailing list
> hibernate-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/hibernate-dev