[infinispan-dev] Lucene 5 is coming: pitfalls to consider

Tue Jul 28 08:09:50 EDT 2015

On 28/07/2015 13:13, Sanne Grinovero wrote:
> # Sorting
> To sort on a field will require an UninvertingReader to wrap the
> cached IndexReaders, and the uninverting process is very inefficient.
> On top of that, the result of the uninverting process is not
> cacheable, so that will need to be repeated on each index, for each
> query which is executed.
> In short, I expect performance of sorted queries to be quite degraded
> in our first milestone using Lucene 5, and we'll have to discuss how
> to fix this.
> Needless to say, fixing this is a blocking requirement before we can
> consider the migration complete.
>
> Sorting will not need an UninvertingReader if the target field has
> been indexed as DocValues, but that implies:
>   - we'll need an explicit, upfront (indexing time) flag to be set
>   - we'll need to detect if the matching indexing options are
> compatible with the runtime query to skip the uninverting process
>
> This is mostly a job for Hibernate Search, but in terms of user
> experience it means you have to mark fields for "sortability"
> explicitly; will we need to extend the protobuf schema?
>
> Please make sure we'll just have to hook in existing metadata, we
> can't fix this after API freeze.

This is very important to get right. As a user, I'd honestly expect an 
indexed field to also be used for sorting, so probably having some 
per-index flag (off by default) which implicitly enables DocValues for 
all indexed fields. Does uninverting offer any performance advantage 
over the sorting we already do ? Sorting wouldn't help anyway in the 
clustered query scenario, where you'd have to merge the results from 
multiple nodes anyway (I guess there is some win in pre-sorting on each 
node and then splicing the sorted sub-resultsets).

> # Index encoding
> As usual the index encoding evolves and the easy solution is to
> rebuild it. Lucene 5 no longer ships with backwards compatible
> de-coders, but these are available as separate dependencies. If you
> feel the need to be able to read existing indexes, we should include
> these.

Possibly make them optional. I guess our recommendation is to 
mass-reindex anyway.

Tristan

-- 
Tristan Tarrant
Infinispan Lead
JBoss, a division of Red Hat