On 28/07/2015 13:13, Sanne Grinovero wrote:
# Sorting
To sort on a field will require an UninvertingReader to wrap the
cached IndexReaders, and the uninverting process is very inefficient.
On top of that, the result of the uninverting process is not
cacheable, so that will need to be repeated on each index, for each
query which is executed.
In short, I expect performance of sorted queries to be quite degraded
in our first milestone using Lucene 5, and we'll have to discuss how
to fix this.
Needless to say, fixing this is a blocking requirement before we can
consider the migration complete.
Sorting will not need an UninvertingReader if the target field has
been indexed as DocValues, but that implies:
- we'll need an explicit, upfront (indexing time) flag to be set
- we'll need to detect if the matching indexing options are
compatible with the runtime query to skip the uninverting process
This is mostly a job for Hibernate Search, but in terms of user
experience it means you have to mark fields for "sortability"
explicitly; will we need to extend the protobuf schema?
Please make sure we'll just have to hook in existing metadata, we
can't fix this after API freeze.
This is very important to get right. As a user, I'd honestly expect an
indexed field to also be used for sorting, so probably having some
per-index flag (off by default) which implicitly enables DocValues for
all indexed fields. Does uninverting offer any performance advantage
over the sorting we already do ? Sorting wouldn't help anyway in the
clustered query scenario, where you'd have to merge the results from
multiple nodes anyway (I guess there is some win in pre-sorting on each
node and then splicing the sorted sub-resultsets).
# Index encoding
As usual the index encoding evolves and the easy solution is to
rebuild it. Lucene 5 no longer ships with backwards compatible
de-coders, but these are available as separate dependencies. If you
feel the need to be able to read existing indexes, we should include
these.
Possibly make them optional. I guess our recommendation is to
mass-reindex anyway.
Tristan
--
Tristan Tarrant
Infinispan Lead
JBoss, a division of Red Hat