| Currently, we implement projections by adding collectors next to the TopDocsCollector. The problem with this strategy is that collectors will then be applied to all documents in the index. It's not even just the competitive documents (those that have a score higher than the lowest document in the priority queue when they are visisted): as we can see in MultiCollector, joining multiple collectors together will disable score-based optimizations that would allow skipping some documents along the way. As a result, the distance collector for example will need to store in memory as many results as the total number of documents in the index. Which is ridiculous. We should switch to a two-phase approach:
- First phase: search.search() call that inspects all documents to collect the top docs and their score (TopDocsCollector), and if necessary the aggregations (FacetsCollector).
- Second phase: explicit collection that inspects only top docs to extract data from docvalues (DistanceCollector) or from storage (reader.document(...) using the StoredFieldVisitor: even if the javadoc advises against it, in this case it would be fine). Maybe we can use collectors, but a different abstraction would be fine, since we do not need to perform a search, but rather to inspect a pre-determined set of documents.
Note that solving this ticket should fix HSEARCH-3786 In Progress . |