| Currently we have something like this in LuceneSearcherImpl:
BooleanQuery booleanQuery = LuceneNestedQueries.findChildQuery( nestedDocumentPaths, requestContext.getLuceneQuery() );
try {
ArrayList<Collector> luceneCollectors = new ArrayList<>();
LuceneChildrenCollector childrenCollector = new LuceneChildrenCollector();
luceneCollectors.add( childrenCollector );
luceneCollectors.addAll( collectorsForChildren );
indexSearcher.search( booleanQuery, MultiCollector.wrap( luceneCollectors ) );
return childrenCollector.getChildren();
}
catch (IOException e) {
throw log.errorFetchingNestedDocuments( booleanQuery, e );
}
And findChildQuery does this:
public static BooleanQuery findChildQuery(Set<String> nestedDocumentPaths, Query originalParentQuery) {
QueryBitSetProducer parentsFilter = new QueryBitSetProducer( LuceneQueries.mainDocumentQuery() );
ToChildBlockJoinQuery parentQuery = new ToChildBlockJoinQuery( originalParentQuery, parentsFilter );
return new BooleanQuery.Builder()
.add( parentQuery, BooleanClause.Occur.MUST )
.add( createNestedDocumentPathSubQuery( nestedDocumentPaths ), BooleanClause.Occur.FILTER )
.add( LuceneQueries.childDocumentQuery(), BooleanClause.Occur.FILTER )
.build();
}
This code doesn't take the top docs into account, so we will always retrieve the children for all documents, even if the query was limited to the 20 top documents. This may add up and lead to poor performance in large indexes, where we would retrieve the children of millions of documents. The current version of findChildQuery is fine for sorts and aggregations, where we need to inspect all documents, but we should definitely add another version that only returns the IDs of a specific subset of documents, for use in |