Message Title

Issue Type:	Improvement
Assignee:	Yoann Rodière
Components:	backend-lucene
Created:	10/Dec/2019 02:18 AM
Fix Versions:	6.0.0.Beta-backlog-high-priority
Priority:	Major
Reporter:	Yoann Rodière

Currently we have something like this in LuceneSearcherImpl:

 
                                                                		BooleanQuery booleanQuery = LuceneNestedQueries.findChildQuery( nestedDocumentPaths, requestContext.getLuceneQuery() );

		try {
			ArrayList<Collector> luceneCollectors = new ArrayList<>();
			LuceneChildrenCollector childrenCollector = new LuceneChildrenCollector();
			luceneCollectors.add( childrenCollector );
			luceneCollectors.addAll( collectorsForChildren );

			indexSearcher.search( booleanQuery, MultiCollector.wrap( luceneCollectors ) );
			return childrenCollector.getChildren();
		}
		catch (IOException e) {
			throw log.errorFetchingNestedDocuments( booleanQuery, e );
		}
 
                                                            

And findChildQuery does this:

 
                                                                	public static BooleanQuery findChildQuery(Set<String> nestedDocumentPaths, Query originalParentQuery) {
		QueryBitSetProducer parentsFilter = new QueryBitSetProducer( LuceneQueries.mainDocumentQuery() );
		ToChildBlockJoinQuery parentQuery = new ToChildBlockJoinQuery( originalParentQuery, parentsFilter );

		return new BooleanQuery.Builder()
				.add( parentQuery, BooleanClause.Occur.MUST )
				.add( createNestedDocumentPathSubQuery( nestedDocumentPaths ), BooleanClause.Occur.FILTER )
				.add( LuceneQueries.childDocumentQuery(), BooleanClause.Occur.FILTER )
				.build();
	}
 
                                                            

This code doesn't take the top docs into account, so we will always retrieve the children for all documents, even if the query was limited to the 20 top documents.
This may add up and lead to poor performance in large indexes, where we would retrieve the children of millions of documents.

The current version of findChildQuery is fine for sorts and aggregations, where we need to inspect all documents, but we should definitely add another version that only returns the IDs of a specific subset of documents, for use in

Add Comment

Get Jira notifications on your phone! Download the Jira Cloud app for Android or iOS

This message was sent by Atlassian Jira