[hibernate-issues] [JIRA] (HSEARCH-3323) Search 6 groundwork - Restore support for scrolling

Friday, 17 July 2020

Yoann Rodière (
https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%...
) *updated* an issue

Hibernate Search (
https://hibernate.atlassian.net/browse/HSEARCH?atlOrigin=eyJpIjoiN2M0MWVm...
) / Task (
https://hibernate.atlassian.net/browse/HSEARCH-3323?atlOrigin=eyJpIjoiN2M...
) HSEARCH-3323 (
https://hibernate.atlassian.net/browse/HSEARCH-3323?atlOrigin=eyJpIjoiN2M...
) Search 6 groundwork - Restore support for scrolling (
https://hibernate.atlassian.net/browse/HSEARCH-3323?atlOrigin=eyJpIjoiN2M...
)

Change By: Yoann Rodière (
https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%...
)

h3. Goal

Restore the scroll feature exposed in Search 5 through
{{org.hibernate.search.query.hibernate.impl.FullTextQueryImpl#scroll()}}.

h3. API

All located in the {{org.hibernate.search.engine.search.query}} package.

{code}
public interface SearchFetchable {

// ... there's already some code here...

// Add this (+ javadoc):
// Throws IllegalArgumentException if passed 0 or less (see the class Contracts).
SearchScroll<H> scroll(Integer pageSize);

// Add this (+ javadoc):
// Throws IllegalArgumentException if passed 0 or less for pageSize (see the class
Contracts).
// Throws IllegalArgumentException if passed less than 0 for offset (see the class
Contracts).
// TODO maybe it's not possible to implement this efficiently for Elasticsearch (not
sure it accepts an offset when scrolling is enabled). In that case, remove this method.
SearchScroll<H> scroll(Integer offset, Integer pageSize);

}
{code}

{code}
// This will be used like this:
// try (SearchScroll<H> scroll = query.scroll(20)) {
//   for (SearchScrollResult<H> page = scroll.next(); page.hasHits(); page =
scroll.next()) {
//     List<H> hits = page.getHits();
//     // ... do something with the page ...
//   }
// }
public interface SearchScroll<H> extends AutoCloseable {

@Override
void close();

// TODO: javadoc
// Returns the next page, with at most "pageSize" hits ("pageSize"
defined in the call to query.scroll()).
// May return a result with less than "pageSize" elements if only that many hits
are left.
// This should *not* rely on pre-fetching. Fetching should happen when this method is
called, not before.
// This is necessary if we want to make it easy for users to clear the ORM session between
two pages.
// Note there is no "hasNext" method precisely because we do not do
pre-fetching.
SearchScrollResult<H> next();

}
{code}

{code}
public interface SearchScrollResult<H> {

// TODO: javadoc
// This returns true if there are still hits, false otherwise.
// Note hasHits() == true && getHits().isEmpty() *is possible*, in particular if
matching entities could not be found in the database.
// This methods is mainly useful as a stop condition in loops.
boolean hasHits();

// TODO: javadoc
List<H> getHits();

// TO BE CHECKED: these may not be implementable efficiently.
// First, let's check if Elasticsearch returns the total hit count/aggregations to the
first search API call when scrolling is enabled.
// If it does, let's check the performance impact... Getting this information might
require to execute the search query twice, in which case I'd rather not expose this
information here and require users to execute the search query twice, explicitly.
// Note that *if* we end up implementing these methods, they will return the same data for
every single page.
long getTotalHitCount();
<A> A getAggregation(AggregationKey<A> key);

// TO BE DISCUSSED: if we add this, it will probably be better to wrap this information
into a SearchExecutionMetadata object, and implement getLastExecutionMetadata() here.
// As a first step, I would not implement this and would just create a ticket about it.
Duration getTook();
boolean isTimedOut();

}
{code}

h3. To-do list

In order:

# Add APIs, with stub implementations (throw UnsupportedOperationException( "Not yet
implemented" );
## Ignore getTotalHitCount/getAggregation/getTook/isTimeout for now.
# Copy-paste
{{org.hibernate.search.integrationtest.backend.tck.search.query.SearchQueryFetchIT}} to
{{SearchQueryScrollIT}} and adapt it to test scrolling.
## Don't forget to test edge cases: not fetching any result (should work fine),
fetching some results but not all of them (should work fine), trying to fetch more than
the total hit count (should throw an exception).
## Don't forget to check that {{hasMoreHits()}} returns the correct information.
# Add tests for timeouts (failAfter/truncateAfter) when scrolling.
# Implement scrolling for the stub backend.
# Add tests to the ORM mapper. Will probably need to copy/paste
{{org.hibernate.search.integrationtest.mapper.orm.search.loading.SearchQueryEntityLoadingBaseIT}}
and adapt it to test loading when calling {{scroll()}} instead of just loading when
calling {{fetch()}}.
# Implement scrolling for Elasticsearch.
## This should be easy enough: the first call to fetch*() will execute a search work with
the {{scroll}} parameter set, the next calls with execute a scroll work (already
implemented, see
{{org.hibernate.search.elasticsearch.work.impl.factory.ElasticsearchWorkFactory#scroll}}).
## On close, we will execute a clearScroll work (already implemented, see
{{org.hibernate.search.elasticsearch.work.impl.factory.ElasticsearchWorkFactory#clearScroll}}).
# Implement scrolling for Lucene.
## Search 5 code will not be very useful in that regard, as it addresses a lot of problems
that are no longer relevant in Search 6.
## In the SearchScroll implementation we will need to keep around some of the context that
we currently store as local variables in {{LuceneSearcherImpl#search}}: the
{{IndexSearcher}} and the {{LuceneCollectors}} instance in particular.
## When calling {{next()}}:
### First we will need to update the topDocs if necessary: if the topDocs do not include
the next page, then update the topDocs
#### See {{org.hibernate.search.query.engine.impl.QueryHits#scoreDoc}} for how to decide
how many topDocs to retrieve
#### See phase 1 in
{{org.hibernate.search.backend.lucene.search.extraction.impl.LuceneCollectors#collect}},
but *only phase 1*
### Then we will need to collect information for the next page; see the call to
{{extractTopDocs}} and phase 2 in
{{org.hibernate.search.backend.lucene.search.extraction.impl.LuceneCollectors#collect}}.
## This may prove difficult, maybe let's organize a pair-programming session for
that?
# Add Lucene-specific extensions to Scrolling
## This is mainly necessary for Infinispan
## Expose a way to force Lucene to extract TopDocs up to a specific index and retrieve
them: {{LuceneSearchScroll#preloadTopDocsUpTo(), returns TopDocs}}
## Expose a way to load a specific document specified by its index:
{{LuceneSearchScroll#loadHitByIndex(), returns H}}
## Maybe we can improve on that later; ideally Infinispan should load multiple hits in one
call ({{LuceneSearchScroll#loadHitsByIndex(int ...), returns List<H>}}) otherwise
the cost of creating collectors for each retrieved hit will be a bit too much.
# Implement {{scroll()}} and {{scroll(ScrollMode)}} in {{HibernateOrmSearchQueryAdapter}},
relying on {{ scrollAll SearchQuery#scroll ( int )}} under the scene.
## Only {{ScrollMode.FORWARD_ONLY}} will be supported.
## We will need to decide on a page size. Let's use the same size as the loading fetch
size, which should be accessible from
{{org.hibernate.search.mapper.orm.search.loading.impl.MutableEntityLoadingOptions#getFetchSize}}.
## Some internal windowing will probably be necessary. Just copy/paste the
{{org.hibernate.search.elasticsearch.util.impl.Window}} class from Search 5 and adapt it.
Do not forget to also copy the unit test,
{{org.hibernate.search.elasticsearch.test.WindowTest}}.
## See {{org.hibernate.search.query.hibernate.impl.ScrollableResultsImpl}} for an example
of how it was done in Search 5 (may or may not be helpful).
# Add tests for {{scroll()}} and {{scroll(ScrollMode)}} in
{{org.hibernate.search.integrationtest.mapper.orm.hibernateormapis.ToHibernateOrmIT}}:
## Nominal case (create scroll, fetch some hits until all hits have been consumed,
close).
## Edge cases: not fetching any result (should work fine), fetching some results but not
all of them (should work fine), trying to fetch more than the total hit count (should
throw an exception).
## Error cases: trying to scroll back, trying to call the {{get*(int)}} methods...
## Check that using any scroll mode other than ScrollMode.FORWARD_ONLY fails.
## Test {{query.stream()}} too (it's based on {{scroll()}}).
# Add tests for {{getResultStream()}} in
{{org.hibernate.search.integrationtest.mapper.orm.hibernateormapis.ToJpaIT}}.
# Allow backends to extend the SearchScroll interfaces, like they currently do with
{{SearchQuery}} ({{ElasticsearchSearchQuery}}, {{LuceneSearchQuery}}):
## Add a generic parameter {{S extends SearchScroll<H>}} to
{{ExtendedSearchFetchable}} and override its {{scroll}} methods to return that type.
## Adapt the interfaces that extend {{ExtendedSearchFetchable}} as necessary.
## Create a new {{ExtendedSearchScroll<H>}} interface using the same principle.
## Create specific interfaces for Elasticsearch and Lucene: {{ElasticsearchSearchScroll}}
and {{LuceneSearchScroll}}.
## Implement these interfaces where appropriate.
## Test extensions for Lucene and Elasticsearch. Mainly, check that the scroll has the
correct type. See how it's done for SearchResult
in  {{org.hibernate.search.integrationtest.backend.elasticsearch.ElasticsearchExtensionIT#query}}.
# Add getTotalHitCount/getAggregation to APIs if relevant and implement them.
# Add getTook/isTimeout to APIs if relevant and implement them.

(
https://hibernate.atlassian.net/browse/HSEARCH-3323#add-comment?atlOrigin...
) Add Comment (
https://hibernate.atlassian.net/browse/HSEARCH-3323#add-comment?atlOrigin...
)

Get Jira notifications on your phone! Download the Jira Cloud app for Android (
https://play.google.com/store/apps/details?id=com.atlassian.android.jira....
) or iOS (
https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=Em...
) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100133- sha1:d093d11 )

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[hibernate-issues] [JIRA] (HSEARCH-3323) Search 6 groundwork - Restore support for scrolling