See:
* https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#configuration-reader-strategy * https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#_near_real_time * https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#_custom_2 * https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#configuration-worker * Some of the options mentioned in https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#lucene-indexing-performance
See also occurrences of this JIRA ticket's key in the source code.
This is a complex issue and these configuration options are not completely independent, so I'm keeping all this in a single ticket.
IMPORTANT: let's check with Sanne before working on this.
h3. Reader strategy
{{not-shared}} is what we have currently implemented in Search 6, but it probably doesn't make sense anymore. Let's replace it with {{shared}} as the default implementation and introduce an option to enable asynchronous readers (default will remain synchronous).
h3. Async worker and commit policy
In Search 5, it was possible to configure each index's "worker" as async. This implied two things:
# The "synchronicity" of automatic indexing: the user wouldn't wait for indexing works to be applied to the index writers upon transaction commit, since everthing everything would happen in a background thread. # The commit policy: the index writer would not commit after each workset, but at a regular time interval set by {{index_flush_interval}} (by default 1000ms).
In Search 6, #1 is configurable differently through the automatic indexing synchronization strategy; my point being this part is not relevant anymore. However, #2 (the commit policy) is still something missing. In Search 6, works are committed immediately after being applied if the mapper requested it (DocumentCommitStrategy.FORCE), or at the end of a batch (i.e. every ~500 worksets or as soon as the work queue is empty, whatever happens first).
We could imagine introducing configuration options regarding how often commits should be executed:
* every X worksets ("maximum batch size") * every X milliseconds ("refresh interval")
I'm not sure how we should present this to the user, however.
h3. Writers and near-real-time
Near-real-time is something we need for Infinispan. However, I think using near-real-time for writes affects:
* how readers behave (since they have access to the data from the not-yet-committed writer) * the commit policy (apparently it's forced to periodic commits?)
We should be careful to structure the configuration in a way that does not even offer incompatible options. One question is: shouldn't near-real-time be enabled as soon as the user asks for commits to be performed periodically instead of immediately?
h3. Custom index manager
I don't think there's any clear use case for this. Let's drop it and introduce proper SPIs later if we discover a clear use case.
h3. Exclusive index use
{{hibernate.search.[default|<indexname>].exclusive_index_use}}
This essentially means we commit, close the index writer and release the locks regularly (~after each workset).
Does this even make sense anymore? Are there legitimate use cases for non-exclusive index use?
h2. Decisions (from last discussion)
h3. Readers/writers
* Make NRT writer and async reader the default * Expose only three settings: ** {{readwrite.strategy : nrt/debug}} *** Debug is the old “default” writer strategy + the “not shared” reader strategy. Not tested very thoroughly, just for debug, basically you’re on your own if it doesn’t work. ** {{readwrite.refresh_interval}} 0, 1, 10,... (in millis) *** Default to 0 or 1? ** {{readwrite.max_flush_interval}} 0, 1, 10, … (in millis) *** We force a write to disk after the interval *** IMPORTANT: a commit might not be enough in the case of the NRT writer. Might need an extra call to something else? *** Default to 0 ** Names to be determined * IMPORTANT: some utilities related to shared readers ended up in Lucene; let’s try to use these instead of Search 5 code
h3. Automatic indexing synchronization strategy:
* Commit is ambiguous: people will assume it means changes are visible, but they are not. Let’s find another name? “Sync” is not great... * “Searchable” will no longer imply a commit * Introduce “committed_searchable”
h3. Custom index manager
This was not discussed. Let's drop support for this.
h3. Exclusive index use
This was not discussed. Let's create a separate ticket targeting 6.x. |
|