These sizes are currently hardcoded:
- The maximum bulk size in the Elasticsearch backend
- The maximum number of worksets per batch in the serial orchestrator of the Elasticsearch backend
- The maximum number of worksets per batch in the parallel orchestrator of the Elasticsearch backend
- The maximum capacity of the workset queue in the serial orchestrator of the Elasticsearch backend
- The maximum capacity of the workset queue in the parallel orchestrator of the Elasticsearch backend
- The maximum number of worksets per batch in the write orchestrator of the Lucene indexes
- The maximum capacity of the workset queue in the write orchestrator of the Lucene indexes
We should address two problems:
- The hardcoded sizes may not be very good. For example we allow 5000 worksets to queue up for execution in the parallel orchestrator of the Elasticsearch backend. Each workset might contain several works. A tad too much, maybe?
- Even if we make the default values better, they'll never fit every use case. Users should be able to change them through configuration properties.
For workset queues, keep in mind the queue should be at least equal to (estimated number of user threads)*(estimated number of worksets created by each thread). For workset queues, we might want to allow to set the capacity to "unlimited" for people who'd rather get an OOM error than block because the queue is full. For infinite capacity, a linked list as implemented in org.hibernate.search.backend.impl.lucene.MultiWriteDrainableLinkedList might help. Note there was a configuration option for the maximum work queue length of the async executor in Search 5: see org.hibernate.search.indexes.impl.PropertiesParseHelper#extractMaxQueueSize. The sync executor, however, had a queue of unlimited capacity (a linked list). |