To summarize the situation, we need to:
- Document the current limitation about HQL/JPQL, where the parallelism is disabled (only a single partition is used); there's no query validation; the query order is not guaranteed; and the checkpoint is ignored—in case of job restart, we don't restart from the checkpoint, but from the very beginning instead.
- Keep the current implementation about HQL, which is a single partition.
- The current implementation does not work with checkpoints, and I don't know how to achieve it... I suppose the only solution is to intercept the HQL, and add the checkpoint into the WHERE clause of the query.
- Ensure the ability to restart the job correctly under HQL/JPQL. The current implementation creates duplicate Lucene documents in case of restart, because there's no purge of indexed documents, and checkpoint does not work. My proposition is to use UpdateLuceneWork instead of AddLuceneWork as a workaround.
- Clarify the questionable parameter maxResults:
- Explain that it is NOT only used for customized index scopes (HQL/Criteria), but is used to all index scopes (Full/HQL/Criteria).
- There's no limit by default
|