https://github.com/mincong-h/gsoc-hsearch/issues/146
Remaining issues:
# ~~ Shouldn't we expect the HQL/Criteria to produce an ordered list of IDs instead of what's done currently? It would in particular allow to use partitioning, and it doesn't seem much harder to use. ~~ => Actually no, that's a bad idea, since we're not sure *all* IDs in the resulting bounds would be relevant. On the other hand, if we passed an offset and a limit to partitions, instead of passing a first and last ID, we would solve lots of practical issues we're having... (partitioning for HQL/Criteria, support for embedded IDs, ...). But maybe there was a particular reason to do it that way? # Can this work with checkpoints? We should fix HSEARCH-2616, add checkpoints for the HQL and Criteria cases and test it # The "maxResults" parameter is questionable: * why would we only use it when HQL/Criteria is used? Couldn't we simply have something similar to {{org.hibernate.search.MassIndexer.limitIndexedObjectsTo(long)}}? * why is the limit arbitrarily set to 1 million by default? This could come as a surprise to users. # The following limitations should be documented: * There's not query validation before the job's start. If the query is invalid, then the job will failed to process the partition plan on the 2nd step produceLuceneDoc. * The partitioning is disabled in this HQL approach. Why? Because allowing the parallel execution requires an ordered-by-entity-ID selection (through criteria or HQL), so that PartitionMapper can scroll this selection projected on ID and split it into multiple sub-selections. In each sub-selection, the bounds are limited by the lower bound and the upper bound. However, in HQL approach, the query is given by the user. There's no guarantee for any order. |
|