I like the idea of attempting to detect if it's the first run or a re-run. An UpdateLuceneWork has to perform a delete before the write; the fact that these are two operations rather than one is only one aspect of the problem: the main one being that a delete operation is far more expensive than a write operation. So one aspect of the problem is that we need to issue individual delete operations for each id, as we have to target the id to select the to-be-deleted data. Historically we refrained from adding any "hidden fields" to the index but this might be a strong case for adding one. Assuming the batch indexer could add a field with a "partition id" then the delete operation could target this one and perform the necessary cleanup in a single shot. I suspect some people might not like us to add more hidden fields - we already have the classname - so maybe this could be an opt-in strategy: if the user is willing to allow us to add such fields, we'll be able to perform faster recovery. |