[
https://issues.jboss.org/browse/ISPN-4650?page=com.atlassian.jira.plugin....
]
Sanne Grinovero commented on ISPN-4650:
---------------------------------------
The finding is certainly correct, I've witnesses as will significant slowdowns caused
by delete operations.
But for the MassIndexer to ignore put operations which could be triggered "in
parallel" has some dangers, you'd need to carefully test this and see if you
don't need to queue up update operations which are triggered by normal transactional
operations while the MassIndexer is running.
I suspect you'll need to make sure that any update operation triggered by cache write
operations needs to be applied after the MassIndexer is done.
For sake of simplicity, you could make this an option which is disabled by default and
properly explains the drawback of enabling it: it would be perfectly fine for a batch
upload if the user simply knows because of how his application is designed, that no
concurrent updates will be applied to the cache while the indexer is running.
Alternatively rather than using a queue - which poses questions on where to store it and
how to limit it from uncontrolled size - is to go with the optimistic approach and
automatically switch to the safer approach if any other indexing event is generated.
MassIndexer should not use UpdateDocument when adding to Lucene
---------------------------------------------------------------
Key: ISPN-4650
URL:
https://issues.jboss.org/browse/ISPN-4650
Project: Infinispan
Issue Type: Enhancement
Security Level: Public(Everyone can see)
Components: Embedded Querying
Affects Versions: 7.0.0.Beta1
Reporter: Gustavo Fernandes
Assignee: Gustavo Fernandes
Fix For: 7.0.0.Beta2
The MassIndexer currently issues an Update operation to hibernate search backend, which
in turn becomes a delete plus and add in the index.
Lucene buffers those deletes queries and during merge it tries to 'apply' those
deletes wasting a massive amount of time doing seeks and queries unnecessarily.
Since the mass indexer wipes the index at the beginning, it should simply issue an add
operation. Performance wise this make a huge difference:
* indexing 50k documents brings down the indexing time from 195s to 33s
* indexing 200k documents brings down the indexing time from 600s to 55s
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)