[
https://issues.jboss.org/browse/ISPN-4650?page=com.atlassian.jira.plugin....
]
Gustavo Fernandes updated ISPN-4650:
------------------------------------
Description:
The MassIndexer currently issues a Delete plus and Add operation to hibernate search
backend.
Lucene buffers those deletes queries and during merge it tries to 'apply' those
deletes wasting a massive amount of time doing seeks and queries unnecessarily.
Since the mass indexer wipes the index at the beginning, it should simply issue an add
operation. Performance wise this make a huge difference:
* indexing 50k documents brings down the indexing time from 195s to 33s
* indexing 200k documents brings down the indexing time from 600s to 55s
was:
The MassIndexer currently issues an Update operation to hibernate search backend, which in
turn becomes a delete plus and add in the index.
Lucene buffers those deletes queries and during merge it tries to 'apply' those
deletes wasting a massive amount of time doing seeks and queries unnecessarily.
Since the mass indexer wipes the index at the beginning, it should simply issue an add
operation. Performance wise this make a huge difference:
* indexing 50k documents brings down the indexing time from 195s to 33s
* indexing 200k documents brings down the indexing time from 600s to 55s
MassIndexer should not use UpdateDocument when adding to Lucene
---------------------------------------------------------------
Key: ISPN-4650
URL:
https://issues.jboss.org/browse/ISPN-4650
Project: Infinispan
Issue Type: Enhancement
Security Level: Public(Everyone can see)
Components: Embedded Querying
Affects Versions: 7.0.0.Beta1
Reporter: Gustavo Fernandes
Assignee: Gustavo Fernandes
Fix For: 7.0.0.Beta2
The MassIndexer currently issues a Delete plus and Add operation to hibernate search
backend.
Lucene buffers those deletes queries and during merge it tries to 'apply' those
deletes wasting a massive amount of time doing seeks and queries unnecessarily.
Since the mass indexer wipes the index at the beginning, it should simply issue an add
operation. Performance wise this make a huge difference:
* indexing 50k documents brings down the indexing time from 195s to 33s
* indexing 200k documents brings down the indexing time from 600s to 55s
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)