[infinispan-issues] [JBoss JIRA] (ISPN-4650) MassIndexer should not use UpdateDocument when adding to Lucene
Gustavo Fernandes (JIRA)
issues at jboss.org
Wed Aug 20 13:55:30 EDT 2014
[ https://issues.jboss.org/browse/ISPN-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gustavo Fernandes updated ISPN-4650:
------------------------------------
Description:
The MassIndexer currently issues a Delete plus and Add operation to hibernate search backend.
Lucene buffers those deletes queries and during merge it tries to 'apply' those deletes wasting a massive amount of time doing seeks and queries unnecessarily.
Since the mass indexer wipes the index at the beginning, it should simply issue an add operation (or at least rely on Lucene atomic IndexWriter.updateDocument). Performance wise this make a huge difference:
* indexing 50k documents brings down the indexing time from 195s to 33s
* indexing 200k documents brings down the indexing time from 600s to 55s
was:
The MassIndexer currently issues a Delete plus and Add operation to hibernate search backend.
Lucene buffers those deletes queries and during merge it tries to 'apply' those deletes wasting a massive amount of time doing seeks and queries unnecessarily.
Since the mass indexer wipes the index at the beginning, it should simply issue an add operation. Performance wise this make a huge difference:
* indexing 50k documents brings down the indexing time from 195s to 33s
* indexing 200k documents brings down the indexing time from 600s to 55s
> MassIndexer should not use UpdateDocument when adding to Lucene
> ---------------------------------------------------------------
>
> Key: ISPN-4650
> URL: https://issues.jboss.org/browse/ISPN-4650
> Project: Infinispan
> Issue Type: Enhancement
> Security Level: Public(Everyone can see)
> Components: Embedded Querying
> Affects Versions: 7.0.0.Beta1
> Reporter: Gustavo Fernandes
> Assignee: Gustavo Fernandes
> Fix For: 7.0.0.Beta2
>
>
> The MassIndexer currently issues a Delete plus and Add operation to hibernate search backend.
> Lucene buffers those deletes queries and during merge it tries to 'apply' those deletes wasting a massive amount of time doing seeks and queries unnecessarily.
> Since the mass indexer wipes the index at the beginning, it should simply issue an add operation (or at least rely on Lucene atomic IndexWriter.updateDocument). Performance wise this make a huge difference:
> * indexing 50k documents brings down the indexing time from 195s to 33s
> * indexing 200k documents brings down the indexing time from 600s to 55s
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
More information about the infinispan-issues
mailing list