[infinispan-issues] [JBoss JIRA] (ISPN-4650) MassIndexer should not use UpdateDocument when adding to Lucene

Gustavo Fernandes (JIRA) issues at jboss.org
Tue Aug 19 07:25:31 EDT 2014


     [ https://issues.jboss.org/browse/ISPN-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gustavo Fernandes updated ISPN-4650:
------------------------------------

    Description: 
The MassIndexer currently issues an Update operation to hibernate search backend, which in turn becomes a delete plus and add in the index.
Lucene buffers those deletes queries and during merge it tries to 'apply' those deletes wasting a massive amount of time doing seeks and queries unnecessarily.
Since the mass indexer wipes the index at the beginning, it should simply issue an add operation. Performance wise this make a huge difference: 

* indexing 50k documents brings down the indexing time from 195s to 33s
* indexing 200k documents brings down the indexing time from 600s to 55s





  was:
The MassIndexer currently issues an Update operation to hibernate search backend, which in turn becomes a delete plus and add in the index.
Lucene buffers those deletes queries and during merge it tries to 'apply' those deletes wasting a massive amount of time doing seeks and queries unnecessarily.
Since the mass indexer wipes the index at the beginning, it should simply issue an add operation. Performance wise this make a huge difference: 

* indexing 50k documents brings the final merge time from 195s to 33s
* indexing 200k documents brings the final merge time from 600s to 55s







> MassIndexer should not use UpdateDocument when adding to Lucene
> ---------------------------------------------------------------
>
>                 Key: ISPN-4650
>                 URL: https://issues.jboss.org/browse/ISPN-4650
>             Project: Infinispan
>          Issue Type: Feature Request
>      Security Level: Public(Everyone can see) 
>          Components: Embedded Querying
>    Affects Versions: 7.0.0.Beta1
>            Reporter: Gustavo Fernandes
>            Assignee: Gustavo Fernandes
>             Fix For: 7.0.0.Beta2
>
>
> The MassIndexer currently issues an Update operation to hibernate search backend, which in turn becomes a delete plus and add in the index.
> Lucene buffers those deletes queries and during merge it tries to 'apply' those deletes wasting a massive amount of time doing seeks and queries unnecessarily.
> Since the mass indexer wipes the index at the beginning, it should simply issue an add operation. Performance wise this make a huge difference: 
> * indexing 50k documents brings down the indexing time from 195s to 33s
> * indexing 200k documents brings down the indexing time from 600s to 55s



--
This message was sent by Atlassian JIRA
(v6.2.6#6264)


More information about the infinispan-issues mailing list