[JBoss JIRA] (ISPN-4650) MassIndexer should not use UpdateDocument when adding to Lucene

Wednesday, 20 August 2014

     [
https://issues.jboss.org/browse/ISPN-4650?page=com.atlassian.jira.plugin....
]

Gustavo Fernandes updated ISPN-4650:
------------------------------------

    Description: 
The MassIndexer currently issues a Delete plus and Add operation to hibernate search
backend.
Lucene buffers those deletes queries and during merge it tries to 'apply' those
deletes wasting a massive amount of time doing seeks and queries unnecessarily.
Since the mass indexer wipes the index at the beginning, it should simply issue an add
operation (or at least rely on Lucene atomic IndexWriter.updateDocument). Performance wise
this make a huge difference: 

* indexing 50k documents brings down the indexing time from 195s to 33s
* indexing 200k documents brings down the indexing time from 600s to 55s

  was:
The MassIndexer currently issues a Delete plus and Add operation to hibernate search
backend.
Lucene buffers those deletes queries and during merge it tries to 'apply' those
deletes wasting a massive amount of time doing seeks and queries unnecessarily.
Since the mass indexer wipes the index at the beginning, it should simply issue an add
operation. Performance wise this make a huge difference: 

* indexing 50k documents brings down the indexing time from 195s to 33s
* indexing 200k documents brings down the indexing time from 600s to 55s

...
 MassIndexer should not use UpdateDocument when adding to Lucene
 ---------------------------------------------------------------

                 Key: ISPN-4650
                 URL: https://issues.jboss.org/browse/ISPN-4650
             Project: Infinispan
          Issue Type: Enhancement
      Security Level: Public(Everyone can see) 
          Components: Embedded Querying
    Affects Versions: 7.0.0.Beta1
            Reporter: Gustavo Fernandes
            Assignee: Gustavo Fernandes
             Fix For: 7.0.0.Beta2

 The MassIndexer currently issues a Delete plus and Add operation to hibernate search
backend.
 Lucene buffers those deletes queries and during merge it tries to 'apply' those
deletes wasting a massive amount of time doing seeks and queries unnecessarily.
 Since the mass indexer wipes the index at the beginning, it should simply issue an add
operation (or at least rely on Lucene atomic IndexWriter.updateDocument). Performance wise
this make a huge difference: 
 * indexing 50k documents brings down the indexing time from 195s to 33s
 * indexing 200k documents brings down the indexing time from 600s to 55s 

--
This message was sent by Atlassian JIRA
(v6.2.6#6264)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009