[infinispan-issues] [JBoss JIRA] (ISPN-4650) MassIndexer should not use UpdateDocument when adding to Lucene

Sanne Grinovero (JIRA) issues at jboss.org
Fri Aug 29 10:34:00 EDT 2014


    [ https://issues.jboss.org/browse/ISPN-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996993#comment-12996993 ] 

Sanne Grinovero commented on ISPN-4650:
---------------------------------------

{quote} I'd like to explore a 3rd alternative besides using auto switch and queues, which is relying on UpdateExtWorkDelegate. Some local tests demonstrated this kind of delegate to be closer performance wise to the AddWorkDelegate. I'm aware the implications of its usage (being recommended if keys are unique in an index), and I think infinispan ticks the boxes, doesn't it?{quote}

It's not too uncommon for people to share a same index for different types. We would need to make sure that such configurations are not possible; maybe that's a clever logic which could be applied by the automatic configuration approach you've been working on?
But if the user is allowed to change all the configuration options as he can do today, this is unfortunately not a safe assumption.

An alternative could be to always perform both the ADD and the REMOVE operations, but allowing IndexWriter commits and IndexReader refreshes to skip the flush on the delete operations: when Query returns a List of matches, it's supposed going to ignore the matches which it won't load.
This doesn't work when projections are being used (But we can know when they are been used) and might break some pagination operations: in Search (ORM) the pagination is compensated for by adding some entries from the next page, but the exact indexes of page boundaries might be inaccurate.


> MassIndexer should not use UpdateDocument when adding to Lucene
> ---------------------------------------------------------------
>
>                 Key: ISPN-4650
>                 URL: https://issues.jboss.org/browse/ISPN-4650
>             Project: Infinispan
>          Issue Type: Enhancement
>      Security Level: Public(Everyone can see) 
>          Components: Embedded Querying
>    Affects Versions: 7.0.0.Beta1
>            Reporter: Gustavo Fernandes
>            Assignee: Gustavo Fernandes
>             Fix For: 7.0.0.Beta2
>
>
> The MassIndexer currently causes a Delete plus and Add operation to hibernate search backend.
> Lucene buffers those deletes queries and during merge it tries to 'apply' those deletes wasting a massive amount of time doing seeks and queries unnecessarily.
> Since the mass indexer wipes the index at the beginning, it should simply issue an add operation (or at least rely on Lucene atomic IndexWriter.updateDocument). Performance wise this make a huge difference: 
> * indexing 50k documents brings down the indexing time from 195s to 33s
> * indexing 200k documents brings down the indexing time from 600s to 55s



--
This message was sent by Atlassian JIRA
(v6.3.1#6329)


More information about the infinispan-issues mailing list