[infinispan-issues] [JBoss JIRA] (ISPN-5103) Inefficient index updates cause high cost merges and increase overall latency

Sanne Grinovero (JIRA) issues at jboss.org
Wed Jan 7 11:53:29 EST 2015


    [ https://issues.jboss.org/browse/ISPN-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030615#comment-13030615 ] 

Sanne Grinovero commented on ISPN-5103:
---------------------------------------

right, but it uses the {{Stored}} flag so it's quite more expensive (both time and space) than most other fields so I'd keep it in mind for future polishing.

> Inefficient index updates cause high cost merges and increase overall latency
> -----------------------------------------------------------------------------
>
>                 Key: ISPN-5103
>                 URL: https://issues.jboss.org/browse/ISPN-5103
>             Project: Infinispan
>          Issue Type: Enhancement
>          Components: Embedded Querying
>    Affects Versions: 7.0.2.Final, 7.1.0.Alpha1
>            Reporter: Gustavo Fernandes
>            Assignee: Gustavo Fernandes
>
> Currently every change to the index is done Lucene-wise combining two operations:
> * Delete by query, using a boolean query on the id plus the entity class
> * Add 
>  
> Under high load, specially during merges those numerous deletes provoke very long delays causing high latency. 
> We should instead use a simple Lucene Update to add/change documents, since internally it translates to a Delete by term plus an Add operation, and delete by terms are extremely efficient in Lucene.
> Some local tests showed average latency of updating the index using this strategy to drop 4 times, both for the SYNC and ASYNC  backends
> With relation to sharing the index between entities, which was the original motivation of the Delete by query plus add strategy, we have two scenarios:
> * Same cache with multiple entity types: that's a non-issue, since obviously there's no id collision in this case
> * Different caches with the same index: this scenario happens when different caches shares the same index, for ex:
> {code}
> @Indexed(indexName=common)
> public class Country { ... }
> @Indexed(indexName=common)
> public class Currency { ... }
> cm.getCache("currencies").put(1, new Currency(...))
> cm.getCache("countries").put(1, new Country(...))
> {code}
> This would require a delete by query in order to persist both a Country and a Currency with id=1.
> It would also require setting "default.exclusive_index_use", "false", with the associated cost of having to reopen the IndexWriter on every operation.
> Given the performance gain of doing a simple Update is considerable, we should make the corner case supported by extra configuration or alternatively,  generate a unique @ProvidedId, including the entity class or the cache name that work for all cases described above.



--
This message was sent by Atlassian JIRA
(v6.3.11#6341)


More information about the infinispan-issues mailing list