[infinispan-issues] [JBoss JIRA] (ISPN-5103) Inefficient index updates cause high cost merges and increase overall latency

Wed Jan 7 12:43:30 EST 2015

    [ https://issues.jboss.org/browse/ISPN-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030588#comment-13030588 ] 

Sanne Grinovero edited comment on ISPN-5103 at 1/7/15 12:43 PM:
----------------------------------------------------------------

As a reminder of our IRC chat:
it seems this would be safe as Infinispan Query handles the identifiers in a slightly different way than Hibernate Search / ORM : the {{org.infinispan.query.Transformer}} guarantees a unique relation between each encoded id and makes it possible to reconstruct from it not only the ID instance but also the {{Transformer}} implementation to be used for the transformation itself.
A similar optimisation wouldn't be always safe in ORM world as the {{org.hibernate.search.bridge.TwoWayFieldBridge}} needs to be known, or we'd need to be sure that different {{FieldBridge}} implementations would encode values in some unique way.
For example we'd have a problem when mapping Integer(5) and Long(5) using keyword encoding as they would both map to "5", which could be back-mapped to either the Integer or the Long version, hence deleting the other entry from the index as well.

was (Author: sannegrinovero):
As a reminder of our IRC chat:
it seems this would be safe as Infinispan Query handles the identifiers in a slightly different way than Hibernate Search / ORM : the {{org.infinispan.query.Transformer}} guarantees a unique relation between each encoded id and makes it possible to reconstruct from it not only the ID instance but also the {{Transformer}} implementation to be used for the transformation itself.
A similar optimisation wouldn't be always safe in ORM world as the {{org.hibernate.search.bridge.TwoWayFieldBridge}} needs to be known, or we'd need to be sure that different {{FieldBridge}} implementations would encode values in some unique way.
For example we'd have a problem when mapping Integer(5) and Long(5) using keyword encoding as they would both map to "5".

> Inefficient index updates cause high cost merges and increase overall latency
> -----------------------------------------------------------------------------
>
>                 Key: ISPN-5103
>                 URL: https://issues.jboss.org/browse/ISPN-5103
>             Project: Infinispan
>          Issue Type: Enhancement
>          Components: Embedded Querying
>    Affects Versions: 7.0.2.Final, 7.1.0.Alpha1
>            Reporter: Gustavo Fernandes
>            Assignee: Gustavo Fernandes
>
> Currently every change to the index is done Lucene-wise combining two operations:
> * Delete by query, using a boolean query on the id plus the entity class
> * Add 
>  
> Under high load, specially during merges those numerous deletes provoke very long delays causing high latency. 
> We should instead use a simple Lucene Update to add/change documents, since internally it translates to a Delete by term plus an Add operation, and delete by terms are extremely efficient in Lucene.
> Some local tests showed average latency of updating the index using this strategy to drop 4 times, both for the SYNC and ASYNC  backends
> With relation to sharing the index between entities, which was the original motivation of the Delete by query plus add strategy, we have two scenarios:
> * Same cache with multiple entity types: that's a non-issue, since obviously there's no id collision in this case
> * Different caches with the same index: this scenario happens when different caches shares the same index, for ex:
> {code}
> @Indexed(indexName=common)
> public class Country { ... }
> @Indexed(indexName=common)
> public class Currency { ... }
> cm.getCache("currencies").put(1, new Currency(...))
> cm.getCache("countries").put(1, new Country(...))
> {code}
> This would require a delete by query in order to persist both a Country and a Currency with id=1.
> It would also require setting "default.exclusive_index_use", "false", with the associated cost of having to reopen the IndexWriter on every operation.
> Given the performance gain of doing a simple Update is considerable, we should make the corner case supported by extra configuration or alternatively,  generate a unique @ProvidedId, including the entity class or the cache name that work for all cases described above.

--
This message was sent by Atlassian JIRA
(v6.3.11#6341)