[
https://issues.jboss.org/browse/ISPN-4650?page=com.atlassian.jira.plugin....
]
Gustavo Fernandes edited comment on ISPN-4650 at 8/20/14 1:27 PM:
------------------------------------------------------------------
The auto switch ended up being trickier than it appears. Considering that IndexMappers are
sent over the wire to all nodes, and each node has its own query interceptor, and indexing
operations can be async depending on the stack used, and the mass indexer can be started
from any node (the same applies to regular writes) there are plenty of opportunities for
race conditions to provoke duplicate documents if using the MassIndexer at the same time
as normal operations.
I'd like to explore a 3rd alternative besides using auto switch and queues, which is
relying on UpdateExtWorkDelegate. Some local tests demonstrated this kind of delegate to
be very closer performance wise to the AddWorkDelegate. I'm aware the implications of
its usage (being recommended if keys are unique in an index), and I think infinispan
ticks the boxes, doesn't it?
was (Author: gustavonalle):
The auto switch ended up being trickier than it appears. Considering that IndexMappers are
sent over the wire to all nodes, and each node has its own query interceptor, and indexing
operations can be async depending on the stack used, and the mass indexer can be started
from any node (the same applies to regular writes) there are plenty of opportunities for
race conditions to provoke duplicate documents if using the MassIndexer at the same time
as normal operations.
I'd like to explore a 3rd alternative besides using auto switch and queues, which is
relying on UpdateExtWorkDelegate. Some local tests demonstrated this kind of delegate to
be very close performance wise to the AddWorkDelegate. I'm aware the implications of
its usage (being recommended if keys are unique in an index), and I think infinispan
ticks the boxes, doesn't it?
MassIndexer should not use UpdateDocument when adding to Lucene
---------------------------------------------------------------
Key: ISPN-4650
URL:
https://issues.jboss.org/browse/ISPN-4650
Project: Infinispan
Issue Type: Enhancement
Security Level: Public(Everyone can see)
Components: Embedded Querying
Affects Versions: 7.0.0.Beta1
Reporter: Gustavo Fernandes
Assignee: Gustavo Fernandes
Fix For: 7.0.0.Beta2
The MassIndexer currently issues an Update operation to hibernate search backend, which
in turn becomes a delete plus and add in the index.
Lucene buffers those deletes queries and during merge it tries to 'apply' those
deletes wasting a massive amount of time doing seeks and queries unnecessarily.
Since the mass indexer wipes the index at the beginning, it should simply issue an add
operation. Performance wise this make a huge difference:
* indexing 50k documents brings down the indexing time from 195s to 33s
* indexing 200k documents brings down the indexing time from 600s to 55s
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)