[hibernate-dev] [Search] making updates to the indexes concurrently

Fri Nov 21 07:28:31 EST 2008

On Thu, 20 Nov 2008 21:14:16 +0100, Sanne Grinovero  
<sanne.grinovero at gmail.com> wrote:

> because of HSEARCH-268( optimize indexes in parallel ) but also for
> other purposes, I am in need to define a new ThreadPool in Hibernate
> Search's Lucene backend.
> The final effect will actually be that all changes to indexes are
> going to be performed in parallel (on different indexes).

So really, we should rename the Jira task to "apply index modifications
in parallel", unless we are first tackling the optimize problem only.

> about the size
> =========
> I've considered some options:
> 1) "steal" the configuration setting from BatchedQueueingProcessor,
> transforming that implementation in singlethreaded,
> and reusing the parameter internally to the Lucene backend only (JMS
> doesn't need it AFAIK).
> I'm afraid this could break custom made backends configuration parsing.
>
> 2)add a new parameter to the environment

Will this change change the way the work execution is configured? For  
example,
if you set hibernate.search.worker.execution=async, does this mean that  
first of
all the work queue itself gets processed by  
hibernate.search.worker.thread_pool.size number of threads
and then there will be another thread (from another pool? same pool?) per  
directory provider applying the
actual changes?
If so we need two settings I believe since the default size of these two  
different thread pools
will be different, right?

> about transactions
> ============
> As you know Search is not using a two phase commit between DB and
> Index, but Emmanuel has a very cool vision about that: we could add
> that later.
> The problem is: what to do if a Lucene index update fails (e.g. index
> A is corrupted),
> should we cancel the tasks going to make changes to the other indexes, B  
> and C?
> That would be possible, but I don't think that you like that: after
> all the database changes are committed already, so I should actually
> make a "best effort" to update all indexes which are still working  
> correctly.

I think a best effort approach is the best for now. I assume we still  
throw an Exception
if the index operation fails, right? In this case the user will have to  
reindex one way or
the other anyway.

Will we be able to get this change in for the GA release?

--Hardy