On Thu, 20 Nov 2008 21:14:16 +0100, Sanne Grinovero
<sanne.grinovero(a)gmail.com> wrote:
because of HSEARCH-268( optimize indexes in parallel ) but also for
other purposes, I am in need to define a new ThreadPool in Hibernate
Search's Lucene backend.
The final effect will actually be that all changes to indexes are
going to be performed in parallel (on different indexes).
So really, we should rename the Jira task to "apply index modifications
in parallel", unless we are first tackling the optimize problem only.
about the size
=========
I've considered some options:
1) "steal" the configuration setting from BatchedQueueingProcessor,
transforming that implementation in singlethreaded,
and reusing the parameter internally to the Lucene backend only (JMS
doesn't need it AFAIK).
I'm afraid this could break custom made backends configuration parsing.
2)add a new parameter to the environment
Will this change change the way the work execution is configured? For
example,
if you set hibernate.search.worker.execution=async, does this mean that
first of
all the work queue itself gets processed by
hibernate.search.worker.thread_pool.size number of threads
and then there will be another thread (from another pool? same pool?) per
directory provider applying the
actual changes?
If so we need two settings I believe since the default size of these two
different thread pools
will be different, right?
about transactions
============
As you know Search is not using a two phase commit between DB and
Index, but Emmanuel has a very cool vision about that: we could add
that later.
The problem is: what to do if a Lucene index update fails (e.g. index
A is corrupted),
should we cancel the tasks going to make changes to the other indexes, B
and C?
That would be possible, but I don't think that you like that: after
all the database changes are committed already, so I should actually
make a "best effort" to update all indexes which are still working
correctly.
I think a best effort approach is the best for now. I assume we still
throw an Exception
if the index operation fails, right? In this case the user will have to
reindex one way or
the other anyway.
Will we be able to get this change in for the GA release?
--Hardy