inline answers;
2008/11/21 Hardy Ferentschik <hibernate(a)ferentschik.de>:
On Thu, 20 Nov 2008 21:14:16 +0100, Sanne Grinovero
<sanne.grinovero(a)gmail.com> wrote:
> because of HSEARCH-268( optimize indexes in parallel ) but also for
> other purposes, I am in need to define a new ThreadPool in Hibernate
> Search's Lucene backend.
> The final effect will actually be that all changes to indexes are
> going to be performed in parallel (on different indexes).
So really, we should rename the Jira task to "apply index modifications
in parallel", unless we are first tackling the optimize problem only.
yes I
should rename it, it would be more difficult to isolate optimization
only than actually doing all work parallel.
> about the size
> =========
> I've considered some options:
> 1) "steal" the configuration setting from BatchedQueueingProcessor,
> transforming that implementation in singlethreaded,
> and reusing the parameter internally to the Lucene backend only (JMS
> doesn't need it AFAIK).
> I'm afraid this could break custom made backends configuration parsing.
> 2)add a new parameter to the environment
Will this change change the way the work execution is configured? For
example,
if you set hibernate.search.worker.execution=async, does this mean that
first of
all the work queue itself gets processed by
hibernate.search.worker.thread_pool.size number of threads
and then there will be another thread (from another pool? same pool?) per
directory provider applying the
actual changes?
If so we need two settings I believe since the default size of these two
different thread pools
will be different, right?
yes you got the point, this is why I'm writing here.
Actually the second pool size, as mentioned in other email, should be fixed
and so we don't need to add a new parameter. This is good for the book ;-)
> about transactions
> ============
> As you know Search is not using a two phase commit between DB and
> Index, but Emmanuel has a very cool vision about that: we could add
> that later.
> The problem is: what to do if a Lucene index update fails (e.g. index
> A is corrupted),
> should we cancel the tasks going to make changes to the other indexes, B
> and C?
> That would be possible, but I don't think that you like that: after
> all the database changes are committed already, so I should actually
> make a "best effort" to update all indexes which are still working
> correctly.
I think a best effort approach is the best for now. I assume we still throw
an Exception
if the index operation fails, right? In this case the user will have to
reindex one way or
the other anyway.
well an indexing time failure would kill a thread in the pool
and get logged of course, but will not "kill" the committing transaction.
This is the same semantics as of current working in async, so my guess is that
it could stay that way at least for this release.
Actually in sync mode the behaviour would be different.. should I catch
the exception and pass it back to the committing thread?
This means the exception handling is currently different in sync or async too,
probably nobody is bothering with this detail?
Will we be able to get this change in for the GA release?
yes the concept code is working here, need to do some more tests
and of course a "go" feedback from this list.
> --Hardy
thanks for helping,
Sanne