[hibernate-dev] Hibernate Search and massive indexing

Thu May 7 11:50:55 EDT 2009

I'd like to stress that the second mode (adaptative) is not contrained
into the BackendQueueProcessorFactory interface as
it is not created by the usual BatchedQueueingProcessor but by a new
kind of Worker.
So we can apply some more pipelining optimizations.

About 5) the API is not going to be threadsafe/multithread but will
use multiple threads.

It's not clear to me at the moment when the new "Commit" LuceneWork
should be used.

Sanne

2009/5/7 Emmanuel Bernard <emmanuel at hibernate.org>:
> I have synced with Sanne on his work on massive reindexing and here is the
> outcome of the discussion.
>
> 1. An exclusive batch mode is a mode where a node has exclusive access to
> the index and can optimize writings (not flushing, not committing at
> specific times etc).
>
> 2. The node able to activate the exclusive batch has to be the master in a
> cluster (ie not the slaves).
>
> 3. The master will have two modes, a transactional mode (as today, ie commit
> at tx boundaries - potentially asyned) and an exclusive batch mode called
> the adaptative mode.
> In this mode the BackendQueryProcessor can take some freedom in when and how
> it flushes changes to the Lucene index and when and how it commits.
> One approach would be to be transactional (ie one queue of changes = one
> commit) for low thresholds and batch exclusive for higher threshold (apply
> several queues of changes before flushing or committing
> this back end would somehow communicate with the master copy process to only
> copy changes at the right time. (I think it should work well already but
> needs to be verified).
> A slave / client could force a commit by sending a Commit LuceneWork if
> needed.
>
> 4. There should be a way to switch at runtime from the tx mode to the
> adaptative mode. When switching, the tx queue is forked, new elements are
> queued, old elements are processed.
> When the old queue is emptied, the adaptative mode kicks in.
>
> 5. On top of that, the massive indexer API, reads data from the database as
> fast as possible and push index works to the adaptative engine. This API
> will be mono server but multi thread for now. Sanne can describe that more
> in details.
> This api woudl have a start and waitTillDone() API that starts the
> adaptative engine and stops it.
>
> That's it for the first step.
>
> Second steps (no particular order)
> 6. Make the massive indexer API work in a cluster.
> Slaves would read the DB and push index works to the queue
>
> 7. find a way to apply analyzing before the actual IndexWriter usage
> That would allow to increase the index parallelism by allowing some
> pipelining.
> Or even better to analyze on the slaves and free cpu time for the master
> (would work nice with 6)
>
> Sanne please add anything I have missed, misinterpreted.
> _______________________________________________
> hibernate-dev mailing list
> hibernate-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>