[hibernate-dev] Hibernate Search and massive indexing
Emmanuel Bernard
emmanuel at hibernate.org
Thu May 7 12:00:30 EDT 2009
On May 7, 2009, at 17:50, Sanne Grinovero wrote:
> I'd like to stress that the second mode (adaptative) is not contrained
> into the BackendQueueProcessorFactory interface as
> it is not created by the usual BatchedQueueingProcessor but by a new
> kind of Worker.
> So we can apply some more pipelining optimizations.
Well not sure. Check the MDB listening to changes in the asymetric
cluster mode
http://anonsvn.jboss.org/repos/hibernate/search/trunk/src/main/java/org/hibernate/search/backend/impl/jms/AbstractJMSHibernateSearchController.java
method getWorker()
We reach the getBackendQueueProcessorFactory and skip the Worker.
>
>
> About 5) the API is not going to be threadsafe/multithread but will
> use multiple threads.
>
> It's not clear to me at the moment when the new "Commit" LuceneWork
> should be used.
Correct, we can delay that and see who needs that.
>
>
> Sanne
>
> 2009/5/7 Emmanuel Bernard <emmanuel at hibernate.org>:
>> I have synced with Sanne on his work on massive reindexing and here
>> is the
>> outcome of the discussion.
>>
>> 1. An exclusive batch mode is a mode where a node has exclusive
>> access to
>> the index and can optimize writings (not flushing, not committing at
>> specific times etc).
>>
>> 2. The node able to activate the exclusive batch has to be the
>> master in a
>> cluster (ie not the slaves).
>>
>> 3. The master will have two modes, a transactional mode (as today,
>> ie commit
>> at tx boundaries - potentially asyned) and an exclusive batch mode
>> called
>> the adaptative mode.
>> In this mode the BackendQueryProcessor can take some freedom in
>> when and how
>> it flushes changes to the Lucene index and when and how it commits.
>> One approach would be to be transactional (ie one queue of changes
>> = one
>> commit) for low thresholds and batch exclusive for higher threshold
>> (apply
>> several queues of changes before flushing or committing
>> this back end would somehow communicate with the master copy
>> process to only
>> copy changes at the right time. (I think it should work well
>> already but
>> needs to be verified).
>> A slave / client could force a commit by sending a Commit
>> LuceneWork if
>> needed.
>>
>> 4. There should be a way to switch at runtime from the tx mode to the
>> adaptative mode. When switching, the tx queue is forked, new
>> elements are
>> queued, old elements are processed.
>> When the old queue is emptied, the adaptative mode kicks in.
>>
>> 5. On top of that, the massive indexer API, reads data from the
>> database as
>> fast as possible and push index works to the adaptative engine.
>> This API
>> will be mono server but multi thread for now. Sanne can describe
>> that more
>> in details.
>> This api woudl have a start and waitTillDone() API that starts the
>> adaptative engine and stops it.
>>
>> That's it for the first step.
>>
>> Second steps (no particular order)
>> 6. Make the massive indexer API work in a cluster.
>> Slaves would read the DB and push index works to the queue
>>
>> 7. find a way to apply analyzing before the actual IndexWriter usage
>> That would allow to increase the index parallelism by allowing some
>> pipelining.
>> Or even better to analyze on the slaves and free cpu time for the
>> master
>> (would work nice with 6)
>>
>> Sanne please add anything I have missed, misinterpreted.
>> _______________________________________________
>> hibernate-dev mailing list
>> hibernate-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>
More information about the hibernate-dev
mailing list