[hibernate-dev] Hibernate Search and massive indexing

Thu May 7 12:00:30 EDT 2009

On  May 7, 2009, at 17:50, Sanne Grinovero wrote:

> I'd like to stress that the second mode (adaptative) is not contrained
> into the BackendQueueProcessorFactory interface as
> it is not created by the usual BatchedQueueingProcessor but by a new
> kind of Worker.
> So we can apply some more pipelining optimizations.

Well not sure. Check the MDB listening to changes in the asymetric  
cluster mode
http://anonsvn.jboss.org/repos/hibernate/search/trunk/src/main/java/org/hibernate/search/backend/impl/jms/AbstractJMSHibernateSearchController.java
method getWorker()

We reach the getBackendQueueProcessorFactory and skip the Worker.

>
>
> About 5) the API is not going to be threadsafe/multithread but will
> use multiple threads.
>
> It's not clear to me at the moment when the new "Commit" LuceneWork
> should be used.

Correct, we can delay that and see who needs that.

>
>
> Sanne
>
> 2009/5/7 Emmanuel Bernard <emmanuel at hibernate.org>:
>> I have synced with Sanne on his work on massive reindexing and here  
>> is the
>> outcome of the discussion.
>>
>> 1. An exclusive batch mode is a mode where a node has exclusive  
>> access to
>> the index and can optimize writings (not flushing, not committing at
>> specific times etc).
>>
>> 2. The node able to activate the exclusive batch has to be the  
>> master in a
>> cluster (ie not the slaves).
>>
>> 3. The master will have two modes, a transactional mode (as today,  
>> ie commit
>> at tx boundaries - potentially asyned) and an exclusive batch mode  
>> called
>> the adaptative mode.
>> In this mode the BackendQueryProcessor can take some freedom in  
>> when and how
>> it flushes changes to the Lucene index and when and how it commits.
>> One approach would be to be transactional (ie one queue of changes  
>> = one
>> commit) for low thresholds and batch exclusive for higher threshold  
>> (apply
>> several queues of changes before flushing or committing
>> this back end would somehow communicate with the master copy  
>> process to only
>> copy changes at the right time. (I think it should work well  
>> already but
>> needs to be verified).
>> A slave / client could force a commit by sending a Commit  
>> LuceneWork if
>> needed.
>>
>> 4. There should be a way to switch at runtime from the tx mode to the
>> adaptative mode. When switching, the tx queue is forked, new  
>> elements are
>> queued, old elements are processed.
>> When the old queue is emptied, the adaptative mode kicks in.
>>
>> 5. On top of that, the massive indexer API, reads data from the  
>> database as
>> fast as possible and push index works to the adaptative engine.  
>> This API
>> will be mono server but multi thread for now. Sanne can describe  
>> that more
>> in details.
>> This api woudl have a start and waitTillDone() API that starts the
>> adaptative engine and stops it.
>>
>> That's it for the first step.
>>
>> Second steps (no particular order)
>> 6. Make the massive indexer API work in a cluster.
>> Slaves would read the DB and push index works to the queue
>>
>> 7. find a way to apply analyzing before the actual IndexWriter usage
>> That would allow to increase the index parallelism by allowing some
>> pipelining.
>> Or even better to analyze on the slaves and free cpu time for the  
>> master
>> (would work nice with 6)
>>
>> Sanne please add anything I have missed, misinterpreted.
>> _______________________________________________
>> hibernate-dev mailing list
>> hibernate-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>