[hibernate-dev] Batch indexing API

Sun Jul 5 05:12:03 EDT 2009

>> cacheMode //when would you need something different than Ignore?  
>> Also, I'd
>> rather get CacheMode be a Search class to keep the independance wrt
>> Hibernate Core
> Depending on the model it might be much faster using cache when the
> indexed entity
> is having a @ManyToOne+ at IndexedEmbedded relation to some entity  
> having high
> probability to have been indexed already.
> Like book->nation of publishing : you might have millions of books,
> but just some hundreds
> of nations, if these nations need to be reloaded over and over lazily
> with a second query
> a cache helps.
> I'll wrap it to a Search specific enum like I've seen in Annotations?

Did you try? It seems that the first level cache would load the nation  
object once per iteration. Provided that cacheMode is unfortunately a  
global setting for all entities, I'm wondering what's more efficient  
in the end.

>
>> optimizeAtEnd => optimizeOnFinish
>> optimizeAfterPurge
>> purgeAllAtStart => purgeBeforeIndexing ?, purgeAllOnStart,  
>> purgeOnStart
> I vote for "purgeAllOnStart", I like  "purgeAll" to be consistent.

That's reasonable, my idea was to remove All to allow the  
implementation to evolve down the road should Lucene provide a more  
efficient solution to purge and create a new object but that's a far  
off bet.

>
>> limitObjects => indexObjectsUpTo, indexFirstObjectsUpTo,
>> limitIndexedObjectsTo  //what's the use case?
> Mostly testing and developing; I didn't have this in first design but
> having to test it often it came out that
> it was quite useful if I could just try the effect of some new
> Analyzer without having to reindex
> millions of records; Also during changes to the entities you might
> want to see the effect
> of adding some new field / search option without having to wait for  
> hours.
> I could have deleted data from dev database, but I consider having
> this option a bit more flexible;
> Actually I can foresee some feature request to be able to restrict the
> data, but we can think about
> that later. For same reasoning we could leave this out for the moment,
> but it has been very useful for me.

OK, let's mark this one as experimental, you seem to want more of the  
API.

>
>> start => Future should get actually return some stats? We can delay  
>> that but
>> I don't like the JavaDoc claiming that we will always return null
> I took that from the recommendations on the Future javadoc itself, but
> I agree with you it doesn't feel very good.
> I could return (like you suggest) a reference to the used  
> IndexerProgressMonitor
> (see http://fisheye.jboss.org/browse/Hibernate/search/trunk/src/main/java/org/hibernate/search/batchindexing/IndexerProgressMonitor.java?r=16587
> )
> The API of the IndexerProgressMonitor will be a topic to discuss later
> (it is HSEARCH-370), for now there is one default impl
> which will log progress and some performance stats; that's why it is
> missing methods to retrieve the stats.

Let's think about that and return null for now, I just want to relax  
the strong statement in the javadoc

>
>
>> startAndWait => execute ?
> I preferred to stress the little difference with "start"; don't you
> think that having a "start" method and an "execute" method
> is not making it clear which one I should call?
>

I know that's why I put a ? :)