[hibernate-dev] Batch indexing API

Sanne Grinovero sanne.grinovero at gmail.com
Sat Jul 4 14:59:51 EDT 2009

thanks, answering inline:

2009/7/3 Emmanuel Bernard <emmanuel at hibernate.org>:
> Overall I think it's good.
> some proposals (when followed by a ? my feeling is that the original name is
> as good or better)
> Generally, I've inversed names to make the most important word first.
> Indexer => MassIndexer

> objectLoadingThreads => threadsToLoadObjects
> objectLoadingBatchSize => batchSizeToLoadObjects
> documentBuilderThreads => threadsForSubsequentFetching, threadsForFetching
> indexWriterThreads => threadsIndexingToLucene
agreed on all above

> cacheMode //when would you need something different than Ignore? Also, I'd
> rather get CacheMode be a Search class to keep the independance wrt
> Hibernate Core
Depending on the model it might be much faster using cache when the
indexed entity
is having a @ManyToOne+ at IndexedEmbedded relation to some entity having high
probability to have been indexed already.
Like book->nation of publishing : you might have millions of books,
but just some hundreds
of nations, if these nations need to be reloaded over and over lazily
with a second query
a cache helps.
I'll wrap it to a Search specific enum like I've seen in Annotations?

> optimizeAtEnd => optimizeOnFinish
> optimizeAfterPurge
> purgeAllAtStart => purgeBeforeIndexing ?, purgeAllOnStart, purgeOnStart
I vote for "purgeAllOnStart", I like  "purgeAll" to be consistent.

> limitObjects => indexObjectsUpTo, indexFirstObjectsUpTo,
> limitIndexedObjectsTo  //what's the use case?
Mostly testing and developing; I didn't have this in first design but
having to test it often it came out that
it was quite useful if I could just try the effect of some new
Analyzer without having to reindex
millions of records; Also during changes to the entities you might
want to see the effect
of adding some new field / search option without having to wait for hours.
I could have deleted data from dev database, but I consider having
this option a bit more flexible;
Actually I can foresee some feature request to be able to restrict the
data, but we can think about
that later. For same reasoning we could leave this out for the moment,
but it has been very useful for me.

> start => Future should get actually return some stats? We can delay that but
> I don't like the JavaDoc claiming that we will always return null
I took that from the recommendations on the Future javadoc itself, but
I agree with you it doesn't feel very good.
I could return (like you suggest) a reference to the used IndexerProgressMonitor
(see http://fisheye.jboss.org/browse/Hibernate/search/trunk/src/main/java/org/hibernate/search/batchindexing/IndexerProgressMonitor.java?r=16587
The API of the IndexerProgressMonitor will be a topic to discuss later
(it is HSEARCH-370), for now there is one default impl
which will log progress and some performance stats; that's why it is
missing methods to retrieve the stats.

> startAndWait => execute ?
I preferred to stress the little difference with "start"; don't you
think that having a "start" method and an "execute" method
is not making it clear which one I should call?


> WTY?
> On  Jun 30, 2009, at 16:18, Sanne Grinovero wrote:
>> Hello,
>> I need some comments about the batch indexing API, so that I can
>> stabilize it and write the documentation;
>> I might even blog about it :-)
>> Here is the current sketch:
>> http://fisheye.jboss.org/browse/Hibernate/search/trunk/src/main/java/org/hibernate/search/Indexer.java?r=16967
>> Emmanuel I remember you wanted to give me directions to make this
>> "fluent", and you didn't like the method names.
>> ideas?
>> Sanne
>> _______________________________________________
>> hibernate-dev mailing list
>> hibernate-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/hibernate-dev

More information about the hibernate-dev mailing list