Re: [hibernate-dev] Batch indexing API

Sunday, 5 July 2009

...
> cacheMode //when would you need something different than Ignore? 

> Also, I'd
> rather get CacheMode be a Search class to keep the independance wrt
> Hibernate Core
 Depending on the model it might be much faster using cache when the
 indexed entity
 is having a @ManyToOne+@IndexedEmbedded relation to some entity  
 having high
 probability to have been indexed already.
 Like book->nation of publishing : you might have millions of books,
 but just some hundreds
 of nations, if these nations need to be reloaded over and over lazily
 with a second query
 a cache helps.
 I'll wrap it to a Search specific enum like I've seen in Annotations? 
Did you try? It seems that the first level cache would load the nation  
object once per iteration. Provided that cacheMode is unfortunately a  
global setting for all entities, I'm wondering what's more efficient  
in the end.

...

> optimizeAtEnd => optimizeOnFinish
> optimizeAfterPurge
> purgeAllAtStart => purgeBeforeIndexing ?, purgeAllOnStart,  
> purgeOnStart
 I vote for "purgeAllOnStart", I like  "purgeAll" to be consistent.

That's reasonable, my idea was to remove All to allow the  
implementation to evolve down the road should Lucene provide a more  
efficient solution to purge and create a new object but that's a far  
off bet.

...

> limitObjects => indexObjectsUpTo, indexFirstObjectsUpTo,
> limitIndexedObjectsTo  //what's the use case?
 Mostly testing and developing; I didn't have this in first design but
 having to test it often it came out that
 it was quite useful if I could just try the effect of some new
 Analyzer without having to reindex
 millions of records; Also during changes to the entities you might
 want to see the effect
 of adding some new field / search option without having to wait for  
 hours.
 I could have deleted data from dev database, but I consider having
 this option a bit more flexible;
 Actually I can foresee some feature request to be able to restrict the
 data, but we can think about
 that later. For same reasoning we could leave this out for the moment,
 but it has been very useful for me. 
OK, let's mark this one as experimental, you seem to want more of the  
API.

...

> start => Future should get actually return some stats? We can delay  
> that but
> I don't like the JavaDoc claiming that we will always return null
 I took that from the recommendations on the Future javadoc itself, but
 I agree with you it doesn't feel very good.
 I could return (like you suggest) a reference to the used  
 IndexerProgressMonitor
 (see
http://fisheye.jboss.org/browse/Hibernate/search/trunk/src/main/java/org/...
 )
 The API of the IndexerProgressMonitor will be a topic to discuss later
 (it is HSEARCH-370), for now there is one default impl
 which will log progress and some performance stats; that's why it is
 missing methods to retrieve the stats. 
Let's think about that and return null for now, I just want to relax  
the strong statement in the javadoc

...

> startAndWait => execute ?
 I preferred to stress the little difference with "start"; don't you
 think that having a "start" method and an "execute" method
 is not making it clear which one I should call?

I know that's why I put a ? :)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [hibernate-dev] Batch indexing API