Re: [hibernate-dev] Batch indexing API

Tuesday, 7 July 2009

ok, go ahead what ever that means :).

On  Jul 7, 2009, at 10:49, Sanne Grinovero wrote:

...
 inline:

 2009/7/5 Emmanuel Bernard <emmanuel(a)hibernate.org&gt;:
>
>>> cacheMode //when would you need something different than Ignore?  
>>> Also,
>>> I'd
>>> rather get CacheMode be a Search class to keep the independance wrt
>>> Hibernate Core
>>
>> Depending on the model it might be much faster using cache when the
>> indexed entity
>> is having a @ManyToOne+@IndexedEmbedded relation to some entity  
>> having
>> high
>> probability to have been indexed already.
>> Like book->nation of publishing : you might have millions of books,
>> but just some hundreds
>> of nations, if these nations need to be reloaded over and over  
>> lazily
>> with a second query
>> a cache helps.
>> I'll wrap it to a Search specific enum like I've seen in  
>> Annotations?
>
> Did you try? It seems that the first level cache would load the  
> nation
> object once per iteration. Provided that cacheMode is unfortunately  
> a global
> setting for all entities, I'm wondering what's more efficient in  
> the end.
>

 Well I'm sure that it's not the best setting for most cases, but yes I
 have tried it
 and there are some situations in which it gives a major performance  
 boost,
 especially on complex models having many relations of this type;
 Also in this case the "first level cache" is very short lived, and
 every thread is having
 it's own... being short lived there's not a big chance to have a
 "first level cache hit";
 at the opposite the second level cache makes sure all "lookup tables"
 are loaded once
 for the whole process.
 Also it makes only sense when using a real cache, properly configured,
 not the Hashtable
 one.

>>
>>> optimizeAtEnd => optimizeOnFinish
>>> optimizeAfterPurge
>>> purgeAllAtStart => purgeBeforeIndexing ?, purgeAllOnStart,  
>>> purgeOnStart
>>
>> I vote for "purgeAllOnStart", I like  "purgeAll" to be
consistent.
>
> That's reasonable, my idea was to remove All to allow the  
> implementation to
> evolve down the road should Lucene provide a more efficient  
> solution to
> purge and create a new object but that's a far off bet.
>

 ah well I suppose this is not the only API you'll have to change,
 should that happen :-)

>>
>>> limitObjects => indexObjectsUpTo, indexFirstObjectsUpTo,
>>> limitIndexedObjectsTo  //what's the use case?
>>
>> Mostly testing and developing; I didn't have this in first design  
>> but
>> having to test it often it came out that
>> it was quite useful if I could just try the effect of some new
>> Analyzer without having to reindex
>> millions of records; Also during changes to the entities you might
>> want to see the effect
>> of adding some new field / search option without having to wait  
>> for hours.
>> I could have deleted data from dev database, but I consider having
>> this option a bit more flexible;
>> Actually I can foresee some feature request to be able to restrict  
>> the
>> data, but we can think about
>> that later. For same reasoning we could leave this out for the  
>> moment,
>> but it has been very useful for me.
>
> OK, let's mark this one as experimental, you seem to want more of  
> the API.
>
>>
>>> start => Future should get actually return some stats? We can  
>>> delay that
>>> but
>>> I don't like the JavaDoc claiming that we will always return null
>>
>> I took that from the recommendations on the Future javadoc itself,  
>> but
>> I agree with you it doesn't feel very good.
>> I could return (like you suggest) a reference to the used
>> IndexerProgressMonitor
>> (see
>>
http://fisheye.jboss.org/browse/Hibernate/search/trunk/src/main/java/org/...
>> )
>> The API of the IndexerProgressMonitor will be a topic to discuss  
>> later
>> (it is HSEARCH-370), for now there is one default impl
>> which will log progress and some performance stats; that's why it is
>> missing methods to retrieve the stats.
>
> Let's think about that and return null for now, I just want to  
> relax the
> strong statement in the javadoc
>
 I'll remove the return javadoc, leaving it undocumented for know.

>>
>>
>>> startAndWait => execute ?
>>
>> I preferred to stress the little difference with "start"; don't
you
>> think that having a "start" method and an "execute" method
>> is not making it clear which one I should call?
>>
>
> I know that's why I put a ? :)
>

 thanks for the insight,
 Sanne 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [hibernate-dev] Batch indexing API