Re: [hibernate-dev] Batch indexing API

Tuesday, 7 July 2009

inline:

2009/7/5 Emmanuel Bernard <emmanuel(a)hibernate.org&gt;:
...

>> cacheMode //when would you need something different than Ignore? Also,
>> I'd
>> rather get CacheMode be a Search class to keep the independance wrt
>> Hibernate Core
>
> Depending on the model it might be much faster using cache when the
> indexed entity
> is having a @ManyToOne+@IndexedEmbedded relation to some entity having
> high
> probability to have been indexed already.
> Like book->nation of publishing : you might have millions of books,
> but just some hundreds
> of nations, if these nations need to be reloaded over and over lazily
> with a second query
> a cache helps.
> I'll wrap it to a Search specific enum like I've seen in Annotations?

 Did you try? It seems that the first level cache would load the nation
 object once per iteration. Provided that cacheMode is unfortunately a global
 setting for all entities, I'm wondering what's more efficient in the end.

Well I'm sure that it's not the best setting for most cases, but yes I
have tried it
and there are some situations in which it gives a major performance boost,
especially on complex models having many relations of this type;
Also in this case the "first level cache" is very short lived, and
every thread is having
it's own... being short lived there's not a big chance to have a
"first level cache hit";
at the opposite the second level cache makes sure all "lookup tables"
are loaded once
for the whole process.
Also it makes only sense when using a real cache, properly configured,
not the Hashtable
one.

...
>
>> optimizeAtEnd => optimizeOnFinish
>> optimizeAfterPurge
>> purgeAllAtStart => purgeBeforeIndexing ?, purgeAllOnStart, purgeOnStart
>
> I vote for "purgeAllOnStart", I like  "purgeAll" to be
consistent.

 That's reasonable, my idea was to remove All to allow the implementation to
 evolve down the road should Lucene provide a more efficient solution to
 purge and create a new object but that's a far off bet.

ah well I suppose this is not the only API you'll have to change,
should that happen :-)

...
>
>> limitObjects => indexObjectsUpTo, indexFirstObjectsUpTo,
>> limitIndexedObjectsTo  //what's the use case?
>
> Mostly testing and developing; I didn't have this in first design but
> having to test it often it came out that
> it was quite useful if I could just try the effect of some new
> Analyzer without having to reindex
> millions of records; Also during changes to the entities you might
> want to see the effect
> of adding some new field / search option without having to wait for hours.
> I could have deleted data from dev database, but I consider having
> this option a bit more flexible;
> Actually I can foresee some feature request to be able to restrict the
> data, but we can think about
> that later. For same reasoning we could leave this out for the moment,
> but it has been very useful for me.

 OK, let's mark this one as experimental, you seem to want more of the API.

>
>> start => Future should get actually return some stats? We can delay that
>> but
>> I don't like the JavaDoc claiming that we will always return null
>
> I took that from the recommendations on the Future javadoc itself, but
> I agree with you it doesn't feel very good.
> I could return (like you suggest) a reference to the used
> IndexerProgressMonitor
> (see
>
http://fisheye.jboss.org/browse/Hibernate/search/trunk/src/main/java/org/...
> )
> The API of the IndexerProgressMonitor will be a topic to discuss later
> (it is HSEARCH-370), for now there is one default impl
> which will log progress and some performance stats; that's why it is
> missing methods to retrieve the stats.

 Let's think about that and return null for now, I just want to relax the
 strong statement in the javadoc
 I'll remove the return javadoc, leaving it undocumented for know.

...
>
>
>> startAndWait => execute ?
>
> I preferred to stress the little difference with "start"; don't you
> think that having a "start" method and an "execute" method
> is not making it clear which one I should call?
>

 I know that's why I put a ? :)

thanks for the insight,
Sanne

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [hibernate-dev] Batch indexing API