[hibernate-dev] Re: Search: backend refactoring
Emmanuel Bernard
emmanuel at hibernate.org
Mon Sep 8 10:26:26 EDT 2008
On Sep 7, 2008, at 05:41, Sanne Grinovero wrote:
> The short question:
> may I add some methods to the implementations of LuceneWork?
> I'm refactoring the backends and it would help, but there
> is a warning there in the javadoc about not changing it freely.
>
> Sanne
The short answer is no, I don't think it should be needed. LuceneWork
should be the minimal contract needed when sending info across the
wire. What additional info do you need to forward?
>
>
> The same question, a bit more verbose:
>
> Hi,
> I've been puzzling about several optimization in Search I would like
> to implement,
> but am needing to do some refactoring in the
> org.hibernate.search.backend package.
> (mostly done actually, but needing your ideas)
>
> Most changes affect "lucene" implementation, but the code would be
> greatly simplified,
> more readable and (performing better too IMHO) if I'm permitted to
> change the
> current implementations of LuceneWork; however there's a big warning
> there
> about a requirement to be backwards compatible with the serialized
> form.
> (btw OptimizeLuceneWork is missing the "magic serialization number")
optimize does not cross the wire
>
>
> I would like to add them some methods, and a single field which
> could actually
> be transient so I could attempt to maintain the compatibility.
> Additionally I've been thinking that iff you like to keep the
> LuceneWork as
> a very simple transport and prefer to not add methods, it would be
> nicer to
> have just one class and have the AddLuceneWork/DeleteLuceneWork/... to
> differentiate
> by a field (using org.hibernate.search.backend.WorkType ?)
I am open to this approach. I initially created subclasses because the
necessary data was different between works.
>
> to mark the different type of work; so I could add
> the methods I'm needing to the enum.
> Also I could see some use of having an UpdateLuceneWork too, so that
> it is
> the backend implementation's business to decide if he wants to split
> it in a
> delete+insert or do something more clever:
> the receive order of messages would be less critical and some clever
> optimizations
> could be applied by the backend by reordering received Work(s) or
> repackaging
> several queues in one.
Why would the order of message be less critical? Not sure what you
mean by critical as it's contained in a given work load.
>
>
> What I've done already:
> a)early division in different queues, basing on affected
> DirectoryProviders
>
> b)refactoring/simplification of Workspace, no longer needed to keep
> track of
> state for different DP as there is only one in the context.
>
> c)shorter Lock times: no threads ever need more than one Lock;
> work is sorted by DP, each lock is released before acquiring the
> next one.
> (deadlockFreeQueue is removed as not needed anymore)
> before if we needed lock on DP's A,B,C the time of acquisition
> looked like:
> Alock *********
> Block ******
> CLock ***
> now it is more like
> Alock ***
> Block ***
> Clock ***
> And my goal is to make this possible, in separate threads when async:
> Alock ***
> Block ***
> Clock ***
> (not implemented yet: will need a new backend, but I'm preparing the
> common
> stuff to make this possible)
>
> d)The QueueProcessor can ask the Work about if they need an
> indexwriter,
> indexreader or have any preference about one for when there is
> possibility to make a choice (when we open both a reader and writer
> anyway because of strict requirement of other Work in the same queue).
I partly follow you (a delete can be done by a writer in some
situations) but I don't quite understand why the work should describe
that. What do you gain?
>
>
> e)basing on d), DeleteLuceneWork is able to run either on reader or
> writer
> (when it's possible to do so, depending on (the number of different
> classes using the same DP) == 1); In this last case the work is able
> to
> tell it "prefers" to be executed on an IndexWriter, but will be able
> to do it's task with an IndexReader too (or the opposite?)
when would you need to still use the IR approach in that case?
>
>
> f)"batch mode" is currently set on all DP if only one Work is of
> type batch,
> the division of Workspace per DP does not need this any more and batch
> mode can be set independently.
good to have the flexibility but I am not sure we will ever need that.
This case should not happen unless you merge queues from different
transactions.
>
>
> Another goal I have with this design is the possibility to aggregate
> different committed queues in one, having the possibility to
> optimize away work (insert then delete => noop) considering the
> original
> order,
hum total ordering is hard (on multi VM) and this case (insert then
delete) is probably very uncommon. (though it could happen if you
execute the work of a whole day at once ; but then you face memory
issues to order queues).
> but also call the strategy optimization again
> to reorder the newly created work for best efficiency.
> The final effect would be to obtain the same behavior of
> my custom batch indexer, but optimizing not only indexing from scratch
> but any type of load.
> I hope to not scare you, the resulting code is quite simple and I
> think there are actually less LOC than the current trunk has;
> I've not prepared any special case Test, I just run all existing ones.
let's try and chat on IM around that.
>
>
> kind regards,
> Sanne
More information about the hibernate-dev
mailing list