[hibernate-dev] [Search] Lazy initialization vs HSearch - Round 4

Mon Dec 17 06:25:24 EST 2012

There are at least 2 different approaches:

#1
What Emmanuel describes is to write in a structure which allows for
queries as well, so to be able to include uncommitted changes in
queries.
It should be possible to pick such a structure in a way which could
 - make use of temporary storage out of memory for the larger transactions
 - avoid repeating Analysis of the same input text
 - be efficiently applied to the reference index, for example maybe
using org.apache.lucene.index.IndexWriter.addIndexes(...)

That would be indeed great as it would provide a new isolation level
which is generally maybe not as efficient as we do today but more in
line with expectations.

2#
I didn't see Guillaume asking about queries, so assuming he would just
need a large write buffer where to keep the current Documents; this is
a simpler patch as we already do build such a plan during any index
flush / commit.
Have a look into
  org.hibernate.search.backend.impl.BatchedQueueingProcessor

a simple way would be to have the method _performWorks_ to "shelf" the
WorkQueue in a list rather than committing it, a more sophisticated
patch (not sure if it's worth it) might want to reuse the same
workQueue.

Reusing of the same queue could give the chance to produce a more
compact output: if you update the same entity multiple times it will
contain a single operation for it. But I'm guessing that case would be
very unlikely since we're talking about large datasets here, and
possibly that's not worth the additional complexity.

Sanne

On 17 December 2012 09:10, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
> My initial reaction is if you really need transactional consistency is to do run a transaction commit every n operations. But that may not be ok for your application.
> We could imagine more or less what you are describing. This is an idea we want to explore to offer consistency between the index and the database even within a non committee transaction. Basically flushing changes to an in memory index plus some smart hand waving to via index filters to hide deleted elements from the committed indexes.
>
> I don't think this is really going to be an easy task but if you have some spare time, that would be awesome to start digging in this area.
>
> On 10 déc. 2012, at 21:45, Guillaume Smet <guillaume.smet at gmail.com> wrote:
>
>> On Mon, Dec 10, 2012 at 9:07 PM, Guillaume Smet
>> <guillaume.smet at gmail.com> wrote:
>>> On Mon, Dec 10, 2012 at 8:27 PM, Guillaume Smet
>>> <guillaume.smet at gmail.com> wrote:
>>>> Is it transactionally safe?
>>>
>>> From what I read, it's not.
>>>
>>> Do you see any way to get this type of pattern working in a
>>> transactionally safe way?
>>
>> Hmmm, just handwaving at the moment but couldn't we imagine that,
>> instead of flushing to the indexes, we could flush to a "buffer" of
>> Lucene documents. I think it could reduce the memory footprint
>> compared to keeping all the entities in memory while allowing us to
>> keep the operation transactional? And it might allow us to get the
>> above case working.
>>
>> Perhaps, flushing to a queue of Lucene documents isn't the right idea
>> but we could prepare/"compile" the LuceneWork to whatever is
>> convenient for this particular operation (Lucene document, query for
>> deletion...), get rid of the entity and all the entity related stuff
>> and apply the works at the commit of the transaction as it's already
>> done.
>>
>> I'm pretty sure it could be an interesting compromise - if doable.
>>
>> Any thoughts?
>>
>> --
>> Guillaume
>> _______________________________________________
>> hibernate-dev mailing list
>> hibernate-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/hibernate-dev