[infinispan-dev] asynchronous change manager for high load Lucene

Tue Mar 30 07:14:47 EDT 2010

I've cut large parts of text and quoting from different answers, hope it helps:

Bela Ban wrote:
>>I'm commenting on this without being a Lucene Hibernate Search expert,
so bear with me...

thanks for all the insight, it's very appreciated.

Manik Surtani wrote:
>> This is because each time an update is made, all parts of the index need to be updated?  Can this not be done in a finer-grained manner where nodes only lock and update the chunks they need to update?  Caveat: I have no idea of Lucene index formats, so this may make no sense whatsoever.  :)

The index is structured in a series of large files; one of them is
small and is always updated by all changes, the others are either
deleted or added, never updated - depending on the amount of segments:
if some thresholds are crossed, more updates might be needed.
The most frequent operation is to add a new segment and edit the small
one to add a pointer to the new segment.
In some cases a group of smaller segments is deleted and a new one is
added containing the merged data - the new segment will have a new
name/key.
It's quite possible the case that they are all deleted and new
segments are rewritten containing a new optimized copy of the same
information - will use a different filename in this case (filename is
our key).

Manik Surtani wrote:
>> Not nice.  Are there plans to support retries in Lucene in future?

I don't think retries are planned, but if we wanted to do that we
could ignore the typical semantics of the Lucene lock as we implement
that and blok longer than what the application asks for.
Still that means that all nodes are going to have high contention on
the global index lock - in theory it's still more efficient to send
over all changes to a single node and have it handle it as it can
buffer and aggregate changes.
The IndexWritter was designed to be highly efficient and
multithreaded, we just need to make sure there's one instance - a
longer term idea I'd like to explore is to totally override the
IndexWriter but that's going to be complex and every change in Lucene
would require an update.
Would be nice to override the IndexReader too and use an ad-hoc index
structure - I'll play with it but doubt I could deliver something
stable in the near future as I'm working on this in my spare time.

>> Sanne Grinovero wrote:
>> The current locking solution is implemented by atomically adding a
>> marker value in the cache. I can't use transactions as the lock could
>> span several transactions and must be visible.

Bela Ban wrote:
>> The issue with this is that you (ab)use the cache as a messaging /
>> signalling system ! This is not recommended. A better approach would be
>> to use a separate channel for communication, or to even reuse the
>> channel Infinispan uses. The latter is something I've been discussing
>> with Brian.

I know, I clearly remember your recommendations about this, please
forgive me :) This was done so that the lock state should be coupled
to the index and this Lucene Directory could be used in other
scenarios, like to keep it simple enough to work as a drop-in
replacement in existing applications which don't want to implement the
enhancements discussed on this thread.
In most cases Lucene is almost read-only, not updated very often as
updates are very costly, and so the cost of such a lock is quite
negligible compared to all that what's happening in the middle:
potentially fetching, processing and distributing many gigabytes of
index.
It's possible to plug it's own locking implementation, the marker
value stored in the cache is the default as a safety net.
In fact I'd like to plug a no-op Lock implementation and just make
sure the election stuff works - but need strong guarantees.
To make another analogy to "traditional lucene" locks, a well designed
application could disable them too, but by default a file-level lock
is written to protect from the unexpected.

Manik Surtani wrote:
>> This is because each time an update is made, all parts of the index need to be updated?  Can this not be done in a finer-grained manner where nodes only lock and update the chunks they need to update?  Caveat: I have no idea of Lucene index formats, so this may make no sense whatsoever.  :)

Lucene chooses which parts it needs to update, as mentioned above I'd
like to investigate this path and take control of the index structure,
but that's far-future planning :)

Emmanuel Bernard wrote:
> Bela Ban wrote:
>> You could use the cluster view, which is the same across all nodes, and
>> pick the first element in the list. Or you could run an agreement
>> protocol, which deterministically elects a master.
>
> Looks simple, deterministic and elegant.

Agreed, that looks like a clean solution, but would scale better if we
could decide a master for each different index, to spread the load
instead of choosing always the same node for all the work, ideas on
that?
I could hash the indexname (the identifier) and % cluster view list size, WDYT ?

Emmanuel Bernard wrote:
> So basically, following Bela's advice, you would:
> - use the cluster view to elect the master
> - send changes to the master using JGroups
>
> We still need one guaranty: make sure queued changes not applied on the master are going to be processed by the new master (that's why Sanne was thining of using Infinispan as a messaging system I guess).

Yes that's what I thought, if I store changes-to-be-done in Infinispan
I expect that to be reliable enough - as much as people wants to
configure. You're right about networking inefficiency, ideally I
should need something that was able to select the main storing node,
and have secondary (third..) "buddies" elected by hash.
Total ordering is not needed, but we need to make sure that different
works applying to the same entity instance are applied in the same
order - and it's also fine if they happen to cancel out each other, so
writing changes in a map I could atomically remove work which is done,
or atomically replace outdated work which can be replaced to a new
version when the writer is lagging behind the work production - this
lag could accumulate the whole day, and still have updates to the
index reflected as fast as possible (in current design terms) and
possibly have guarantess that at night all work will be processed -
and duplicates removed as a side effect of key collisions during the
storing.
When using a global queue I'll have to guarantee that all received
work is processed in the same order as they where produced - order has
to match the database transactions to which Hibernate Search was
listening to - when using a map approach I'm confident that MVCC will
help in "seeing" only the most recently put work, which always is the
correct ordering.
The short to identify JGroups ordering contexts is way too limited for
this problem, the context should be defined as unique
type+id+operation to solve this same problem, which is a nice key for
my Map.

I don't want to push for the "Cache bus" approach, but in other cases
I would definitely need some help on the issues above, while the Map
sounds easy to implement now that I also see an easy way to know if
I'm the master or not.
I got puzzled by this SEQUENCER and just read the docs, I'll ask some
more details in an appropriate thread.

Manik, you have an example of how I should send a message to a
selected member by integrating in Infinispan? I think I can figure out
how to send it with some code reading, but I'm mostly lost about how
to receive it? Or maybe you suggest I should share the
JGroupsTransport with totally infinispan-independent code?

thanks all for the excellent feedback,
Sanne