[infinispan-dev] asynchronous change manager for high load Lucene

Fri Mar 26 10:55:24 EDT 2010

Hello,
as I had anticipated in some chats, my current feeling with the Lucene
Directory is that I got a marvelous new device but I can't completely
use it,
like currently in my home town they installed a new pair of train
trails for high speed trains to connect to the city but they can't
afford the high-speed trains.
Or, more fitting, like getting a 64-way server for a service which
can't use more than one thread, because of some old dependency. You
would definitely want to solve this limitation :D

The Lucene Directory is doing well in read-most situations, and a nice
to have for huge sized indexes to easily sync them in dynamic-topology
clusters.
But when the many participant nodes all potentially apply changes, a
global index lock needs to be acquired; this is safely achievable by
using the LockFactory implementations provided in the lucene-directory
package but of course doesn't scale.

In practice, it's worse than a performance problem: Lucene expects
full ownership of all resources, and so the IndexWriter implementation
doesn't expect the Lock to timeout, and has no notion of fairness in
the wait process, so if your architecture does something like "each
node can ask for the lock and apply changes" without some external
coordination, this only works in case the contention on this lock is
low enough; if it piles up, it will blow up the application with
exceptions on indexing.

Best solution I've seen so far is what Emmanuel implemented years ago
for Hibernate Search by using JMS to send changes to a single master
node. Currently I think the state-of-the-art installation should
combine such a queue-like solution to delegate all changes to a single
node, and this single node should apply the changes to an Infinispan
Directory - so making changes to all other nodes visible through
efficient Infinispan distribution/replication.
Replication was done in the past by using an rsync-like file copy, so
the new benefit would be to ease the setup of Directory replication,
but you still need a dedicated master node and work on setting this
up.

Now I would like to use Infinispan to replace the JMS approach,
especially as in cloud environments it's useful that the different
participants which make up the service are all equally configured:
having a Master Writer node is fine as long as it's auto-elected.
(not mandatory, just very nice to have)

The problems to solve:

A) Locking
The current locking solution is implemented by atomically adding a
marker value in the cache. I can't use transactions as the lock could
span several transactions and must be visible.
It's Lucene's responsibility to clear this lock properly, and I can
trust it for that as far as the code can do. But what happens if the
node dies? Other nodes taking over the writer role should be able to
detect the situation and remove the lock.

Proposals:

 - A1) We don't lock, but find a way to elect one and only one writer
node. This would be very cool, but I have no idea about how to
implement it.

 - A2) We could store the node-address as value in the marker object,
if the address isn't part of the members the lock is cleared. (can I
trust the members view? the index will corrupt if changed by two
nodes)

B) Sending changes to writer
As for Lucene's design the IndexWriter is threadsafe and is an heavy
to build object, it should be reused as much as possible to insert
many Documents at once. So when a node managed to acquire the Lock it
should keep the IndexWriter open a relatively long time, and possibly
receive changes from other nodes to be applied on the index.

Proposal:

 - B1) Use JMS or JGroups directly (as Hibernate Search is currently
capable to use both), again I face the problem of IndexWriter node
election, and have the messages sent to the correct node.
   In both cases I would like to receive enough information from
Infinispan to now where to send messages from the queue.

 - B2) Exploit the ConcurrentMap nature of Infinispan: I don't need
strict ordering of change-requests if we can make an assumption on the
Lucene Documents: to have each document identifiable.
   This is usually the case, and always is in Hibernate Search where
each Document entry is identified by (typeOfEntity, PrimaryKey).
   Assuming we find out how to start the IndexWriting process on a
single node, we could have a Cache to store change requests on the
index: if for each changed entity I was inserting a key made of
(documentID,typeOfOperation) and value containing eventually the new
Document to be written and some timestamp/counter. typeOfOperation
could be "delete" or "add", which are the only supported operations by
Lucene. I believe I could read the timestamp from the entry?
The timestamp would be needed to recognize what to do in case of
having both a delete and a add operation on same entity (add,delete ->
noop; delete,add --> update).
So the node which is running the Lucene IndexWriting task could
periodically iterate on all entries of this cache and apply latest
needed operations, using atomic removeIfUnchanged . It would be fine,
even better, in case of overwritten entries as I would write only
latest version of a Document in case it's being changed quicker than
what we can write.
The drawback of this approach is that it assumes that, while it will
buffer temporary spikes of load, it needs to be on average faster to
write out all changes, and that changes on different entities are
applied in unpredictable order, still I'm liking this solution the
most as it looks like it can best use Infinispan for maximum
performance.

Thoughts? Problems?
I'd also appreciate some example of a job that needs to be running on
a single node only if you have one, and would love to avoid depending
on more than Infinispan.

Cheers,
Sanne