[infinispan-dev] asynchronous change manager for high load Lucene

Mon Mar 29 08:04:35 EDT 2010

Hi Sanne.

Wow, talk about a detailed analysis email!  My comments inline.

On 26 Mar 2010, at 14:55, Sanne Grinovero wrote:

> Hello,
> as I had anticipated in some chats, my current feeling with the Lucene
> Directory is that I got a marvelous new device but I can't completely
> use it,
> like currently in my home town they installed a new pair of train
> trails for high speed trains to connect to the city but they can't
> afford the high-speed trains.
> Or, more fitting, like getting a 64-way server for a service which
> can't use more than one thread, because of some old dependency. You
> would definitely want to solve this limitation :D

An interesting analogy... :)

> The Lucene Directory is doing well in read-most situations, and a nice
> to have for huge sized indexes to easily sync them in dynamic-topology
> clusters.
> But when the many participant nodes all potentially apply changes, a
> global index lock needs to be acquired; this is safely achievable by
> using the LockFactory implementations provided in the lucene-directory
> package but of course doesn't scale.

This is because each time an update is made, all parts of the index need to be updated?  Can this not be done in a finer-grained manner where nodes only lock and update the chunks they need to update?  Caveat: I have no idea of Lucene index formats, so this may make no sense whatsoever.  :)

> In practice, it's worse than a performance problem: Lucene expects
> full ownership of all resources, and so the IndexWriter implementation
> doesn't expect the Lock to timeout, and has no notion of fairness in
> the wait process, so if your architecture does something like "each
> node can ask for the lock and apply changes" without some external
> coordination, this only works in case the contention on this lock is
> low enough; if it piles up, it will blow up the application with
> exceptions on indexing.

Not nice.  Are there plans to support retries in Lucene in future?

> Best solution I've seen so far is what Emmanuel implemented years ago
> for Hibernate Search by using JMS to send changes to a single master
> node. Currently I think the state-of-the-art installation should
> combine such a queue-like solution to delegate all changes to a single
> node, and this single node should apply the changes to an Infinispan
> Directory - so making changes to all other nodes visible through
> efficient Infinispan distribution/replication.
> Replication was done in the past by using an rsync-like file copy, so
> the new benefit would be to ease the setup of Directory replication,
> but you still need a dedicated master node and work on setting this
> up.

That would still not scale though - still a bottleneck on the single master updater node?  Probably better than your current impl, though, so I see the motivations here.  :)

> Now I would like to use Infinispan to replace the JMS approach,
> especially as in cloud environments it's useful that the different
> participants which make up the service are all equally configured:
> having a Master Writer node is fine as long as it's auto-elected.
> (not mandatory, just very nice to have)
> 
> The problems to solve:
> 
> A) Locking
> The current locking solution is implemented by atomically adding a
> marker value in the cache. I can't use transactions as the lock could
> span several transactions and must be visible.
> It's Lucene's responsibility to clear this lock properly, and I can
> trust it for that as far as the code can do. But what happens if the
> node dies? Other nodes taking over the writer role should be able to
> detect the situation and remove the lock.
> 
> Proposals:
> 
> - A1) We don't lock, but find a way to elect one and only one writer
> node. This would be very cool, but I have no idea about how to
> implement it.

Infinispan exposes JGroups coordinator information.  You can peg this task to the coordinator, and you can register for view change events to be notified of potential changes in the coordinator.

> - A2) We could store the node-address as value in the marker object,
> if the address isn't part of the members the lock is cleared. (can I
> trust the members view? the index will corrupt if changed by two
> nodes)

Not necessary, see above.

> B) Sending changes to writer
> As for Lucene's design the IndexWriter is threadsafe and is an heavy
> to build object, it should be reused as much as possible to insert
> many Documents at once. So when a node managed to acquire the Lock it
> should keep the IndexWriter open a relatively long time, and possibly
> receive changes from other nodes to be applied on the index.
> 
> Proposal:
> 
> - B1) Use JMS or JGroups directly (as Hibernate Search is currently
> capable to use both), again I face the problem of IndexWriter node
> election, and have the messages sent to the correct node.
>   In both cases I would like to receive enough information from
> Infinispan to now where to send messages from the queue.

I agree that using JGroups may be the best way to do this.  No more dependencies since you already have JGroups to run Infinispan.  One comment re: Bela's SEQUENCER proposal: make sure you set up a separate channel for this communication.  SEQUENCER in Infinispan's JGroups channel will add unnecessary overhead for your indexes.

> - B2) Exploit the ConcurrentMap nature of Infinispan: I don't need
> strict ordering of change-requests if we can make an assumption on the
> Lucene Documents: to have each document identifiable.
>   This is usually the case, and always is in Hibernate Search where
> each Document entry is identified by (typeOfEntity, PrimaryKey).
>   Assuming we find out how to start the IndexWriting process on a
> single node, we could have a Cache to store change requests on the
> index: if for each changed entity I was inserting a key made of
> (documentID,typeOfOperation) and value containing eventually the new
> Document to be written and some timestamp/counter. typeOfOperation
> could be "delete" or "add", which are the only supported operations by
> Lucene. I believe I could read the timestamp from the entry?
> The timestamp would be needed to recognize what to do in case of
> having both a delete and a add operation on same entity (add,delete ->
> noop; delete,add --> update).
> So the node which is running the Lucene IndexWriting task could
> periodically iterate on all entries of this cache and apply latest
> needed operations, using atomic removeIfUnchanged . It would be fine,
> even better, in case of overwritten entries as I would write only
> latest version of a Document in case it's being changed quicker than
> what we can write.
> The drawback of this approach is that it assumes that, while it will
> buffer temporary spikes of load, it needs to be on average faster to
> write out all changes, and that changes on different entities are
> applied in unpredictable order, still I'm liking this solution the
> most as it looks like it can best use Infinispan for maximum
> performance.

B2 is effectively using Infinispan for message passing.  I think that will hamper scalability even further - whart are your options?  REPL?  Then everyone will get these "tasks", even though (all - 1) nodes will ignore them,  DIST?  How can you guarantee that the coordinator node is always in the owner group?  Otherwise the coordinator will still need to fetch these objects from a remote source.

> 
> Thoughts? Problems?
> I'd also appreciate some example of a job that needs to be running on
> a single node only if you have one, and would love to avoid depending
> on more than Infinispan.
> 
> Cheers,
> Sanne
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org