[infinispan-dev] asynchronous change manager for high load Lucene

Mon Mar 29 08:20:36 EDT 2010

See inline

> Sanne Grinovero wrote:
>> The Lucene Directory is doing well in read-most situations, and a nice
>> to have for huge sized indexes to easily sync them in dynamic-topology
>> clusters.
>> But when the many participant nodes all potentially apply changes, a
>> global index lock needs to be acquired; this is safely achievable by
>> using the LockFactory implementations provided in the lucene-directory
>> package but of course doesn't scale.
>> 
>> In practice, it's worse than a performance problem: Lucene expects
>> full ownership of all resources, and so the IndexWriter implementation
>> doesn't expect the Lock to timeout, and has no notion of fairness in
>> the wait process, so if your architecture does something like "each
>> node can ask for the lock and apply changes" without some external
>> coordination, this only works in case the contention on this lock is
>> low enough; if it piles up, it will blow up the application with
>> exceptions on indexing.
>> 
>> Best solution I've seen so far is what Emmanuel implemented years ago
>> for Hibernate Search by using JMS to send changes to a single master
>> node. Currently I think the state-of-the-art installation should
>> combine such a queue-like solution to delegate all changes to a single
>> node, and this single node should apply the changes to an Infinispan
>> Directory - so making changes to all other nodes visible through
>> efficient Infinispan distribution/replication.
> 
> By sending the changes to a single master, who then applies them, you 
> essentially pick a total order concept: all changes are applies in the 
> same order across the cluster. Total order eliminates the need to 
> acquire a lock, because the changes from different nodes will always be 
> seen in the same order, therefore you don't need to lock.
> 
> This could also be done by using the SEQUENCER protocol in JGroups: say 
> A, B and C need to update an index. If B wanted to make an update, it 
> would have to acquire the cluster wide lock, update the index, replicate 
> across the cluster, and the release the lock. The lock is needed to make 
> sure that B's changes don't overlap with C's or A's changes.
> 
> However, if we guaranteed that everybody would receive B's modification, 
> followed by A's modification, followed by C's modification, we would not 
> need a lock at all !
> 
> In other words, if we have to apply a change across a cluster, with 1 
> message, total order can replace a two-phase lock-unlock protocol.

In Lucene's term that would mean that an index copy is present on all nodes (not store in Infinispan necessarily)  and all changes are passed and executed on all nodes (being passed in total ordering via JGroups). 

Intuitively, I don't think it's the right approach:
 - that would mean CPU and I/O consumption on all nodes: indexing / analyzing is a heavy work potentially
 - I am not sure Lucene is 100% deterministic in its way of indexing

> 
>> Now I would like to use Infinispan to replace the JMS approach,
>> especially as in cloud environments it's useful that the different
>> participants which make up the service are all equally configured:
>> having a Master Writer node is fine as long as it's auto-elected.
>> (not mandatory, just very nice to have)
>> 
>> The problems to solve:
>> 
>> A) Locking
>> The current locking solution is implemented by atomically adding a
>> marker value in the cache. I can't use transactions as the lock could
>> span several transactions and must be visible.
> 
> The issue with this is that you (ab)use the cache as a messaging / 
> signalling system ! This is not recommended. A better approach would be 
> to use a separate channel for communication, or to even reuse the 
> channel Infinispan uses. The latter is something I've been discussing 
> with Brian.
> 
>> It's Lucene's responsibility to clear this lock properly, and I can
>> trust it for that as far as the code can do. But what happens if the
>> node dies? Other nodes taking over the writer role should be able to
>> detect the situation and remove the lock.
> 
> You get a new view and release all locks held by nodes which are not 
> part of the new view. This is how it's implemented in JBossCache and 
> Infinispan today.
> 
>> Proposals:
>> 
>> - A1) We don't lock, but find a way to elect one and only one writer
>> node. This would be very cool, but I have no idea about how to 
>> implement it.
> 
> You could use the cluster view, which is the same across all nodes, and 
> pick the first element in the list. Or you could run an agreement 
> protocol, which deterministically elects a master.

Looks simple, deterministic and elegant.

> 
>> - A2) We could store the node-address as value in the marker object,
>> if the address isn't part of the members the lock is cleared. (can I
>> trust the members view? the index will corrupt if changed by two
>> nodes)
> 
> Not so good, don't abuse the cache as a communication bus
> 
>> Proposal:
>> 
>> - B1) Use JMS or JGroups directly (as Hibernate Search is currently
>> capable to use both), again I face the problem of IndexWriter node
>> election, and have the messages sent to the correct node.
>> In both cases I would like to receive enough information from
>> Infinispan to know where to send messages from the queue.
> 
> I'd use JGroups (of course I'm biased :-)), but JMS really has no means 
> of cluster views to elect a master. Plus, you could use SEQUENCER as a 
> means to establish a global order between messages, so maybe the need to 
> acquire a lock could be dropped altogether.

So basically, following Bela's advice, you would:
- use the cluster view to elect the master
- send changes to the master using JGroups

We still need one guaranty: make sure queued changes not applied on the master are going to be processed by the new master (that's why Sanne was thining of using Infinispan as a messaging system I guess).