I'm commenting on this without being a Lucene Hibernate Search expert,
so bear with me...
Sanne Grinovero wrote:
The Lucene Directory is doing well in read-most situations, and a
nice
to have for huge sized indexes to easily sync them in dynamic-topology
clusters.
But when the many participant nodes all potentially apply changes, a
global index lock needs to be acquired; this is safely achievable by
using the LockFactory implementations provided in the lucene-directory
package but of course doesn't scale.
In practice, it's worse than a performance problem: Lucene expects
full ownership of all resources, and so the IndexWriter implementation
doesn't expect the Lock to timeout, and has no notion of fairness in
the wait process, so if your architecture does something like "each
node can ask for the lock and apply changes" without some external
coordination, this only works in case the contention on this lock is
low enough; if it piles up, it will blow up the application with
exceptions on indexing.
Best solution I've seen so far is what Emmanuel implemented years ago
for Hibernate Search by using JMS to send changes to a single master
node. Currently I think the state-of-the-art installation should
combine such a queue-like solution to delegate all changes to a single
node, and this single node should apply the changes to an Infinispan
Directory - so making changes to all other nodes visible through
efficient Infinispan distribution/replication.
By sending the changes to a single master, who then applies them, you
essentially pick a total order concept: all changes are applies in the
same order across the cluster. Total order eliminates the need to
acquire a lock, because the changes from different nodes will always be
seen in the same order, therefore you don't need to lock.
This could also be done by using the SEQUENCER protocol in JGroups: say
A, B and C need to update an index. If B wanted to make an update, it
would have to acquire the cluster wide lock, update the index, replicate
across the cluster, and the release the lock. The lock is needed to make
sure that B's changes don't overlap with C's or A's changes.
However, if we guaranteed that everybody would receive B's modification,
followed by A's modification, followed by C's modification, we would not
need a lock at all !
In other words, if we have to apply a change across a cluster, with 1
message, total order can replace a two-phase lock-unlock protocol.
Now I would like to use Infinispan to replace the JMS approach,
especially as in cloud environments it's useful that the different
participants which make up the service are all equally configured:
having a Master Writer node is fine as long as it's auto-elected.
(not mandatory, just very nice to have)
The problems to solve:
A) Locking
The current locking solution is implemented by atomically adding a
marker value in the cache. I can't use transactions as the lock could
span several transactions and must be visible.
The issue with this is that you (ab)use the cache as a messaging /
signalling system ! This is not recommended. A better approach would be
to use a separate channel for communication, or to even reuse the
channel Infinispan uses. The latter is something I've been discussing
with Brian.
It's Lucene's responsibility to clear this lock properly, and
I can
trust it for that as far as the code can do. But what happens if the
node dies? Other nodes taking over the writer role should be able to
detect the situation and remove the lock.
You get a new view and release all locks held by nodes which are not
part of the new view. This is how it's implemented in JBossCache and
Infinispan today.
Proposals:
- A1) We don't lock, but find a way to elect one and only one writer
node. This would be very cool, but I have no idea about how to
implement it.
You could use the cluster view, which is the same across all nodes, and
pick the first element in the list. Or you could run an agreement
protocol, which deterministically elects a master.
- A2) We could store the node-address as value in the marker object,
if the address isn't part of the members the lock is cleared. (can I
trust the members view? the index will corrupt if changed by two
nodes)
Not so good, don't abuse the cache as a communication bus
Proposal:
- B1) Use JMS or JGroups directly (as Hibernate Search is currently
capable to use both), again I face the problem of IndexWriter node
election, and have the messages sent to the correct node.
In both cases I would like to receive enough information from
Infinispan to know where to send messages from the queue.
I'd use JGroups (of course I'm biased :-)), but JMS really has no means
of cluster views to elect a master. Plus, you could use SEQUENCER as a
means to establish a global order between messages, so maybe the need to
acquire a lock could be dropped altogether.
--
Bela Ban
Lead JGroups / Clustering Team
JBoss