[jbosscache-dev] JBoss Cache Lucene Directory

Wed Apr 29 18:43:54 EDT 2009

Mark,

> It depends.  Are the work objects attributes in /queue/slaveN ?
> Remember that the granularity for all locks is the node itself so if
> all slaves write to a single node, they will all compete for the same
> lock.

Each work object is a new child node of the /queue/slaveN node. So based on that, I think from what you have said all the slave queues can be merged without altering the locking behaviour.

> It could well be the lock striping at work.  As of JBoss Cache 3.1.0
> you can disable lock striping and have one lock per node.  While this
> is expensive in that if you have a lot of nodes, you end up with a lot
> of locks, if you have a finite number of nodes this may help you a lot.

Great, I'll definitely check out this feature.

> Not quite sure I understand your design - so this distributes the work
> objects and each cluster member maintains indexes locally?  If so, you
> need to know when all members have processed the work objects before
> removing these.

The master node processes all work objects written by the slave and then updates the JBCDirectory held in the cache to distribute the index back to all the slave nodes. I think this works well for Hibernate Search for a few reasons:

- On each slave, HB Search will continue to use the shared index reader until the master publishes the next version of the index, so fewer index updates translate into faster searching at the expense of returning slightly out of date data. There is an obvious trade off here of search performance vs accuracy of the index and the current JMS HB Search solution based on file copying works best when the index can be quite out of date without impacting the application. For our application we'd prefer to have the index up to date within a few seconds of the entity being modified and it's not a requirement that the index be updated as part of the transaction.

- The index updates are batched which means each individual update takes less time due to some of the mechanisms at work inside HB Search and Lucene

- No distributed locking, so slaves are never blocked which provides some limited insurance against one node impacting every other node

> Thanks for offering to contribute.  :-)  One other thing that may be
> of interest is that I just launched Infinispan [1] [2] - a new data
> grid product.  You could implement a directory provider on Infinispan

Great, I'll definitely take a look.

One more question, does the JDBCCacheLoader integrate with the cache transaction, i.e. does it create a transaction with equivalent DB updates to the updates being applied to the cache ? For the current JBCDirectory structure, it is important that the parent node is updated in the same transaction as the node holding the file chunk to avoid the index being left in a corrupt state. Does it matter if the  loader is async vs sync ?

Cheers,

Andrew
________________________________________
From: Manik Surtani [manik at jboss.org]
Sent: Thursday, April 30, 2009 3:54 AM
To: Andrew Duckworth
Cc: jbosscache-dev at lists.jboss.org
Subject: Re: [jbosscache-dev] JBoss Cache Lucene Directory

On 27 Apr 2009, at 05:18, Andrew Duckworth wrote:

> Hello,
>
> I have been working on a Lucene Directory provider based on JBoss
> Cache, my starting point was an implementation Manik had already
> written which pretty much worked with a few minor tweaks. Our use
> case was to  cluster a Lucene index being used with Hibernate Search
> in our application, with the requirements that searching needed to
> be fast, there was no shared file system and it was important that
> the index was consistent across the cluster in a relatively short
> time frame.
>
> Maniks code used a token node in the cache to implement the
> distributed lock. During my testing I set up multiple cache copies
> with multiple threads reading/writing to each cache copy. I was
> finding a lot of transactions to acquire or release this lock were
> timing out, not understanding JBC well  I modified the distributed
> lock to use JGroups DistrubutedLockManager. This worked quite well,
> however the time taken to acquire/release the lock (~100 ms for
> both) dwarfed the time to process the index update, lowering
> throughput. Even using Hibernate Search with an async worker thread,
> there was still a lot of contention for the single lock which seemed
> to limit the scalability of the solution. I thinkl part of the
> problem was that our use of HB Search generates a lot of small units
> of work (remove index entry, add index entry) and each of these UOW
> acquire a new IndexWriter and new write lock on the underlying
> Lucene Directory implementation.
>
>
> Out of curiosity, I created an alternative implementation based on
> the Hibernate Search JMS clustering strategy. Inside JBoss Cache I
> created a queue node and each slave node in the cluster creates a
> separate queue underneath where indexing work is written:
>
>  /queue/slave1/[work0, work1, work2 ....]
>             /slave2
>             /slave3
>
> etc
>
> In each cluster member a background thread runs continuously when it
> wakes up, it decides if it is the master node or not (currently
> checks if it is the view coordinator, but I'm considering changing
> it to use a  longer lived distributed lock). If it is the master it
> merges the tasks from each slave queue, and updates the JBCDirectory
> in one go, it can safely do this with only local VM  locking. This
> approach means that in all the slave nodes they can write to their
> queue without needing a global lock that any other slave or the
> master would be using. On the master, it can perform multiple
> updates in the context of a single Lucene index writer. With a cache
> loader configured, work that is written into the slave queue is
> persistent, so it can survive the master node crashing with
> automatic fail over to a new master meaning that eventually all
> updates should be applied to the index. Each work element in the
> queue is time stamped to allow them to be processed in order
> (requires!
>  time synchronisation across the cluster) by the master. For our
> workload the master/slave pattern seems to improve the throughput of
> the system.
>
>
> Currently I'm refining the code and I have a few JBoss Cache
> questions which I hope you can help me with:
>
> 1) I have noticed that under high load I get LockTimeoutExceptions
> writing to /queue/slave0 when the lock owner is a transaction
> working on /queue/slave1 , i.e. the same lock seems to be used for 2
> unrelated nodes in the cache. I'm assuming this is a result of the
> lock striping algorithm, if you could give me some insight into how
> this works that would be very helpful. Bumping up the cache
> concurrency level from 500 to 2000 seemed to reduce this problem,
> however I'm not sure if it just reduces the probability of a random
> event of if there is some level that will be sufficient to eliminate
> the issue.

It could well be the lock striping at work.  As of JBoss Cache 3.1.0
you can disable lock striping and have one lock per node.  While this
is expensive in that if you have a lot of nodes, you end up with a lot
of locks, if you have a finite number of nodes this may help you a lot.

> 2) Is there a reason to use separate nodes for each slave queue ?
> Will it help with locking, or can each slave safely insert to the
> same parent node in separate transactions without interfering or
> blocking each other ? If I can reduce it to a single queue I thin
> that would be a more elegant solution. I am setting the
> lockParentForChildInsertRemove to false for the queue nodes.

It depends.  Are the work objects attributes in /queue/slaveN ?
Remember that the granularity for all locks is the node itself so if
all slaves write to a single node, they will all compete for the same
lock.

> 3) Similarly, is there any reason why the master should/shouldn't
> take responsibility for removing work nodes that have been processed ?

Not quite sure I understand your design - so this distributes the work
objects and each cluster member maintains indexes locally?  If so, you
need to know when all members have processed the work objects before
removing these.

> Thanks in advance for help, I hope to make this solution general
> purpose enough to be able to contribute back to Hibernate Search and
> JBC teams.

Thanks for offering to contribute.  :-)  One other thing that may be
of interest is that I just launched Infinispan [1] [2] - a new data
grid product.  You could implement a directory provider on Infinispan
too - it is a lot more efficient than JBC at many things, including
concurrency.  Also, Infinispan's lock granularity is per-key/value
pair.  So a single distributed cache would be all you need for work
objects.  Also, another thing that could help is the eager locking we
have on the roadmap [3] which may make a more traditional approach of
locking + writing indexes to the cache more feasible.  I'd encourage
you to check it out.

[1] http://www.infinispan.org
[2] http://infinispan.blogspot.com/2009/04/infinispan-start-of-new-era-in-open.html
[3] https://jira.jboss.org/jira/browse/ISPN-48
--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org