[hibernate-dev] Re: [jbosscache-dev] JBoss Cache Lucene Directory

Mon May 25 05:53:35 EDT 2009

Hello,
I'm forwarding this email to Emmanuel and Hibernate Search dev, as I
believe we should join the discussion.
Could we keep both dev-lists (jbosscache-dev at lists.jboss.org,
hibernate-dev at lists.jboss.org ) on CC ?

Sanne

2009/4/29 Manik Surtani <manik at jboss.org>:
>
> On 27 Apr 2009, at 05:18, Andrew Duckworth wrote:
>
>> Hello,
>>
>> I have been working on a Lucene Directory provider based on JBoss Cache,
>> my starting point was an implementation Manik had already written which
>> pretty much worked with a few minor tweaks. Our use case was to  cluster a
>> Lucene index being used with Hibernate Search in our application, with the
>> requirements that searching needed to be fast, there was no shared file
>> system and it was important that the index was consistent across the cluster
>> in a relatively short time frame.
>>
>> Maniks code used a token node in the cache to implement the distributed
>> lock. During my testing I set up multiple cache copies with multiple threads
>> reading/writing to each cache copy. I was finding a lot of transactions to
>> acquire or release this lock were timing out, not understanding JBC well  I
>> modified the distributed lock to use JGroups DistrubutedLockManager. This
>> worked quite well, however the time taken to acquire/release the lock (~100
>> ms for both) dwarfed the time to process the index update, lowering
>> throughput. Even using Hibernate Search with an async worker thread, there
>> was still a lot of contention for the single lock which seemed to limit the
>> scalability of the solution. I thinkl part of the problem was that our use
>> of HB Search generates a lot of small units of work (remove index entry, add
>> index entry) and each of these UOW acquire a new IndexWriter and new write
>> lock on the underlying Lucene Directory implementation.
>>
>>
>> Out of curiosity, I created an alternative implementation based on the
>> Hibernate Search JMS clustering strategy. Inside JBoss Cache I created a
>> queue node and each slave node in the cluster creates a separate queue
>> underneath where indexing work is written:
>>
>>  /queue/slave1/[work0, work1, work2 ....]
>>            /slave2
>>            /slave3
>>
>> etc
>>
>> In each cluster member a background thread runs continuously when it wakes
>> up, it decides if it is the master node or not (currently checks if it is
>> the view coordinator, but I'm considering changing it to use a  longer lived
>> distributed lock). If it is the master it merges the tasks from each slave
>> queue, and updates the JBCDirectory in one go, it can safely do this with
>> only local VM  locking. This approach means that in all the slave nodes they
>> can write to their queue without needing a global lock that any other slave
>> or the master would be using. On the master, it can perform multiple updates
>> in the context of a single Lucene index writer. With a cache loader
>> configured, work that is written into the slave queue is persistent, so it
>> can survive the master node crashing with automatic fail over to a new
>> master meaning that eventually all updates should be applied to the index.
>> Each work element in the queue is time stamped to allow them to be processed
>> in order (requires!
>>  time synchronisation across the cluster) by the master. For our workload
>> the master/slave pattern seems to improve the throughput of the system.
>>
>>
>> Currently I'm refining the code and I have a few JBoss Cache questions
>> which I hope you can help me with:
>>
>> 1) I have noticed that under high load I get LockTimeoutExceptions writing
>> to /queue/slave0 when the lock owner is a transaction working on
>> /queue/slave1 , i.e. the same lock seems to be used for 2 unrelated nodes in
>> the cache. I'm assuming this is a result of the lock striping algorithm, if
>> you could give me some insight into how this works that would be very
>> helpful. Bumping up the cache concurrency level from 500 to 2000 seemed to
>> reduce this problem, however I'm not sure if it just reduces the probability
>> of a random event of if there is some level that will be sufficient to
>> eliminate the issue.
>
> It could well be the lock striping at work.  As of JBoss Cache 3.1.0 you can
> disable lock striping and have one lock per node.  While this is expensive
> in that if you have a lot of nodes, you end up with a lot of locks, if you
> have a finite number of nodes this may help you a lot.
>
>> 2) Is there a reason to use separate nodes for each slave queue ? Will it
>> help with locking, or can each slave safely insert to the same parent node
>> in separate transactions without interfering or blocking each other ? If I
>> can reduce it to a single queue I thin that would be a more elegant
>> solution. I am setting the lockParentForChildInsertRemove to false for the
>> queue nodes.
>
> It depends.  Are the work objects attributes in /queue/slaveN ?  Remember
> that the granularity for all locks is the node itself so if all slaves write
> to a single node, they will all compete for the same lock.
>
>> 3) Similarly, is there any reason why the master should/shouldn't take
>> responsibility for removing work nodes that have been processed ?
>
> Not quite sure I understand your design - so this distributes the work
> objects and each cluster member maintains indexes locally?  If so, you need
> to know when all members have processed the work objects before removing
> these.
>
>> Thanks in advance for help, I hope to make this solution general purpose
>> enough to be able to contribute back to Hibernate Search and JBC teams.
>
> Thanks for offering to contribute.  :-)  One other thing that may be of
> interest is that I just launched Infinispan [1] [2] - a new data grid
> product.  You could implement a directory provider on Infinispan too - it is
> a lot more efficient than JBC at many things, including concurrency.  Also,
> Infinispan's lock granularity is per-key/value pair.  So a single
> distributed cache would be all you need for work objects.  Also, another
> thing that could help is the eager locking we have on the roadmap [3] which
> may make a more traditional approach of locking + writing indexes to the
> cache more feasible.  I'd encourage you to check it out.
>
> [1] http://www.infinispan.org
> [2]
> http://infinispan.blogspot.com/2009/04/infinispan-start-of-new-era-in-open.html
> [3] https://jira.jboss.org/jira/browse/ISPN-48
> --
> Manik Surtani
> manik at jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
>
>
>
>
> _______________________________________________
> jbosscache-dev mailing list
> jbosscache-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/jbosscache-dev
>