[hibernate-dev] Re: [jbosscache-dev] JBoss Cache Lucene Directory

Manik Surtani manik at jboss.org
Tue May 26 10:37:24 EDT 2009


Sanne,

Agreed.  Could all involved please make sure we post to both hibernate- 
dev as well as infinispan-dev (rather than jbosscache-dev) when  
discussing anything to do with such integration work.  As there are  
parallel efforts which can be brought together.

Cheers
Manik

On 25 May 2009, at 10:53, Sanne Grinovero wrote:

> Hello,
> I'm forwarding this email to Emmanuel and Hibernate Search dev, as I
> believe we should join the discussion.
> Could we keep both dev-lists (jbosscache-dev at lists.jboss.org,
> hibernate-dev at lists.jboss.org ) on CC ?
>
> Sanne
>
> 2009/4/29 Manik Surtani <manik at jboss.org>:
>>
>> On 27 Apr 2009, at 05:18, Andrew Duckworth wrote:
>>
>>> Hello,
>>>
>>> I have been working on a Lucene Directory provider based on JBoss  
>>> Cache,
>>> my starting point was an implementation Manik had already written  
>>> which
>>> pretty much worked with a few minor tweaks. Our use case was to   
>>> cluster a
>>> Lucene index being used with Hibernate Search in our application,  
>>> with the
>>> requirements that searching needed to be fast, there was no shared  
>>> file
>>> system and it was important that the index was consistent across  
>>> the cluster
>>> in a relatively short time frame.
>>>
>>> Maniks code used a token node in the cache to implement the  
>>> distributed
>>> lock. During my testing I set up multiple cache copies with  
>>> multiple threads
>>> reading/writing to each cache copy. I was finding a lot of  
>>> transactions to
>>> acquire or release this lock were timing out, not understanding  
>>> JBC well  I
>>> modified the distributed lock to use JGroups  
>>> DistrubutedLockManager. This
>>> worked quite well, however the time taken to acquire/release the  
>>> lock (~100
>>> ms for both) dwarfed the time to process the index update, lowering
>>> throughput. Even using Hibernate Search with an async worker  
>>> thread, there
>>> was still a lot of contention for the single lock which seemed to  
>>> limit the
>>> scalability of the solution. I thinkl part of the problem was that  
>>> our use
>>> of HB Search generates a lot of small units of work (remove index  
>>> entry, add
>>> index entry) and each of these UOW acquire a new IndexWriter and  
>>> new write
>>> lock on the underlying Lucene Directory implementation.
>>>
>>>
>>> Out of curiosity, I created an alternative implementation based on  
>>> the
>>> Hibernate Search JMS clustering strategy. Inside JBoss Cache I  
>>> created a
>>> queue node and each slave node in the cluster creates a separate  
>>> queue
>>> underneath where indexing work is written:
>>>
>>>  /queue/slave1/[work0, work1, work2 ....]
>>>            /slave2
>>>            /slave3
>>>
>>> etc
>>>
>>> In each cluster member a background thread runs continuously when  
>>> it wakes
>>> up, it decides if it is the master node or not (currently checks  
>>> if it is
>>> the view coordinator, but I'm considering changing it to use a   
>>> longer lived
>>> distributed lock). If it is the master it merges the tasks from  
>>> each slave
>>> queue, and updates the JBCDirectory in one go, it can safely do  
>>> this with
>>> only local VM  locking. This approach means that in all the slave  
>>> nodes they
>>> can write to their queue without needing a global lock that any  
>>> other slave
>>> or the master would be using. On the master, it can perform  
>>> multiple updates
>>> in the context of a single Lucene index writer. With a cache loader
>>> configured, work that is written into the slave queue is  
>>> persistent, so it
>>> can survive the master node crashing with automatic fail over to a  
>>> new
>>> master meaning that eventually all updates should be applied to  
>>> the index.
>>> Each work element in the queue is time stamped to allow them to be  
>>> processed
>>> in order (requires!
>>>  time synchronisation across the cluster) by the master. For our  
>>> workload
>>> the master/slave pattern seems to improve the throughput of the  
>>> system.
>>>
>>>
>>> Currently I'm refining the code and I have a few JBoss Cache  
>>> questions
>>> which I hope you can help me with:
>>>
>>> 1) I have noticed that under high load I get LockTimeoutExceptions  
>>> writing
>>> to /queue/slave0 when the lock owner is a transaction working on
>>> /queue/slave1 , i.e. the same lock seems to be used for 2  
>>> unrelated nodes in
>>> the cache. I'm assuming this is a result of the lock striping  
>>> algorithm, if
>>> you could give me some insight into how this works that would be  
>>> very
>>> helpful. Bumping up the cache concurrency level from 500 to 2000  
>>> seemed to
>>> reduce this problem, however I'm not sure if it just reduces the  
>>> probability
>>> of a random event of if there is some level that will be  
>>> sufficient to
>>> eliminate the issue.
>>
>> It could well be the lock striping at work.  As of JBoss Cache  
>> 3.1.0 you can
>> disable lock striping and have one lock per node.  While this is  
>> expensive
>> in that if you have a lot of nodes, you end up with a lot of locks,  
>> if you
>> have a finite number of nodes this may help you a lot.
>>
>>> 2) Is there a reason to use separate nodes for each slave queue ?  
>>> Will it
>>> help with locking, or can each slave safely insert to the same  
>>> parent node
>>> in separate transactions without interfering or blocking each  
>>> other ? If I
>>> can reduce it to a single queue I thin that would be a more elegant
>>> solution. I am setting the lockParentForChildInsertRemove to false  
>>> for the
>>> queue nodes.
>>
>> It depends.  Are the work objects attributes in /queue/slaveN ?   
>> Remember
>> that the granularity for all locks is the node itself so if all  
>> slaves write
>> to a single node, they will all compete for the same lock.
>>
>>> 3) Similarly, is there any reason why the master should/shouldn't  
>>> take
>>> responsibility for removing work nodes that have been processed ?
>>
>> Not quite sure I understand your design - so this distributes the  
>> work
>> objects and each cluster member maintains indexes locally?  If so,  
>> you need
>> to know when all members have processed the work objects before  
>> removing
>> these.
>>
>>> Thanks in advance for help, I hope to make this solution general  
>>> purpose
>>> enough to be able to contribute back to Hibernate Search and JBC  
>>> teams.
>>
>> Thanks for offering to contribute.  :-)  One other thing that may  
>> be of
>> interest is that I just launched Infinispan [1] [2] - a new data grid
>> product.  You could implement a directory provider on Infinispan  
>> too - it is
>> a lot more efficient than JBC at many things, including  
>> concurrency.  Also,
>> Infinispan's lock granularity is per-key/value pair.  So a single
>> distributed cache would be all you need for work objects.  Also,  
>> another
>> thing that could help is the eager locking we have on the roadmap  
>> [3] which
>> may make a more traditional approach of locking + writing indexes  
>> to the
>> cache more feasible.  I'd encourage you to check it out.
>>
>> [1] http://www.infinispan.org
>> [2]
>> http://infinispan.blogspot.com/2009/04/infinispan-start-of-new-era-in-open.html
>> [3] https://jira.jboss.org/jira/browse/ISPN-48
>> --
>> Manik Surtani
>> manik at jboss.org
>> Lead, Infinispan
>> Lead, JBoss Cache
>> http://www.infinispan.org
>> http://www.jbosscache.org
>>
>>
>>
>>
>> _______________________________________________
>> jbosscache-dev mailing list
>> jbosscache-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/jbosscache-dev
>>

--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org







More information about the hibernate-dev mailing list