[infinispan-dev] Feedback on Infinispan patch

Manik Surtani manik at jboss.org
Thu Sep 24 12:41:43 EDT 2009


On 24 Sep 2009, at 17:36, Jeff Ramsdale wrote:

> Another alternative would be to see if the Compass Project would be
> interested in hosting it: http://www.compass-project.org/

Good idea.  Anybody have traction with the compass folk to propose this?

> Even if Infinispan ends up hosting it there might be value in doing
> some cross-pollination with the Compass folks since this aligns
> directly with what they are working on.
>
> -jeff
>
> 2009/9/24 Manik Surtani <manik at jboss.org>:
>> Minorly off topic, but rather than working with patches, do we want  
>> this
>> Directory impl in source control somewhere?
>> Being dependent on LGPL, it won't be accepted into Lucene's  
>> contribs.  If it
>> doesn't depend on any Hibernate Search code, I could host it in  
>> Infinispan's
>> SVN repo...
>>
>> On 23 Sep 2009, at 13:58, Łukasz Moreń wrote:
>>
>> I agree that Infinispan case is not much different from  
>> RamDirectory. The
>> major difference is that in RD (also FileDirectory) changes are not  
>> batched
>> like in ID. If I do not wrap changes in InfinispanDirectory(simple  
>> remove
>> tx.begin() from obtain() method and tx.commit() from release() in
>> InfinispanLock), and immediately commit every change made by IW it  
>> works
>> well. Hovewer it makes indexing really slower, because of frequent
>> replication to other nodes.
>> Sanne it's good remark that IW commit is kind of flush.
>>
>> I've attached patch with InfinispanDirectory, failing test is
>> testDirectoryWithMultipleThreads in InfinispanDirectoryTest class.  
>> It fails
>> randomly. I think problem is Infinispan commit on lockRelease() in
>> org.apache.lucene.index.IndexWriter (line 1658) is after IW commit 
>> () (line
>> 1654).
>>
>>> Is it because, the IndexWriter only clean files if no indexReaders  
>>> are
>>> reading them (how would that be detected)?
>>
>> It can happen if IndexWriter clean file, and IndexReader try to  
>> access that
>> cleaned file.
>>
>> 2009/9/23 Sanne Grinovero <sanne.grinovero at gmail.com>
>>>
>>> I agree It should work the same way; The IndexWriter cleans files
>>> whenever it likes to, it doesn't try to detect readers, and this
>>> shouldn't have any effect on the working of readers.
>>> The IndexReader opens the "SegmentsInfo" first, and immediately
>>> after** gets a reference to the segments listed in this  
>>> SegmentsInfo.
>>> No IndexWriter will ever change an existing segment, only add new
>>> files or eventually delete old ones (segments merge,optimize).
>>> The deletion of segments is the interesting subject: when using  
>>> Files
>>> it uses "delete at last close", which works because the IR needing  
>>> it
>>> have it opened already**; when using the RAMDirectory they have a
>>> reference preventing garbage collection.
>>>
>>> ( the two "**" are assuming the same event occurred correctly,
>>> otherwise an exception is thrown at opening)
>>>
>>> When using Infinispan it shouldn't be much different than the
>>> RAMDirectory? so even if the needed segment is deleted, the IR  
>>> holds a
>>> reference to the Java object locally since it was opened.
>>>
>>>  Łukcasz, do you have some failing test?
>>>
>>> Sanne
>>>
>>> 2009/9/23 Emmanuel Bernard <emmanuel at hibernate.org>:
>>>> Conceptually I don't understand why it does work in a pure file  
>>>> system
>>>> directory (ie IndexReader can go and process queries with the
>>>> IndexWriter
>>>> goes about its business) and not when using Infinispan.
>>>> Is it because, the IndexWriter only clean files if no  
>>>> indexReaders are
>>>> reading them (how would that be detected)?
>>>> On 22 sept. 09, at 20:46, Łukasz Moreń wrote:
>>>>
>>>> I need to provide this same lifecycle for IndexWriter as for  
>>>> Infinispan
>>>> tx -
>>>> IW is created: tx is started, IW is commited: tx is commited. It  
>>>> assures
>>>> that IndexReader doesn't read old data from directory.
>>>> Infinispan transaction can be started when IW acquires the lock,  
>>>> but its
>>>> commit on IW lock release, as it is done so far, causes a problem:
>>>>
>>>> index writer close {
>>>>   index writer commit(); //changes are visible for IndexReaders
>>>>
>>>>        //Index reader starts reading here, i.e. tries to access  
>>>> file "A"
>>>>
>>>>   index writer lockRelease(); //changes in Infinispan directory are
>>>> commited, file "A" was removed, IndexReader cannot find it and  
>>>> crashes
>>>> }
>>>>
>>>> I think Infinispan tx have to be commited just before IW commit,  
>>>> and the
>>>> problem is where to put in code.
>>>>
>>>> W dniu 22 września 2009 18:24 użytkownik Emmanuel Bernard
>>>> <emmanuel at hibernate.org> napisał:
>>>>>
>>>>> Can you explain in more details what is going on.
>>>>> Aside from that Workspace has been Sanne's baby lately so he  
>>>>> will be
>>>>> the
>>>>> best to see what design will work in HSearch. That being said, I  
>>>>> don't
>>>>> like
>>>>> the idea of subclassing / overriding very much. In my  
>>>>> experience, it
>>>>> has
>>>>> lead to more bad and unmaintainable code than anything else.
>>>>> On 22 sept. 09, at 02:16, Łukasz Moreń wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Thanks for explanation.
>>>>> Maybe better I will concentrate on the first release and postpone
>>>>> distributed writing.
>>>>>
>>>>> There is already LockStrategy that uses Infinispan. With using  
>>>>> it I was
>>>>> wrapping changes made by IndexWriter in Infinispan transaction,  
>>>>> because
>>>>> of
>>>>> performance reasons -
>>>>> on lock obtaining transaction was started, on lock release  
>>>>> transaction
>>>>> was
>>>>> commited. Hovewer Ispn transaction commit on lock release is not  
>>>>> good
>>>>> idea
>>>>> since IndexWriter calls index commit before lock is released(and  
>>>>> ispn
>>>>> transaction is committed).
>>>>> I was thinking to override Workspace class and getIndexWriter 
>>>>> (start
>>>>> infinispan tx), commitIndexWriter (commit tx) methods to wrap
>>>>> IndexWrite
>>>>> lifecycle, but this needs few other changes. Some other ideas?
>>>>>
>>>>> Cheers,
>>>>> Lukasz
>>>>>
>>>>> 2009/9/21 Sanne Grinovero <sanne.grinovero at gmail.com>
>>>>>>
>>>>>> Hi Łukasz,
>>>>>> you've rightful concerns, because the way the IndexWriter tries  
>>>>>> to
>>>>>> achieve the lock
>>>>>> that will bring some trouble; As far as I remember we decided  
>>>>>> in this
>>>>>> first release
>>>>>> to avoid multiple writer nodes because of this reasons
>>>>>> (that's written in your docs?)
>>>>>>
>>>>>> Actually it shouldn't be very hard to do, as the LockStrategy is
>>>>>> pluggable (see changes from HSEARCH-345)
>>>>>> and you could implement one delegating to an Infinispan eager  
>>>>>> lock on
>>>>>> some key,
>>>>>> like the default LockStrategy takes a file lock in the index
>>>>>> directory.
>>>>>>
>>>>>> Maybe it's simpler to support this distributed writing instead of
>>>>>> sending the queue to some single
>>>>>> (elected) node? Would be cool, as the Document Analysis effort  
>>>>>> would
>>>>>> be distributed,
>>>>>> but I have no idea if this would be more or less efficient than a
>>>>>> single node writing; it could
>>>>>> bring some huge data transfers along the wire during segments  
>>>>>> merging
>>>>>> (basically fetching
>>>>>> the whole index data at each node performing a segment merge);  
>>>>>> maybe
>>>>>> you'll need to
>>>>>> play with IndexWriter settings (
>>>>>>
>>>>>>
>>>>>> http://docs.jboss.org/hibernate/stable/search/reference/en/html_single/#lucene-indexing-performance
>>>>>> )
>>>>>> probably need to find the sweet spot for "merge_factor".
>>>>>> I just saw now that MergePolicy is now re-implementable, but I  
>>>>>> hope
>>>>>> that won't be needed.
>>>>>>
>>>>>> Sanne
>>>>>>
>>>>>> 2009/9/21 Łukasz Moreń <lukasz.moren at gmail.com>:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm wondering if it is reasonable to have multiple threads/nodes
>>>>>>> that
>>>>>>> modifies indexes in Lucene Directory based on Infinispan? Let's
>>>>>>> assume
>>>>>>> that
>>>>>>> two nodes try to update index in this same time. First one  
>>>>>>> creates
>>>>>>> IndexWriter and obtains
>>>>>>> write lock. There is high propability that second node throws
>>>>>>> LockObtainFailedException (as one IndexWriter is allowed on  
>>>>>>> single
>>>>>>> index)
>>>>>>> and index is not modified. How is that? Should be always only  
>>>>>>> one
>>>>>>> node
>>>>>>> that
>>>>>>> makes changes in
>>>>>>> the index?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Lukasz
>>>>>>>
>>>>>>> W dniu 15 września 2009 01:39 użytkownik Łukasz Moreń
>>>>>>> <lukasz.moren at gmail.com> napisał:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> With using JMeter I wanted to check if Infinispan dir does not
>>>>>>>> crash
>>>>>>>> under
>>>>>>>> heavy load in "real" use and check performance in comparison  
>>>>>>>> with
>>>>>>>> none/other
>>>>>>>> directories.
>>>>>>>> However appeared problem when multiple IndexWriters tries to  
>>>>>>>> modify
>>>>>>>> index
>>>>>>>> (test InfinispanDirectoryTest) - random deadlocks, and Lucene
>>>>>>>> exceptions.
>>>>>>>> IndexWriter tries to access files in index that were removed
>>>>>>>> before.
>>>>>>>> I'm
>>>>>>>> looking into it, but not having good idea.
>>>>>>>>
>>>>>>>> Concerning the last part, I think similar thing is done in
>>>>>>>> InfinispanDirectoryProviderTest. Many threads are making  
>>>>>>>> changes
>>>>>>>> and
>>>>>>>> searching (not checking if db is in sync with index).
>>>>>>>> If threads finish their work, with Lucene query I'm checking if
>>>>>>>> index
>>>>>>>> contains as many results as expected. Maybe you meant something
>>>>>>>> else?
>>>>>>>> Would be good to run each node in different VM.
>>>>>>>>
>>>>>>>>> Great ! Looking forward to it. What state are things in at the
>>>>>>>>> moment
>>>>>>>>> if I want to play around with it ?
>>>>>>>>
>>>>>>>> Should work with with one master(updates index) and one many  
>>>>>>>> slave
>>>>>>>> nodes
>>>>>>>> (sends changes to master). I tried with one master and one  
>>>>>>>> slave
>>>>>>>> (both
>>>>>>>> with
>>>>>>>> jms and jgroups backend) and worked ok. Still fails if multiple
>>>>>>>> nodes
>>>>>>>> want
>>>>>>>> to modify index.
>>>>>>>>
>>>>>>>> I've attached patch with current version.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Łukasz
>>>>>>>>
>>>>>>>> 2009/9/13 Michael Neale <michael.neale at gmail.com>
>>>>>>>>>
>>>>>>>>> Great ! Looking forward to it. What state are things in at the
>>>>>>>>> moment
>>>>>>>>> if I want to play around with it ?
>>>>>>>>>
>>>>>>>>> Sent from my phone.
>>>>>>>>>
>>>>>>>>> On 13/09/2009, at 7:26 PM, Sanne Grinovero
>>>>>>>>> <sanne.grinovero at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> 2009/9/12 Michael Neale <michael.neale at gmail.com>:
>>>>>>>>>>> That does sounds pretty cool. Would be nice if the lucene
>>>>>>>>>>> indexes
>>>>>>>>>>> could scale along with how people will want to use  
>>>>>>>>>>> infinispan.
>>>>>>>>>>> Probably worth playing with.
>>>>>>>>>>
>>>>>>>>>> Sure, this is the goal of Łukasz's work; We know compass has
>>>>>>>>>> some good Directories, but we're building our own as one  
>>>>>>>>>> based
>>>>>>>>>> on Infinispan is not yet available.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Sent from my phone.
>>>>>>>>>>>
>>>>>>>>>>> On 13/09/2009, at 8:37 AM, Jeff Ramsdale
>>>>>>>>>>> <jeff.ramsdale at gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I'm afraid I haven't followed the Infinispan-Lucene
>>>>>>>>>>>> implementation
>>>>>>>>>>>> closely, but have you looked at the Compass Project?
>>>>>>>>>>>> (http://www.compass-project.org/overview.html) It  
>>>>>>>>>>>> provides a
>>>>>>>>>>>> simplified interface to Lucene (optional) as well as  
>>>>>>>>>>>> Directory
>>>>>>>>>>>> implementations built on Terracotta, Gigaspaces and  
>>>>>>>>>>>> Coherence.
>>>>>>>>>>>> The
>>>>>>>>>>>> latter, in particular, might be a useful guide for the
>>>>>>>>>>>> Infinispan
>>>>>>>>>>>> implementation. I believe it's mature enough to have solved
>>>>>>>>>>>> many
>>>>>>>>>>>> of
>>>>>>>>>>>> the most difficult problems of implementing Directory on a
>>>>>>>>>>>> distributed
>>>>>>>>>>>> Map.
>>>>>>>>>>>>
>>>>>>>>>>>> If someone has any experience with Compass (particularly  
>>>>>>>>>>>> it's
>>>>>>>>>>>> Directory implementations) I'd be interested in hearing  
>>>>>>>>>>>> about
>>>>>>>>>>>> it...
>>>>>>>>>>>> It's Apache 2.0 licensed, btw.
>>>>>>>>>>>>
>>>>>>>>>>>> -jeff
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> infinispan-dev mailing list
>>>>>>>>>>>> infinispan-dev at lists.jboss.org
>>>>>>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> infinispan-dev mailing list
>>>>>>>>>>> infinispan-dev at lists.jboss.org
>>>>>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> infinispan-dev mailing list
>>>>>>>>>> infinispan-dev at lists.jboss.org
>>>>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> infinispan-dev mailing list
>>>>>>>>> infinispan-dev at lists.jboss.org
>>>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>
>> < 
>> InfinispanDirectoryProvider_22_09_2009 
>> .patch>_______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> --
>> Manik Surtani
>> manik at jboss.org
>> Lead, Infinispan
>> Lead, JBoss Cache
>> http://www.infinispan.org
>> http://www.jbosscache.org
>>
>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org








More information about the infinispan-dev mailing list