[infinispan-dev] Feedback on Infinispan patch
Manik Surtani
manik at jboss.org
Thu Sep 24 12:41:43 EDT 2009
On 24 Sep 2009, at 17:36, Jeff Ramsdale wrote:
> Another alternative would be to see if the Compass Project would be
> interested in hosting it: http://www.compass-project.org/
Good idea. Anybody have traction with the compass folk to propose this?
> Even if Infinispan ends up hosting it there might be value in doing
> some cross-pollination with the Compass folks since this aligns
> directly with what they are working on.
> -jeff
> 2009/9/24 Manik Surtani <manik at jboss.org>:
>> Minorly off topic, but rather than working with patches, do we want
>> this
>> Directory impl in source control somewhere?
>> Being dependent on LGPL, it won't be accepted into Lucene's
>> contribs. If it
>> doesn't depend on any Hibernate Search code, I could host it in
>> Infinispan's
>> SVN repo...
>> On 23 Sep 2009, at 13:58, Łukasz Moreń wrote:
>> I agree that Infinispan case is not much different from
>> RamDirectory. The
>> major difference is that in RD (also FileDirectory) changes are not
>> batched
>> like in ID. If I do not wrap changes in InfinispanDirectory(simple
>> remove
>> tx.begin() from obtain() method and tx.commit() from release() in
>> InfinispanLock), and immediately commit every change made by IW it
>> works
>> well. Hovewer it makes indexing really slower, because of frequent
>> replication to other nodes.
>> Sanne it's good remark that IW commit is kind of flush.
>> I've attached patch with InfinispanDirectory, failing test is
>> testDirectoryWithMultipleThreads in InfinispanDirectoryTest class.
>> It fails
>> randomly. I think problem is Infinispan commit on lockRelease() in
>> org.apache.lucene.index.IndexWriter (line 1658) is after IW commit
>> () (line
>> 1654).
>>> Is it because, the IndexWriter only clean files if no indexReaders
>>> are
>>> reading them (how would that be detected)?
>> It can happen if IndexWriter clean file, and IndexReader try to
>> access that
>> cleaned file.
>> 2009/9/23 Sanne Grinovero <sanne.grinovero at gmail.com>
>>> I agree It should work the same way; The IndexWriter cleans files
>>> whenever it likes to, it doesn't try to detect readers, and this
>>> shouldn't have any effect on the working of readers.
>>> The IndexReader opens the "SegmentsInfo" first, and immediately
>>> after** gets a reference to the segments listed in this
>>> SegmentsInfo.
>>> No IndexWriter will ever change an existing segment, only add new
>>> files or eventually delete old ones (segments merge,optimize).
>>> The deletion of segments is the interesting subject: when using
>>> Files
>>> it uses "delete at last close", which works because the IR needing
>>> it
>>> have it opened already**; when using the RAMDirectory they have a
>>> reference preventing garbage collection.
>>> ( the two "**" are assuming the same event occurred correctly,
>>> otherwise an exception is thrown at opening)
>>> When using Infinispan it shouldn't be much different than the
>>> RAMDirectory? so even if the needed segment is deleted, the IR
>>> holds a
>>> reference to the Java object locally since it was opened.
>>> Łukcasz, do you have some failing test?
>>> Sanne
>>> 2009/9/23 Emmanuel Bernard <emmanuel at hibernate.org>:
>>>> Conceptually I don't understand why it does work in a pure file
>>>> system
>>>> directory (ie IndexReader can go and process queries with the
>>>> IndexWriter
>>>> goes about its business) and not when using Infinispan.
>>>> Is it because, the IndexWriter only clean files if no
>>>> indexReaders are
>>>> reading them (how would that be detected)?
>>>> On 22 sept. 09, at 20:46, Łukasz Moreń wrote:
>>>> I need to provide this same lifecycle for IndexWriter as for
>>>> Infinispan
>>>> tx -
>>>> IW is created: tx is started, IW is commited: tx is commited. It
>>>> assures
>>>> that IndexReader doesn't read old data from directory.
>>>> Infinispan transaction can be started when IW acquires the lock,
>>>> but its
>>>> commit on IW lock release, as it is done so far, causes a problem:
>>>> index writer close {
>>>> index writer commit(); //changes are visible for IndexReaders
>>>> //Index reader starts reading here, i.e. tries to access
>>>> file "A"
>>>> index writer lockRelease(); //changes in Infinispan directory are
>>>> commited, file "A" was removed, IndexReader cannot find it and
>>>> crashes
>>>> }
>>>> I think Infinispan tx have to be commited just before IW commit,
>>>> and the
>>>> problem is where to put in code.
>>>> W dniu 22 września 2009 18:24 użytkownik Emmanuel Bernard
>>>> <emmanuel at hibernate.org> napisał:
>>>>> Can you explain in more details what is going on.
>>>>> Aside from that Workspace has been Sanne's baby lately so he
>>>>> will be
>>>>> the
>>>>> best to see what design will work in HSearch. That being said, I
>>>>> don't
>>>>> like
>>>>> the idea of subclassing / overriding very much. In my
>>>>> experience, it
>>>>> has
>>>>> lead to more bad and unmaintainable code than anything else.
>>>>> On 22 sept. 09, at 02:16, Łukasz Moreń wrote:
>>>>> Hi,
>>>>> Thanks for explanation.
>>>>> Maybe better I will concentrate on the first release and postpone
>>>>> distributed writing.
>>>>> There is already LockStrategy that uses Infinispan. With using
>>>>> it I was
>>>>> wrapping changes made by IndexWriter in Infinispan transaction,
>>>>> because
>>>>> of
>>>>> performance reasons -
>>>>> on lock obtaining transaction was started, on lock release
>>>>> transaction
>>>>> was
>>>>> commited. Hovewer Ispn transaction commit on lock release is not
>>>>> good
>>>>> idea
>>>>> since IndexWriter calls index commit before lock is released(and
>>>>> ispn
>>>>> transaction is committed).
>>>>> I was thinking to override Workspace class and getIndexWriter
>>>>> (start
>>>>> infinispan tx), commitIndexWriter (commit tx) methods to wrap
>>>>> IndexWrite
>>>>> lifecycle, but this needs few other changes. Some other ideas?
>>>>> Cheers,
>>>>> Lukasz
>>>>> 2009/9/21 Sanne Grinovero <sanne.grinovero at gmail.com>
>>>>>> Hi Łukasz,
>>>>>> you've rightful concerns, because the way the IndexWriter tries
>>>>>> to
>>>>>> achieve the lock
>>>>>> that will bring some trouble; As far as I remember we decided
>>>>>> in this
>>>>>> first release
>>>>>> to avoid multiple writer nodes because of this reasons
>>>>>> (that's written in your docs?)
>>>>>> Actually it shouldn't be very hard to do, as the LockStrategy is
>>>>>> pluggable (see changes from HSEARCH-345)
>>>>>> and you could implement one delegating to an Infinispan eager
>>>>>> lock on
>>>>>> some key,
>>>>>> like the default LockStrategy takes a file lock in the index
>>>>>> directory.
>>>>>> Maybe it's simpler to support this distributed writing instead of
>>>>>> sending the queue to some single
>>>>>> (elected) node? Would be cool, as the Document Analysis effort
>>>>>> would
>>>>>> be distributed,
>>>>>> but I have no idea if this would be more or less efficient than a
>>>>>> single node writing; it could
>>>>>> bring some huge data transfers along the wire during segments
>>>>>> merging
>>>>>> (basically fetching
>>>>>> the whole index data at each node performing a segment merge);
>>>>>> maybe
>>>>>> you'll need to
>>>>>> play with IndexWriter settings (
>>>>>> http://docs.jboss.org/hibernate/stable/search/reference/en/html_single/#lucene-indexing-performance
>>>>>> )
>>>>>> probably need to find the sweet spot for "merge_factor".
>>>>>> I just saw now that MergePolicy is now re-implementable, but I
>>>>>> hope
>>>>>> that won't be needed.
>>>>>> Sanne
>>>>>> 2009/9/21 Łukasz Moreń <lukasz.moren at gmail.com>:
>>>>>>> Hi,
>>>>>>> I'm wondering if it is reasonable to have multiple threads/nodes
>>>>>>> that
>>>>>>> modifies indexes in Lucene Directory based on Infinispan? Let's
>>>>>>> assume
>>>>>>> that
>>>>>>> two nodes try to update index in this same time. First one
>>>>>>> creates
>>>>>>> IndexWriter and obtains
>>>>>>> write lock. There is high propability that second node throws
>>>>>>> LockObtainFailedException (as one IndexWriter is allowed on
>>>>>>> single
>>>>>>> index)
>>>>>>> and index is not modified. How is that? Should be always only
>>>>>>> one
>>>>>>> node
>>>>>>> that
>>>>>>> makes changes in
>>>>>>> the index?
>>>>>>> Cheers,
>>>>>>> Lukasz
>>>>>>> W dniu 15 września 2009 01:39 użytkownik Łukasz Moreń
>>>>>>> <lukasz.moren at gmail.com> napisał:
>>>>>>>> Hi,
>>>>>>>> With using JMeter I wanted to check if Infinispan dir does not
>>>>>>>> crash
>>>>>>>> under
>>>>>>>> heavy load in "real" use and check performance in comparison
>>>>>>>> with
>>>>>>>> none/other
>>>>>>>> directories.
>>>>>>>> However appeared problem when multiple IndexWriters tries to
>>>>>>>> modify
>>>>>>>> index
>>>>>>>> (test InfinispanDirectoryTest) - random deadlocks, and Lucene
>>>>>>>> exceptions.
>>>>>>>> IndexWriter tries to access files in index that were removed
>>>>>>>> before.
>>>>>>>> I'm
>>>>>>>> looking into it, but not having good idea.
>>>>>>>> Concerning the last part, I think similar thing is done in
>>>>>>>> InfinispanDirectoryProviderTest. Many threads are making
>>>>>>>> changes
>>>>>>>> and
>>>>>>>> searching (not checking if db is in sync with index).
>>>>>>>> If threads finish their work, with Lucene query I'm checking if
>>>>>>>> index
>>>>>>>> contains as many results as expected. Maybe you meant something
>>>>>>>> else?
>>>>>>>> Would be good to run each node in different VM.
>>>>>>>>> Great ! Looking forward to it. What state are things in at the
>>>>>>>>> moment
>>>>>>>>> if I want to play around with it ?
>>>>>>>> Should work with with one master(updates index) and one many
>>>>>>>> slave
>>>>>>>> nodes
>>>>>>>> (sends changes to master). I tried with one master and one
>>>>>>>> slave
>>>>>>>> (both
>>>>>>>> with
>>>>>>>> jms and jgroups backend) and worked ok. Still fails if multiple
>>>>>>>> nodes
>>>>>>>> want
>>>>>>>> to modify index.
>>>>>>>> I've attached patch with current version.
>>>>>>>> Cheers,
>>>>>>>> Łukasz
>>>>>>>> 2009/9/13 Michael Neale <michael.neale at gmail.com>
>>>>>>>>> Great ! Looking forward to it. What state are things in at the
>>>>>>>>> moment
>>>>>>>>> if I want to play around with it ?
>>>>>>>>> Sent from my phone.
>>>>>>>>> On 13/09/2009, at 7:26 PM, Sanne Grinovero
>>>>>>>>> <sanne.grinovero at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>> 2009/9/12 Michael Neale <michael.neale at gmail.com>:
>>>>>>>>>>> That does sounds pretty cool. Would be nice if the lucene
>>>>>>>>>>> indexes
>>>>>>>>>>> could scale along with how people will want to use
>>>>>>>>>>> infinispan.
>>>>>>>>>>> Probably worth playing with.
>>>>>>>>>> Sure, this is the goal of Łukasz's work; We know compass has
>>>>>>>>>> some good Directories, but we're building our own as one
>>>>>>>>>> based
>>>>>>>>>> on Infinispan is not yet available.
>>>>>>>>>>> Sent from my phone.
>>>>>>>>>>> On 13/09/2009, at 8:37 AM, Jeff Ramsdale
>>>>>>>>>>> <jeff.ramsdale at gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> I'm afraid I haven't followed the Infinispan-Lucene
>>>>>>>>>>>> implementation
>>>>>>>>>>>> closely, but have you looked at the Compass Project?
>>>>>>>>>>>> (http://www.compass-project.org/overview.html) It
>>>>>>>>>>>> provides a
>>>>>>>>>>>> simplified interface to Lucene (optional) as well as
>>>>>>>>>>>> Directory
>>>>>>>>>>>> implementations built on Terracotta, Gigaspaces and
>>>>>>>>>>>> Coherence.
>>>>>>>>>>>> The
>>>>>>>>>>>> latter, in particular, might be a useful guide for the
>>>>>>>>>>>> Infinispan
>>>>>>>>>>>> implementation. I believe it's mature enough to have solved
>>>>>>>>>>>> many
>>>>>>>>>>>> of
>>>>>>>>>>>> the most difficult problems of implementing Directory on a
>>>>>>>>>>>> distributed
>>>>>>>>>>>> Map.
>>>>>>>>>>>> If someone has any experience with Compass (particularly
>>>>>>>>>>>> it's
>>>>>>>>>>>> Directory implementations) I'd be interested in hearing
>>>>>>>>>>>> about
>>>>>>>>>>>> it...
>>>>>>>>>>>> It's Apache 2.0 licensed, btw.
>>>>>>>>>>>> -jeff
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> infinispan-dev mailing list
>>>>>>>>>>>> infinispan-dev at lists.jboss.org
>>>>>>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> infinispan-dev mailing list
>>>>>>>>>>> infinispan-dev at lists.jboss.org
>>>>>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>>>>> _______________________________________________
>>>>>>>>>> infinispan-dev mailing list
>>>>>>>>>> infinispan-dev at lists.jboss.org
>>>>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>>>> _______________________________________________
>>>>>>>>> infinispan-dev mailing list
>>>>>>>>> infinispan-dev at lists.jboss.org
>>>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> <
>> InfinispanDirectoryProvider_22_09_2009
>> .patch>_______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> --
>> Manik Surtani
>> manik at jboss.org
>> Lead, Infinispan
>> Lead, JBoss Cache
>> http://www.infinispan.org
>> http://www.jbosscache.org
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
More information about the infinispan-dev
mailing list