[infinispan-dev] Feedback on Infinispan patch
Emmanuel Bernard
emmanuel at hibernate.org
Thu Sep 24 11:52:02 EDT 2009
+1
On 24 sept. 09, at 17:49, Manik Surtani wrote:
> Minorly off topic, but rather than working with patches, do we want
> this Directory impl in source control somewhere?
>
> Being dependent on LGPL, it won't be accepted into Lucene's
> contribs. If it doesn't depend on any Hibernate Search code, I
> could host it in Infinispan's SVN repo...
>
>
> On 23 Sep 2009, at 13:58, Łukasz Moreń wrote:
>
>> I agree that Infinispan case is not much different from
>> RamDirectory. The major difference is that in RD (also
>> FileDirectory) changes are not batched like in ID. If I do not wrap
>> changes in InfinispanDirectory(simple remove tx.begin() from
>> obtain() method and tx.commit() from release() in InfinispanLock),
>> and immediately commit every change made by IW it works well.
>> Hovewer it makes indexing really slower, because of frequent
>> replication to other nodes.
>> Sanne it's good remark that IW commit is kind of flush.
>>
>> I've attached patch with InfinispanDirectory, failing test is
>> testDirectoryWithMultipleThreads in InfinispanDirectoryTest class.
>> It fails randomly. I think problem is Infinispan commit on
>> lockRelease() in org.apache.lucene.index.IndexWriter (line 1658) is
>> after IW commit() (line 1654).
>>
>> Is it because, the IndexWriter only clean files if no indexReaders
>> are reading them (how would that be detected)?
>> It can happen if IndexWriter clean file, and IndexReader try to
>> access that cleaned file.
>>
>> 2009/9/23 Sanne Grinovero <sanne.grinovero at gmail.com>
>> I agree It should work the same way; The IndexWriter cleans files
>> whenever it likes to, it doesn't try to detect readers, and this
>> shouldn't have any effect on the working of readers.
>> The IndexReader opens the "SegmentsInfo" first, and immediately
>> after** gets a reference to the segments listed in this SegmentsInfo.
>> No IndexWriter will ever change an existing segment, only add new
>> files or eventually delete old ones (segments merge,optimize).
>> The deletion of segments is the interesting subject: when using Files
>> it uses "delete at last close", which works because the IR needing it
>> have it opened already**; when using the RAMDirectory they have a
>> reference preventing garbage collection.
>>
>> ( the two "**" are assuming the same event occurred correctly,
>> otherwise an exception is thrown at opening)
>>
>> When using Infinispan it shouldn't be much different than the
>> RAMDirectory? so even if the needed segment is deleted, the IR
>> holds a
>> reference to the Java object locally since it was opened.
>>
>> Łukcasz, do you have some failing test?
>>
>> Sanne
>>
>> 2009/9/23 Emmanuel Bernard <emmanuel at hibernate.org>:
>> > Conceptually I don't understand why it does work in a pure file
>> system
>> > directory (ie IndexReader can go and process queries with the
>> IndexWriter
>> > goes about its business) and not when using Infinispan.
>> > Is it because, the IndexWriter only clean files if no
>> indexReaders are
>> > reading them (how would that be detected)?
>> > On 22 sept. 09, at 20:46, Łukasz Moreń wrote:
>> >
>> > I need to provide this same lifecycle for IndexWriter as for
>> Infinispan tx -
>> > IW is created: tx is started, IW is commited: tx is commited. It
>> assures
>> > that IndexReader doesn't read old data from directory.
>> > Infinispan transaction can be started when IW acquires the lock,
>> but its
>> > commit on IW lock release, as it is done so far, causes a problem:
>> >
>> > index writer close {
>> > index writer commit(); //changes are visible for IndexReaders
>> >
>> > //Index reader starts reading here, i.e. tries to access
>> file "A"
>> >
>> > index writer lockRelease(); //changes in Infinispan directory are
>> > commited, file "A" was removed, IndexReader cannot find it and
>> crashes
>> > }
>> >
>> > I think Infinispan tx have to be commited just before IW commit,
>> and the
>> > problem is where to put in code.
>> >
>> > W dniu 22 września 2009 18:24 użytkownik Emmanuel Bernard
>> > <emmanuel at hibernate.org> napisał:
>> >>
>> >> Can you explain in more details what is going on.
>> >> Aside from that Workspace has been Sanne's baby lately so he
>> will be the
>> >> best to see what design will work in HSearch. That being said, I
>> don't like
>> >> the idea of subclassing / overriding very much. In my
>> experience, it has
>> >> lead to more bad and unmaintainable code than anything else.
>> >> On 22 sept. 09, at 02:16, Łukasz Moreń wrote:
>> >>
>> >> Hi,
>> >>
>> >> Thanks for explanation.
>> >> Maybe better I will concentrate on the first release and postpone
>> >> distributed writing.
>> >>
>> >> There is already LockStrategy that uses Infinispan. With using
>> it I was
>> >> wrapping changes made by IndexWriter in Infinispan transaction,
>> because of
>> >> performance reasons -
>> >> on lock obtaining transaction was started, on lock release
>> transaction was
>> >> commited. Hovewer Ispn transaction commit on lock release is not
>> good idea
>> >> since IndexWriter calls index commit before lock is released(and
>> ispn
>> >> transaction is committed).
>> >> I was thinking to override Workspace class and
>> getIndexWriter(start
>> >> infinispan tx), commitIndexWriter (commit tx) methods to wrap
>> IndexWrite
>> >> lifecycle, but this needs few other changes. Some other ideas?
>> >>
>> >> Cheers,
>> >> Lukasz
>> >>
>> >> 2009/9/21 Sanne Grinovero <sanne.grinovero at gmail.com>
>> >>>
>> >>> Hi Łukasz,
>> >>> you've rightful concerns, because the way the IndexWriter tries
>> to
>> >>> achieve the lock
>> >>> that will bring some trouble; As far as I remember we decided
>> in this
>> >>> first release
>> >>> to avoid multiple writer nodes because of this reasons
>> >>> (that's written in your docs?)
>> >>>
>> >>> Actually it shouldn't be very hard to do, as the LockStrategy is
>> >>> pluggable (see changes from HSEARCH-345)
>> >>> and you could implement one delegating to an Infinispan eager
>> lock on
>> >>> some key,
>> >>> like the default LockStrategy takes a file lock in the index
>> directory.
>> >>>
>> >>> Maybe it's simpler to support this distributed writing instead of
>> >>> sending the queue to some single
>> >>> (elected) node? Would be cool, as the Document Analysis effort
>> would
>> >>> be distributed,
>> >>> but I have no idea if this would be more or less efficient than a
>> >>> single node writing; it could
>> >>> bring some huge data transfers along the wire during segments
>> merging
>> >>> (basically fetching
>> >>> the whole index data at each node performing a segment merge);
>> maybe
>> >>> you'll need to
>> >>> play with IndexWriter settings (
>> >>>
>> >>> http://docs.jboss.org/hibernate/stable/search/reference/en/html_single/#lucene-indexing-performance
>> >>> )
>> >>> probably need to find the sweet spot for "merge_factor".
>> >>> I just saw now that MergePolicy is now re-implementable, but I
>> hope
>> >>> that won't be needed.
>> >>>
>> >>> Sanne
>> >>>
>> >>> 2009/9/21 Łukasz Moreń <lukasz.moren at gmail.com>:
>> >>> > Hi,
>> >>> >
>> >>> > I'm wondering if it is reasonable to have multiple threads/
>> nodes that
>> >>> > modifies indexes in Lucene Directory based on Infinispan?
>> Let's assume
>> >>> > that
>> >>> > two nodes try to update index in this same time. First one
>> creates
>> >>> > IndexWriter and obtains
>> >>> > write lock. There is high propability that second node throws
>> >>> > LockObtainFailedException (as one IndexWriter is allowed on
>> single
>> >>> > index)
>> >>> > and index is not modified. How is that? Should be always only
>> one node
>> >>> > that
>> >>> > makes changes in
>> >>> > the index?
>> >>> >
>> >>> > Cheers,
>> >>> > Lukasz
>> >>> >
>> >>> > W dniu 15 września 2009 01:39 użytkownik Łukasz Moreń
>> >>> > <lukasz.moren at gmail.com> napisał:
>> >>> >>
>> >>> >> Hi,
>> >>> >>
>> >>> >> With using JMeter I wanted to check if Infinispan dir does
>> not crash
>> >>> >> under
>> >>> >> heavy load in "real" use and check performance in comparison
>> with
>> >>> >> none/other
>> >>> >> directories.
>> >>> >> However appeared problem when multiple IndexWriters tries to
>> modify
>> >>> >> index
>> >>> >> (test InfinispanDirectoryTest) - random deadlocks, and Lucene
>> >>> >> exceptions.
>> >>> >> IndexWriter tries to access files in index that were removed
>> before.
>> >>> >> I'm
>> >>> >> looking into it, but not having good idea.
>> >>> >>
>> >>> >> Concerning the last part, I think similar thing is done in
>> >>> >> InfinispanDirectoryProviderTest. Many threads are making
>> changes and
>> >>> >> searching (not checking if db is in sync with index).
>> >>> >> If threads finish their work, with Lucene query I'm checking
>> if index
>> >>> >> contains as many results as expected. Maybe you meant
>> something else?
>> >>> >> Would be good to run each node in different VM.
>> >>> >>
>> >>> >>> Great ! Looking forward to it. What state are things in at
>> the moment
>> >>> >>> if I want to play around with it ?
>> >>> >>
>> >>> >> Should work with with one master(updates index) and one many
>> slave
>> >>> >> nodes
>> >>> >> (sends changes to master). I tried with one master and one
>> slave (both
>> >>> >> with
>> >>> >> jms and jgroups backend) and worked ok. Still fails if
>> multiple nodes
>> >>> >> want
>> >>> >> to modify index.
>> >>> >>
>> >>> >> I've attached patch with current version.
>> >>> >>
>> >>> >> Cheers,
>> >>> >> Łukasz
>> >>> >>
>> >>> >> 2009/9/13 Michael Neale <michael.neale at gmail.com>
>> >>> >>>
>> >>> >>> Great ! Looking forward to it. What state are things in at
>> the moment
>> >>> >>> if I want to play around with it ?
>> >>> >>>
>> >>> >>> Sent from my phone.
>> >>> >>>
>> >>> >>> On 13/09/2009, at 7:26 PM, Sanne Grinovero
>> >>> >>> <sanne.grinovero at gmail.com>
>> >>> >>> wrote:
>> >>> >>>
>> >>> >>> > 2009/9/12 Michael Neale <michael.neale at gmail.com>:
>> >>> >>> >> That does sounds pretty cool. Would be nice if the
>> lucene indexes
>> >>> >>> >> could scale along with how people will want to use
>> infinispan.
>> >>> >>> >> Probably worth playing with.
>> >>> >>> >
>> >>> >>> > Sure, this is the goal of Łukasz's work; We know compass
>> has
>> >>> >>> > some good Directories, but we're building our own as one
>> based
>> >>> >>> > on Infinispan is not yet available.
>> >>> >>> >
>> >>> >>> >>
>> >>> >>> >> Sent from my phone.
>> >>> >>> >>
>> >>> >>> >> On 13/09/2009, at 8:37 AM, Jeff Ramsdale <jeff.ramsdale at gmail.com
>> >
>> >>> >>> >> wrote:
>> >>> >>> >>
>> >>> >>> >>> I'm afraid I haven't followed the Infinispan-Lucene
>> >>> >>> >>> implementation
>> >>> >>> >>> closely, but have you looked at the Compass Project?
>> >>> >>> >>> (http://www.compass-project.org/overview.html) It
>> provides a
>> >>> >>> >>> simplified interface to Lucene (optional) as well as
>> Directory
>> >>> >>> >>> implementations built on Terracotta, Gigaspaces and
>> Coherence.
>> >>> >>> >>> The
>> >>> >>> >>> latter, in particular, might be a useful guide for the
>> Infinispan
>> >>> >>> >>> implementation. I believe it's mature enough to have
>> solved many
>> >>> >>> >>> of
>> >>> >>> >>> the most difficult problems of implementing Directory
>> on a
>> >>> >>> >>> distributed
>> >>> >>> >>> Map.
>> >>> >>> >>>
>> >>> >>> >>> If someone has any experience with Compass
>> (particularly it's
>> >>> >>> >>> Directory implementations) I'd be interested in hearing
>> about
>> >>> >>> >>> it...
>> >>> >>> >>> It's Apache 2.0 licensed, btw.
>> >>> >>> >>>
>> >>> >>> >>> -jeff
>> >>> >>> >>> _______________________________________________
>> >>> >>> >>> infinispan-dev mailing list
>> >>> >>> >>> infinispan-dev at lists.jboss.org
>> >>> >>> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >>> >>> >> _______________________________________________
>> >>> >>> >> infinispan-dev mailing list
>> >>> >>> >> infinispan-dev at lists.jboss.org
>> >>> >>> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >>> >>> >>
>> >>> >>> >
>> >>> >>> > _______________________________________________
>> >>> >>> > infinispan-dev mailing list
>> >>> >>> > infinispan-dev at lists.jboss.org
>> >>> >>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >>> >>>
>> >>> >>> _______________________________________________
>> >>> >>> infinispan-dev mailing list
>> >>> >>> infinispan-dev at lists.jboss.org
>> >>> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >>> >
>> >>> >
>> >>
>> >>
>> >
>> >
>> >
>>
>> <
>> InfinispanDirectoryProvider_22_09_2009
>> .patch>_______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
> Manik Surtani
> manik at jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20090924/25d5c012/attachment-0002.html
More information about the infinispan-dev
mailing list