[infinispan-dev] Feedback on Infinispan patch
Sanne Grinovero
sanne.grinovero at gmail.com
Sun Sep 27 08:42:14 EDT 2009
Hi Łukasz,
I'm unable to reproduce the problem, you said it happens randomly:
I've tried several times
and I'm not getting errors. Do you know something I could do to make it happen?
Could you share a stacktrace?
Anyway if you are confident it's about the segments getting lost when
they are still being read,
you could introduce a per-segment counter of usage; like it starts at
value 1 to mark the segment
as "most current", gets a +1 vote at each reader opening it, -1
closing, and -1 deleting.
Each decrement method should check for the value reaching 0 to really delete it,
and this counting method would be easy to add inside the Directory.
When opening a new indexReader, you
1) get the SegmentsInfo
2) increment all counters (eager-lock, verify>0 or retry : set changed
counters back and get a new SegmentsInfo-->1)
3) get the needed segments
Getting a counter should be much faster than getting a segment in case
the data is downloaded
from another node, so we can use a different key while still relating
to the segment.
Sanne
2009/9/23 Łukasz Moreń <lukasz.moren at gmail.com>:
> I agree that Infinispan case is not much different from RamDirectory. The
> major difference is that in RD (also FileDirectory) changes are not batched
> like in ID. If I do not wrap changes in InfinispanDirectory(simple remove
> tx.begin() from obtain() method and tx.commit() from release() in
> InfinispanLock), and immediately commit every change made by IW it works
> well. Hovewer it makes indexing really slower, because of frequent
> replication to other nodes.
> Sanne it's good remark that IW commit is kind of flush.
>
> I've attached patch with InfinispanDirectory, failing test is
> testDirectoryWithMultipleThreads in InfinispanDirectoryTest class. It fails
> randomly. I think problem is Infinispan commit on lockRelease() in
> org.apache.lucene.index.IndexWriter (line 1658) is after IW commit() (line
> 1654).
>
>> Is it because, the IndexWriter only clean files if no indexReaders are
>> reading them (how would that be detected)?
>
> It can happen if IndexWriter clean file, and IndexReader try to access that
> cleaned file.
>
> 2009/9/23 Sanne Grinovero <sanne.grinovero at gmail.com>
>>
>> I agree It should work the same way; The IndexWriter cleans files
>> whenever it likes to, it doesn't try to detect readers, and this
>> shouldn't have any effect on the working of readers.
>> The IndexReader opens the "SegmentsInfo" first, and immediately
>> after** gets a reference to the segments listed in this SegmentsInfo.
>> No IndexWriter will ever change an existing segment, only add new
>> files or eventually delete old ones (segments merge,optimize).
>> The deletion of segments is the interesting subject: when using Files
>> it uses "delete at last close", which works because the IR needing it
>> have it opened already**; when using the RAMDirectory they have a
>> reference preventing garbage collection.
>>
>> ( the two "**" are assuming the same event occurred correctly,
>> otherwise an exception is thrown at opening)
>>
>> When using Infinispan it shouldn't be much different than the
>> RAMDirectory? so even if the needed segment is deleted, the IR holds a
>> reference to the Java object locally since it was opened.
>>
>> Łukcasz, do you have some failing test?
>>
>> Sanne
>>
>> 2009/9/23 Emmanuel Bernard <emmanuel at hibernate.org>:
>> > Conceptually I don't understand why it does work in a pure file system
>> > directory (ie IndexReader can go and process queries with the
>> > IndexWriter
>> > goes about its business) and not when using Infinispan.
>> > Is it because, the IndexWriter only clean files if no indexReaders are
>> > reading them (how would that be detected)?
>> > On 22 sept. 09, at 20:46, Łukasz Moreń wrote:
>> >
>> > I need to provide this same lifecycle for IndexWriter as for Infinispan
>> > tx -
>> > IW is created: tx is started, IW is commited: tx is commited. It assures
>> > that IndexReader doesn't read old data from directory.
>> > Infinispan transaction can be started when IW acquires the lock, but its
>> > commit on IW lock release, as it is done so far, causes a problem:
>> >
>> > index writer close {
>> > index writer commit(); //changes are visible for IndexReaders
>> >
>> > //Index reader starts reading here, i.e. tries to access file "A"
>> >
>> > index writer lockRelease(); //changes in Infinispan directory are
>> > commited, file "A" was removed, IndexReader cannot find it and crashes
>> > }
>> >
>> > I think Infinispan tx have to be commited just before IW commit, and the
>> > problem is where to put in code.
>> >
>> > W dniu 22 września 2009 18:24 użytkownik Emmanuel Bernard
>> > <emmanuel at hibernate.org> napisał:
>> >>
>> >> Can you explain in more details what is going on.
>> >> Aside from that Workspace has been Sanne's baby lately so he will be
>> >> the
>> >> best to see what design will work in HSearch. That being said, I don't
>> >> like
>> >> the idea of subclassing / overriding very much. In my experience, it
>> >> has
>> >> lead to more bad and unmaintainable code than anything else.
>> >> On 22 sept. 09, at 02:16, Łukasz Moreń wrote:
>> >>
>> >> Hi,
>> >>
>> >> Thanks for explanation.
>> >> Maybe better I will concentrate on the first release and postpone
>> >> distributed writing.
>> >>
>> >> There is already LockStrategy that uses Infinispan. With using it I was
>> >> wrapping changes made by IndexWriter in Infinispan transaction, because
>> >> of
>> >> performance reasons -
>> >> on lock obtaining transaction was started, on lock release transaction
>> >> was
>> >> commited. Hovewer Ispn transaction commit on lock release is not good
>> >> idea
>> >> since IndexWriter calls index commit before lock is released(and ispn
>> >> transaction is committed).
>> >> I was thinking to override Workspace class and getIndexWriter(start
>> >> infinispan tx), commitIndexWriter (commit tx) methods to wrap
>> >> IndexWrite
>> >> lifecycle, but this needs few other changes. Some other ideas?
>> >>
>> >> Cheers,
>> >> Lukasz
>> >>
>> >> 2009/9/21 Sanne Grinovero <sanne.grinovero at gmail.com>
>> >>>
>> >>> Hi Łukasz,
>> >>> you've rightful concerns, because the way the IndexWriter tries to
>> >>> achieve the lock
>> >>> that will bring some trouble; As far as I remember we decided in this
>> >>> first release
>> >>> to avoid multiple writer nodes because of this reasons
>> >>> (that's written in your docs?)
>> >>>
>> >>> Actually it shouldn't be very hard to do, as the LockStrategy is
>> >>> pluggable (see changes from HSEARCH-345)
>> >>> and you could implement one delegating to an Infinispan eager lock on
>> >>> some key,
>> >>> like the default LockStrategy takes a file lock in the index
>> >>> directory.
>> >>>
>> >>> Maybe it's simpler to support this distributed writing instead of
>> >>> sending the queue to some single
>> >>> (elected) node? Would be cool, as the Document Analysis effort would
>> >>> be distributed,
>> >>> but I have no idea if this would be more or less efficient than a
>> >>> single node writing; it could
>> >>> bring some huge data transfers along the wire during segments merging
>> >>> (basically fetching
>> >>> the whole index data at each node performing a segment merge); maybe
>> >>> you'll need to
>> >>> play with IndexWriter settings (
>> >>>
>> >>>
>> >>> http://docs.jboss.org/hibernate/stable/search/reference/en/html_single/#lucene-indexing-performance
>> >>> )
>> >>> probably need to find the sweet spot for "merge_factor".
>> >>> I just saw now that MergePolicy is now re-implementable, but I hope
>> >>> that won't be needed.
>> >>>
>> >>> Sanne
>> >>>
>> >>> 2009/9/21 Łukasz Moreń <lukasz.moren at gmail.com>:
>> >>> > Hi,
>> >>> >
>> >>> > I'm wondering if it is reasonable to have multiple threads/nodes
>> >>> > that
>> >>> > modifies indexes in Lucene Directory based on Infinispan? Let's
>> >>> > assume
>> >>> > that
>> >>> > two nodes try to update index in this same time. First one creates
>> >>> > IndexWriter and obtains
>> >>> > write lock. There is high propability that second node throws
>> >>> > LockObtainFailedException (as one IndexWriter is allowed on single
>> >>> > index)
>> >>> > and index is not modified. How is that? Should be always only one
>> >>> > node
>> >>> > that
>> >>> > makes changes in
>> >>> > the index?
>> >>> >
>> >>> > Cheers,
>> >>> > Lukasz
>> >>> >
>> >>> > W dniu 15 września 2009 01:39 użytkownik Łukasz Moreń
>> >>> > <lukasz.moren at gmail.com> napisał:
>> >>> >>
>> >>> >> Hi,
>> >>> >>
>> >>> >> With using JMeter I wanted to check if Infinispan dir does not
>> >>> >> crash
>> >>> >> under
>> >>> >> heavy load in "real" use and check performance in comparison with
>> >>> >> none/other
>> >>> >> directories.
>> >>> >> However appeared problem when multiple IndexWriters tries to modify
>> >>> >> index
>> >>> >> (test InfinispanDirectoryTest) - random deadlocks, and Lucene
>> >>> >> exceptions.
>> >>> >> IndexWriter tries to access files in index that were removed
>> >>> >> before.
>> >>> >> I'm
>> >>> >> looking into it, but not having good idea.
>> >>> >>
>> >>> >> Concerning the last part, I think similar thing is done in
>> >>> >> InfinispanDirectoryProviderTest. Many threads are making changes
>> >>> >> and
>> >>> >> searching (not checking if db is in sync with index).
>> >>> >> If threads finish their work, with Lucene query I'm checking if
>> >>> >> index
>> >>> >> contains as many results as expected. Maybe you meant something
>> >>> >> else?
>> >>> >> Would be good to run each node in different VM.
>> >>> >>
>> >>> >>> Great ! Looking forward to it. What state are things in at the
>> >>> >>> moment
>> >>> >>> if I want to play around with it ?
>> >>> >>
>> >>> >> Should work with with one master(updates index) and one many slave
>> >>> >> nodes
>> >>> >> (sends changes to master). I tried with one master and one slave
>> >>> >> (both
>> >>> >> with
>> >>> >> jms and jgroups backend) and worked ok. Still fails if multiple
>> >>> >> nodes
>> >>> >> want
>> >>> >> to modify index.
>> >>> >>
>> >>> >> I've attached patch with current version.
>> >>> >>
>> >>> >> Cheers,
>> >>> >> Łukasz
>> >>> >>
>> >>> >> 2009/9/13 Michael Neale <michael.neale at gmail.com>
>> >>> >>>
>> >>> >>> Great ! Looking forward to it. What state are things in at the
>> >>> >>> moment
>> >>> >>> if I want to play around with it ?
>> >>> >>>
>> >>> >>> Sent from my phone.
>> >>> >>>
>> >>> >>> On 13/09/2009, at 7:26 PM, Sanne Grinovero
>> >>> >>> <sanne.grinovero at gmail.com>
>> >>> >>> wrote:
>> >>> >>>
>> >>> >>> > 2009/9/12 Michael Neale <michael.neale at gmail.com>:
>> >>> >>> >> That does sounds pretty cool. Would be nice if the lucene
>> >>> >>> >> indexes
>> >>> >>> >> could scale along with how people will want to use infinispan.
>> >>> >>> >> Probably worth playing with.
>> >>> >>> >
>> >>> >>> > Sure, this is the goal of Łukasz's work; We know compass has
>> >>> >>> > some good Directories, but we're building our own as one based
>> >>> >>> > on Infinispan is not yet available.
>> >>> >>> >
>> >>> >>> >>
>> >>> >>> >> Sent from my phone.
>> >>> >>> >>
>> >>> >>> >> On 13/09/2009, at 8:37 AM, Jeff Ramsdale
>> >>> >>> >> <jeff.ramsdale at gmail.com>
>> >>> >>> >> wrote:
>> >>> >>> >>
>> >>> >>> >>> I'm afraid I haven't followed the Infinispan-Lucene
>> >>> >>> >>> implementation
>> >>> >>> >>> closely, but have you looked at the Compass Project?
>> >>> >>> >>> (http://www.compass-project.org/overview.html) It provides a
>> >>> >>> >>> simplified interface to Lucene (optional) as well as Directory
>> >>> >>> >>> implementations built on Terracotta, Gigaspaces and Coherence.
>> >>> >>> >>> The
>> >>> >>> >>> latter, in particular, might be a useful guide for the
>> >>> >>> >>> Infinispan
>> >>> >>> >>> implementation. I believe it's mature enough to have solved
>> >>> >>> >>> many
>> >>> >>> >>> of
>> >>> >>> >>> the most difficult problems of implementing Directory on a
>> >>> >>> >>> distributed
>> >>> >>> >>> Map.
>> >>> >>> >>>
>> >>> >>> >>> If someone has any experience with Compass (particularly it's
>> >>> >>> >>> Directory implementations) I'd be interested in hearing about
>> >>> >>> >>> it...
>> >>> >>> >>> It's Apache 2.0 licensed, btw.
>> >>> >>> >>>
>> >>> >>> >>> -jeff
>> >>> >>> >>> _______________________________________________
>> >>> >>> >>> infinispan-dev mailing list
>> >>> >>> >>> infinispan-dev at lists.jboss.org
>> >>> >>> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >>> >>> >> _______________________________________________
>> >>> >>> >> infinispan-dev mailing list
>> >>> >>> >> infinispan-dev at lists.jboss.org
>> >>> >>> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >>> >>> >>
>> >>> >>> >
>> >>> >>> > _______________________________________________
>> >>> >>> > infinispan-dev mailing list
>> >>> >>> > infinispan-dev at lists.jboss.org
>> >>> >>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >>> >>>
>> >>> >>> _______________________________________________
>> >>> >>> infinispan-dev mailing list
>> >>> >>> infinispan-dev at lists.jboss.org
>> >>> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >>> >
>> >>> >
>> >>
>> >>
>> >
>> >
>> >
>
>
More information about the infinispan-dev
mailing list