[infinispan-dev] Feedback on Infinispan patch

Łukasz Moreń lukasz.moren at gmail.com
Sun Sep 27 16:00:11 EDT 2009


You can try to incease TURNS_NUM (I've tried with 1000) and THREADS_NUM
(200) fields in InfinispanDirectoryTest to make it more propable. Same
problem appears also in InfinispanDirectoryProviderTest

An example stacktrace is:

21:22:44,441 ERROR InfinispanDirectoryTest:142 - Error
java.io.IOException: File [ segments_nl ] for index [ indexName ] was not
found
    at
org.hibernate.search.store.infinispan.InfinispanIndexIO$InfinispanIndexInput.<init>(InfinispanIndexIO.java:79)
    at
org.hibernate.search.store.infinispan.InfinispanDirectory.openInput(InfinispanDirectory.java:201)
    at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:214)
    at
org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:95)
    at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:653)
    at
org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:115)
    at org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
    at org.apache.lucene.index.IndexReader.open(IndexReader.java:227)
    at org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:55)
    at
org.hibernate.search.test.directoryProvider.infinispan.CacheTestSupport.doReadOperation(CacheTestSupport.java:106)
    at
org.hibernate.search.test.directoryProvider.infinispan.InfinispanDirectoryTest$InfinispanDirectoryThread.run(InfinispanDirectoryTest.java:130)

Cheers,
Lukasz

2009/9/27 Sanne Grinovero <sanne.grinovero at gmail.com>

> Hi Łukasz,
> I'm unable to reproduce the problem, you said it happens randomly:
> I've tried several times
> and I'm not getting errors. Do you know something I could do to make it
> happen?
> Could you share a stacktrace?
>
> Anyway if you are confident it's about the segments getting lost when
> they are still being read,
> you could introduce a per-segment counter of usage; like it starts at
> value 1 to mark the segment
> as "most current", gets a +1 vote at each reader opening it, -1
> closing, and -1 deleting.
> Each decrement method should check for the value reaching 0 to really
> delete it,
> and this counting method would be easy to add inside the Directory.
> When opening a new indexReader, you
> 1) get the SegmentsInfo
> 2) increment all counters (eager-lock, verify>0 or retry : set changed
> counters back and get a new SegmentsInfo-->1)
> 3) get the needed segments
>
> Getting a counter should be much faster than getting a segment in case
> the data is downloaded
> from another node, so we can use a different key while still relating
> to the segment.
>
> Sanne
>
> 2009/9/23 Łukasz Moreń <lukasz.moren at gmail.com>:
> > I agree that Infinispan case is not much different from RamDirectory. The
> > major difference is that in RD (also FileDirectory) changes are not
> batched
> > like in ID. If I do not wrap changes in InfinispanDirectory(simple remove
> > tx.begin() from obtain() method and tx.commit() from release() in
> > InfinispanLock), and immediately commit every change made by IW it works
> > well. Hovewer it makes indexing really slower, because of frequent
> > replication to other nodes.
> > Sanne it's good remark that IW commit is kind of flush.
> >
> > I've attached patch with InfinispanDirectory, failing test is
> > testDirectoryWithMultipleThreads in InfinispanDirectoryTest class. It
> fails
> > randomly. I think problem is Infinispan commit on lockRelease() in
> > org.apache.lucene.index.IndexWriter (line 1658) is after IW commit()
> (line
> > 1654).
> >
> >> Is it because, the IndexWriter only clean files if no indexReaders are
> >> reading them (how would that be detected)?
> >
> > It can happen if IndexWriter clean file, and IndexReader try to access
> that
> > cleaned file.
> >
> > 2009/9/23 Sanne Grinovero <sanne.grinovero at gmail.com>
> >>
> >> I agree It should work the same way; The IndexWriter cleans files
> >> whenever it likes to, it doesn't try to detect readers, and this
> >> shouldn't have any effect on the working of readers.
> >> The IndexReader opens the "SegmentsInfo" first, and immediately
> >> after** gets a reference to the segments listed in this SegmentsInfo.
> >> No IndexWriter will ever change an existing segment, only add new
> >> files or eventually delete old ones (segments merge,optimize).
> >> The deletion of segments is the interesting subject: when using Files
> >> it uses "delete at last close", which works because the IR needing it
> >> have it opened already**; when using the RAMDirectory they have a
> >> reference preventing garbage collection.
> >>
> >> ( the two "**" are assuming the same event occurred correctly,
> >> otherwise an exception is thrown at opening)
> >>
> >> When using Infinispan it shouldn't be much different than the
> >> RAMDirectory? so even if the needed segment is deleted, the IR holds a
> >> reference to the Java object locally since it was opened.
> >>
> >>  Łukcasz, do you have some failing test?
> >>
> >> Sanne
> >>
> >> 2009/9/23 Emmanuel Bernard <emmanuel at hibernate.org>:
> >> > Conceptually I don't understand why it does work in a pure file system
> >> > directory (ie IndexReader can go and process queries with the
> >> > IndexWriter
> >> > goes about its business) and not when using Infinispan.
> >> > Is it because, the IndexWriter only clean files if no indexReaders are
> >> > reading them (how would that be detected)?
> >> > On 22 sept. 09, at 20:46, Łukasz Moreń wrote:
> >> >
> >> > I need to provide this same lifecycle for IndexWriter as for
> Infinispan
> >> > tx -
> >> > IW is created: tx is started, IW is commited: tx is commited. It
> assures
> >> > that IndexReader doesn't read old data from directory.
> >> > Infinispan transaction can be started when IW acquires the lock, but
> its
> >> > commit on IW lock release, as it is done so far, causes a problem:
> >> >
> >> > index writer close {
> >> >   index writer commit(); //changes are visible for IndexReaders
> >> >
> >> >        //Index reader starts reading here, i.e. tries to access file
> "A"
> >> >
> >> >   index writer lockRelease(); //changes in Infinispan directory are
> >> > commited, file "A" was removed, IndexReader cannot find it and crashes
> >> > }
> >> >
> >> > I think Infinispan tx have to be commited just before IW commit, and
> the
> >> > problem is where to put in code.
> >> >
> >> > W dniu 22 września 2009 18:24 użytkownik Emmanuel Bernard
> >> > <emmanuel at hibernate.org> napisał:
> >> >>
> >> >> Can you explain in more details what is going on.
> >> >> Aside from that Workspace has been Sanne's baby lately so he will be
> >> >> the
> >> >> best to see what design will work in HSearch. That being said, I
> don't
> >> >> like
> >> >> the idea of subclassing / overriding very much. In my experience, it
> >> >> has
> >> >> lead to more bad and unmaintainable code than anything else.
> >> >> On 22 sept. 09, at 02:16, Łukasz Moreń wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> Thanks for explanation.
> >> >> Maybe better I will concentrate on the first release and postpone
> >> >> distributed writing.
> >> >>
> >> >> There is already LockStrategy that uses Infinispan. With using it I
> was
> >> >> wrapping changes made by IndexWriter in Infinispan transaction,
> because
> >> >> of
> >> >> performance reasons -
> >> >> on lock obtaining transaction was started, on lock release
> transaction
> >> >> was
> >> >> commited. Hovewer Ispn transaction commit on lock release is not good
> >> >> idea
> >> >> since IndexWriter calls index commit before lock is released(and ispn
> >> >> transaction is committed).
> >> >> I was thinking to override Workspace class and getIndexWriter(start
> >> >> infinispan tx), commitIndexWriter (commit tx) methods to wrap
> >> >> IndexWrite
> >> >> lifecycle, but this needs few other changes. Some other ideas?
> >> >>
> >> >> Cheers,
> >> >> Lukasz
> >> >>
> >> >> 2009/9/21 Sanne Grinovero <sanne.grinovero at gmail.com>
> >> >>>
> >> >>> Hi Łukasz,
> >> >>> you've rightful concerns, because the way the IndexWriter tries to
> >> >>> achieve the lock
> >> >>> that will bring some trouble; As far as I remember we decided in
> this
> >> >>> first release
> >> >>> to avoid multiple writer nodes because of this reasons
> >> >>> (that's written in your docs?)
> >> >>>
> >> >>> Actually it shouldn't be very hard to do, as the LockStrategy is
> >> >>> pluggable (see changes from HSEARCH-345)
> >> >>> and you could implement one delegating to an Infinispan eager lock
> on
> >> >>> some key,
> >> >>> like the default LockStrategy takes a file lock in the index
> >> >>> directory.
> >> >>>
> >> >>> Maybe it's simpler to support this distributed writing instead of
> >> >>> sending the queue to some single
> >> >>> (elected) node? Would be cool, as the Document Analysis effort would
> >> >>> be distributed,
> >> >>> but I have no idea if this would be more or less efficient than a
> >> >>> single node writing; it could
> >> >>> bring some huge data transfers along the wire during segments
> merging
> >> >>> (basically fetching
> >> >>> the whole index data at each node performing a segment merge); maybe
> >> >>> you'll need to
> >> >>> play with IndexWriter settings (
> >> >>>
> >> >>>
> >> >>>
> http://docs.jboss.org/hibernate/stable/search/reference/en/html_single/#lucene-indexing-performance
> >> >>> )
> >> >>> probably need to find the sweet spot for "merge_factor".
> >> >>> I just saw now that MergePolicy is now re-implementable, but I hope
> >> >>> that won't be needed.
> >> >>>
> >> >>> Sanne
> >> >>>
> >> >>> 2009/9/21 Łukasz Moreń <lukasz.moren at gmail.com>:
> >> >>> > Hi,
> >> >>> >
> >> >>> > I'm wondering if it is reasonable to have multiple threads/nodes
> >> >>> > that
> >> >>> > modifies indexes in Lucene Directory based on Infinispan? Let's
> >> >>> > assume
> >> >>> > that
> >> >>> > two nodes try to update index in this same time. First one creates
> >> >>> > IndexWriter and obtains
> >> >>> > write lock. There is high propability that second node throws
> >> >>> > LockObtainFailedException (as one IndexWriter is allowed on single
> >> >>> > index)
> >> >>> > and index is not modified. How is that? Should be always only one
> >> >>> > node
> >> >>> > that
> >> >>> > makes changes in
> >> >>> > the index?
> >> >>> >
> >> >>> > Cheers,
> >> >>> > Lukasz
> >> >>> >
> >> >>> > W dniu 15 września 2009 01:39 użytkownik Łukasz Moreń
> >> >>> > <lukasz.moren at gmail.com> napisał:
> >> >>> >>
> >> >>> >> Hi,
> >> >>> >>
> >> >>> >> With using JMeter I wanted to check if Infinispan dir does not
> >> >>> >> crash
> >> >>> >> under
> >> >>> >> heavy load in "real" use and check performance in comparison with
> >> >>> >> none/other
> >> >>> >> directories.
> >> >>> >> However appeared problem when multiple IndexWriters tries to
> modify
> >> >>> >> index
> >> >>> >> (test InfinispanDirectoryTest) - random deadlocks, and Lucene
> >> >>> >> exceptions.
> >> >>> >> IndexWriter tries to access files in index that were removed
> >> >>> >> before.
> >> >>> >> I'm
> >> >>> >> looking into it, but not having good idea.
> >> >>> >>
> >> >>> >> Concerning the last part, I think similar thing is done in
> >> >>> >> InfinispanDirectoryProviderTest. Many threads are making changes
> >> >>> >> and
> >> >>> >> searching (not checking if db is in sync with index).
> >> >>> >> If threads finish their work, with Lucene query I'm checking if
> >> >>> >> index
> >> >>> >> contains as many results as expected. Maybe you meant something
> >> >>> >> else?
> >> >>> >> Would be good to run each node in different VM.
> >> >>> >>
> >> >>> >>> Great ! Looking forward to it. What state are things in at the
> >> >>> >>> moment
> >> >>> >>> if I want to play around with it ?
> >> >>> >>
> >> >>> >> Should work with with one master(updates index) and one many
> slave
> >> >>> >> nodes
> >> >>> >> (sends changes to master). I tried with one master and one slave
> >> >>> >> (both
> >> >>> >> with
> >> >>> >> jms and jgroups backend) and worked ok. Still fails if multiple
> >> >>> >> nodes
> >> >>> >> want
> >> >>> >> to modify index.
> >> >>> >>
> >> >>> >> I've attached patch with current version.
> >> >>> >>
> >> >>> >> Cheers,
> >> >>> >> Łukasz
> >> >>> >>
> >> >>> >> 2009/9/13 Michael Neale <michael.neale at gmail.com>
> >> >>> >>>
> >> >>> >>> Great ! Looking forward to it. What state are things in at the
> >> >>> >>> moment
> >> >>> >>> if I want to play around with it ?
> >> >>> >>>
> >> >>> >>> Sent from my phone.
> >> >>> >>>
> >> >>> >>> On 13/09/2009, at 7:26 PM, Sanne Grinovero
> >> >>> >>> <sanne.grinovero at gmail.com>
> >> >>> >>> wrote:
> >> >>> >>>
> >> >>> >>> > 2009/9/12 Michael Neale <michael.neale at gmail.com>:
> >> >>> >>> >> That does sounds pretty cool. Would be nice if the lucene
> >> >>> >>> >> indexes
> >> >>> >>> >> could scale along with how people will want to use
> infinispan.
> >> >>> >>> >> Probably worth playing with.
> >> >>> >>> >
> >> >>> >>> > Sure, this is the goal of Łukasz's work; We know compass has
> >> >>> >>> > some good Directories, but we're building our own as one based
> >> >>> >>> > on Infinispan is not yet available.
> >> >>> >>> >
> >> >>> >>> >>
> >> >>> >>> >> Sent from my phone.
> >> >>> >>> >>
> >> >>> >>> >> On 13/09/2009, at 8:37 AM, Jeff Ramsdale
> >> >>> >>> >> <jeff.ramsdale at gmail.com>
> >> >>> >>> >> wrote:
> >> >>> >>> >>
> >> >>> >>> >>> I'm afraid I haven't followed the Infinispan-Lucene
> >> >>> >>> >>> implementation
> >> >>> >>> >>> closely, but have you looked at the Compass Project?
> >> >>> >>> >>> (http://www.compass-project.org/overview.html) It provides
> a
> >> >>> >>> >>> simplified interface to Lucene (optional) as well as
> Directory
> >> >>> >>> >>> implementations built on Terracotta, Gigaspaces and
> Coherence.
> >> >>> >>> >>> The
> >> >>> >>> >>> latter, in particular, might be a useful guide for the
> >> >>> >>> >>> Infinispan
> >> >>> >>> >>> implementation. I believe it's mature enough to have solved
> >> >>> >>> >>> many
> >> >>> >>> >>> of
> >> >>> >>> >>> the most difficult problems of implementing Directory on a
> >> >>> >>> >>> distributed
> >> >>> >>> >>> Map.
> >> >>> >>> >>>
> >> >>> >>> >>> If someone has any experience with Compass (particularly
> it's
> >> >>> >>> >>> Directory implementations) I'd be interested in hearing
> about
> >> >>> >>> >>> it...
> >> >>> >>> >>> It's Apache 2.0 licensed, btw.
> >> >>> >>> >>>
> >> >>> >>> >>> -jeff
> >> >>> >>> >>> _______________________________________________
> >> >>> >>> >>> infinispan-dev mailing list
> >> >>> >>> >>> infinispan-dev at lists.jboss.org
> >> >>> >>> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >> >>> >>> >> _______________________________________________
> >> >>> >>> >> infinispan-dev mailing list
> >> >>> >>> >> infinispan-dev at lists.jboss.org
> >> >>> >>> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >> >>> >>> >>
> >> >>> >>> >
> >> >>> >>> > _______________________________________________
> >> >>> >>> > infinispan-dev mailing list
> >> >>> >>> > infinispan-dev at lists.jboss.org
> >> >>> >>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >> >>> >>>
> >> >>> >>> _______________________________________________
> >> >>> >>> infinispan-dev mailing list
> >> >>> >>> infinispan-dev at lists.jboss.org
> >> >>> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >> >>> >
> >> >>> >
> >> >>
> >> >>
> >> >
> >> >
> >> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20090927/4aa03190/attachment-0002.html 


More information about the infinispan-dev mailing list