[infinispan-dev] [hibernate-dev] Feedback on Infinispan patch

johng.sst at gmail.com johng.sst at gmail.com
Sun Sep 27 16:12:49 EDT 2009


Sanne,

That error looks suspiciously familiar to an old Lucene error they had.  
Could they have regressed?

John Griffin

On Sep 27, 2009 2:00pm, Łukasz Moreń <lukasz.moren at gmail.com> wrote:
> You can try to incease TURNS_NUM (I've tried with 1000) and THREADS_NUM  
> (200) fields in InfinispanDirectoryTest to make it more propable. Same  
> problem appears also in InfinispanDirectoryProviderTest

> An example stacktrace is:


> 21:22:44,441 ERROR InfinispanDirectoryTest:142 - Error
> java.io.IOException: File [ segments_nl ] for index [ indexName ] was not  
> found
> at  
> org.hibernate.search.store.infinispan.InfinispanIndexIO$InfinispanIndexInput.(InfinispanIndexIO.java:79)

> at  
> org.hibernate.search.store.infinispan.InfinispanDirectory.openInput(InfinispanDirectory.java:201)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:214)
> at  
> org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:95)

> at  
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:653)
> at  
> org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:115)
> at org.apache.lucene.index.IndexReader.open(IndexReader.java:316)

> at org.apache.lucene.index.IndexReader.open(IndexReader.java:227)
> at org.apache.lucene.search.IndexSearcher.(IndexSearcher.java:55)
> at  
> org.hibernate.search.test.directoryProvider.infinispan.CacheTestSupport.doReadOperation(CacheTestSupport.java:106)

> at  
> org.hibernate.search.test.directoryProvider.infinispan.InfinispanDirectoryTest$InfinispanDirectoryThread.run(InfinispanDirectoryTest.java:130)

> Cheers,
> Lukasz

> 2009/9/27 Sanne Grinovero sanne.grinovero at gmail.com>

> Hi Łukasz,

> I'm unable to reproduce the problem, you said it happens randomly:

> I've tried several times

> and I'm not getting errors. Do you know something I could do to make it  
> happen?

> Could you share a stacktrace?



> Anyway if you are confident it's about the segments getting lost when

> they are still being read,

> you could introduce a per-segment counter of usage; like it starts at

> value 1 to mark the segment

> as "most current", gets a +1 vote at each reader opening it, -1

> closing, and -1 deleting.

> Each decrement method should check for the value reaching 0 to really  
> delete it,

> and this counting method would be easy to add inside the Directory.

> When opening a new indexReader, you

> 1) get the SegmentsInfo

> 2) increment all counters (eager-lock, verify>0 or retry : set changed

> counters back and get a new SegmentsInfo-->1)

> 3) get the needed segments



> Getting a counter should be much faster than getting a segment in case

> the data is downloaded

> from another node, so we can use a different key while still relating

> to the segment.



> Sanne



> 2009/9/23 Łukasz Moreń lukasz.moren at gmail.com>:


> > I agree that Infinispan case is not much different from RamDirectory.  
> The

> > major difference is that in RD (also FileDirectory) changes are not  
> batched

> > like in ID. If I do not wrap changes in InfinispanDirectory(simple  
> remove

> > tx.begin() from obtain() method and tx.commit() from release() in

> > InfinispanLock), and immediately commit every change made by IW it works

> > well. Hovewer it makes indexing really slower, because of frequent

> > replication to other nodes.

> > Sanne it's good remark that IW commit is kind of flush.

> >

> > I've attached patch with InfinispanDirectory, failing test is

> > testDirectoryWithMultipleThreads in InfinispanDirectoryTest class. It  
> fails

> > randomly. I think problem is Infinispan commit on lockRelease() in

> > org.apache.lucene.index.IndexWriter (line 1658) is after IW commit()  
> (line

> > 1654).

> >

> >> Is it because, the IndexWriter only clean files if no indexReaders are

> >> reading them (how would that be detected)?

> >

> > It can happen if IndexWriter clean file, and IndexReader try to access  
> that

> > cleaned file.

> >

> > 2009/9/23 Sanne Grinovero sanne.grinovero at gmail.com>

> >>

> >> I agree It should work the same way; The IndexWriter cleans files

> >> whenever it likes to, it doesn't try to detect readers, and this

> >> shouldn't have any effect on the working of readers.

> >> The IndexReader opens the "SegmentsInfo" first, and immediately

> >> after** gets a reference to the segments listed in this SegmentsInfo.

> >> No IndexWriter will ever change an existing segment, only add new

> >> files or eventually delete old ones (segments merge,optimize).

> >> The deletion of segments is the interesting subject: when using Files

> >> it uses "delete at last close", which works because the IR needing it

> >> have it opened already**; when using the RAMDirectory they have a

> >> reference preventing garbage collection.

> >>

> >> ( the two "**" are assuming the same event occurred correctly,

> >> otherwise an exception is thrown at opening)

> >>

> >> When using Infinispan it shouldn't be much different than the

> >> RAMDirectory? so even if the needed segment is deleted, the IR holds a

> >> reference to the Java object locally since it was opened.

> >>

> >> Łukcasz, do you have some failing test?

> >>

> >> Sanne

> >>

> >> 2009/9/23 Emmanuel Bernard emmanuel at hibernate.org>:

> >> > Conceptually I don't understand why it does work in a pure file  
> system

> >> > directory (ie IndexReader can go and process queries with the

> >> > IndexWriter

> >> > goes about its business) and not when using Infinispan.

> >> > Is it because, the IndexWriter only clean files if no indexReaders  
> are

> >> > reading them (how would that be detected)?

> >> > On 22 sept. 09, at 20:46, Łukasz Moreń wrote:

> >> >

> >> > I need to provide this same lifecycle for IndexWriter as for  
> Infinispan

> >> > tx -

> >> > IW is created: tx is started, IW is commited: tx is commited. It  
> assures

> >> > that IndexReader doesn't read old data from directory.

> >> > Infinispan transaction can be started when IW acquires the lock, but  
> its

> >> > commit on IW lock release, as it is done so far, causes a problem:

> >> >

> >> > index writer close {

> >> > index writer commit(); //changes are visible for IndexReaders

> >> >

> >> > //Index reader starts reading here, ie tries to access file "A"

> >> >

> >> > index writer lockRelease(); //changes in Infinispan directory are

> >> > commited, file "A" was removed, IndexReader cannot find it and  
> crashes

> >> > }

> >> >

> >> > I think Infinispan tx have to be commited just before IW commit, and  
> the

> >> > problem is where to put in code.

> >> >

> >> > W dniu 22 września 2009 18:24 użytkownik Emmanuel Bernard

> >> > emmanuel at hibernate.org> napisał:

> >> >>

> >> >> Can you explain in more details what is going on.

> >> >> Aside from that Workspace has been Sanne's baby lately so he will be

> >> >> the

> >> >> best to see what design will work in HSearch. That being said, I  
> don't

> >> >> like

> >> >> the idea of subclassing / overriding very much. In my experience, it

> >> >> has

> >> >> lead to more bad and unmaintainable code than anything else.

> >> >> On 22 sept. 09, at 02:16, Łukasz Moreń wrote:

> >> >>

> >> >> Hi,

> >> >>

> >> >> Thanks for explanation.

> >> >> Maybe better I will concentrate on the first release and postpone

> >> >> distributed writing.

> >> >>

> >> >> There is already LockStrategy that uses Infinispan. With using it I  
> was

> >> >> wrapping changes made by IndexWriter in Infinispan transaction,  
> because

> >> >> of

> >> >> performance reasons -

> >> >> on lock obtaining transaction was started, on lock release  
> transaction

> >> >> was

> >> >> commited. Hovewer Ispn transaction commit on lock release is not  
> good

> >> >> idea

> >> >> since IndexWriter calls index commit before lock is released(and  
> ispn

> >> >> transaction is committed).

> >> >> I was thinking to override Workspace class and getIndexWriter(start

> >> >> infinispan tx), commitIndexWriter (commit tx) methods to wrap

> >> >> IndexWrite

> >> >> lifecycle, but this needs few other changes. Some other ideas?

> >> >>

> >> >> Cheers,

> >> >> Lukasz

> >> >>

> >> >> 2009/9/21 Sanne Grinovero sanne.grinovero at gmail.com>

> >> >>>

> >> >>> Hi Łukasz,

> >> >>> you've rightful concerns, because the way the IndexWriter tries to

> >> >>> achieve the lock

> >> >>> that will bring some trouble; As far as I remember we decided in  
> this

> >> >>> first release

> >> >>> to avoid multiple writer nodes because of this reasons

> >> >>> (that's written in your docs?)

> >> >>>

> >> >>> Actually it shouldn't be very hard to do, as the LockStrategy is

> >> >>> pluggable (see changes from HSEARCH-345)

> >> >>> and you could implement one delegating to an Infinispan eager lock  
> on

> >> >>> some key,

> >> >>> like the default LockStrategy takes a file lock in the index

> >> >>> directory.

> >> >>>

> >> >>> Maybe it's simpler to support this distributed writing instead of

> >> >>> sending the queue to some single

> >> >>> (elected) node? Would be cool, as the Document Analysis effort  
> would

> >> >>> be distributed,

> >> >>> but I have no idea if this would be more or less efficient than a

> >> >>> single node writing; it could

> >> >>> bring some huge data transfers along the wire during segments  
> merging

> >> >>> (basically fetching

> >> >>> the whole index data at each node performing a segment merge);  
> maybe

> >> >>> you'll need to

> >> >>> play with IndexWriter settings (

> >> >>>

> >> >>>

> >> >>>  
> http://docs.jboss.org/hibernate/stable/search/reference/en/html_single/#lucene-indexing-performance


> >> >>> )

> >> >>> probably need to find the sweet spot for "merge_factor".

> >> >>> I just saw now that MergePolicy is now re-implementable, but I hope

> >> >>> that won't be needed.

> >> >>>

> >> >>> Sanne

> >> >>>

> >> >>> 2009/9/21 Łukasz Moreń lukasz.moren at gmail.com>:

> >> >>> > Hi,

> >> >>> >

> >> >>> > I'm wondering if it is reasonable to have multiple threads/nodes

> >> >>> > that

> >> >>> > modifies indexes in Lucene Directory based on Infinispan? Let's

> >> >>> > assume

> >> >>> > that

> >> >>> > two nodes try to update index in this same time. First one  
> creates

> >> >>> > IndexWriter and obtains

> >> >>> > write lock. There is high propability that second node throws

> >> >>> > LockObtainFailedException (as one IndexWriter is allowed on  
> single

> >> >>> > index)

> >> >>> > and index is not modified. How is that? Should be always only one

> >> >>> > node

> >> >>> > that

> >> >>> > makes changes in

> >> >>> > the index?

> >> >>> >

> >> >>> > Cheers,

> >> >>> > Lukasz

> >> >>> >

> >> >>> > W dniu 15 września 2009 01:39 użytkownik Łukasz Moreń

> >> >>> > lukasz.moren at gmail.com> napisał:

> >> >>> >>

> >> >>> >> Hi,

> >> >>> >>

> >> >>> >> With using JMeter I wanted to check if Infinispan dir does not

> >> >>> >> crash

> >> >>> >> under

> >> >>> >> heavy load in "real" use and check performance in comparison  
> with

> >> >>> >> none/other

> >> >>> >> directories.

> >> >>> >> However appeared problem when multiple IndexWriters tries to  
> modify

> >> >>> >> index

> >> >>> >> (test InfinispanDirectoryTest) - random deadlocks, and Lucene

> >> >>> >> exceptions.

> >> >>> >> IndexWriter tries to access files in index that were removed

> >> >>> >> before.

> >> >>> >> I'm

> >> >>> >> looking into it, but not having good idea.

> >> >>> >>

> >> >>> >> Concerning the last part, I think similar thing is done in

> >> >>> >> InfinispanDirectoryProviderTest. Many threads are making changes

> >> >>> >> and

> >> >>> >> searching (not checking if db is in sync with index).

> >> >>> >> If threads finish their work, with Lucene query I'm checking if

> >> >>> >> index

> >> >>> >> contains as many results as expected. Maybe you meant something

> >> >>> >> else?

> >> >>> >> Would be good to run each node in different VM.

> >> >>> >>

> >> >>> >>> Great ! Looking forward to it. What state are things in at the

> >> >>> >>> moment

> >> >>> >>> if I want to play around with it ?

> >> >>> >>

> >> >>> >> Should work with with one master(updates index) and one many  
> slave

> >> >>> >> nodes

> >> >>> >> (sends changes to master). I tried with one master and one slave

> >> >>> >> (both

> >> >>> >> with

> >> >>> >> jms and jgroups backend) and worked ok. Still fails if multiple

> >> >>> >> nodes

> >> >>> >> want

> >> >>> >> to modify index.

> >> >>> >>

> >> >>> >> I've attached patch with current version.

> >> >>> >>

> >> >>> >> Cheers,

> >> >>> >> Łukasz

> >> >>> >>

> >> >>> >> 2009/9/13 Michael Neale michael.neale at gmail.com>

> >> >>> >>>

> >> >>> >>> Great ! Looking forward to it. What state are things in at the

> >> >>> >>> moment

> >> >>> >>> if I want to play around with it ?

> >> >>> >>>

> >> >>> >>> Sent from my phone.

> >> >>> >>>

> >> >>> >>> On 13/09/2009, at 7:26 PM, Sanne Grinovero

> >> >>> >>> sanne.grinovero at gmail.com>

> >> >>> >>> wrote:

> >> >>> >>>

> >> >>> >>> > 2009/9/12 Michael Neale michael.neale at gmail.com>:

> >> >>> >>> >> That does sounds pretty cool. Would be nice if the lucene

> >> >>> >>> >> indexes

> >> >>> >>> >> could scale along with how people will want to use  
> infinispan.

> >> >>> >>> >> Probably worth playing with.

> >> >>> >>> >

> >> >>> >>> > Sure, this is the goal of Łukasz's work; We know compass has

> >> >>> >>> > some good Directories, but we're building our own as one  
> based

> >> >>> >>> > on Infinispan is not yet available.

> >> >>> >>> >

> >> >>> >>> >>

> >> >>> >>> >> Sent from my phone.

> >> >>> >>> >>

> >> >>> >>> >> On 13/09/2009, at 8:37 AM, Jeff Ramsdale

> >> >>> >>> >> jeff.ramsdale at gmail.com>

> >> >>> >>> >> wrote:

> >> >>> >>> >>

> >> >>> >>> >>> I'm afraid I haven't followed the Infinispan-Lucene

> >> >>> >>> >>> implementation

> >> >>> >>> >>> closely, but have you looked at the Compass Project?

> >> >>> >>> >>> (http://www.compass-project.org/overview.html) It provides  
> a

> >> >>> >>> >>> simplified interface to Lucene (optional) as well as  
> Directory

> >> >>> >>> >>> implementations built on Terracotta, Gigaspaces and  
> Coherence.

> >> >>> >>> >>> The

> >> >>> >>> >>> latter, in particular, might be a useful guide for the

> >> >>> >>> >>> Infinispan

> >> >>> >>> >>> implementation. I believe it's mature enough to have solved

> >> >>> >>> >>> many

> >> >>> >>> >>> of

> >> >>> >>> >>> the most difficult problems of implementing Directory on a

> >> >>> >>> >>> distributed

> >> >>> >>> >>> Map.

> >> >>> >>> >>>

> >> >>> >>> >>> If someone has any experience with Compass (particularly  
> it's

> >> >>> >>> >>> Directory implementations) I'd be interested in hearing  
> about

> >> >>> >>> >>> it...

> >> >>> >>> >>> It's Apache 2.0 licensed, btw.

> >> >>> >>> >>>

> >> >>> >>> >>> -jeff

> >> >>> >>> >>> _______________________________________________

> >> >>> >>> >>> infinispan-dev mailing list

> >> >>> >>> >>> infinispan-dev at lists.jboss.org

> >> >>> >>> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev

> >> >>> >>> >> _______________________________________________

> >> >>> >>> >> infinispan-dev mailing list

> >> >>> >>> >> infinispan-dev at lists.jboss.org

> >> >>> >>> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev

> >> >>> >>> >>

> >> >>> >>> >

> >> >>> >>> > _______________________________________________

> >> >>> >>> > infinispan-dev mailing list

> >> >>> >>> > infinispan-dev at lists.jboss.org

> >> >>> >>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev

> >> >>> >>>

> >> >>> >>> _______________________________________________

> >> >>> >>> infinispan-dev mailing list

> >> >>> >>> infinispan-dev at lists.jboss.org

> >> >>> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev

> >> >>> >

> >> >>> >

> >> >>

> >> >>

> >> >

> >> >

> >> >

> >

> >






-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20090927/06121281/attachment-0002.html 


More information about the infinispan-dev mailing list