Sanne,
That error looks suspiciously familiar to an old Lucene error they had.
Could they have regressed?
John Griffin
On Sep 27, 2009 2:00pm, Łukasz Moreń <lukasz.moren(a)gmail.com> wrote:
You can try to incease TURNS_NUM (I've tried with 1000) and
THREADS_NUM
(200) fields in InfinispanDirectoryTest to make it more propable. Same
problem appears also in InfinispanDirectoryProviderTest
An example stacktrace is:
21:22:44,441 ERROR InfinispanDirectoryTest:142 - Error
java.io.IOException: File [ segments_nl ] for index [ indexName ] was not
found
at
org.hibernate.search.store.infinispan.InfinispanIndexIO$InfinispanIndexInput.(InfinispanIndexIO.java:79)
at
org.hibernate.search.store.infinispan.InfinispanDirectory.openInput(InfinispanDirectory.java:201)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:214)
at
org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:95)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:653)
at
org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:115)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:227)
at org.apache.lucene.search.IndexSearcher.(IndexSearcher.java:55)
at
org.hibernate.search.test.directoryProvider.infinispan.CacheTestSupport.doReadOperation(CacheTestSupport.java:106)
at
org.hibernate.search.test.directoryProvider.infinispan.InfinispanDirectoryTest$InfinispanDirectoryThread.run(InfinispanDirectoryTest.java:130)
Cheers,
Lukasz
2009/9/27 Sanne Grinovero sanne.grinovero(a)gmail.com>
Hi Łukasz,
I'm unable to reproduce the problem, you said it happens
randomly:
I've tried several times
and I'm not getting errors. Do you know something I could do to
make it
happen?
Could you share a stacktrace?
Anyway if you are confident it's about the segments getting lost
when
they are still being read,
you could introduce a per-segment counter of usage; like it starts
at
value 1 to mark the segment
as "most current", gets a +1 vote at each reader opening
it, -1
closing, and -1 deleting.
Each decrement method should check for the value reaching 0 to really
delete it,
and this counting method would be easy to add inside the Directory.
When opening a new indexReader, you
1) get the SegmentsInfo
2) increment all counters (eager-lock, verify>0 or retry : set
changed
counters back and get a new SegmentsInfo-->1)
3) get the needed segments
Getting a counter should be much faster than getting a segment in
case
the data is downloaded
from another node, so we can use a different key while still
relating
to the segment.
Sanne
2009/9/23 Łukasz Moreń lukasz.moren(a)gmail.com>:
> I agree that Infinispan case is not much different from
RamDirectory.
The
> major difference is that in RD (also FileDirectory) changes are
not
batched
> like in ID. If I do not wrap changes in
InfinispanDirectory(simple
remove
> tx.begin() from obtain() method and tx.commit() from release()
in
> InfinispanLock), and immediately commit every change made by IW
it works
> well. Hovewer it makes indexing really slower, because of
frequent
> replication to other nodes.
> Sanne it's good remark that IW commit is kind of flush.
>
> I've attached patch with InfinispanDirectory, failing test
is
> testDirectoryWithMultipleThreads in InfinispanDirectoryTest
class. It
fails
> randomly. I think problem is Infinispan commit on lockRelease()
in
> org.apache.lucene.index.IndexWriter (line 1658) is after IW
commit()
(line
> 1654).
>
>> Is it because, the IndexWriter only clean files if no
indexReaders are
>> reading them (how would that be detected)?
>
> It can happen if IndexWriter clean file, and IndexReader try to
access
that
> cleaned file.
>
> 2009/9/23 Sanne Grinovero sanne.grinovero(a)gmail.com>
>>
>> I agree It should work the same way; The IndexWriter cleans
files
>> whenever it likes to, it doesn't try to detect readers,
and this
>> shouldn't have any effect on the working of readers.
>> The IndexReader opens the "SegmentsInfo" first,
and immediately
>> after** gets a reference to the segments listed in this
SegmentsInfo.
>> No IndexWriter will ever change an existing segment, only
add new
>> files or eventually delete old ones (segments
merge,optimize).
>> The deletion of segments is the interesting subject: when
using Files
>> it uses "delete at last close", which works
because the IR needing it
>> have it opened already**; when using the RAMDirectory they
have a
>> reference preventing garbage collection.
>>
>> ( the two "**" are assuming the same event
occurred correctly,
>> otherwise an exception is thrown at opening)
>>
>> When using Infinispan it shouldn't be much different
than the
>> RAMDirectory? so even if the needed segment is deleted, the
IR holds a
>> reference to the Java object locally since it was opened.
>>
>> Łukcasz, do you have some failing test?
>>
> >
Sanne
>>
>> 2009/9/23 Emmanuel Bernard emmanuel(a)hibernate.org>:
>> > Conceptually I don't understand why it does work in
a pure file
system
>> > directory (ie IndexReader can go and process queries
with the
>> > IndexWriter
>> > goes about its business) and not when using
Infinispan.
>> > Is it because, the IndexWriter only clean files if no
indexReaders
are
>> > reading them (how would that be detected)?
>> > On 22 sept. 09, at 20:46, Łukasz Moreń wrote:
> >
>
>> > I need to provide this same lifecycle for IndexWriter
as for
Infinispan
>> > tx -
>> > IW is created: tx is started, IW is commited: tx is
commited. It
assures
>> > that IndexReader doesn't read old data from
directory.
>> > Infinispan transaction can be started when IW acquires
the lock, but
its
>> > commit on IW lock release, as it is done so far, causes
a problem:
> >
>
>> > index writer close {
>> > index writer commit(); //changes are visible for
IndexReaders
> >
>
>> > //Index reader starts reading here, ie tries to access
file "A"
> >
>
>> > index writer lockRelease(); //changes in Infinispan
directory are
>> > commited, file "A" was removed, IndexReader
cannot find it and
crashes
>> > }
> >
>
>> > I think Infinispan tx have to be commited just before
IW commit, and
the
>> > problem is where to put in code.
> >
>
>> > W dniu 22 września 2009 18:24 użytkownik Emmanuel
Bernard
>> > emmanuel(a)hibernate.org> napisał:
> >
>>
>> >> Can you explain in more details what is going on.
>> >> Aside from that Workspace has been Sanne's baby
lately so he will be
>> >> the
>> >> best to see what design will work in HSearch. That
being said, I
don't
>> >> like
>> >> the idea of subclassing / overriding very much. In
my experience, it
>> >> has
>> >> lead to more bad and unmaintainable code than
anything else.
>> >> On 22 sept. 09, at 02:16, Łukasz Moreń wrote:
> >
>>
>> >> Hi,
> >
>>
>> >> Thanks for explanation.
>> >> Maybe better I will concentrate on the first
release and postpone
>> >> distributed writing.
> >
>>
>> >> There is already LockStrategy that uses Infinispan.
With using it I
was
>> >> wrapping changes made by IndexWriter in Infinispan
transaction,
because
>> >> of
>> >> performance reasons -
>> >> on lock obtaining transaction was started, on lock
release
transaction
>> >> was
>> >> commited. Hovewer Ispn transaction commit on lock
release is not
good
>> >> idea
>> >> since IndexWriter calls index commit before lock is
released(and
ispn
>> >> transaction is committed).
>> >> I was thinking to override Workspace class and
getIndexWriter(start
>> >> infinispan tx), commitIndexWriter (commit tx)
methods to wrap
>> >> IndexWrite
>> >> lifecycle, but this needs few other changes. Some
other ideas?
> >
>>
>> >> Cheers,
>> >> Lukasz
> >
>>
>> >> 2009/9/21 Sanne Grinovero
sanne.grinovero(a)gmail.com>
>> >>>
> >> >>
Hi Łukasz,
>> >>> you've rightful concerns, because the way
the IndexWriter tries to
>> >>> achieve the lock
>> >>> that will bring some trouble; As far as I
remember we decided in
this
>> >>> first release
>> >>> to avoid multiple writer nodes because of this
reasons
>> >>> (that's written in your docs?)
>> >>>
>> >>> Actually it shouldn't be very hard to do,
as the LockStrategy is
>> >>> pluggable (see changes from HSEARCH-345)
>> >>> and you could implement one delegating to an
Infinispan eager lock
on
>> >>> some key,
>> >>> like the default LockStrategy takes a file lock
in the index
>> >>> directory.
>> >>>
>> >>> Maybe it's simpler to support this
distributed writing instead of
>> >>> sending the queue to some single
>> >>> (elected) node? Would be cool, as the Document
Analysis effort
would
>> >>> be distributed,
>> >>> but I have no idea if this would be more or
less efficient than a
>> >>> single node writing; it could
>> >>> bring some huge data transfers along the wire
during segments
merging
>> >>> (basically fetching
>> >>> the whole index data at each node performing a
segment merge);
maybe
>> >>> you'll need to
>> >>> play with IndexWriter settings (
>> >>>
>> >>>
>> >>> )
>> >>> probably need to find the sweet spot for
"merge_factor".
>> >>> I just saw now that MergePolicy is now
re-implementable, but I hope
>> >>> that won't be needed.
>> >>>
> >> >>
Sanne
>> >>>
>> >>> 2009/9/21 Łukasz Moreń
lukasz.moren(a)gmail.com>:
>> >>> > Hi,
> >> >>
>
>> >>> > I'm wondering if it is reasonable to
have multiple threads/nodes
>> >>> > that
>> >>> > modifies indexes in Lucene Directory based
on Infinispan? Let's
>> >>> > assume
>> >>> > that
>> >>> > two nodes try to update index in this same
time. First one
creates
>> >>> > IndexWriter and obtains
>> >>> > write lock. There is high propability that
second node throws
>> >>> > LockObtainFailedException (as one
IndexWriter is allowed on
single
>> >>> > index)
>> >>> > and index is not modified. How is that?
Should be always only one
>> >>> > node
>> >>> > that
>> >>> > makes changes in
>> >>> > the index?
> >> >>
>
>> >>> > Cheers,
>> >>> > Lukasz
> >> >>
>
>> >>> > W dniu 15 września 2009 01:39 użytkownik
Łukasz Moreń
>> >>> > lukasz.moren(a)gmail.com> napisał:
> >> >>
>>
>> >>> >> Hi,
> >> >>
>>
>> >>> >> With using JMeter I wanted to check if
Infinispan dir does not
>> >>> >> crash
>> >>> >> under
>> >>> >> heavy load in "real" use and
check performance in comparison
with
>> >>> >> none/other
>> >>> >> directories.
>> >>> >> However appeared problem when multiple
IndexWriters tries to
modify
>> >>> >> index
>> >>> >> (test InfinispanDirectoryTest) -
random deadlocks, and Lucene
>> >>> >> exceptions.
>> >>> >> IndexWriter tries to access files in
index that were removed
>> >>> >> before.
>> >>> >> I'm
>> >>> >> looking into it, but not having good
idea.
> >> >>
>>
>> >>> >> Concerning the last part, I think
similar thing is done in
>> >>> >> InfinispanDirectoryProviderTest. Many
threads are making changes
>> >>> >> and
>> >>> >> searching (not checking if db is in
sync with index).
>> >>> >> If threads finish their work, with
Lucene query I'm checking if
>> >>> >> index
>> >>> >> contains as many results as expected.
Maybe you meant something
>> >>> >> else?
>> >>> >> Would be good to run each node in
different VM.
> >> >>
>>
>> >>> >>> Great ! Looking forward to it.
What state are things in at the
>> >>> >>> moment
>> >>> >>> if I want to play around with it
?
> >> >>
>>
>> >>> >> Should work with with one
master(updates index) and one many
slave
>> >>> >> nodes
>> >>> >> (sends changes to master). I tried
with one master and one slave
>> >>> >> (both
>> >>> >> with
>> >>> >> jms and jgroups backend) and worked
ok. Still fails if multiple
>> >>> >> nodes
>> >>> >> want
>> >>> >> to modify index.
> >> >>
>>
>> >>> >> I've attached patch with current
version.
> >> >>
>>
>> >>> >> Cheers,
>> >>> >> Łukasz
> >> >>
>>
>> >>> >> 2009/9/13 Michael Neale
michael.neale(a)gmail.com>
>> >>> >>>
>> >>> >>> Great ! Looking forward to it.
What state are things in at the
>> >>> >>> moment
>> >>> >>> if I want to play around with it
?
>> >>> >>>
>> >>> >>> Sent from my phone.
>> >>> >>>
>> >>> >>> On 13/09/2009, at 7:26 PM, Sanne
Grinovero
>> >>> >>> sanne.grinovero(a)gmail.com>
>> >>> >>> wrote:
>> >>> >>>
>> >>> >>> > 2009/9/12 Michael Neale
michael.neale(a)gmail.com>:
>> >>> >>> >> That does sounds pretty
cool. Would be nice if the lucene
>> >>> >>> >> indexes
>> >>> >>> >> could scale along with
how people will want to use
infinispan.
>> >>> >>> >> Probably worth playing
with.
> >> >>> >>
>
>> >>> >>> > Sure, this is the goal of
Łukasz's work; We know compass has
>> >>> >>> > some good Directories, but
we're building our own as one
based
>> >>> >>> > on Infinispan is not yet
available.
> >> >>> >>
>
> >> >>> >>
>>
>> >>> >>> >> Sent from my phone.
> >> >>> >>
>>
>> >>> >>> >> On 13/09/2009, at 8:37
AM, Jeff Ramsdale
>> >>> >>> >>
jeff.ramsdale(a)gmail.com>
>> >>> >>> >> wrote:
> >> >>> >>
>>
>> >>> >>> >>> I'm afraid I
haven't followed the Infinispan-Lucene
>> >>> >>> >>> implementation
>> >>> >>> >>> closely, but have you
looked at the Compass Project?
>> >>> >>> >>> simplified interface
to Lucene (optional) as well as
Directory
>> >>> >>> >>> implementations built
on Terracotta, Gigaspaces and
Coherence.
>> >>> >>> >>> The
>> >>> >>> >>> latter, in
particular, might be a useful guide for the
>> >>> >>> >>> Infinispan
>> >>> >>> >>> implementation. I
believe it's mature enough to have solved
>> >>> >>> >>> many
>> >>> >>> >>> of
>> >>> >>> >>> the most difficult
problems of implementing Directory on a
>> >>> >>> >>> distributed
>> >>> >>> >>> Map.
>> >>> >>> >>>
>> >>> >>> >>> If someone has any
experience with Compass (particularly
it's
>> >>> >>> >>> Directory
implementations) I'd be interested in hearing
about
>> >>> >>> >>> it...
>> >>> >>> >>> It's Apache 2.0
licensed, btw.
>> >>> >>> >>>
>> >>> >>> >>> -jeff
>> >>> >>> >>>
_______________________________________________
>> >>> >>> >>> infinispan-dev
mailing list
>> >>> >>> >>>
infinispan-dev(a)lists.jboss.org
>> >>> >>> >>
_______________________________________________
>> >>> >>> >> infinispan-dev mailing
list
>> >>> >>> >>
infinispan-dev(a)lists.jboss.org
> >> >>> >>
>>
> >> >>> >>
>
>> >>> >>> >
_______________________________________________
>> >>> >>> > infinispan-dev mailing list
>> >>> >>> >
infinispan-dev(a)lists.jboss.org
>> >>> >>>
>> >>> >>>
_______________________________________________
>> >>> >>> infinispan-dev mailing list
>> >>> >>> infinispan-dev(a)lists.jboss.org
> >> >>
>
> >> >>
>
> >
>>
> >
>>
> >
>
> >
>
> >
>
>
>