Good idea. Anybody have traction with the compass folk to propose this?
Even if Infinispan ends up hosting it there might be value in doing
some cross-pollination with the Compass folks since this aligns
directly with what they are working on.
-jeff
2009/9/24 Manik Surtani <manik(a)jboss.org>:
> Minorly off topic, but rather than working with patches, do we want
> this
> Directory impl in source control somewhere?
> Being dependent on LGPL, it won't be accepted into Lucene's
> contribs. If it
> doesn't depend on any Hibernate Search code, I could host it in
> Infinispan's
> SVN repo...
>
> On 23 Sep 2009, at 13:58, Łukasz Moreń wrote:
>
> I agree that Infinispan case is not much different from
> RamDirectory. The
> major difference is that in RD (also FileDirectory) changes are not
> batched
> like in ID. If I do not wrap changes in InfinispanDirectory(simple
> remove
> tx.begin() from obtain() method and tx.commit() from release() in
> InfinispanLock), and immediately commit every change made by IW it
> works
> well. Hovewer it makes indexing really slower, because of frequent
> replication to other nodes.
> Sanne it's good remark that IW commit is kind of flush.
>
> I've attached patch with InfinispanDirectory, failing test is
> testDirectoryWithMultipleThreads in InfinispanDirectoryTest class.
> It fails
> randomly. I think problem is Infinispan commit on lockRelease() in
> org.apache.lucene.index.IndexWriter (line 1658) is after IW commit
> () (line
> 1654).
>
>> Is it because, the IndexWriter only clean files if no indexReaders
>> are
>> reading them (how would that be detected)?
>
> It can happen if IndexWriter clean file, and IndexReader try to
> access that
> cleaned file.
>
> 2009/9/23 Sanne Grinovero <sanne.grinovero(a)gmail.com>
>>
>> I agree It should work the same way; The IndexWriter cleans files
>> whenever it likes to, it doesn't try to detect readers, and this
>> shouldn't have any effect on the working of readers.
>> The IndexReader opens the "SegmentsInfo" first, and immediately
>> after** gets a reference to the segments listed in this
>> SegmentsInfo.
>> No IndexWriter will ever change an existing segment, only add new
>> files or eventually delete old ones (segments merge,optimize).
>> The deletion of segments is the interesting subject: when using
>> Files
>> it uses "delete at last close", which works because the IR needing
>> it
>> have it opened already**; when using the RAMDirectory they have a
>> reference preventing garbage collection.
>>
>> ( the two "**" are assuming the same event occurred correctly,
>> otherwise an exception is thrown at opening)
>>
>> When using Infinispan it shouldn't be much different than the
>> RAMDirectory? so even if the needed segment is deleted, the IR
>> holds a
>> reference to the Java object locally since it was opened.
>>
>> Łukcasz, do you have some failing test?
>>
>> Sanne
>>
>> 2009/9/23 Emmanuel Bernard <emmanuel(a)hibernate.org>:
>>> Conceptually I don't understand why it does work in a pure file
>>> system
>>> directory (ie IndexReader can go and process queries with the
>>> IndexWriter
>>> goes about its business) and not when using Infinispan.
>>> Is it because, the IndexWriter only clean files if no
>>> indexReaders are
>>> reading them (how would that be detected)?
>>> On 22 sept. 09, at 20:46, Łukasz Moreń wrote:
>>>
>>> I need to provide this same lifecycle for IndexWriter as for
>>> Infinispan
>>> tx -
>>> IW is created: tx is started, IW is commited: tx is commited. It
>>> assures
>>> that IndexReader doesn't read old data from directory.
>>> Infinispan transaction can be started when IW acquires the lock,
>>> but its
>>> commit on IW lock release, as it is done so far, causes a problem:
>>>
>>> index writer close {
>>> index writer commit(); //changes are visible for IndexReaders
>>>
>>> //Index reader starts reading here, i.e. tries to access
>>> file "A"
>>>
>>> index writer lockRelease(); //changes in Infinispan directory are
>>> commited, file "A" was removed, IndexReader cannot find it and
>>> crashes
>>> }
>>>
>>> I think Infinispan tx have to be commited just before IW commit,
>>> and the
>>> problem is where to put in code.
>>>
>>> W dniu 22 września 2009 18:24 użytkownik Emmanuel Bernard
>>> <emmanuel(a)hibernate.org> napisał:
>>>>
>>>> Can you explain in more details what is going on.
>>>> Aside from that Workspace has been Sanne's baby lately so he
>>>> will be
>>>> the
>>>> best to see what design will work in HSearch. That being said, I
>>>> don't
>>>> like
>>>> the idea of subclassing / overriding very much. In my
>>>> experience, it
>>>> has
>>>> lead to more bad and unmaintainable code than anything else.
>>>> On 22 sept. 09, at 02:16, Łukasz Moreń wrote:
>>>>
>>>> Hi,
>>>>
>>>> Thanks for explanation.
>>>> Maybe better I will concentrate on the first release and postpone
>>>> distributed writing.
>>>>
>>>> There is already LockStrategy that uses Infinispan. With using
>>>> it I was
>>>> wrapping changes made by IndexWriter in Infinispan transaction,
>>>> because
>>>> of
>>>> performance reasons -
>>>> on lock obtaining transaction was started, on lock release
>>>> transaction
>>>> was
>>>> commited. Hovewer Ispn transaction commit on lock release is not
>>>> good
>>>> idea
>>>> since IndexWriter calls index commit before lock is released(and
>>>> ispn
>>>> transaction is committed).
>>>> I was thinking to override Workspace class and getIndexWriter
>>>> (start
>>>> infinispan tx), commitIndexWriter (commit tx) methods to wrap
>>>> IndexWrite
>>>> lifecycle, but this needs few other changes. Some other ideas?
>>>>
>>>> Cheers,
>>>> Lukasz
>>>>
>>>> 2009/9/21 Sanne Grinovero <sanne.grinovero(a)gmail.com>
>>>>>
>>>>> Hi Łukasz,
>>>>> you've rightful concerns, because the way the IndexWriter tries
>>>>> to
>>>>> achieve the lock
>>>>> that will bring some trouble; As far as I remember we decided
>>>>> in this
>>>>> first release
>>>>> to avoid multiple writer nodes because of this reasons
>>>>> (that's written in your docs?)
>>>>>
>>>>> Actually it shouldn't be very hard to do, as the LockStrategy is
>>>>> pluggable (see changes from HSEARCH-345)
>>>>> and you could implement one delegating to an Infinispan eager
>>>>> lock on
>>>>> some key,
>>>>> like the default LockStrategy takes a file lock in the index
>>>>> directory.
>>>>>
>>>>> Maybe it's simpler to support this distributed writing instead
of
>>>>> sending the queue to some single
>>>>> (elected) node? Would be cool, as the Document Analysis effort
>>>>> would
>>>>> be distributed,
>>>>> but I have no idea if this would be more or less efficient than a
>>>>> single node writing; it could
>>>>> bring some huge data transfers along the wire during segments
>>>>> merging
>>>>> (basically fetching
>>>>> the whole index data at each node performing a segment merge);
>>>>> maybe
>>>>> you'll need to
>>>>> play with IndexWriter settings (
>>>>>
>>>>>
>>>>>
http://docs.jboss.org/hibernate/stable/search/reference/en/html_single/#l...
>>>>> )
>>>>> probably need to find the sweet spot for "merge_factor".
>>>>> I just saw now that MergePolicy is now re-implementable, but I
>>>>> hope
>>>>> that won't be needed.
>>>>>
>>>>> Sanne
>>>>>
>>>>> 2009/9/21 Łukasz Moreń <lukasz.moren(a)gmail.com>:
>>>>>> Hi,
>>>>>>
>>>>>> I'm wondering if it is reasonable to have multiple
threads/nodes
>>>>>> that
>>>>>> modifies indexes in Lucene Directory based on Infinispan?
Let's
>>>>>> assume
>>>>>> that
>>>>>> two nodes try to update index in this same time. First one
>>>>>> creates
>>>>>> IndexWriter and obtains
>>>>>> write lock. There is high propability that second node throws
>>>>>> LockObtainFailedException (as one IndexWriter is allowed on
>>>>>> single
>>>>>> index)
>>>>>> and index is not modified. How is that? Should be always only
>>>>>> one
>>>>>> node
>>>>>> that
>>>>>> makes changes in
>>>>>> the index?
>>>>>>
>>>>>> Cheers,
>>>>>> Lukasz
>>>>>>
>>>>>> W dniu 15 września 2009 01:39 użytkownik Łukasz Moreń
>>>>>> <lukasz.moren(a)gmail.com> napisał:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> With using JMeter I wanted to check if Infinispan dir does
not
>>>>>>> crash
>>>>>>> under
>>>>>>> heavy load in "real" use and check performance in
comparison
>>>>>>> with
>>>>>>> none/other
>>>>>>> directories.
>>>>>>> However appeared problem when multiple IndexWriters tries to
>>>>>>> modify
>>>>>>> index
>>>>>>> (test InfinispanDirectoryTest) - random deadlocks, and
Lucene
>>>>>>> exceptions.
>>>>>>> IndexWriter tries to access files in index that were removed
>>>>>>> before.
>>>>>>> I'm
>>>>>>> looking into it, but not having good idea.
>>>>>>>
>>>>>>> Concerning the last part, I think similar thing is done in
>>>>>>> InfinispanDirectoryProviderTest. Many threads are making
>>>>>>> changes
>>>>>>> and
>>>>>>> searching (not checking if db is in sync with index).
>>>>>>> If threads finish their work, with Lucene query I'm
checking if
>>>>>>> index
>>>>>>> contains as many results as expected. Maybe you meant
something
>>>>>>> else?
>>>>>>> Would be good to run each node in different VM.
>>>>>>>
>>>>>>>> Great ! Looking forward to it. What state are things in
at the
>>>>>>>> moment
>>>>>>>> if I want to play around with it ?
>>>>>>>
>>>>>>> Should work with with one master(updates index) and one many
>>>>>>> slave
>>>>>>> nodes
>>>>>>> (sends changes to master). I tried with one master and one
>>>>>>> slave
>>>>>>> (both
>>>>>>> with
>>>>>>> jms and jgroups backend) and worked ok. Still fails if
multiple
>>>>>>> nodes
>>>>>>> want
>>>>>>> to modify index.
>>>>>>>
>>>>>>> I've attached patch with current version.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Łukasz
>>>>>>>
>>>>>>> 2009/9/13 Michael Neale <michael.neale(a)gmail.com>
>>>>>>>>
>>>>>>>> Great ! Looking forward to it. What state are things in
at the
>>>>>>>> moment
>>>>>>>> if I want to play around with it ?
>>>>>>>>
>>>>>>>> Sent from my phone.
>>>>>>>>
>>>>>>>> On 13/09/2009, at 7:26 PM, Sanne Grinovero
>>>>>>>> <sanne.grinovero(a)gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> 2009/9/12 Michael Neale
<michael.neale(a)gmail.com>:
>>>>>>>>>> That does sounds pretty cool. Would be nice if
the lucene
>>>>>>>>>> indexes
>>>>>>>>>> could scale along with how people will want to
use
>>>>>>>>>> infinispan.
>>>>>>>>>> Probably worth playing with.
>>>>>>>>>
>>>>>>>>> Sure, this is the goal of Łukasz's work; We know
compass has
>>>>>>>>> some good Directories, but we're building our own
as one
>>>>>>>>> based
>>>>>>>>> on Infinispan is not yet available.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Sent from my phone.
>>>>>>>>>>
>>>>>>>>>> On 13/09/2009, at 8:37 AM, Jeff Ramsdale
>>>>>>>>>> <jeff.ramsdale(a)gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I'm afraid I haven't followed the
Infinispan-Lucene
>>>>>>>>>>> implementation
>>>>>>>>>>> closely, but have you looked at the Compass
Project?
>>>>>>>>>>>
(
http://www.compass-project.org/overview.html) It
>>>>>>>>>>> provides a
>>>>>>>>>>> simplified interface to Lucene (optional) as
well as
>>>>>>>>>>> Directory
>>>>>>>>>>> implementations built on Terracotta,
Gigaspaces and
>>>>>>>>>>> Coherence.
>>>>>>>>>>> The
>>>>>>>>>>> latter, in particular, might be a useful
guide for the
>>>>>>>>>>> Infinispan
>>>>>>>>>>> implementation. I believe it's mature
enough to have solved
>>>>>>>>>>> many
>>>>>>>>>>> of
>>>>>>>>>>> the most difficult problems of implementing
Directory on a
>>>>>>>>>>> distributed
>>>>>>>>>>> Map.
>>>>>>>>>>>
>>>>>>>>>>> If someone has any experience with Compass
(particularly
>>>>>>>>>>> it's
>>>>>>>>>>> Directory implementations) I'd be
interested in hearing
>>>>>>>>>>> about
>>>>>>>>>>> it...
>>>>>>>>>>> It's Apache 2.0 licensed, btw.
>>>>>>>>>>>
>>>>>>>>>>> -jeff
>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>> infinispan-dev mailing list
>>>>>>>>>>> infinispan-dev(a)lists.jboss.org
>>>>>>>>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>>>>> _______________________________________________
>>>>>>>>>> infinispan-dev mailing list
>>>>>>>>>> infinispan-dev(a)lists.jboss.org
>>>>>>>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> infinispan-dev mailing list
>>>>>>>>> infinispan-dev(a)lists.jboss.org
>>>>>>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> infinispan-dev mailing list
>>>>>>>> infinispan-dev(a)lists.jboss.org
>>>>>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>
>>>>>>
>>>>
>>>>
>>>
>>>
>>>
>
> <
> InfinispanDirectoryProvider_22_09_2009
> .patch>_______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
> Manik Surtani
> manik(a)jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
>
http://www.infinispan.org
>
http://www.jbosscache.org
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev