[hibernate-dev] Search: dropping support for changes through IndexReader

Wed Dec 17 11:53:04 EST 2008

there's a very good article from Michael McCandless, available for free here:
http://manning.com/free/green_HotBackupsLucene.html

The API looks like Directory implementation agnostic, but I didn't try
it on other implementations than FS
(would it make sense?). In any case it doesn't rely on a specific
filesystem feature like ZFS snapshotting;
basically it avoids to delete unneeded files and returns a list of
files consistent to make up one index,
then you copy them the usual way. When you "release" the snapshot it
deletes all unused segments.

The article is not saying what would happen in case of optimization
and I still need to figure that out,
we may need to keep the lock for this type of cases.

2008/12/17 Emmanuel Bernard <emmanuel at hibernate.org>:
> I don't know the stability of snapshot but if it works well we certainly
> should move there as it releases us from the mostly useless lock code. Is
> snapshot Directory implementation agnostic? Or does it rely on FS?
>
> On  Dec 16, 2008, at 11:18, Sanne Grinovero wrote:
>
>> 2008/12/16 Hardy Ferentschik <hibernate at ferentschik.de>:
>>>
>>> Hi Sanne,
>>>
>>> On Mon, 15 Dec 2008 20:15:52 +0100, Sanne Grinovero
>>> <sanne.grinovero at gmail.com> wrote:
>>>
>>>> some more steps towards mass indexing
>>>
>>> What exactly is actually your definition of "mass indexing"? The use of
>>> FullTextSession.index()
>>> to (re-)build your index?
>>
>> I have split the problem it two:
>>  first part is to just speedup the whole process changing
>> the backend only, so people will get the new benefits using the
>> recommended procedure
>> described in the book and in the reference docs: so yes I mean using
>> "FullTextSession.index()".
>>
>>  The second part is to add a new API which wraps the complexity of
>> the object loading, and uses
>> FullTextSession.index() under the covers (I hope to). It is not really
>> clear at the moment
>> which arguments this new API will need, as people may want to finetune
>> the process in several ways,
>> but it will look like the same as the recommended approach, but using
>> several threads.
>>
>>>
>>>> do you agree I'll drop the capability to use an IndexReader to make
>>>> changes to the index?
>>>> This implies I'll simplify the backend by removing all methods working
>>>> on an IndexReader (they are not needed anymore),
>>>> and is required to reuse the IndexWriter as next improvement.
>>>
>>> I assume you are talking about removing performWork(LuceneWork work,
>>> IndexReader reader)
>>> from the LuceneWorkDelegate interface. I think that makes sense. I think
>>> we
>>> still agree
>>> that it is a good idea to only use the IndexWriter to apply changes.
>>
>> Nice I'll start with this as it really helps minimizing the complexity
>> of the backend.
>>
>>>
>>>> Also we'll need to change the FSMasterDirectoryProvider to use
>>>> Lucene's snapshotting feature instead of the DirectoryProvider lock (is
>>>> HSEARCH-152).
>>>> Another solution would be to replace the lock-holding with a signal
>>>> like "please release it to me for a moment"
>>>> or even delegate the task to the thread owning the IndexWriter.
>>>
>>> Not sure about this one. I haven't looked into this snapshot feature yet.
>>
>> This feature makes it possible
>> to make consistent copies of the index even when an IndexWriter is
>> being used to make changes.
>> I would need this to make sure the index can still be hot copied even
>> when we hold a lock
>> for longer times (avoid closing the IndexWriter).
>>
>>>
>>>> BTW this will mean the DirectoryProvider won't be used by anybody.
>>>
>>> Really?
>>
>> Sorry, of course not, I meant to say the "DirectoryProvider-specific
>> LOCK";
>> It is currently used only to achieve mutual exclusion between the
>> IndexWriter activity and the
>> FSMasterDirectoryProvider's copy task, so if we implement the snapshotting
>> it won't be useful anymore.
>>
>>>
>>> --Hardy
>>>
>>
>> Sanne
>
>