[hibernate-dev] [infinispan-dev] Infinispan tx, config and multithreading
Łukasz Moreń
lukasz.moren at gmail.com
Fri Aug 14 13:35:30 EDT 2009
Yes, using SerialMergeScheduler helped. Thanks! Restriction on merge
scheduler should be set in IndexWriterSetting enum, similarly to i.e.
max_buffered_delete_terms or max_buffered_docs properties?
2009/8/14 Emmanuel Bernard <emmanuel at hibernate.org>
> OK so let's try something.Lukasz, can you try and use
> the SerialMergeScheduler policy on the index writer and see what is going
> on.
> It will make indexing slower for Infinispan but it seems we can't do much
> better in the short time.
>
> If that works then we will put some restriction in place when reading the
> config. The index writer for this given index will be forced to use the
> serial strategy,
>
> On 14 août 09, at 06:23, Łukasz Moreń wrote:
>
> Expensive is replication to all Infinispan nodes. IndexWriter creates
> segment files, merge it to one compound segment, delete already useless
> descriptor files - many files that must be replicated. Some of them even
> don't need to be replicated because they are inserted into directory at the
> begin of index commit process and removed at the end. Batching helps with
> performance here. Yes, I think IndexWriter works like you wrote.
>
> 2009/8/14 Manik Surtani <manik at jboss.org>
>
>>
>> On 14 Aug 2009, at 10:17, Łukasz Moreń wrote:
>>
>> Yes, but i.e. FSDirectory flushes changes if any file descriptor is
>> created/updated - can be many in one IndexWriter life.
>> In infinispan case implementation, I want to commit changes only when
>> IndexWriter is closing - batch all modifications.
>> If I switch to transaction per descriptor modification - similarly how
>> it's done in FSDirectory it works well, however not efficient.
>>
>>
>> So what's expensive here? Writing to Infinispan, or the indexing itself?
>> Correct me if I am wrong, I assume that the IndexWriter creates multiple
>> threads, and each thread does: {
>> // some indexing work
>> // write these indexes to Infinispan
>> }
>>
>> Is that correct?
>>
>>
>> 2009/8/14 Sanne Grinovero <sanne.grinovero at gmail.com>
>>
>>> I am not an expert on this part of Lucene, but it looks like to me
>>> that the IndexWriter is the "driver/coordinator", and it's decisions
>>> are affected by a pluggable MergeScheduler; they do stuff on the
>>> internal buffers of the IndexWriter (dequeue the pending segments to
>>> be written to the index), but it shouldn't matter what they exactly do
>>> as the internal status of these classes are unaffected by our
>>> transactions.
>>> They take some decision about writing segments to the Directory and
>>> committing changes ("sync()") : as you implement this Directory you
>>> should only have to take care of this class, I don't think the
>>> MergeScheduler(s) are relevant: it just happens that the thread going
>>> to apply changes to the index might be a different one than the one
>>> pushing changes to the IndexWriter.
>>>
>>> In the Directory implementation you should use transactions to push
>>> state changes to the "underlying storage": as FSDirectory is playing
>>> with file descriptors and flushes, you do the same with Infinispan
>>> transactions.
>>>
>>> 2009/8/14 Łukasz Moreń <lukasz.moren at gmail.com>:
>>> > Yes, right, MergeSchedulers.
>>> >
>>> > 2009/8/14 Sanne Grinovero <sanne.grinovero at gmail.com>
>>> >>
>>> >> what are these "other" threads? Are you speaking about the
>>> >> MergeSchedulers?
>>> >>
>>> >> 2009/8/13 Łukasz Moreń <lukasz.moren at gmail.com>:
>>> >> > IndexWriter processes index update and delegates some job to other
>>> >> > threads and waits when they finish. These "other" threads works on
>>> >> > data modified
>>> >> > in IndexWriter transaction. So I think if I use transaction per
>>> >> > thread, "others" would not see data modified by IndexWriter until
>>> >> > commit.
>>> >> >
>>> >> > 2009/8/13, Emmanuel Bernard <emmanuel at hibernate.org>:
>>> >> >> Ah I thought it was using multiple threads because of your mass
>>> >> >> indexing. I did not know some threads were span specifically for
>>> the
>>> >> >> Infinispan directory.
>>> >> >>
>>> >> >> On 13 août 09, at 17:34, Sanne Grinovero wrote:
>>> >> >>
>>> >> >>> Hi Łukasz,
>>> >> >>> what is your usage of these threads? did you consider using one
>>> >> >>> transaction per thread?
>>> >> >>>
>>> >> >>> Sanne
>>> >> >>>
>>> >> >>> 2009/8/13 Łukasz Moreń <lukasz.moren at gmail.com>:
>>> >> >>>> Newly created threads were not associated with any transaction,
>>> so I
>>> >> >>>> suppose it was a problem. Sharing transaction between threads
>>> seems
>>> >> >>>> to
>>> >> >>>> be a good solution.
>>> >> >>>> Thanks for help!
>>> >> >>>>
>>> >> >>>> 2009/8/13, Jason T. Greene <jason.greene at redhat.com>:
>>> >> >>>>> Correct. Also there could be read races as well, so if you are
>>> >> >>>>> going to
>>> >> >>>>> share a tx between threads, i would use some shared lock to
>>> >> >>>>> gaurantee
>>> >> >>>>> that only one thread can use it at a time. BTW this means you
>>> have
>>> >> >>>>> to
>>> >> >>>>> properly suspend/resume the TX via the TM API as well.
>>> >> >>>>>
>>> >> >>>>> Emmanuel Bernard wrote:
>>> >> >>>>>> Modifying a transaction means applying muations (like SQL
>>> INSERT /
>>> >> >>>>>> UPDATE / DELETE) to the transactional resource?
>>> >> >>>>>>
>>> >> >>>>>> On 13 août 09, at 15:07, Jason T. Greene wrote:
>>> >> >>>>>>
>>> >> >>>>>>> When using transactions, the context is bound to the
>>> >> >>>>>>> transaction, and
>>> >> >>>>>>> you can move a transaction between threads. However, you
>>> should
>>> >> >>>>>>> only
>>> >> >>>>>>> be modifying a transaction with one thread at a time.
>>> >> >>>>>>>
>>> >> >>>>>>> Emmanuel Bernard wrote:
>>> >> >>>>>>>> Could it be that you are not using the same transaction
>>> between
>>> >> >>>>>>>> different threads (ie you physically start different ones or
>>> >> >>>>>>>> different "Infinispan contexts")?
>>> >> >>>>>>>> Infini guys, do you support transactional operation spanning
>>> >> >>>>>>>> several
>>> >> >>>>>>>> concurrent threads?
>>> >> >>>>>>>> On 13 août 09, at 14:04, Łukasz Moreń wrote:
>>> >> >>>>>>>>> I've tried with JBoss AS transaction manager and
>>> >> >>>>>>>>> JBossStandaloneTM.
>>> >> >>>>>>>>> The result is this same in all cases - error during merge.
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> 2009/8/12, Emmanuel Bernard <emmanuel at hibernate.org>:
>>> >> >>>>>>>>>> Ok I understand better now.
>>> >> >>>>>>>>>> Do your tests in JBoss AS with it's decent transaction
>>> manager
>>> >> >>>>>>>>>> (infinispan should have a config for it)
>>> >> >>>>>>>>>> For unit testing, force the indexing process in hibernate
>>> to
>>> >> >>>>>>>>>> use a
>>> >> >>>>>>>>>> single thread (I ghnk it's possible ask Sanne of you don't
>>> >> >>>>>>>>>> know how).
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> Exposing some configuration to infinispan makes sense. can
>>> you
>>> >> >>>>>>>>>> start a
>>> >> >>>>>>>>>> thread explainig what is configurable and which one you
>>> think
>>> >> >>>>>>>>>> we
>>> >> >>>>>>>>>> should expose to hsearch users. Ideally I would like to
>>> offer
>>> >> >>>>>>>>>> one or
>>> >> >>>>>>>>>> two defaut config scenarios and allow to fallback to a
>>> custom
>>> >> >>>>>>>>>> config.
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> Emmanuel
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> On 12 août 2009, at 11:58, Łukasz Moreń
>>> >> >>>>>>>>>> <lukasz.moren at gmail.com>
>>> >> >>>>>>>>>> wrote:
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>>> Sorry, but my wifi does not work well today. I will try to
>>> >> >>>>>>>>>>> explain
>>> >> >>>>>>>>>>> it more clear.
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> I'm using DummyTransactionManager available for
>>> Infinispan.
>>> >> >>>>>>>>>>> It associates transaction with the calling thread.
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> Steps to update index:
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> 1. index writer acquires lock - begin of transaction
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> 2. if it is necessary, index writer delegates new threads
>>> to
>>> >> >>>>>>>>>>> do
>>> >> >>>>>>>>>>> merge work.
>>> >> >>>>>>>>>>> Those merge threads do not see changes made so far from
>>> >> >>>>>>>>>>> begin of
>>> >> >>>>>>>>>>> transaction,
>>> >> >>>>>>>>>>> and are looking for segments which are not yet in index.
>>> >> >>>>>>>>>>> Changes will be visible when AD.3 is completed.
>>> >> >>>>>>>>>>> For tests i tried to commit transaction when merge starts
>>> >> >>>>>>>>>>> and then
>>> >> >>>>>>>>>>> everything worked well. But then i need to start it again.
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> 3. index writer releases lock - transaction is commited,
>>> all
>>> >> >>>>>>>>>>> changes
>>> >> >>>>>>>>>>> made in this transaction are visible for other threads.
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> Maybe using some other transaction manager could help?
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> What about Infinispan cache configuration? Some
>>> configuration
>>> >> >>>>>>>>>>> mechanism should be exposed to the user,
>>> >> >>>>>>>>>>> or we can hardcoded one in InfinispanDirectoryProvider is
>>> >> >>>>>>>>>>> enough?
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> 2009/8/12 Emmanuel Bernard <emmanuel at hibernate.org>
>>> >> >>>>>>>>>>> why?
>>> >> >>>>>>>>>>> Emmanuel Bernard
>>> >> >>>>>>>>>>> Pending
>>> >> >>>>>>>>>>> you there?
>>> >> >>>>>>>>>>> Emmanuel Bernard
>>> >> >>>>>>>>>>> Pending
>>> >> >>>>>>>>>>> Ok please describe in details what is going on. From what
>>> >> >>>>>>>>>>> you are
>>> >> >>>>>>>>>>> describing the tx cannot see all segments which looks like
>>> an
>>> >> >>>>>>>>>>> infinispan bug to me.
>>> >> >>>>>>>>>>> Pending
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> As a back up you can try wo transaction and see if that
>>> works
>>> >> >>>>>>>>>>> Emmanuel Bernard
>>> >> >>>>>>>>>>> Pending
>>> >> >>>>>>>>>>> technically the lucene index should cope with that
>>> >> >>>>>>>>>>> Emmanuel Bernard
>>> >> >>>>>>>>>>> 11:16
>>> >> >>>>>>>>>>> but I like this approach less
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> Let's try and chat by email IF I'm not online, I need to
>>> run
>>> >> >>>>>>>>>>> on some
>>> >> >>>>>>>>>>> errands today.
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>> _______________________________________________
>>> >> >>>>>>>> infinispan-dev mailing list
>>> >> >>>>>>>> infinispan-dev at lists.jboss.org
>>> >> >>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> >> >>>>>>>
>>> >> >>>>>>>
>>> >> >>>>>>> --
>>> >> >>>>>>> Jason T. Greene
>>> >> >>>>>>> JBoss, a division of Red Hat
>>> >> >>>>>>
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>> --
>>> >> >>>>> Jason T. Greene
>>> >> >>>>> JBoss, a division of Red Hat
>>> >> >>>>>
>>> >> >>>>
>>> >> >>>> _______________________________________________
>>> >> >>>> infinispan-dev mailing list
>>> >> >>>> infinispan-dev at lists.jboss.org
>>> >> >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> >> >>>
>>> >> >>> _______________________________________________
>>> >> >>> hibernate-dev mailing list
>>> >> >>> hibernate-dev at lists.jboss.org
>>> >> >>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>> >> >>
>>> >> >>
>>> >> >
>>> >
>>> >
>>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
>> --
>> Manik Surtani
>> manik at jboss.org
>> Lead, Infinispan
>> Lead, JBoss Cache
>> http://www.infinispan.org
>> http://www.jbosscache.org
>>
>>
>>
>>
>>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/hibernate-dev/attachments/20090814/79f8b4da/attachment.html
More information about the hibernate-dev
mailing list