[hibernate-dev] [infinispan-dev] Infinispan tx, config and multithreading
Emmanuel Bernard
emmanuel at hibernate.org
Fri Aug 14 13:42:59 EDT 2009
I think so.
But ideally I would like:
- to add the property to set the mergerscheduler and expose that
- automatically force the setting if the IW is used by Infinispan
(not sure how easy that is).
On 14 août 09, at 13:35, Łukasz Moreń wrote:
> Yes, using SerialMergeScheduler helped. Thanks! Restriction on merge
> scheduler should be set in IndexWriterSetting enum, similarly to
> i.e. max_buffered_delete_terms or max_buffered_docs properties?
>
> 2009/8/14 Emmanuel Bernard <emmanuel at hibernate.org>
> OK so let's try something.
> Lukasz, can you try and use the SerialMergeScheduler policy on the
> index writer and see what is going on.
> It will make indexing slower for Infinispan but it seems we can't do
> much better in the short time.
>
> If that works then we will put some restriction in place when
> reading the config. The index writer for this given index will be
> forced to use the serial strategy,
>
> On 14 août 09, at 06:23, Łukasz Moreń wrote:
>
>> Expensive is replication to all Infinispan nodes. IndexWriter
>> creates segment files, merge it to one compound segment, delete
>> already useless descriptor files - many files that must be
>> replicated. Some of them even don't need to be replicated because
>> they are inserted into directory at the begin of index commit
>> process and removed at the end. Batching helps with performance here.
>> Yes, I think IndexWriter works like you wrote.
>>
>> 2009/8/14 Manik Surtani <manik at jboss.org>
>>
>> On 14 Aug 2009, at 10:17, Łukasz Moreń wrote:
>>
>>> Yes, but i.e. FSDirectory flushes changes if any file descriptor
>>> is created/updated - can be many in one IndexWriter life.
>>> In infinispan case implementation, I want to commit changes only
>>> when IndexWriter is closing - batch all modifications.
>>> If I switch to transaction per descriptor modification - similarly
>>> how it's done in FSDirectory it works well, however not efficient.
>>
>> So what's expensive here? Writing to Infinispan, or the indexing
>> itself? Correct me if I am wrong, I assume that the IndexWriter
>> creates multiple threads, and each thread does: {
>> // some indexing work
>> // write these indexes to Infinispan
>> }
>>
>> Is that correct?
>>
>>>
>>> 2009/8/14 Sanne Grinovero <sanne.grinovero at gmail.com>
>>> I am not an expert on this part of Lucene, but it looks like to me
>>> that the IndexWriter is the "driver/coordinator", and it's decisions
>>> are affected by a pluggable MergeScheduler; they do stuff on the
>>> internal buffers of the IndexWriter (dequeue the pending segments to
>>> be written to the index), but it shouldn't matter what they
>>> exactly do
>>> as the internal status of these classes are unaffected by our
>>> transactions.
>>> They take some decision about writing segments to the Directory and
>>> committing changes ("sync()") : as you implement this Directory you
>>> should only have to take care of this class, I don't think the
>>> MergeScheduler(s) are relevant: it just happens that the thread
>>> going
>>> to apply changes to the index might be a different one than the one
>>> pushing changes to the IndexWriter.
>>>
>>> In the Directory implementation you should use transactions to push
>>> state changes to the "underlying storage": as FSDirectory is playing
>>> with file descriptors and flushes, you do the same with Infinispan
>>> transactions.
>>>
>>> 2009/8/14 Łukasz Moreń <lukasz.moren at gmail.com>:
>>> > Yes, right, MergeSchedulers.
>>> >
>>> > 2009/8/14 Sanne Grinovero <sanne.grinovero at gmail.com>
>>> >>
>>> >> what are these "other" threads? Are you speaking about the
>>> >> MergeSchedulers?
>>> >>
>>> >> 2009/8/13 Łukasz Moreń <lukasz.moren at gmail.com>:
>>> >> > IndexWriter processes index update and delegates some job to
>>> other
>>> >> > threads and waits when they finish. These "other" threads
>>> works on
>>> >> > data modified
>>> >> > in IndexWriter transaction. So I think if I use transaction per
>>> >> > thread, "others" would not see data modified by IndexWriter
>>> until
>>> >> > commit.
>>> >> >
>>> >> > 2009/8/13, Emmanuel Bernard <emmanuel at hibernate.org>:
>>> >> >> Ah I thought it was using multiple threads because of your
>>> mass
>>> >> >> indexing. I did not know some threads were span specifically
>>> for the
>>> >> >> Infinispan directory.
>>> >> >>
>>> >> >> On 13 août 09, at 17:34, Sanne Grinovero wrote:
>>> >> >>
>>> >> >>> Hi Łukasz,
>>> >> >>> what is your usage of these threads? did you consider using
>>> one
>>> >> >>> transaction per thread?
>>> >> >>>
>>> >> >>> Sanne
>>> >> >>>
>>> >> >>> 2009/8/13 Łukasz Moreń <lukasz.moren at gmail.com>:
>>> >> >>>> Newly created threads were not associated with any
>>> transaction, so I
>>> >> >>>> suppose it was a problem. Sharing transaction between
>>> threads seems
>>> >> >>>> to
>>> >> >>>> be a good solution.
>>> >> >>>> Thanks for help!
>>> >> >>>>
>>> >> >>>> 2009/8/13, Jason T. Greene <jason.greene at redhat.com>:
>>> >> >>>>> Correct. Also there could be read races as well, so if
>>> you are
>>> >> >>>>> going to
>>> >> >>>>> share a tx between threads, i would use some shared lock to
>>> >> >>>>> gaurantee
>>> >> >>>>> that only one thread can use it at a time. BTW this means
>>> you have
>>> >> >>>>> to
>>> >> >>>>> properly suspend/resume the TX via the TM API as well.
>>> >> >>>>>
>>> >> >>>>> Emmanuel Bernard wrote:
>>> >> >>>>>> Modifying a transaction means applying muations (like
>>> SQL INSERT /
>>> >> >>>>>> UPDATE / DELETE) to the transactional resource?
>>> >> >>>>>>
>>> >> >>>>>> On 13 août 09, at 15:07, Jason T. Greene wrote:
>>> >> >>>>>>
>>> >> >>>>>>> When using transactions, the context is bound to the
>>> >> >>>>>>> transaction, and
>>> >> >>>>>>> you can move a transaction between threads. However,
>>> you should
>>> >> >>>>>>> only
>>> >> >>>>>>> be modifying a transaction with one thread at a time.
>>> >> >>>>>>>
>>> >> >>>>>>> Emmanuel Bernard wrote:
>>> >> >>>>>>>> Could it be that you are not using the same
>>> transaction between
>>> >> >>>>>>>> different threads (ie you physically start different
>>> ones or
>>> >> >>>>>>>> different "Infinispan contexts")?
>>> >> >>>>>>>> Infini guys, do you support transactional operation
>>> spanning
>>> >> >>>>>>>> several
>>> >> >>>>>>>> concurrent threads?
>>> >> >>>>>>>> On 13 août 09, at 14:04, Łukasz Moreń wrote:
>>> >> >>>>>>>>> I've tried with JBoss AS transaction manager and
>>> >> >>>>>>>>> JBossStandaloneTM.
>>> >> >>>>>>>>> The result is this same in all cases - error during
>>> merge.
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> 2009/8/12, Emmanuel Bernard <emmanuel at hibernate.org>:
>>> >> >>>>>>>>>> Ok I understand better now.
>>> >> >>>>>>>>>> Do your tests in JBoss AS with it's decent
>>> transaction manager
>>> >> >>>>>>>>>> (infinispan should have a config for it)
>>> >> >>>>>>>>>> For unit testing, force the indexing process in
>>> hibernate to
>>> >> >>>>>>>>>> use a
>>> >> >>>>>>>>>> single thread (I ghnk it's possible ask Sanne of you
>>> don't
>>> >> >>>>>>>>>> know how).
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> Exposing some configuration to infinispan makes
>>> sense. can you
>>> >> >>>>>>>>>> start a
>>> >> >>>>>>>>>> thread explainig what is configurable and which one
>>> you think
>>> >> >>>>>>>>>> we
>>> >> >>>>>>>>>> should expose to hsearch users. Ideally I would like
>>> to offer
>>> >> >>>>>>>>>> one or
>>> >> >>>>>>>>>> two defaut config scenarios and allow to fallback to
>>> a custom
>>> >> >>>>>>>>>> config.
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> Emmanuel
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> On 12 août 2009, at 11:58, Łukasz Moreń
>>> >> >>>>>>>>>> <lukasz.moren at gmail.com>
>>> >> >>>>>>>>>> wrote:
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>>> Sorry, but my wifi does not work well today. I will
>>> try to
>>> >> >>>>>>>>>>> explain
>>> >> >>>>>>>>>>> it more clear.
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> I'm using DummyTransactionManager available for
>>> Infinispan.
>>> >> >>>>>>>>>>> It associates transaction with the calling thread.
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> Steps to update index:
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> 1. index writer acquires lock - begin of transaction
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> 2. if it is necessary, index writer delegates new
>>> threads to
>>> >> >>>>>>>>>>> do
>>> >> >>>>>>>>>>> merge work.
>>> >> >>>>>>>>>>> Those merge threads do not see changes made so far
>>> from
>>> >> >>>>>>>>>>> begin of
>>> >> >>>>>>>>>>> transaction,
>>> >> >>>>>>>>>>> and are looking for segments which are not yet in
>>> index.
>>> >> >>>>>>>>>>> Changes will be visible when AD.3 is completed.
>>> >> >>>>>>>>>>> For tests i tried to commit transaction when merge
>>> starts
>>> >> >>>>>>>>>>> and then
>>> >> >>>>>>>>>>> everything worked well. But then i need to start it
>>> again.
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> 3. index writer releases lock - transaction is
>>> commited, all
>>> >> >>>>>>>>>>> changes
>>> >> >>>>>>>>>>> made in this transaction are visible for other
>>> threads.
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> Maybe using some other transaction manager could
>>> help?
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> What about Infinispan cache configuration? Some
>>> configuration
>>> >> >>>>>>>>>>> mechanism should be exposed to the user,
>>> >> >>>>>>>>>>> or we can hardcoded one in
>>> InfinispanDirectoryProvider is
>>> >> >>>>>>>>>>> enough?
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> 2009/8/12 Emmanuel Bernard <emmanuel at hibernate.org>
>>> >> >>>>>>>>>>> why?
>>> >> >>>>>>>>>>> Emmanuel Bernard
>>> >> >>>>>>>>>>> Pending
>>> >> >>>>>>>>>>> you there?
>>> >> >>>>>>>>>>> Emmanuel Bernard
>>> >> >>>>>>>>>>> Pending
>>> >> >>>>>>>>>>> Ok please describe in details what is going on.
>>> From what
>>> >> >>>>>>>>>>> you are
>>> >> >>>>>>>>>>> describing the tx cannot see all segments which
>>> looks like an
>>> >> >>>>>>>>>>> infinispan bug to me.
>>> >> >>>>>>>>>>> Pending
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> As a back up you can try wo transaction and see if
>>> that works
>>> >> >>>>>>>>>>> Emmanuel Bernard
>>> >> >>>>>>>>>>> Pending
>>> >> >>>>>>>>>>> technically the lucene index should cope with that
>>> >> >>>>>>>>>>> Emmanuel Bernard
>>> >> >>>>>>>>>>> 11:16
>>> >> >>>>>>>>>>> but I like this approach less
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> Let's try and chat by email IF I'm not online, I
>>> need to run
>>> >> >>>>>>>>>>> on some
>>> >> >>>>>>>>>>> errands today.
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>> _______________________________________________
>>> >> >>>>>>>> infinispan-dev mailing list
>>> >> >>>>>>>> infinispan-dev at lists.jboss.org
>>> >> >>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> >> >>>>>>>
>>> >> >>>>>>>
>>> >> >>>>>>> --
>>> >> >>>>>>> Jason T. Greene
>>> >> >>>>>>> JBoss, a division of Red Hat
>>> >> >>>>>>
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>> --
>>> >> >>>>> Jason T. Greene
>>> >> >>>>> JBoss, a division of Red Hat
>>> >> >>>>>
>>> >> >>>>
>>> >> >>>> _______________________________________________
>>> >> >>>> infinispan-dev mailing list
>>> >> >>>> infinispan-dev at lists.jboss.org
>>> >> >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> >> >>>
>>> >> >>> _______________________________________________
>>> >> >>> hibernate-dev mailing list
>>> >> >>> hibernate-dev at lists.jboss.org
>>> >> >>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>> >> >>
>>> >> >>
>>> >> >
>>> >
>>> >
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> --
>> Manik Surtani
>> manik at jboss.org
>> Lead, Infinispan
>> Lead, JBoss Cache
>> http://www.infinispan.org
>> http://www.jbosscache.org
>>
>>
>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/hibernate-dev/attachments/20090814/14d5ca31/attachment.html
More information about the hibernate-dev
mailing list