[infinispan-dev] [hibernate-dev] Infinispan tx, config and multithreading

Manik Surtani manik at jboss.org
Fri Aug 14 06:32:06 EDT 2009


On 14 Aug 2009, at 11:23, Łukasz Moreń wrote:

> Expensive is replication to all Infinispan nodes. IndexWriter  
> creates segment files, merge it to one compound segment, delete  
> already useless descriptor files - many files that must be  
> replicated. Some of them even don't need to be replicated because  
> they are inserted into directory at the begin of index commit  
> process and removed at the end. Batching helps with performance here.
> Yes, I think IndexWriter works like you wrote.

Then perhaps some of the cost can be mitigated by designing the  
IndexWriter threads to do : {
	// some indexing work
	// write index fragment to a (concurrent) queue
}

then you have a single thread that polls the queue and writes this  
stuff to Infinispan.  This single thread can then either use a batch  
or transaction to scope the changes (but if you use a batch/ 
transaction the writer thread would need to know when to end this batch)



>
> 2009/8/14 Manik Surtani <manik at jboss.org>
>
> On 14 Aug 2009, at 10:17, Łukasz Moreń wrote:
>
>> Yes, but i.e. FSDirectory flushes changes if any file descriptor is  
>> created/updated - can be many in one IndexWriter life.
>> In infinispan case implementation, I want to commit changes only  
>> when IndexWriter is closing - batch all modifications.
>> If I switch to transaction per descriptor modification - similarly  
>> how it's done in FSDirectory it works well, however not efficient.
>
> So what's expensive here?  Writing to Infinispan, or the indexing  
> itself?  Correct me if I am wrong, I assume that the IndexWriter  
> creates multiple threads, and each thread does: {
> 	// some indexing work
> 	// write these indexes to Infinispan
> }
>
> Is that correct?
>
>>
>> 2009/8/14 Sanne Grinovero <sanne.grinovero at gmail.com>
>> I am not an expert on this part of Lucene, but it looks like to me
>> that the IndexWriter is the "driver/coordinator", and it's decisions
>> are affected by a pluggable MergeScheduler; they do stuff on the
>> internal buffers of the IndexWriter (dequeue the pending segments to
>> be written to the index), but it shouldn't matter what they exactly  
>> do
>> as the internal status of these classes are unaffected by our
>> transactions.
>> They take some decision about writing segments to the Directory and
>> committing changes ("sync()") : as you implement this Directory you
>> should only have to take care of this class, I don't think the
>> MergeScheduler(s) are relevant: it just happens that the thread going
>> to apply changes to the index might be a different one than the one
>> pushing changes to the IndexWriter.
>>
>> In the Directory implementation you should use transactions to push
>> state changes to the "underlying storage": as FSDirectory is playing
>> with file descriptors and flushes, you do the same with Infinispan
>> transactions.
>>
>> 2009/8/14 Łukasz Moreń <lukasz.moren at gmail.com>:
>> > Yes, right, MergeSchedulers.
>> >
>> > 2009/8/14 Sanne Grinovero <sanne.grinovero at gmail.com>
>> >>
>> >> what are these "other" threads? Are you speaking about the
>> >> MergeSchedulers?
>> >>
>> >> 2009/8/13 Łukasz Moreń <lukasz.moren at gmail.com>:
>> >> > IndexWriter processes index update and delegates some job to  
>> other
>> >> > threads and waits when they finish. These "other" threads  
>> works on
>> >> > data modified
>> >> > in IndexWriter transaction. So I think if I use transaction per
>> >> > thread, "others" would not see data modified by IndexWriter  
>> until
>> >> > commit.
>> >> >
>> >> > 2009/8/13, Emmanuel Bernard <emmanuel at hibernate.org>:
>> >> >> Ah I thought it was using multiple threads because of your mass
>> >> >> indexing. I did not know some threads were span specifically  
>> for the
>> >> >> Infinispan directory.
>> >> >>
>> >> >> On 13 août 09, at 17:34, Sanne Grinovero wrote:
>> >> >>
>> >> >>> Hi Łukasz,
>> >> >>> what is your usage of these threads? did you consider using  
>> one
>> >> >>> transaction per thread?
>> >> >>>
>> >> >>> Sanne
>> >> >>>
>> >> >>> 2009/8/13 Łukasz Moreń <lukasz.moren at gmail.com>:
>> >> >>>> Newly created threads were not associated with any  
>> transaction, so I
>> >> >>>> suppose it was a problem. Sharing transaction between  
>> threads seems
>> >> >>>> to
>> >> >>>> be a good solution.
>> >> >>>> Thanks for help!
>> >> >>>>
>> >> >>>> 2009/8/13, Jason T. Greene <jason.greene at redhat.com>:
>> >> >>>>> Correct. Also there could be read races as well, so if you  
>> are
>> >> >>>>> going to
>> >> >>>>> share a tx between threads, i would use some shared lock to
>> >> >>>>> gaurantee
>> >> >>>>> that only one thread can use it at a time. BTW this means  
>> you have
>> >> >>>>> to
>> >> >>>>> properly suspend/resume the TX via the TM API as well.
>> >> >>>>>
>> >> >>>>> Emmanuel Bernard wrote:
>> >> >>>>>> Modifying a transaction means applying muations (like SQL  
>> INSERT /
>> >> >>>>>> UPDATE / DELETE) to the transactional resource?
>> >> >>>>>>
>> >> >>>>>> On 13 août 09, at 15:07, Jason T. Greene wrote:
>> >> >>>>>>
>> >> >>>>>>> When using transactions, the context is bound to the
>> >> >>>>>>> transaction, and
>> >> >>>>>>> you can move a transaction between threads. However, you  
>> should
>> >> >>>>>>> only
>> >> >>>>>>> be modifying a transaction with one thread at a time.
>> >> >>>>>>>
>> >> >>>>>>> Emmanuel Bernard wrote:
>> >> >>>>>>>> Could it be that you are not using the same transaction  
>> between
>> >> >>>>>>>> different threads (ie you physically start different  
>> ones or
>> >> >>>>>>>> different  "Infinispan contexts")?
>> >> >>>>>>>> Infini guys, do you support transactional operation  
>> spanning
>> >> >>>>>>>> several
>> >> >>>>>>>> concurrent threads?
>> >> >>>>>>>> On 13 août 09, at 14:04, Łukasz Moreń wrote:
>> >> >>>>>>>>> I've tried with JBoss AS transaction manager and
>> >> >>>>>>>>> JBossStandaloneTM.
>> >> >>>>>>>>> The result is this same in all cases - error during  
>> merge.
>> >> >>>>>>>>>
>> >> >>>>>>>>> 2009/8/12, Emmanuel Bernard <emmanuel at hibernate.org>:
>> >> >>>>>>>>>> Ok I understand better now.
>> >> >>>>>>>>>> Do your tests in JBoss AS with it's decent  
>> transaction manager
>> >> >>>>>>>>>> (infinispan should have a config for it)
>> >> >>>>>>>>>> For unit testing, force the indexing process in  
>> hibernate to
>> >> >>>>>>>>>> use a
>> >> >>>>>>>>>> single thread (I ghnk it's possible ask Sanne of you  
>> don't
>> >> >>>>>>>>>> know how).
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> Exposing some configuration to infinispan makes  
>> sense. can you
>> >> >>>>>>>>>> start a
>> >> >>>>>>>>>> thread explainig what is configurable and which one  
>> you think
>> >> >>>>>>>>>> we
>> >> >>>>>>>>>> should expose to hsearch users. Ideally I would like  
>> to offer
>> >> >>>>>>>>>> one or
>> >> >>>>>>>>>> two defaut config scenarios and allow to fallback to  
>> a custom
>> >> >>>>>>>>>> config.
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> Emmanuel
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> On 12 août 2009, at 11:58, Łukasz Moreń
>> >> >>>>>>>>>> <lukasz.moren at gmail.com>
>> >> >>>>>>>>>> wrote:
>> >> >>>>>>>>>>
>> >> >>>>>>>>>>> Sorry, but my wifi does not work well today. I will  
>> try to
>> >> >>>>>>>>>>> explain
>> >> >>>>>>>>>>> it more clear.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> I'm using DummyTransactionManager available for  
>> Infinispan.
>> >> >>>>>>>>>>> It associates transaction with the calling thread.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> Steps to update index:
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> 1. index writer acquires lock - begin of transaction
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> 2. if it is necessary, index writer delegates new  
>> threads to
>> >> >>>>>>>>>>> do
>> >> >>>>>>>>>>> merge work.
>> >> >>>>>>>>>>> Those merge threads do not see changes made so far  
>> from
>> >> >>>>>>>>>>> begin of
>> >> >>>>>>>>>>> transaction,
>> >> >>>>>>>>>>> and are looking for segments which are not yet in  
>> index.
>> >> >>>>>>>>>>> Changes will be visible when AD.3 is completed.
>> >> >>>>>>>>>>> For tests i tried to commit transaction when merge  
>> starts
>> >> >>>>>>>>>>> and then
>> >> >>>>>>>>>>> everything worked well. But then i need to start it  
>> again.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> 3. index writer releases lock - transaction is  
>> commited, all
>> >> >>>>>>>>>>> changes
>> >> >>>>>>>>>>> made in this transaction are visible for other  
>> threads.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> Maybe using some other transaction manager could help?
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> What about Infinispan cache configuration? Some  
>> configuration
>> >> >>>>>>>>>>> mechanism should be exposed to the user,
>> >> >>>>>>>>>>> or we can hardcoded one in  
>> InfinispanDirectoryProvider is
>> >> >>>>>>>>>>> enough?
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> 2009/8/12 Emmanuel Bernard <emmanuel at hibernate.org>
>> >> >>>>>>>>>>> why?
>> >> >>>>>>>>>>> Emmanuel Bernard
>> >> >>>>>>>>>>> Pending
>> >> >>>>>>>>>>> you there?
>> >> >>>>>>>>>>> Emmanuel Bernard
>> >> >>>>>>>>>>> Pending
>> >> >>>>>>>>>>> Ok please describe in details what is going on. From  
>> what
>> >> >>>>>>>>>>> you are
>> >> >>>>>>>>>>> describing the tx cannot see all segments which  
>> looks like an
>> >> >>>>>>>>>>> infinispan bug to me.
>> >> >>>>>>>>>>> Pending
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> As a back up you can try wo transaction and see if  
>> that works
>> >> >>>>>>>>>>> Emmanuel Bernard
>> >> >>>>>>>>>>> Pending
>> >> >>>>>>>>>>> technically the lucene index should cope with that
>> >> >>>>>>>>>>> Emmanuel Bernard
>> >> >>>>>>>>>>> 11:16
>> >> >>>>>>>>>>> but I like this approach less
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> Let's try and chat by email IF I'm not online, I  
>> need to run
>> >> >>>>>>>>>>> on some
>> >> >>>>>>>>>>> errands today.
>> >> >>>>>>>>>>>
>> >> >>>>>>>> _______________________________________________
>> >> >>>>>>>> infinispan-dev mailing list
>> >> >>>>>>>> infinispan-dev at lists.jboss.org
>> >> >>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>> --
>> >> >>>>>>> Jason T. Greene
>> >> >>>>>>> JBoss, a division of Red Hat
>> >> >>>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> --
>> >> >>>>> Jason T. Greene
>> >> >>>>> JBoss, a division of Red Hat
>> >> >>>>>
>> >> >>>>
>> >> >>>> _______________________________________________
>> >> >>>> infinispan-dev mailing list
>> >> >>>> infinispan-dev at lists.jboss.org
>> >> >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >> >>>
>> >> >>> _______________________________________________
>> >> >>> hibernate-dev mailing list
>> >> >>> hibernate-dev at lists.jboss.org
>> >> >>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>> >> >>
>> >> >>
>> >> >
>> >
>> >
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
> Manik Surtani
> manik at jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
>
>
>
>
>

--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org




-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20090814/d7610fb4/attachment-0002.html 


More information about the infinispan-dev mailing list