<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">I think so.<div>But ideally I would like:</div><div> - to add the property to set the mergerscheduler and expose that</div><div> - automatically force the setting if the IW is used by Infinispan (not sure how easy that is).</div><div><br></div><div><div><div>On 14 août 09, at 13:35, Łukasz Moreń wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">Yes, using <span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; border-collapse: collapse; ">SerialMergeScheduler helped. Thanks! Restriction on merge scheduler should be set in IndexWriterSetting enum, similarly to i.e. max_buffered_delete_terms or max_buffered_docs properties?</span><br> <br><div class="gmail_quote">2009/8/14 Emmanuel Bernard <span dir="ltr"><<a href="mailto:emmanuel@hibernate.org">emmanuel@hibernate.org</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"> <div style="word-wrap:break-word">OK so let's try something.<div>Lukasz, can you try and use the SerialMergeScheduler policy on the index writer and see what is going on.</div><div>It will make indexing slower for Infinispan but it seems we can't do much better in the short time.</div> <div><br></div><div>If that works then we will put some restriction in place when reading the config. The index writer for this given index will be forced to use the serial strategy,</div><div><div></div><div class="h5"><div> <br><div><div>On 14 août 09, at 06:23, Łukasz Moreń wrote:</div><br><blockquote type="cite">Expensive is replication to all Infinispan nodes. IndexWriter creates segment files, merge it to one compound segment, delete already useless descriptor files - many files that must be replicated. Some of them even don't need to be replicated because they are inserted into directory at the begin of index commit process and removed at the end. Batching helps with performance here. <div> Yes, I think IndexWriter works like you wrote.<br><br><div class="gmail_quote">2009/8/14 Manik Surtani <span dir="ltr"><<a href="mailto:manik@jboss.org" target="_blank">manik@jboss.org</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div style="word-wrap:break-word"><br><div><div><div>On 14 Aug 2009, at 10:17, Łukasz Moreń wrote:</div><br><blockquote type="cite"><div><span style="font-family:arial, sans-serif;font-size:13px;border-collapse:collapse">Yes, but i.e. <span style="background-repeat:initial;background-color:yellow">FSDirectory</span> flushes changes if any file descriptor is created/updated - can be many in one <span style="background-repeat:initial;background-color:yellow">IndexWriter</span> life.</span></div> <span style="font-family:arial, sans-serif;font-size:13px;border-collapse:collapse"><div>In <span style="background-repeat:initial;background-color:yellow">infinispan</span> case implementation, I want to commit changes only when <span style="background-repeat:initial;background-color:yellow">IndexWriter</span> is closing - batch all modifications.</div> <div>If I switch to transaction per descriptor modification - similarly how it's done in FSDirectory it works well, however not efficient.</div></span></blockquote><div><br></div></div><div>So what's expensive here? Writing to Infinispan, or the indexing itself? Correct me if I am wrong, I assume that the IndexWriter creates multiple threads, and each thread does: {</div> <div><span style="white-space:pre">        </span>// some indexing work</div><div><span style="white-space:pre">        </span>// write these indexes to Infinispan</div><div>}</div><div><br></div><div>Is that correct?</div><div><div></div> <div><br><blockquote type="cite"><br><div class="gmail_quote">2009/8/14 Sanne Grinovero <span dir="ltr"><<a href="mailto:sanne.grinovero@gmail.com" target="_blank">sanne.grinovero@gmail.com</a>></span><br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> I am not an expert on this part of Lucene, but it looks like to me<br> that the IndexWriter is the "driver/coordinator", and it's decisions<br> are affected by a pluggable MergeScheduler; they do stuff on the<br> internal buffers of the IndexWriter (dequeue the pending segments to<br> be written to the index), but it shouldn't matter what they exactly do<br> as the internal status of these classes are unaffected by our<br> transactions.<br> They take some decision about writing segments to the Directory and<br> committing changes ("sync()") : as you implement this Directory you<br> should only have to take care of this class, I don't think the<br> MergeScheduler(s) are relevant: it just happens that the thread going<br> to apply changes to the index might be a different one than the one<br> pushing changes to the IndexWriter.<br> <br> In the Directory implementation you should use transactions to push<br> state changes to the "underlying storage": as FSDirectory is playing<br> with file descriptors and flushes, you do the same with Infinispan<br> transactions.<br> <br> 2009/8/14 Łukasz Moreń <<a href="mailto:lukasz.moren@gmail.com" target="_blank">lukasz.moren@gmail.com</a>>:<br> <div><div></div><div>> Yes, right, MergeSchedulers.<br> ><br> > 2009/8/14 Sanne Grinovero <<a href="mailto:sanne.grinovero@gmail.com" target="_blank">sanne.grinovero@gmail.com</a>><br> >><br> >> what are these "other" threads? Are you speaking about the<br> >> MergeSchedulers?<br> >><br> >> 2009/8/13 Łukasz Moreń <<a href="mailto:lukasz.moren@gmail.com" target="_blank">lukasz.moren@gmail.com</a>>:<br> >> > IndexWriter processes index update and delegates some job to other<br> >> > threads and waits when they finish. These "other" threads works on<br> >> > data modified<br> >> > in IndexWriter transaction. So I think if I use transaction per<br> >> > thread, "others" would not see data modified by IndexWriter until<br> >> > commit.<br> >> ><br> >> > 2009/8/13, Emmanuel Bernard <<a href="mailto:emmanuel@hibernate.org" target="_blank">emmanuel@hibernate.org</a>>:<br> >> >> Ah I thought it was using multiple threads because of your mass<br> >> >> indexing. I did not know some threads were span specifically for the<br> >> >> Infinispan directory.<br> >> >><br> >> >> On 13 août 09, at 17:34, Sanne Grinovero wrote:<br> >> >><br> >> >>> Hi Łukasz,<br> >> >>> what is your usage of these threads? did you consider using one<br> >> >>> transaction per thread?<br> >> >>><br> >> >>> Sanne<br> >> >>><br> >> >>> 2009/8/13 Łukasz Moreń <<a href="mailto:lukasz.moren@gmail.com" target="_blank">lukasz.moren@gmail.com</a>>:<br> >> >>>> Newly created threads were not associated with any transaction, so I<br> >> >>>> suppose it was a problem. Sharing transaction between threads seems<br> >> >>>> to<br> >> >>>> be a good solution.<br> >> >>>> Thanks for help!<br> >> >>>><br> >> >>>> 2009/8/13, Jason T. Greene <<a href="mailto:jason.greene@redhat.com" target="_blank">jason.greene@redhat.com</a>>:<br> >> >>>>> Correct. Also there could be read races as well, so if you are<br> >> >>>>> going to<br> >> >>>>> share a tx between threads, i would use some shared lock to<br> >> >>>>> gaurantee<br> >> >>>>> that only one thread can use it at a time. BTW this means you have<br> >> >>>>> to<br> >> >>>>> properly suspend/resume the TX via the TM API as well.<br> >> >>>>><br> >> >>>>> Emmanuel Bernard wrote:<br> >> >>>>>> Modifying a transaction means applying muations (like SQL INSERT /<br> >> >>>>>> UPDATE / DELETE) to the transactional resource?<br> >> >>>>>><br> >> >>>>>> On 13 août 09, at 15:07, Jason T. Greene wrote:<br> >> >>>>>><br> >> >>>>>>> When using transactions, the context is bound to the<br> >> >>>>>>> transaction, and<br> >> >>>>>>> you can move a transaction between threads. However, you should<br> >> >>>>>>> only<br> >> >>>>>>> be modifying a transaction with one thread at a time.<br> >> >>>>>>><br> >> >>>>>>> Emmanuel Bernard wrote:<br> >> >>>>>>>> Could it be that you are not using the same transaction between<br> >> >>>>>>>> different threads (ie you physically start different ones or<br> >> >>>>>>>> different "Infinispan contexts")?<br> >> >>>>>>>> Infini guys, do you support transactional operation spanning<br> >> >>>>>>>> several<br> >> >>>>>>>> concurrent threads?<br> >> >>>>>>>> On 13 août 09, at 14:04, Łukasz Moreń wrote:<br> >> >>>>>>>>> I've tried with JBoss AS transaction manager and<br> >> >>>>>>>>> JBossStandaloneTM.<br> >> >>>>>>>>> The result is this same in all cases - error during merge.<br> >> >>>>>>>>><br> >> >>>>>>>>> 2009/8/12, Emmanuel Bernard <<a href="mailto:emmanuel@hibernate.org" target="_blank">emmanuel@hibernate.org</a>>:<br> >> >>>>>>>>>> Ok I understand better now.<br> >> >>>>>>>>>> Do your tests in JBoss AS with it's decent transaction manager<br> >> >>>>>>>>>> (infinispan should have a config for it)<br> >> >>>>>>>>>> For unit testing, force the indexing process in hibernate to<br> >> >>>>>>>>>> use a<br> >> >>>>>>>>>> single thread (I ghnk it's possible ask Sanne of you don't<br> >> >>>>>>>>>> know how).<br> >> >>>>>>>>>><br> >> >>>>>>>>>> Exposing some configuration to infinispan makes sense. can you<br> >> >>>>>>>>>> start a<br> >> >>>>>>>>>> thread explainig what is configurable and which one you think<br> >> >>>>>>>>>> we<br> >> >>>>>>>>>> should expose to hsearch users. Ideally I would like to offer<br> >> >>>>>>>>>> one or<br> >> >>>>>>>>>> two defaut config scenarios and allow to fallback to a custom<br> >> >>>>>>>>>> config.<br> >> >>>>>>>>>><br> >> >>>>>>>>>> Emmanuel<br> >> >>>>>>>>>><br> >> >>>>>>>>>> On 12 août 2009, at 11:58, Łukasz Moreń<br> >> >>>>>>>>>> <<a href="mailto:lukasz.moren@gmail.com" target="_blank">lukasz.moren@gmail.com</a>><br> >> >>>>>>>>>> wrote:<br> >> >>>>>>>>>><br> >> >>>>>>>>>>> Sorry, but my wifi does not work well today. I will try to<br> >> >>>>>>>>>>> explain<br> >> >>>>>>>>>>> it more clear.<br> >> >>>>>>>>>>><br> >> >>>>>>>>>>> I'm using DummyTransactionManager available for Infinispan.<br> >> >>>>>>>>>>> It associates transaction with the calling thread.<br> >> >>>>>>>>>>><br> >> >>>>>>>>>>> Steps to update index:<br> >> >>>>>>>>>>><br> >> >>>>>>>>>>> 1. index writer acquires lock - begin of transaction<br> >> >>>>>>>>>>><br> >> >>>>>>>>>>> 2. if it is necessary, index writer delegates new threads to<br> >> >>>>>>>>>>> do<br> >> >>>>>>>>>>> merge work.<br> >> >>>>>>>>>>> Those merge threads do not see changes made so far from<br> >> >>>>>>>>>>> begin of<br> >> >>>>>>>>>>> transaction,<br> >> >>>>>>>>>>> and are looking for segments which are not yet in index.<br> >> >>>>>>>>>>> Changes will be visible when AD.3 is completed.<br> >> >>>>>>>>>>> For tests i tried to commit transaction when merge starts<br> >> >>>>>>>>>>> and then<br> >> >>>>>>>>>>> everything worked well. But then i need to start it again.<br> >> >>>>>>>>>>><br> >> >>>>>>>>>>> 3. index writer releases lock - transaction is commited, all<br> >> >>>>>>>>>>> changes<br> >> >>>>>>>>>>> made in this transaction are visible for other threads.<br> >> >>>>>>>>>>><br> >> >>>>>>>>>>> Maybe using some other transaction manager could help?<br> >> >>>>>>>>>>><br> >> >>>>>>>>>>> What about Infinispan cache configuration? Some configuration<br> >> >>>>>>>>>>> mechanism should be exposed to the user,<br> >> >>>>>>>>>>> or we can hardcoded one in InfinispanDirectoryProvider is<br> >> >>>>>>>>>>> enough?<br> >> >>>>>>>>>>><br> >> >>>>>>>>>>><br> >> >>>>>>>>>>><br> >> >>>>>>>>>>><br> >> >>>>>>>>>>> 2009/8/12 Emmanuel Bernard <<a href="mailto:emmanuel@hibernate.org" target="_blank">emmanuel@hibernate.org</a>><br> >> >>>>>>>>>>> why?<br> >> >>>>>>>>>>> Emmanuel Bernard<br> >> >>>>>>>>>>> Pending<br> >> >>>>>>>>>>> you there?<br> >> >>>>>>>>>>> Emmanuel Bernard<br> >> >>>>>>>>>>> Pending<br> >> >>>>>>>>>>> Ok please describe in details what is going on. From what<br> >> >>>>>>>>>>> you are<br> >> >>>>>>>>>>> describing the tx cannot see all segments which looks like an<br> >> >>>>>>>>>>> infinispan bug to me.<br> >> >>>>>>>>>>> Pending<br> >> >>>>>>>>>>><br> >> >>>>>>>>>>> As a back up you can try wo transaction and see if that works<br> >> >>>>>>>>>>> Emmanuel Bernard<br> >> >>>>>>>>>>> Pending<br> >> >>>>>>>>>>> technically the lucene index should cope with that<br> >> >>>>>>>>>>> Emmanuel Bernard<br> >> >>>>>>>>>>> 11:16<br> >> >>>>>>>>>>> but I like this approach less<br> >> >>>>>>>>>>><br> >> >>>>>>>>>>><br> >> >>>>>>>>>>><br> >> >>>>>>>>>>> Let's try and chat by email IF I'm not online, I need to run<br> >> >>>>>>>>>>> on some<br> >> >>>>>>>>>>> errands today.<br> >> >>>>>>>>>>><br> >> >>>>>>>> _______________________________________________<br> >> >>>>>>>> infinispan-dev mailing list<br> >> >>>>>>>> <a href="mailto:infinispan-dev@lists.jboss.org" target="_blank">infinispan-dev@lists.jboss.org</a><br> >> >>>>>>>> <a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br> >> >>>>>>><br> >> >>>>>>><br> >> >>>>>>> --<br> >> >>>>>>> Jason T. Greene<br> >> >>>>>>> JBoss, a division of Red Hat<br> >> >>>>>><br> >> >>>>><br> >> >>>>><br> >> >>>>> --<br> >> >>>>> Jason T. Greene<br> >> >>>>> JBoss, a division of Red Hat<br> >> >>>>><br> >> >>>><br> >> >>>> _______________________________________________<br> >> >>>> infinispan-dev mailing list<br> >> >>>> <a href="mailto:infinispan-dev@lists.jboss.org" target="_blank">infinispan-dev@lists.jboss.org</a><br> >> >>>> <a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br> >> >>><br> >> >>> _______________________________________________<br> >> >>> hibernate-dev mailing list<br> >> >>> <a href="mailto:hibernate-dev@lists.jboss.org" target="_blank">hibernate-dev@lists.jboss.org</a><br> >> >>> <a href="https://lists.jboss.org/mailman/listinfo/hibernate-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/hibernate-dev</a><br> >> >><br> >> >><br> >> ><br> ><br> ><br> </div></div></blockquote></div><br> _______________________________________________<br>infinispan-dev mailing list<br><a href="mailto:infinispan-dev@lists.jboss.org" target="_blank">infinispan-dev@lists.jboss.org</a><br> <a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a></blockquote></div></div></div><br><div> <span style="border-collapse:separate;color:rgb(0, 0, 0);font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:auto;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><div style="word-wrap:break-word"> <span style="border-collapse:separate;color:rgb(0, 0, 0);font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><div style="word-wrap:break-word"> <div>--</div><div><div>Manik Surtani</div><div><a href="mailto:manik@jboss.org" target="_blank">manik@jboss.org</a></div><div>Lead, Infinispan</div><div>Lead, JBoss Cache</div><div><a href="http://www.infinispan.org" target="_blank">http://www.infinispan.org</a></div> <div><a href="http://www.jbosscache.org" target="_blank">http://www.jbosscache.org</a></div><div><br></div></div></div></span><br></div></span><br> </div><br></div></blockquote></div><br></div> _______________________________________________<br> infinispan-dev mailing list<br><a href="mailto:infinispan-dev@lists.jboss.org" target="_blank">infinispan-dev@lists.jboss.org</a><br><a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a></blockquote> </div><br></div></div></div></div></blockquote></div><br></blockquote></div><br></div></body></html>