On 14 Aug 2009, at 11:23, Łukasz Moreń wrote:
> Expensive is replication to all Infinispan nodes. IndexWriter
> creates segment files, merge it to one compound segment, delete
> already useless descriptor files - many files that must be
> replicated. Some of them even don't need to be replicated because
> they are inserted into directory at the begin of index commit
> process and removed at the end. Batching helps with performance here.
> Yes, I think IndexWriter works like you wrote.
Then perhaps some of the cost can be mitigated by designing the
IndexWriter threads to do : {
// some indexing work
// write index fragment to a (concurrent) queue
}
then you have a single thread that polls the queue and writes this
stuff to Infinispan. This single thread can then either use a batch
or transaction to scope the changes (but if you use a batch/
transaction the writer thread would need to know when to end this
batch)
It won't work very well I think because an IndexWriter needs to read
and write the index. That means we have to wait for the queue to be
empty before doing a read.
>
> 2009/8/14 Manik Surtani <manik(a)jboss.org>
>
> On 14 Aug 2009, at 10:17, Łukasz Moreń wrote:
>
>> Yes, but i.e. FSDirectory flushes changes if any file descriptor
>> is created/updated - can be many in one IndexWriter life.
>> In infinispan case implementation, I want to commit changes only
>> when IndexWriter is closing - batch all modifications.
>> If I switch to transaction per descriptor modification - similarly
>> how it's done in FSDirectory it works well, however not efficient.
>
> So what's expensive here? Writing to Infinispan, or the indexing
> itself? Correct me if I am wrong, I assume that the IndexWriter
> creates multiple threads, and each thread does: {
> // some indexing work
> // write these indexes to Infinispan
> }
>
> Is that correct?
>
>>
>> 2009/8/14 Sanne Grinovero <sanne.grinovero(a)gmail.com>
>> I am not an expert on this part of Lucene, but it looks like to me
>> that the IndexWriter is the "driver/coordinator", and it's
decisions
>> are affected by a pluggable MergeScheduler; they do stuff on the
>> internal buffers of the IndexWriter (dequeue the pending segments to
>> be written to the index), but it shouldn't matter what they
>> exactly do
>> as the internal status of these classes are unaffected by our
>> transactions.
>> They take some decision about writing segments to the Directory and
>> committing changes ("sync()") : as you implement this Directory you
>> should only have to take care of this class, I don't think the
>> MergeScheduler(s) are relevant: it just happens that the thread
>> going
>> to apply changes to the index might be a different one than the one
>> pushing changes to the IndexWriter.
>>
>> In the Directory implementation you should use transactions to push
>> state changes to the "underlying storage": as FSDirectory is playing
>> with file descriptors and flushes, you do the same with Infinispan
>> transactions.
>>
>> 2009/8/14 Łukasz Moreń <lukasz.moren(a)gmail.com>:
>> > Yes, right, MergeSchedulers.
>> >
>> > 2009/8/14 Sanne Grinovero <sanne.grinovero(a)gmail.com>
>> >>
>> >> what are these "other" threads? Are you speaking about the
>> >> MergeSchedulers?
>> >>
>> >> 2009/8/13 Łukasz Moreń <lukasz.moren(a)gmail.com>:
>> >> > IndexWriter processes index update and delegates some job to
>> other
>> >> > threads and waits when they finish. These "other" threads
>> works on
>> >> > data modified
>> >> > in IndexWriter transaction. So I think if I use transaction per
>> >> > thread, "others" would not see data modified by
IndexWriter
>> until
>> >> > commit.
>> >> >
>> >> > 2009/8/13, Emmanuel Bernard <emmanuel(a)hibernate.org>:
>> >> >> Ah I thought it was using multiple threads because of your
>> mass
>> >> >> indexing. I did not know some threads were span specifically
>> for the
>> >> >> Infinispan directory.
>> >> >>
>> >> >> On 13 août 09, at 17:34, Sanne Grinovero wrote:
>> >> >>
>> >> >>> Hi Łukasz,
>> >> >>> what is your usage of these threads? did you consider using
>> one
>> >> >>> transaction per thread?
>> >> >>>
>> >> >>> Sanne
>> >> >>>
>> >> >>> 2009/8/13 Łukasz Moreń <lukasz.moren(a)gmail.com>:
>> >> >>>> Newly created threads were not associated with any
>> transaction, so I
>> >> >>>> suppose it was a problem. Sharing transaction between
>> threads seems
>> >> >>>> to
>> >> >>>> be a good solution.
>> >> >>>> Thanks for help!
>> >> >>>>
>> >> >>>> 2009/8/13, Jason T. Greene
<jason.greene(a)redhat.com>:
>> >> >>>>> Correct. Also there could be read races as well, so
if
>> you are
>> >> >>>>> going to
>> >> >>>>> share a tx between threads, i would use some shared
lock to
>> >> >>>>> gaurantee
>> >> >>>>> that only one thread can use it at a time. BTW this
means
>> you have
>> >> >>>>> to
>> >> >>>>> properly suspend/resume the TX via the TM API as
well.
>> >> >>>>>
>> >> >>>>> Emmanuel Bernard wrote:
>> >> >>>>>> Modifying a transaction means applying muations
(like
>> SQL INSERT /
>> >> >>>>>> UPDATE / DELETE) to the transactional
resource?
>> >> >>>>>>
>> >> >>>>>> On 13 août 09, at 15:07, Jason T. Greene
wrote:
>> >> >>>>>>
>> >> >>>>>>> When using transactions, the context is
bound to the
>> >> >>>>>>> transaction, and
>> >> >>>>>>> you can move a transaction between threads.
However,
>> you should
>> >> >>>>>>> only
>> >> >>>>>>> be modifying a transaction with one thread
at a time.
>> >> >>>>>>>
>> >> >>>>>>> Emmanuel Bernard wrote:
>> >> >>>>>>>> Could it be that you are not using the
same
>> transaction between
>> >> >>>>>>>> different threads (ie you physically
start different
>> ones or
>> >> >>>>>>>> different "Infinispan
contexts")?
>> >> >>>>>>>> Infini guys, do you support
transactional operation
>> spanning
>> >> >>>>>>>> several
>> >> >>>>>>>> concurrent threads?
>> >> >>>>>>>> On 13 août 09, at 14:04, Łukasz Moreń
wrote:
>> >> >>>>>>>>> I've tried with JBoss AS
transaction manager and
>> >> >>>>>>>>> JBossStandaloneTM.
>> >> >>>>>>>>> The result is this same in all
cases - error during
>> merge.
>> >> >>>>>>>>>
>> >> >>>>>>>>> 2009/8/12, Emmanuel Bernard
<emmanuel(a)hibernate.org>:
>> >> >>>>>>>>>> Ok I understand better now.
>> >> >>>>>>>>>> Do your tests in JBoss AS with
it's decent
>> transaction manager
>> >> >>>>>>>>>> (infinispan should have a
config for it)
>> >> >>>>>>>>>> For unit testing, force the
indexing process in
>> hibernate to
>> >> >>>>>>>>>> use a
>> >> >>>>>>>>>> single thread (I ghnk it's
possible ask Sanne of you
>> don't
>> >> >>>>>>>>>> know how).
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> Exposing some configuration to
infinispan makes
>> sense. can you
>> >> >>>>>>>>>> start a
>> >> >>>>>>>>>> thread explainig what is
configurable and which one
>> you think
>> >> >>>>>>>>>> we
>> >> >>>>>>>>>> should expose to hsearch users.
Ideally I would like
>> to offer
>> >> >>>>>>>>>> one or
>> >> >>>>>>>>>> two defaut config scenarios and
allow to fallback to
>> a custom
>> >> >>>>>>>>>> config.
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> Emmanuel
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> On 12 août 2009, at 11:58,
Łukasz Moreń
>> >> >>>>>>>>>> <lukasz.moren(a)gmail.com>
>> >> >>>>>>>>>> wrote:
>> >> >>>>>>>>>>
>> >> >>>>>>>>>>> Sorry, but my wifi does not
work well today. I will
>> try to
>> >> >>>>>>>>>>> explain
>> >> >>>>>>>>>>> it more clear.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> I'm using
DummyTransactionManager available for
>> Infinispan.
>> >> >>>>>>>>>>> It associates transaction
with the calling thread.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> Steps to update index:
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> 1. index writer acquires
lock - begin of transaction
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> 2. if it is necessary,
index writer delegates new
>> threads to
>> >> >>>>>>>>>>> do
>> >> >>>>>>>>>>> merge work.
>> >> >>>>>>>>>>> Those merge threads do not
see changes made so far
>> from
>> >> >>>>>>>>>>> begin of
>> >> >>>>>>>>>>> transaction,
>> >> >>>>>>>>>>> and are looking for
segments which are not yet in
>> index.
>> >> >>>>>>>>>>> Changes will be visible
when AD.3 is completed.
>> >> >>>>>>>>>>> For tests i tried to commit
transaction when merge
>> starts
>> >> >>>>>>>>>>> and then
>> >> >>>>>>>>>>> everything worked well. But
then i need to start it
>> again.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> 3. index writer releases
lock - transaction is
>> commited, all
>> >> >>>>>>>>>>> changes
>> >> >>>>>>>>>>> made in this transaction
are visible for other
>> threads.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> Maybe using some other
transaction manager could
>> help?
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> What about Infinispan cache
configuration? Some
>> configuration
>> >> >>>>>>>>>>> mechanism should be exposed
to the user,
>> >> >>>>>>>>>>> or we can hardcoded one in
>> InfinispanDirectoryProvider is
>> >> >>>>>>>>>>> enough?
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> 2009/8/12 Emmanuel Bernard
<emmanuel(a)hibernate.org>
>> >> >>>>>>>>>>> why?
>> >> >>>>>>>>>>> Emmanuel Bernard
>> >> >>>>>>>>>>> Pending
>> >> >>>>>>>>>>> you there?
>> >> >>>>>>>>>>> Emmanuel Bernard
>> >> >>>>>>>>>>> Pending
>> >> >>>>>>>>>>> Ok please describe in
details what is going on.
>> From what
>> >> >>>>>>>>>>> you are
>> >> >>>>>>>>>>> describing the tx cannot
see all segments which
>> looks like an
>> >> >>>>>>>>>>> infinispan bug to me.
>> >> >>>>>>>>>>> Pending
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> As a back up you can try wo
transaction and see if
>> that works
>> >> >>>>>>>>>>> Emmanuel Bernard
>> >> >>>>>>>>>>> Pending
>> >> >>>>>>>>>>> technically the lucene
index should cope with that
>> >> >>>>>>>>>>> Emmanuel Bernard
>> >> >>>>>>>>>>> 11:16
>> >> >>>>>>>>>>> but I like this approach
less
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> Let's try and chat by
email IF I'm not online, I
>> need to run
>> >> >>>>>>>>>>> on some
>> >> >>>>>>>>>>> errands today.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>
_______________________________________________
>> >> >>>>>>>> infinispan-dev mailing list
>> >> >>>>>>>> infinispan-dev(a)lists.jboss.org
>> >> >>>>>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>> --
>> >> >>>>>>> Jason T. Greene
>> >> >>>>>>> JBoss, a division of Red Hat
>> >> >>>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> --
>> >> >>>>> Jason T. Greene
>> >> >>>>> JBoss, a division of Red Hat
>> >> >>>>>
>> >> >>>>
>> >> >>>> _______________________________________________
>> >> >>>> infinispan-dev mailing list
>> >> >>>> infinispan-dev(a)lists.jboss.org
>> >> >>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >> >>>
>> >> >>> _______________________________________________
>> >> >>> hibernate-dev mailing list
>> >> >>> hibernate-dev(a)lists.jboss.org
>> >> >>>
https://lists.jboss.org/mailman/listinfo/hibernate-dev
>> >> >>
>> >> >>
>> >> >
>> >
>> >
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev(a)lists.jboss.org
>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
> Manik Surtani
> manik(a)jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
>
http://www.infinispan.org
>
http://www.jbosscache.org
>
>
>
>
>
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev