Expensive is replication to all Infinispan nodes. IndexWriter
creates segment files, merge it to one compound segment, delete
already useless descriptor files - many files that must be
replicated. Some of them even don't need to be replicated because
they are inserted into directory at the begin of index commit
process and removed at the end. Batching helps with performance here.
Yes, I think IndexWriter works like you wrote.
Then perhaps some of the cost can be mitigated by designing the
IndexWriter threads to do : {
// some indexing work
// write index fragment to a (concurrent) queue
}
then you have a single thread that polls the queue and writes this
stuff to Infinispan. This single thread can then either use a batch
or transaction to scope the changes (but if you use a batch/
transaction the writer thread would need to know when to end this batch)
2009/8/14 Manik Surtani <manik(a)jboss.org>
On 14 Aug 2009, at 10:17, Łukasz Moreń wrote:
> Yes, but i.e. FSDirectory flushes changes if any file descriptor is
> created/updated - can be many in one IndexWriter life.
> In infinispan case implementation, I want to commit changes only
> when IndexWriter is closing - batch all modifications.
> If I switch to transaction per descriptor modification - similarly
> how it's done in FSDirectory it works well, however not efficient.
So what's expensive here? Writing to Infinispan, or the indexing
itself? Correct me if I am wrong, I assume that the IndexWriter
creates multiple threads, and each thread does: {
// some indexing work
// write these indexes to Infinispan
}
Is that correct?
>
> 2009/8/14 Sanne Grinovero <sanne.grinovero(a)gmail.com>
> I am not an expert on this part of Lucene, but it looks like to me
> that the IndexWriter is the "driver/coordinator", and it's decisions
> are affected by a pluggable MergeScheduler; they do stuff on the
> internal buffers of the IndexWriter (dequeue the pending segments to
> be written to the index), but it shouldn't matter what they exactly
> do
> as the internal status of these classes are unaffected by our
> transactions.
> They take some decision about writing segments to the Directory and
> committing changes ("sync()") : as you implement this Directory you
> should only have to take care of this class, I don't think the
> MergeScheduler(s) are relevant: it just happens that the thread going
> to apply changes to the index might be a different one than the one
> pushing changes to the IndexWriter.
>
> In the Directory implementation you should use transactions to push
> state changes to the "underlying storage": as FSDirectory is playing
> with file descriptors and flushes, you do the same with Infinispan
> transactions.
>
> 2009/8/14 Łukasz Moreń <lukasz.moren(a)gmail.com>:
> > Yes, right, MergeSchedulers.
> >
> > 2009/8/14 Sanne Grinovero <sanne.grinovero(a)gmail.com>
> >>
> >> what are these "other" threads? Are you speaking about the
> >> MergeSchedulers?
> >>
> >> 2009/8/13 Łukasz Moreń <lukasz.moren(a)gmail.com>:
> >> > IndexWriter processes index update and delegates some job to
> other
> >> > threads and waits when they finish. These "other" threads
> works on
> >> > data modified
> >> > in IndexWriter transaction. So I think if I use transaction per
> >> > thread, "others" would not see data modified by IndexWriter
> until
> >> > commit.
> >> >
> >> > 2009/8/13, Emmanuel Bernard <emmanuel(a)hibernate.org>:
> >> >> Ah I thought it was using multiple threads because of your mass
> >> >> indexing. I did not know some threads were span specifically
> for the
> >> >> Infinispan directory.
> >> >>
> >> >> On 13 août 09, at 17:34, Sanne Grinovero wrote:
> >> >>
> >> >>> Hi Łukasz,
> >> >>> what is your usage of these threads? did you consider using
> one
> >> >>> transaction per thread?
> >> >>>
> >> >>> Sanne
> >> >>>
> >> >>> 2009/8/13 Łukasz Moreń <lukasz.moren(a)gmail.com>:
> >> >>>> Newly created threads were not associated with any
> transaction, so I
> >> >>>> suppose it was a problem. Sharing transaction between
> threads seems
> >> >>>> to
> >> >>>> be a good solution.
> >> >>>> Thanks for help!
> >> >>>>
> >> >>>> 2009/8/13, Jason T. Greene
<jason.greene(a)redhat.com>:
> >> >>>>> Correct. Also there could be read races as well, so if
you
> are
> >> >>>>> going to
> >> >>>>> share a tx between threads, i would use some shared
lock to
> >> >>>>> gaurantee
> >> >>>>> that only one thread can use it at a time. BTW this
means
> you have
> >> >>>>> to
> >> >>>>> properly suspend/resume the TX via the TM API as well.
> >> >>>>>
> >> >>>>> Emmanuel Bernard wrote:
> >> >>>>>> Modifying a transaction means applying muations
(like SQL
> INSERT /
> >> >>>>>> UPDATE / DELETE) to the transactional resource?
> >> >>>>>>
> >> >>>>>> On 13 août 09, at 15:07, Jason T. Greene wrote:
> >> >>>>>>
> >> >>>>>>> When using transactions, the context is bound
to the
> >> >>>>>>> transaction, and
> >> >>>>>>> you can move a transaction between threads.
However, you
> should
> >> >>>>>>> only
> >> >>>>>>> be modifying a transaction with one thread at a
time.
> >> >>>>>>>
> >> >>>>>>> Emmanuel Bernard wrote:
> >> >>>>>>>> Could it be that you are not using the same
transaction
> between
> >> >>>>>>>> different threads (ie you physically start
different
> ones or
> >> >>>>>>>> different "Infinispan
contexts")?
> >> >>>>>>>> Infini guys, do you support transactional
operation
> spanning
> >> >>>>>>>> several
> >> >>>>>>>> concurrent threads?
> >> >>>>>>>> On 13 août 09, at 14:04, Łukasz Moreń
wrote:
> >> >>>>>>>>> I've tried with JBoss AS
transaction manager and
> >> >>>>>>>>> JBossStandaloneTM.
> >> >>>>>>>>> The result is this same in all cases -
error during
> merge.
> >> >>>>>>>>>
> >> >>>>>>>>> 2009/8/12, Emmanuel Bernard
<emmanuel(a)hibernate.org>:
> >> >>>>>>>>>> Ok I understand better now.
> >> >>>>>>>>>> Do your tests in JBoss AS with
it's decent
> transaction manager
> >> >>>>>>>>>> (infinispan should have a config
for it)
> >> >>>>>>>>>> For unit testing, force the
indexing process in
> hibernate to
> >> >>>>>>>>>> use a
> >> >>>>>>>>>> single thread (I ghnk it's
possible ask Sanne of you
> don't
> >> >>>>>>>>>> know how).
> >> >>>>>>>>>>
> >> >>>>>>>>>> Exposing some configuration to
infinispan makes
> sense. can you
> >> >>>>>>>>>> start a
> >> >>>>>>>>>> thread explainig what is
configurable and which one
> you think
> >> >>>>>>>>>> we
> >> >>>>>>>>>> should expose to hsearch users.
Ideally I would like
> to offer
> >> >>>>>>>>>> one or
> >> >>>>>>>>>> two defaut config scenarios and
allow to fallback to
> a custom
> >> >>>>>>>>>> config.
> >> >>>>>>>>>>
> >> >>>>>>>>>> Emmanuel
> >> >>>>>>>>>>
> >> >>>>>>>>>> On 12 août 2009, at 11:58, Łukasz
Moreń
> >> >>>>>>>>>> <lukasz.moren(a)gmail.com>
> >> >>>>>>>>>> wrote:
> >> >>>>>>>>>>
> >> >>>>>>>>>>> Sorry, but my wifi does not
work well today. I will
> try to
> >> >>>>>>>>>>> explain
> >> >>>>>>>>>>> it more clear.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> I'm using
DummyTransactionManager available for
> Infinispan.
> >> >>>>>>>>>>> It associates transaction with
the calling thread.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> Steps to update index:
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> 1. index writer acquires lock -
begin of transaction
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> 2. if it is necessary, index
writer delegates new
> threads to
> >> >>>>>>>>>>> do
> >> >>>>>>>>>>> merge work.
> >> >>>>>>>>>>> Those merge threads do not see
changes made so far
> from
> >> >>>>>>>>>>> begin of
> >> >>>>>>>>>>> transaction,
> >> >>>>>>>>>>> and are looking for segments
which are not yet in
> index.
> >> >>>>>>>>>>> Changes will be visible when
AD.3 is completed.
> >> >>>>>>>>>>> For tests i tried to commit
transaction when merge
> starts
> >> >>>>>>>>>>> and then
> >> >>>>>>>>>>> everything worked well. But
then i need to start it
> again.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> 3. index writer releases lock -
transaction is
> commited, all
> >> >>>>>>>>>>> changes
> >> >>>>>>>>>>> made in this transaction are
visible for other
> threads.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> Maybe using some other
transaction manager could help?
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> What about Infinispan cache
configuration? Some
> configuration
> >> >>>>>>>>>>> mechanism should be exposed to
the user,
> >> >>>>>>>>>>> or we can hardcoded one in
> InfinispanDirectoryProvider is
> >> >>>>>>>>>>> enough?
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> 2009/8/12 Emmanuel Bernard
<emmanuel(a)hibernate.org>
> >> >>>>>>>>>>> why?
> >> >>>>>>>>>>> Emmanuel Bernard
> >> >>>>>>>>>>> Pending
> >> >>>>>>>>>>> you there?
> >> >>>>>>>>>>> Emmanuel Bernard
> >> >>>>>>>>>>> Pending
> >> >>>>>>>>>>> Ok please describe in details
what is going on. From
> what
> >> >>>>>>>>>>> you are
> >> >>>>>>>>>>> describing the tx cannot see
all segments which
> looks like an
> >> >>>>>>>>>>> infinispan bug to me.
> >> >>>>>>>>>>> Pending
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> As a back up you can try wo
transaction and see if
> that works
> >> >>>>>>>>>>> Emmanuel Bernard
> >> >>>>>>>>>>> Pending
> >> >>>>>>>>>>> technically the lucene index
should cope with that
> >> >>>>>>>>>>> Emmanuel Bernard
> >> >>>>>>>>>>> 11:16
> >> >>>>>>>>>>> but I like this approach less
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> Let's try and chat by email
IF I'm not online, I
> need to run
> >> >>>>>>>>>>> on some
> >> >>>>>>>>>>> errands today.
> >> >>>>>>>>>>>
> >> >>>>>>>>
_______________________________________________
> >> >>>>>>>> infinispan-dev mailing list
> >> >>>>>>>> infinispan-dev(a)lists.jboss.org
> >> >>>>>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> --
> >> >>>>>>> Jason T. Greene
> >> >>>>>>> JBoss, a division of Red Hat
> >> >>>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> --
> >> >>>>> Jason T. Greene
> >> >>>>> JBoss, a division of Red Hat
> >> >>>>>
> >> >>>>
> >> >>>> _______________________________________________
> >> >>>> infinispan-dev mailing list
> >> >>>> infinispan-dev(a)lists.jboss.org
> >> >>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >> >>>
> >> >>> _______________________________________________
> >> >>> hibernate-dev mailing list
> >> >>> hibernate-dev(a)lists.jboss.org
> >> >>>
https://lists.jboss.org/mailman/listinfo/hibernate-dev
> >> >>
> >> >>
> >> >
> >
> >
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org