Yes let's keep it simple for know, even if I think Manik's suggestion
should work: if the IndexWriter will do reads it can use a
"Future-like" pattern to make the other thread return values
maintaining order of transformations applied to the index).
The IndexWriter must be able to maintain a coherent state by internal
concurrency, otherwise it won't work on filesystems, so if we trust
that and guarantee to process the tasks he (the IndexWriter) assigns
in the order he asked to process them, it should be ok.
I used org.hibernate.search.batchindexing.ProducerConsumerQueue to
"pipeline" changes in batch mode, it could be useful for this too when
you'll want to remove the SerialMergeScheduler limitation.
2009/8/14 Emmanuel Bernard <emmanuel(a)hibernate.org>:
OK so let's try something.
Lukasz, can you try and use the SerialMergeScheduler policy on the index
writer and see what is going on.
It will make indexing slower for Infinispan but it seems we can't do much
better in the short time.
If that works then we will put some restriction in place when reading the
config. The index writer for this given index will be forced to use the
serial strategy,
On 14 août 09, at 06:23, Łukasz Moreń wrote:
Expensive is replication to all Infinispan nodes. IndexWriter creates
segment files, merge it to one compound segment, delete already useless
descriptor files - many files that must be replicated. Some of them even
don't need to be replicated because they are inserted into directory at the
begin of index commit process and removed at the end. Batching helps with
performance here.
Yes, I think IndexWriter works like you wrote.
2009/8/14 Manik Surtani <manik(a)jboss.org>
>
> On 14 Aug 2009, at 10:17, Łukasz Moreń wrote:
>
> Yes, but i.e. FSDirectory flushes changes if any file descriptor is
> created/updated - can be many in one IndexWriter life.
> In infinispan case implementation, I want to commit changes only
> when IndexWriter is closing - batch all modifications.
> If I switch to transaction per descriptor modification - similarly how
> it's done in FSDirectory it works well, however not efficient.
>
> So what's expensive here? Writing to Infinispan, or the indexing itself?
> Correct me if I am wrong, I assume that the IndexWriter creates multiple
> threads, and each thread does: {
> // some indexing work
> // write these indexes to Infinispan
> }
> Is that correct?
>
> 2009/8/14 Sanne Grinovero <sanne.grinovero(a)gmail.com>
>>
>> I am not an expert on this part of Lucene, but it looks like to me
>> that the IndexWriter is the "driver/coordinator", and it's
decisions
>> are affected by a pluggable MergeScheduler; they do stuff on the
>> internal buffers of the IndexWriter (dequeue the pending segments to
>> be written to the index), but it shouldn't matter what they exactly do
>> as the internal status of these classes are unaffected by our
>> transactions.
>> They take some decision about writing segments to the Directory and
>> committing changes ("sync()") : as you implement this Directory you
>> should only have to take care of this class, I don't think the
>> MergeScheduler(s) are relevant: it just happens that the thread going
>> to apply changes to the index might be a different one than the one
>> pushing changes to the IndexWriter.
>>
>> In the Directory implementation you should use transactions to push
>> state changes to the "underlying storage": as FSDirectory is playing
>> with file descriptors and flushes, you do the same with Infinispan
>> transactions.
>>
>> 2009/8/14 Łukasz Moreń <lukasz.moren(a)gmail.com>:
>> > Yes, right, MergeSchedulers.
>> >
>> > 2009/8/14 Sanne Grinovero <sanne.grinovero(a)gmail.com>
>> >>
>> >> what are these "other" threads? Are you speaking about the
>> >> MergeSchedulers?
>> >>
>> >> 2009/8/13 Łukasz Moreń <lukasz.moren(a)gmail.com>:
>> >> > IndexWriter processes index update and delegates some job to other
>> >> > threads and waits when they finish. These "other" threads
works on
>> >> > data modified
>> >> > in IndexWriter transaction. So I think if I use transaction per
>> >> > thread, "others" would not see data modified by
IndexWriter until
>> >> > commit.
>> >> >
>> >> > 2009/8/13, Emmanuel Bernard <emmanuel(a)hibernate.org>:
>> >> >> Ah I thought it was using multiple threads because of your
mass
>> >> >> indexing. I did not know some threads were span specifically
for
>> >> >> the
>> >> >> Infinispan directory.
>> >> >>
>> >> >> On 13 août 09, at 17:34, Sanne Grinovero wrote:
>> >> >>
>> >> >>> Hi Łukasz,
>> >> >>> what is your usage of these threads? did you consider using
one
>> >> >>> transaction per thread?
>> >> >>>
>> >> >>> Sanne
>> >> >>>
>> >> >>> 2009/8/13 Łukasz Moreń <lukasz.moren(a)gmail.com>:
>> >> >>>> Newly created threads were not associated with any
transaction,
>> >> >>>> so I
>> >> >>>> suppose it was a problem. Sharing transaction between
threads
>> >> >>>> seems
>> >> >>>> to
>> >> >>>> be a good solution.
>> >> >>>> Thanks for help!
>> >> >>>>
>> >> >>>> 2009/8/13, Jason T. Greene
<jason.greene(a)redhat.com>:
>> >> >>>>> Correct. Also there could be read races as well, so
if you are
>> >> >>>>> going to
>> >> >>>>> share a tx between threads, i would use some shared
lock to
>> >> >>>>> gaurantee
>> >> >>>>> that only one thread can use it at a time. BTW this
means you
>> >> >>>>> have
>> >> >>>>> to
>> >> >>>>> properly suspend/resume the TX via the TM API as
well.
>> >> >>>>>
>> >> >>>>> Emmanuel Bernard wrote:
>> >> >>>>>> Modifying a transaction means applying muations
(like SQL
>> >> >>>>>> INSERT /
>> >> >>>>>> UPDATE / DELETE) to the transactional
resource?
>> >> >>>>>>
>> >> >>>>>> On 13 août 09, at 15:07, Jason T. Greene
wrote:
>> >> >>>>>>
>> >> >>>>>>> When using transactions, the context is
bound to the
>> >> >>>>>>> transaction, and
>> >> >>>>>>> you can move a transaction between threads.
However, you
>> >> >>>>>>> should
>> >> >>>>>>> only
>> >> >>>>>>> be modifying a transaction with one thread
at a time.
>> >> >>>>>>>
>> >> >>>>>>> Emmanuel Bernard wrote:
>> >> >>>>>>>> Could it be that you are not using the
same transaction
>> >> >>>>>>>> between
>> >> >>>>>>>> different threads (ie you physically
start different ones or
>> >> >>>>>>>> different "Infinispan
contexts")?
>> >> >>>>>>>> Infini guys, do you support
transactional operation spanning
>> >> >>>>>>>> several
>> >> >>>>>>>> concurrent threads?
>> >> >>>>>>>> On 13 août 09, at 14:04, Łukasz Moreń
wrote:
>> >> >>>>>>>>> I've tried with JBoss AS
transaction manager and
>> >> >>>>>>>>> JBossStandaloneTM.
>> >> >>>>>>>>> The result is this same in all
cases - error during merge.
>> >> >>>>>>>>>
>> >> >>>>>>>>> 2009/8/12, Emmanuel Bernard
<emmanuel(a)hibernate.org>:
>> >> >>>>>>>>>> Ok I understand better now.
>> >> >>>>>>>>>> Do your tests in JBoss AS with
it's decent transaction
>> >> >>>>>>>>>> manager
>> >> >>>>>>>>>> (infinispan should have a
config for it)
>> >> >>>>>>>>>> For unit testing, force the
indexing process in hibernate
>> >> >>>>>>>>>> to
>> >> >>>>>>>>>> use a
>> >> >>>>>>>>>> single thread (I ghnk it's
possible ask Sanne of you don't
>> >> >>>>>>>>>> know how).
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> Exposing some configuration to
infinispan makes sense. can
>> >> >>>>>>>>>> you
>> >> >>>>>>>>>> start a
>> >> >>>>>>>>>> thread explainig what is
configurable and which one you
>> >> >>>>>>>>>> think
>> >> >>>>>>>>>> we
>> >> >>>>>>>>>> should expose to hsearch users.
Ideally I would like to
>> >> >>>>>>>>>> offer
>> >> >>>>>>>>>> one or
>> >> >>>>>>>>>> two defaut config scenarios and
allow to fallback to a
>> >> >>>>>>>>>> custom
>> >> >>>>>>>>>> config.
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> Emmanuel
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> On 12 août 2009, at 11:58,
Łukasz Moreń
>> >> >>>>>>>>>> <lukasz.moren(a)gmail.com>
>> >> >>>>>>>>>> wrote:
>> >> >>>>>>>>>>
>> >> >>>>>>>>>>> Sorry, but my wifi does not
work well today. I will try to
>> >> >>>>>>>>>>> explain
>> >> >>>>>>>>>>> it more clear.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> I'm using
DummyTransactionManager available for
>> >> >>>>>>>>>>> Infinispan.
>> >> >>>>>>>>>>> It associates transaction
with the calling thread.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> Steps to update index:
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> 1. index writer acquires
lock - begin of transaction
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> 2. if it is necessary,
index writer delegates new threads
>> >> >>>>>>>>>>> to
>> >> >>>>>>>>>>> do
>> >> >>>>>>>>>>> merge work.
>> >> >>>>>>>>>>> Those merge threads do not
see changes made so far from
>> >> >>>>>>>>>>> begin of
>> >> >>>>>>>>>>> transaction,
>> >> >>>>>>>>>>> and are looking for
segments which are not yet in index.
>> >> >>>>>>>>>>> Changes will be visible
when AD.3 is completed.
>> >> >>>>>>>>>>> For tests i tried to commit
transaction when merge starts
>> >> >>>>>>>>>>> and then
>> >> >>>>>>>>>>> everything worked well. But
then i need to start it again.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> 3. index writer releases
lock - transaction is commited,
>> >> >>>>>>>>>>> all
>> >> >>>>>>>>>>> changes
>> >> >>>>>>>>>>> made in this transaction
are visible for other threads.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> Maybe using some other
transaction manager could help?
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> What about Infinispan cache
configuration? Some
>> >> >>>>>>>>>>> configuration
>> >> >>>>>>>>>>> mechanism should be exposed
to the user,
>> >> >>>>>>>>>>> or we can hardcoded one in
InfinispanDirectoryProvider is
>> >> >>>>>>>>>>> enough?
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> 2009/8/12 Emmanuel Bernard
<emmanuel(a)hibernate.org>
>> >> >>>>>>>>>>> why?
>> >> >>>>>>>>>>> Emmanuel Bernard
>> >> >>>>>>>>>>> Pending
>> >> >>>>>>>>>>> you there?
>> >> >>>>>>>>>>> Emmanuel Bernard
>> >> >>>>>>>>>>> Pending
>> >> >>>>>>>>>>> Ok please describe in
details what is going on. From what
>> >> >>>>>>>>>>> you are
>> >> >>>>>>>>>>> describing the tx cannot
see all segments which looks like
>> >> >>>>>>>>>>> an
>> >> >>>>>>>>>>> infinispan bug to me.
>> >> >>>>>>>>>>> Pending
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> As a back up you can try wo
transaction and see if that
>> >> >>>>>>>>>>> works
>> >> >>>>>>>>>>> Emmanuel Bernard
>> >> >>>>>>>>>>> Pending
>> >> >>>>>>>>>>> technically the lucene
index should cope with that
>> >> >>>>>>>>>>> Emmanuel Bernard
>> >> >>>>>>>>>>> 11:16
>> >> >>>>>>>>>>> but I like this approach
less
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> Let's try and chat by
email IF I'm not online, I need to
>> >> >>>>>>>>>>> run
>> >> >>>>>>>>>>> on some
>> >> >>>>>>>>>>> errands today.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>
_______________________________________________
>> >> >>>>>>>> infinispan-dev mailing list
>> >> >>>>>>>> infinispan-dev(a)lists.jboss.org
>> >> >>>>>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>> --
>> >> >>>>>>> Jason T. Greene
>> >> >>>>>>> JBoss, a division of Red Hat
>> >> >>>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> --
>> >> >>>>> Jason T. Greene
>> >> >>>>> JBoss, a division of Red Hat
>> >> >>>>>
>> >> >>>>
>> >> >>>> _______________________________________________
>> >> >>>> infinispan-dev mailing list
>> >> >>>> infinispan-dev(a)lists.jboss.org
>> >> >>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >> >>>
>> >> >>> _______________________________________________
>> >> >>> hibernate-dev mailing list
>> >> >>> hibernate-dev(a)lists.jboss.org
>> >> >>>
https://lists.jboss.org/mailman/listinfo/hibernate-dev
>> >> >>
>> >> >>
>> >> >
>> >
>> >
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
> Manik Surtani
> manik(a)jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
>
http://www.infinispan.org
>
http://www.jbosscache.org
>
>
>
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev