CR3: async stores, again

command id duplicated

Re: [infinispan-dev] CR3: async...

Sanne Grinovero

Wednesday, 25 August 2010 Wed, 25 Aug '10

8:48 a.m.

Hello, I'm testing CR3, I got it so far that I'm testing the stop-restart application cycle now, I'm using the JdbcStringBasedCacheStore in async mode, and every time I stop the application - which issues a clean Stop() on the CacheManager, it appears that some values are lost. This is fixed if I remove the async. Is this expected? I hope not, as this way I can't shutdown the cluster when using async. This time I can reproduce it with a UT. Cheers, Sanne

Show replies by date

Manik Surtani

Wednesday, 25 August Wed, 25 Aug

9:09 a.m.

On 25 Aug 2010, at 14:48, Sanne Grinovero wrote:

...

Cool, a unit test would help. Is this only with the JDBC store? -- Manik Surtani manik(a)jboss.org Lead, Infinispan Lead, JBoss Cache http://www.infinispan.org http://www.jbosscache.org

Sanne Grinovero

9:42 a.m.

2010/8/25 Manik Surtani <manik(a)jboss.org>:

...

On 25 Aug 2010, at 14:48, Sanne Grinovero wrote: > Hello, > I'm testing CR3, I got it so far that I'm testing the stop-restart > application cycle now, I'm using the JdbcStringBasedCacheStore in > async mode, > and every time I stop the application - which issues a clean Stop() on > the CacheManager, it appears that some values are lost. > This is fixed if I remove the async. > > Is this expected? I hope not, as this way I can't shutdown the cluster > when using async. > This time I can reproduce it with a UT. Cool, a unit test would help. Is this only with the JDBC store?

I understand this is not expected and created ISPN-618; you can find my failing unit test attached to the issue. I will try to reproduce the same issue depending on core only to understand if it's related to JDBC. Sanne

...

-- Manik Surtani manik(a)jboss.org Lead, Infinispan Lead, JBoss Cache http://www.infinispan.org http://www.jbosscache.org _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

Sanne Grinovero

Thursday, 26 August Thu, 26 Aug

3:05 a.m.

After (finally!) being able to isolate the issue with a core-only unit test I changed the name of the issue to "AsyncStore fails to save all values when batching" As: * it's not related to the JDBC stores but the Async wrapper * I can only reproduce it while batching I hacked a fix for me, which appears to work fine both with the unit tests and fix the staging issues in my project, but this changed quite a bit so we should talk about it: I see the current AsyncStore implementation is "coalescing" all changes being part of the same transaction during the prepare-commit phase, and then async-delegating the same prepare-commit to the decorated store. After that, it might also async-delegate a commit or a rollback. So if I understood it well enough, this means that: a) it's not merging changes from different transactions b) a commit (or rollback) could be received from the decorated store implementation with a noticeable delay (a full cycle of collected changes), and most importantly it seems to me that it might receive a commit in a different thread than the prepare (is that legal?). So I copied the transaction managing code from AbstractCacheStore (as AsyncStore doesn't inherit from it) and replaced the prepare, commit and rollback methods of AbstractCacheStore to the same as the AbstractCacheStore implementations: this way uncommitted changes are never inserted in the "state" map (nor are commit and rollback operations), and it seems that I end up sending to the delegate store only committed changes; best part of it is that I'm aggregating in a single "coalesced" state all changes from all transactions (only committed), this seems to me a good improvement? The code I'm having is not that clean, as it copied several lines of code from the AbstractCacheStore; to clean it up I think that the AbstractDelegatingStore should inherit from AbstractCacheStore, and avoid delegating the methods relating to transactions (in fact implementing what I did for the AsyncStore for all decorated stores). Also some unit tests will need fixing, as there are explicit tests asserting that the commit is done in another thread: could I delete these? Finally, as the delegated store is not being used transactionally anymore (it would never receive a prepare or commit), wouldn't it make sense to wrap all blocks of changes in a single transaction? rollback from a failure in the async store is, as far as I understood, not supported even with current code. I'm eager to hear you opinions, especially what I would break with this design. Cheers, Sanne 2010/8/25 Sanne Grinovero <sanne.grinovero(a)gmail.com>:

...

2010/8/25 Manik Surtani <manik(a)jboss.org>: > > On 25 Aug 2010, at 14:48, Sanne Grinovero wrote: > >> Hello, >> I'm testing CR3, I got it so far that I'm testing the stop-restart >> application cycle now, I'm using the JdbcStringBasedCacheStore in >> async mode, >> and every time I stop the application - which issues a clean Stop() on >> the CacheManager, it appears that some values are lost. >> This is fixed if I remove the async. >> >> Is this expected? I hope not, as this way I can't shutdown the cluster >> when using async. >> This time I can reproduce it with a UT. > > Cool, a unit test would help. Is this only with the JDBC store? I understand this is not expected and created ISPN-618; you can find my failing unit test attached to the issue. I will try to reproduce the same issue depending on core only to understand if it's related to JDBC. Sanne > > -- > Manik Surtani > manik(a)jboss.org > Lead, Infinispan > Lead, JBoss Cache > http://www.infinispan.org > http://www.jbosscache.org > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev >

Manik Surtani

5:54 a.m.

Hi Sanne Thanks for the detailed analysis! On 26 Aug 2010, at 09:05, Sanne Grinovero wrote:

...

I believe some TMs do allow prepare-and-commit on different threads. Definitely prepare-and-rollback.

...

So I copied the transaction managing code from AbstractCacheStore (as AsyncStore doesn't inherit from it) and replaced the prepare, commit and rollback methods of AbstractCacheStore to the same as the AbstractCacheStore implementations: this way uncommitted changes are never inserted in the "state" map (nor are commit and rollback operations), and it seems that I end up sending to the delegate store only committed changes; best part of it is that I'm aggregating in a single "coalesced" state all changes from all transactions (only committed), this seems to me a good improvement?

Definitely.

...

The code I'm having is not that clean, as it copied several lines of code from the AbstractCacheStore; to clean it up I think that the AbstractDelegatingStore should inherit from AbstractCacheStore, and avoid delegating the methods relating to transactions (in fact implementing what I did for the AsyncStore for all decorated stores).

Yes, much cleaner that way.

...

Also some unit tests will need fixing, as there are explicit tests asserting that the commit is done in another thread: could I delete these?

Like I said we should still support commit on a separate thread.

...

Finally, as the delegated store is not being used transactionally anymore (it would never receive a prepare or commit), wouldn't it make sense to wrap all blocks of changes in a single transaction? rollback from a failure in the async store is, as far as I understood, not supported even with current code.

Yes, true.

...

I'm eager to hear you opinions, especially what I would break with this design. Cheers, Sanne 2010/8/25 Sanne Grinovero <sanne.grinovero(a)gmail.com>: > 2010/8/25 Manik Surtani <manik(a)jboss.org>: >> >> On 25 Aug 2010, at 14:48, Sanne Grinovero wrote: >> >>> Hello, >>> I'm testing CR3, I got it so far that I'm testing the stop-restart >>> application cycle now, I'm using the JdbcStringBasedCacheStore in >>> async mode, >>> and every time I stop the application - which issues a clean Stop() on >>> the CacheManager, it appears that some values are lost. >>> This is fixed if I remove the async. >>> >>> Is this expected? I hope not, as this way I can't shutdown the cluster >>> when using async. >>> This time I can reproduce it with a UT. >> >> Cool, a unit test would help. Is this only with the JDBC store? > > I understand this is not expected and created ISPN-618; you can find > my failing unit test attached to the issue. > > I will try to reproduce the same issue depending on core only to > understand if it's related to JDBC. > > Sanne > >> >> -- >> Manik Surtani >> manik(a)jboss.org >> Lead, Infinispan >> Lead, JBoss Cache >> http://www.infinispan.org >> http://www.jbosscache.org >> >> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> > _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

-- Manik Surtani manik(a)jboss.org Lead, Infinispan Lead, JBoss Cache http://www.infinispan.org http://www.jbosscache.org

Sanne Grinovero

6:22 a.m.

Hi manik, thanks for reading through my long mail. though I didn't understand what you think I should do now :) Regarding to "Like I said we should still support commit on a separate thread.", you mean my approach is not correct? But then you say "Definitely" relating to my question if this is an improvement. BTW why should you support "commit" on a separate thread, if all what the commit does is move a single value from one concurrent map to another one? the heavy work of sending the data down to the real store is performed in the separate thread. If you think this is not going in a good direction, I'll leave this alone, otherwise I could try cleaning up my current experiments and propose a patch. Sanne 2010/8/26 Manik Surtani <manik(a)jboss.org>:

...

Hi Sanne Thanks for the detailed analysis! On 26 Aug 2010, at 09:05, Sanne Grinovero wrote: > After (finally!) being able to isolate the issue with a core-only unit > test I changed the name of the issue to > "AsyncStore fails to save all values when batching" > As: > * it's not related to the JDBC stores but the Async wrapper > * I can only reproduce it while batching > > I hacked a fix for me, which appears to work fine both with the unit > tests and fix the staging issues in my project, but this changed quite > a bit so we should talk about it: > > I see the current AsyncStore implementation is "coalescing" all > changes being part of the same transaction during the prepare-commit > phase, and then async-delegating the same prepare-commit to the > decorated store. After that, it might also async-delegate a commit or > a rollback. > So if I understood it well enough, this means that: > > a) it's not merging changes from different transactions > > b) a commit (or rollback) could be received from the decorated > store implementation with a noticeable delay (a full cycle of > collected changes), and most importantly it seems to me that it might > receive a commit in a different thread than the prepare (is that > legal?). I believe some TMs do allow prepare-and-commit on different threads. Definitely prepare-and-rollback. > So I copied the transaction managing code from AbstractCacheStore (as > AsyncStore doesn't inherit from it) and replaced the prepare, commit > and rollback methods of AbstractCacheStore to the same as the > AbstractCacheStore implementations: > this way uncommitted changes are never inserted in the "state" map > (nor are commit and rollback operations), and it seems that I end up > sending to the delegate store only committed changes; > best part of it is that I'm aggregating in a single "coalesced" state > all changes from all transactions (only committed), this seems to me a > good improvement? Definitely. > The code I'm having is not that clean, as it copied several lines of > code from the AbstractCacheStore; to clean it up I think that the > AbstractDelegatingStore should inherit from AbstractCacheStore, and > avoid delegating the methods relating to transactions (in fact > implementing what I did for the AsyncStore for all decorated stores). Yes, much cleaner that way. > Also some unit tests will need fixing, as there are explicit tests > asserting that the commit is done in another thread: could I delete > these? Like I said we should still support commit on a separate thread. > Finally, as the delegated store is not being used transactionally > anymore (it would never receive a prepare or commit), wouldn't it make > sense to wrap all blocks of changes in a single transaction? rollback > from a failure in the async store is, as far as I understood, not > supported even with current code. Yes, true. > I'm eager to hear you opinions, especially what I would break with this design. > > Cheers, > Sanne > > > 2010/8/25 Sanne Grinovero <sanne.grinovero(a)gmail.com>: >> 2010/8/25 Manik Surtani <manik(a)jboss.org>: >>> >>> On 25 Aug 2010, at 14:48, Sanne Grinovero wrote: >>> >>>> Hello, >>>> I'm testing CR3, I got it so far that I'm testing the stop-restart >>>> application cycle now, I'm using the JdbcStringBasedCacheStore in >>>> async mode, >>>> and every time I stop the application - which issues a clean Stop() on >>>> the CacheManager, it appears that some values are lost. >>>> This is fixed if I remove the async. >>>> >>>> Is this expected? I hope not, as this way I can't shutdown the cluster >>>> when using async. >>>> This time I can reproduce it with a UT. >>> >>> Cool, a unit test would help. Is this only with the JDBC store? >> >> I understand this is not expected and created ISPN-618; you can find >> my failing unit test attached to the issue. >> >> I will try to reproduce the same issue depending on core only to >> understand if it's related to JDBC. >> >> Sanne >> >>> >>> -- >>> Manik Surtani >>> manik(a)jboss.org >>> Lead, Infinispan >>> Lead, JBoss Cache >>> http://www.infinispan.org >>> http://www.jbosscache.org >>> >>> >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >> > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani manik(a)jboss.org Lead, Infinispan Lead, JBoss Cache http://www.infinispan.org http://www.jbosscache.org _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

Manik Surtani

7:01 a.m.

On 26 Aug 2010, at 12:22, Sanne Grinovero wrote:

...

I'm not sure how your approach prevents a commit on a separate thread. :)

...

BTW why should you support "commit" on a separate thread, if all what the commit does is move a single value from one concurrent map to another one? the heavy work of sending the data down to the real store is performed in the separate thread.

Ah sorry - I think I have misunderstood you. I thought what you meant was that T1 may call prepare() and T2 may call commit(), imposing restrictions on how your impl may make use of, say, a ThreadLocal. You are absolutely correct in that the work done in commit() does not need to be taken offline/performed asynchronously.

...

If you think this is not going in a good direction, I'll leave this alone, otherwise I could try cleaning up my current experiments and propose a patch.

No, I think this is definitely a good direction. :)

...

Sanne 2010/8/26 Manik Surtani <manik(a)jboss.org>: > Hi Sanne > > Thanks for the detailed analysis! > > On 26 Aug 2010, at 09:05, Sanne Grinovero wrote: > >> After (finally!) being able to isolate the issue with a core-only unit >> test I changed the name of the issue to >> "AsyncStore fails to save all values when batching" >> As: >> * it's not related to the JDBC stores but the Async wrapper >> * I can only reproduce it while batching >> >> I hacked a fix for me, which appears to work fine both with the unit >> tests and fix the staging issues in my project, but this changed quite >> a bit so we should talk about it: >> >> I see the current AsyncStore implementation is "coalescing" all >> changes being part of the same transaction during the prepare-commit >> phase, and then async-delegating the same prepare-commit to the >> decorated store. After that, it might also async-delegate a commit or >> a rollback. >> So if I understood it well enough, this means that: >> >> a) it's not merging changes from different transactions >> >> b) a commit (or rollback) could be received from the decorated >> store implementation with a noticeable delay (a full cycle of >> collected changes), and most importantly it seems to me that it might >> receive a commit in a different thread than the prepare (is that >> legal?). > > I believe some TMs do allow prepare-and-commit on different threads. Definitely prepare-and-rollback. > >> So I copied the transaction managing code from AbstractCacheStore (as >> AsyncStore doesn't inherit from it) and replaced the prepare, commit >> and rollback methods of AbstractCacheStore to the same as the >> AbstractCacheStore implementations: >> this way uncommitted changes are never inserted in the "state" map >> (nor are commit and rollback operations), and it seems that I end up >> sending to the delegate store only committed changes; >> best part of it is that I'm aggregating in a single "coalesced" state >> all changes from all transactions (only committed), this seems to me a >> good improvement? > > Definitely. > >> The code I'm having is not that clean, as it copied several lines of >> code from the AbstractCacheStore; to clean it up I think that the >> AbstractDelegatingStore should inherit from AbstractCacheStore, and >> avoid delegating the methods relating to transactions (in fact >> implementing what I did for the AsyncStore for all decorated stores). > > Yes, much cleaner that way. > >> Also some unit tests will need fixing, as there are explicit tests >> asserting that the commit is done in another thread: could I delete >> these? > > Like I said we should still support commit on a separate thread. > >> Finally, as the delegated store is not being used transactionally >> anymore (it would never receive a prepare or commit), wouldn't it make >> sense to wrap all blocks of changes in a single transaction? rollback >> from a failure in the async store is, as far as I understood, not >> supported even with current code. > > Yes, true. > >> I'm eager to hear you opinions, especially what I would break with this design. >> >> Cheers, >> Sanne >> >> >> 2010/8/25 Sanne Grinovero <sanne.grinovero(a)gmail.com>: >>> 2010/8/25 Manik Surtani <manik(a)jboss.org>: >>>> >>>> On 25 Aug 2010, at 14:48, Sanne Grinovero wrote: >>>> >>>>> Hello, >>>>> I'm testing CR3, I got it so far that I'm testing the stop-restart >>>>> application cycle now, I'm using the JdbcStringBasedCacheStore in >>>>> async mode, >>>>> and every time I stop the application - which issues a clean Stop() on >>>>> the CacheManager, it appears that some values are lost. >>>>> This is fixed if I remove the async. >>>>> >>>>> Is this expected? I hope not, as this way I can't shutdown the cluster >>>>> when using async. >>>>> This time I can reproduce it with a UT. >>>> >>>> Cool, a unit test would help. Is this only with the JDBC store? >>> >>> I understand this is not expected and created ISPN-618; you can find >>> my failing unit test attached to the issue. >>> >>> I will try to reproduce the same issue depending on core only to >>> understand if it's related to JDBC. >>> >>> Sanne >>> >>>> >>>> -- >>>> Manik Surtani >>>> manik(a)jboss.org >>>> Lead, Infinispan >>>> Lead, JBoss Cache >>>> http://www.infinispan.org >>>> http://www.jbosscache.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev(a)lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > -- > Manik Surtani > manik(a)jboss.org > Lead, Infinispan > Lead, JBoss Cache > http://www.infinispan.org > http://www.jbosscache.org > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

-- Manik Surtani manik(a)jboss.org Lead, Infinispan Lead, JBoss Cache http://www.infinispan.org http://www.jbosscache.org

Sanne Grinovero

7:20 a.m.

2010/8/26 Manik Surtani <manik(a)jboss.org>:

...

On 26 Aug 2010, at 12:22, Sanne Grinovero wrote: > Hi manik, thanks for reading through my long mail. > > though I didn't understand what you think I should do now :) > Regarding to "Like I said we should still support commit on a separate thread.", > you mean my approach is not correct? > But then you say "Definitely" relating to my question if this is an improvement. I'm not sure how your approach prevents a commit on a separate thread. :)

because the caller of AsyncStore.prepare(..) - T1 - will have the modifications list added to the "transactions" juc.ConcurrentHashMap, using the TX as a key, and this same values will be pushed to "state" by AsyncStore.commit(..), nothing else is done, the writing to the decorated store is done by T2 using the current logic which takes modifications from "state", totally independent and unaware to the existing transaction used by T1.

...

> BTW why should you support "commit" on a separate thread, if all what > the commit does is move a single value from one concurrent map to > another one? the heavy work of sending the data down to the real store > is performed in the separate thread. Ah sorry - I think I have misunderstood you. I thought what you meant was that T1 may call prepare() and T2 may call commit(), imposing restrictions on how your impl may make use of, say, a ThreadLocal.

that's what is currently possible as the Executor in asyncStore might have >1 threads, on turn taking some of the operations, so some prepares() could go to thread X and some commits() to thread Y, so it seems to me that thread X might perform the prepare for TX1 and the thread Y receive the commit for the same TX1 (they might even try to perform them in inverted order).

...

You are absolutely correct in that the work done in commit() does not need to be taken offline/performed asynchronously.

...

> If you think this is not going in a good direction, I'll leave this > alone, otherwise I could try cleaning > up my current experiments and propose a patch. No, I think this is definitely a good direction. :)

Ok, I'll try cleaning up and send you a patch. I assume this should go in 4.1.0.Final, so we should handle it with care.. please continue supporting me as this code is all new for me, I just make assumptions on some parts of it. Sanne

...

> > 2010/8/26 Manik Surtani <manik(a)jboss.org>: >> Hi Sanne >> >> Thanks for the detailed analysis! >> >> On 26 Aug 2010, at 09:05, Sanne Grinovero wrote: >> >>> After (finally!) being able to isolate the issue with a core-only unit >>> test I changed the name of the issue to >>> "AsyncStore fails to save all values when batching" >>> As: >>> * it's not related to the JDBC stores but the Async wrapper >>> * I can only reproduce it while batching >>> >>> I hacked a fix for me, which appears to work fine both with the unit >>> tests and fix the staging issues in my project, but this changed quite >>> a bit so we should talk about it: >>> >>> I see the current AsyncStore implementation is "coalescing" all >>> changes being part of the same transaction during the prepare-commit >>> phase, and then async-delegating the same prepare-commit to the >>> decorated store. After that, it might also async-delegate a commit or >>> a rollback. >>> So if I understood it well enough, this means that: >>> >>> a) it's not merging changes from different transactions >>> >>> b) a commit (or rollback) could be received from the decorated >>> store implementation with a noticeable delay (a full cycle of >>> collected changes), and most importantly it seems to me that it might >>> receive a commit in a different thread than the prepare (is that >>> legal?). >> >> I believe some TMs do allow prepare-and-commit on different threads. Definitely prepare-and-rollback. >> >>> So I copied the transaction managing code from AbstractCacheStore (as >>> AsyncStore doesn't inherit from it) and replaced the prepare, commit >>> and rollback methods of AbstractCacheStore to the same as the >>> AbstractCacheStore implementations: >>> this way uncommitted changes are never inserted in the "state" map >>> (nor are commit and rollback operations), and it seems that I end up >>> sending to the delegate store only committed changes; >>> best part of it is that I'm aggregating in a single "coalesced" state >>> all changes from all transactions (only committed), this seems to me a >>> good improvement? >> >> Definitely. >> >>> The code I'm having is not that clean, as it copied several lines of >>> code from the AbstractCacheStore; to clean it up I think that the >>> AbstractDelegatingStore should inherit from AbstractCacheStore, and >>> avoid delegating the methods relating to transactions (in fact >>> implementing what I did for the AsyncStore for all decorated stores). >> >> Yes, much cleaner that way. >> >>> Also some unit tests will need fixing, as there are explicit tests >>> asserting that the commit is done in another thread: could I delete >>> these? >> >> Like I said we should still support commit on a separate thread. >> >>> Finally, as the delegated store is not being used transactionally >>> anymore (it would never receive a prepare or commit), wouldn't it make >>> sense to wrap all blocks of changes in a single transaction? rollback >>> from a failure in the async store is, as far as I understood, not >>> supported even with current code. >> >> Yes, true. >> >>> I'm eager to hear you opinions, especially what I would break with this design. >>> >>> Cheers, >>> Sanne >>> >>> >>> 2010/8/25 Sanne Grinovero <sanne.grinovero(a)gmail.com>: >>>> 2010/8/25 Manik Surtani <manik(a)jboss.org>: >>>>> >>>>> On 25 Aug 2010, at 14:48, Sanne Grinovero wrote: >>>>> >>>>>> Hello, >>>>>> I'm testing CR3, I got it so far that I'm testing the stop-restart >>>>>> application cycle now, I'm using the JdbcStringBasedCacheStore in >>>>>> async mode, >>>>>> and every time I stop the application - which issues a clean Stop() on >>>>>> the CacheManager, it appears that some values are lost. >>>>>> This is fixed if I remove the async. >>>>>> >>>>>> Is this expected? I hope not, as this way I can't shutdown the cluster >>>>>> when using async. >>>>>> This time I can reproduce it with a UT. >>>>> >>>>> Cool, a unit test would help. Is this only with the JDBC store? >>>> >>>> I understand this is not expected and created ISPN-618; you can find >>>> my failing unit test attached to the issue. >>>> >>>> I will try to reproduce the same issue depending on core only to >>>> understand if it's related to JDBC. >>>> >>>> Sanne >>>> >>>>> >>>>> -- >>>>> Manik Surtani >>>>> manik(a)jboss.org >>>>> Lead, Infinispan >>>>> Lead, JBoss Cache >>>>> http://www.infinispan.org >>>>> http://www.jbosscache.org >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev(a)lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>> >>>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> -- >> Manik Surtani >> manik(a)jboss.org >> Lead, Infinispan >> Lead, JBoss Cache >> http://www.infinispan.org >> http://www.jbosscache.org >> >> >> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani manik(a)jboss.org Lead, Infinispan Lead, JBoss Cache http://www.infinispan.org http://www.jbosscache.org _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

Sanne Grinovero

Friday, 27 August Fri, 27 Aug

1:37 p.m.

...

2010/8/26 Manik Surtani <manik(a)jboss.org>: > > On 26 Aug 2010, at 12:22, Sanne Grinovero wrote: > >> Hi manik, thanks for reading through my long mail. >> >> though I didn't understand what you think I should do now :) >> Regarding to "Like I said we should still support commit on a separate thread.", >> you mean my approach is not correct? >> But then you say "Definitely" relating to my question if this is an improvement. > > I'm not sure how your approach prevents a commit on a separate thread. :) because the caller of AsyncStore.prepare(..) - T1 - will have the modifications list added to the "transactions" juc.ConcurrentHashMap, using the TX as a key, and this same values will be pushed to "state" by AsyncStore.commit(..), nothing else is done, the writing to the decorated store is done by T2 using the current logic which takes modifications from "state", totally independent and unaware to the existing transaction used by T1. > >> BTW why should you support "commit" on a separate thread, if all what >> the commit does is move a single value from one concurrent map to >> another one? the heavy work of sending the data down to the real store >> is performed in the separate thread. > > Ah sorry - I think I have misunderstood you. I thought what you meant was that T1 may call prepare() and T2 may call commit(), imposing restrictions on how your impl may make use of, say, a ThreadLocal. that's what is currently possible as the Executor in asyncStore might have >1 threads, on turn taking some of the operations, so some prepares() could go to thread X and some commits() to thread Y, so it seems to me that thread X might perform the prepare for TX1 and the thread Y receive the commit for the same TX1 (they might even try to perform them in inverted order). > > You are absolutely correct in that the work done in commit() does not need to be taken offline/performed asynchronously. Ok > >> If you think this is not going in a good direction, I'll leave this >> alone, otherwise I could try cleaning >> up my current experiments and propose a patch. > > No, I think this is definitely a good direction. :) Ok, I'll try cleaning up and send you a patch. I assume this should go in 4.1.0.Final, so we should handle it with care.. please continue supporting me as this code is all new for me, I just make assumptions on some parts of it. Sanne >> >> 2010/8/26 Manik Surtani <manik(a)jboss.org>: >>> Hi Sanne >>> >>> Thanks for the detailed analysis! >>> >>> On 26 Aug 2010, at 09:05, Sanne Grinovero wrote: >>> >>>> After (finally!) being able to isolate the issue with a core-only unit >>>> test I changed the name of the issue to >>>> "AsyncStore fails to save all values when batching" >>>> As: >>>> * it's not related to the JDBC stores but the Async wrapper >>>> * I can only reproduce it while batching >>>> >>>> I hacked a fix for me, which appears to work fine both with the unit >>>> tests and fix the staging issues in my project, but this changed quite >>>> a bit so we should talk about it: >>>> >>>> I see the current AsyncStore implementation is "coalescing" all >>>> changes being part of the same transaction during the prepare-commit >>>> phase, and then async-delegating the same prepare-commit to the >>>> decorated store. After that, it might also async-delegate a commit or >>>> a rollback. >>>> So if I understood it well enough, this means that: >>>> >>>> a) it's not merging changes from different transactions >>>> >>>> b) a commit (or rollback) could be received from the decorated >>>> store implementation with a noticeable delay (a full cycle of >>>> collected changes), and most importantly it seems to me that it might >>>> receive a commit in a different thread than the prepare (is that >>>> legal?). >>> >>> I believe some TMs do allow prepare-and-commit on different threads. Definitely prepare-and-rollback. >>> >>>> So I copied the transaction managing code from AbstractCacheStore (as >>>> AsyncStore doesn't inherit from it) and replaced the prepare, commit >>>> and rollback methods of AbstractCacheStore to the same as the >>>> AbstractCacheStore implementations: >>>> this way uncommitted changes are never inserted in the "state" map >>>> (nor are commit and rollback operations), and it seems that I end up >>>> sending to the delegate store only committed changes; >>>> best part of it is that I'm aggregating in a single "coalesced" state >>>> all changes from all transactions (only committed), this seems to me a >>>> good improvement? >>> >>> Definitely. >>> >>>> The code I'm having is not that clean, as it copied several lines of >>>> code from the AbstractCacheStore; to clean it up I think that the >>>> AbstractDelegatingStore should inherit from AbstractCacheStore, and >>>> avoid delegating the methods relating to transactions (in fact >>>> implementing what I did for the AsyncStore for all decorated stores). >>> >>> Yes, much cleaner that way. >>> >>>> Also some unit tests will need fixing, as there are explicit tests >>>> asserting that the commit is done in another thread: could I delete >>>> these? >>> >>> Like I said we should still support commit on a separate thread. >>> >>>> Finally, as the delegated store is not being used transactionally >>>> anymore (it would never receive a prepare or commit), wouldn't it make >>>> sense to wrap all blocks of changes in a single transaction? rollback >>>> from a failure in the async store is, as far as I understood, not >>>> supported even with current code. >>> >>> Yes, true. >>> >>>> I'm eager to hear you opinions, especially what I would break with this design. >>>> >>>> Cheers, >>>> Sanne >>>> >>>> >>>> 2010/8/25 Sanne Grinovero <sanne.grinovero(a)gmail.com>: >>>>> 2010/8/25 Manik Surtani <manik(a)jboss.org>: >>>>>> >>>>>> On 25 Aug 2010, at 14:48, Sanne Grinovero wrote: >>>>>> >>>>>>> Hello, >>>>>>> I'm testing CR3, I got it so far that I'm testing the stop-restart >>>>>>> application cycle now, I'm using the JdbcStringBasedCacheStore in >>>>>>> async mode, >>>>>>> and every time I stop the application - which issues a clean Stop() on >>>>>>> the CacheManager, it appears that some values are lost. >>>>>>> This is fixed if I remove the async. >>>>>>> >>>>>>> Is this expected? I hope not, as this way I can't shutdown the cluster >>>>>>> when using async. >>>>>>> This time I can reproduce it with a UT. >>>>>> >>>>>> Cool, a unit test would help. Is this only with the JDBC store? >>>>> >>>>> I understand this is not expected and created ISPN-618; you can find >>>>> my failing unit test attached to the issue. >>>>> >>>>> I will try to reproduce the same issue depending on core only to >>>>> understand if it's related to JDBC. >>>>> >>>>> Sanne >>>>> >>>>>> >>>>>> -- >>>>>> Manik Surtani >>>>>> manik(a)jboss.org >>>>>> Lead, Infinispan >>>>>> Lead, JBoss Cache >>>>>> http://www.infinispan.org >>>>>> http://www.jbosscache.org >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> infinispan-dev mailing list >>>>>> infinispan-dev(a)lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>> >>>>> >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev(a)lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> -- >>> Manik Surtani >>> manik(a)jboss.org >>> Lead, Infinispan >>> Lead, JBoss Cache >>> http://www.infinispan.org >>> http://www.jbosscache.org >>> >>> >>> >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > -- > Manik Surtani > manik(a)jboss.org > Lead, Infinispan > Lead, JBoss Cache > http://www.infinispan.org > http://www.jbosscache.org > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev >

Galder Zamarreño

Monday, 30 August Mon, 30 Aug

3:18 a.m.

...

Some more news about this: Mircea attached a patch he was working on to JIRA, but he hadn't time to fix all tests, so I'm continuing. There are two more issues: 1)keeping an ordered list of transactions works fine for transactions only, but modifications done in a transaction are wrongly reordered with operations not being done in a transaction. 2) Even in the committed current code, I see that a clear() operation is sent async together with all other operations. Then, when the state changes are performed, the clear() causes the previously found operations to be canceled, but "previous" applies here on the iterator on the state map, not on the order of application of those state Modifications, so the clear() is removing random Modifications. 3) Also my attempt to reuse some code from AbstractCacheStore though inheritance is not going to make it as this implementation is doing too much for the needs of AsyncStore: it cares for async Purge, and all methods I wanted to commit are slightly different because of that. Proposal: 1- Out-of-transaction modifications are pushed into a BlockingDeque instead of a Map. 2- A prepare(tx, list) sets the list aside in the "transactions" Map. 3- A commit(tx) takes the list of operations from the "transactions" and pushes the list on the same BlockingDeque as all other operations, guaranteeing same order of application. 4- a clear operation clears the deque and pushes a Clear() operation on it 5- a purge operation is sent though the same deque: only risk I see is being removed by a clear, but I bet there won't be much to purge after that :) Also we don't care if the purge is going to be handled async in the other store, that's a store implementation detail from what I see in AbstractCacheStore. 6- A single processor of AsyncStore drains blocks of a fixed configurable amount, puts them in a local map, applies the coalescing algorithm, and puts the results on a new state map. It seems to me that this worker needs to be exactly one, or reordering problems arise again in the order of the blocks taken from the deque. 6/a - When receiving a clear() - exceptional case - we clear the map of operations, and issue the clear() synchronously: not adding anything to the map, and aquiring a write lock "clearop" to avoid race conditions with the workers which might already be busy but happen to finish after the clear(). 6/b - When receiving a purge() this bypasses other logic and is directly passed to the decorated Store. 7- Now here it becomes tricky, we removed the transactions problem, the clear operation and the purge from the table. But there still is a map of Modifications to be applied, by N threads, and to which another thread might write. This looks like the original problem in simplified form, but reorderings are still possible as worker A might get some work on key K1, then worker B kicks in, gets new work about K1 and happens to perform it before A gets priority again. Clearly this must be performed either by one thread or using very fine grained locking: as we're only interested on a per-key lock, we don't care on the order of operations on other keys: we could use a second map to store the keys which "are being processed": when a worker takes some work it uses atomic operations to check if somebody is working on the same stuff, if it is it could either wait or put it back (if it wasn't changed, otherwise discard) and pick another work to do. Comments? AFAIR the issue I mention on point 7 is a known limitation of Async, so I could implement that in future as an improvement? I would be fine in having just one worker. I wish I had time to do it all soon :| Sanne 2010/8/26 Sanne Grinovero <sanne.grinovero(a)gmail.com>: > 2010/8/26 Manik Surtani <manik(a)jboss.org>: >> >> On 26 Aug 2010, at 12:22, Sanne Grinovero wrote: >> >>> Hi manik, thanks for reading through my long mail. >>> >>> though I didn't understand what you think I should do now :) >>> Regarding to "Like I said we should still support commit on a separate thread.", >>> you mean my approach is not correct? >>> But then you say "Definitely" relating to my question if this is an improvement. >> >> I'm not sure how your approach prevents a commit on a separate thread. :) > > because the caller of AsyncStore.prepare(..) - T1 - will have the > modifications list added to the "transactions" juc.ConcurrentHashMap, > using the TX as a key, and this same values will be pushed to "state" > by AsyncStore.commit(..), nothing else is done, the writing to the > decorated store is done by T2 using the current logic which takes > modifications from "state", totally independent and unaware to the > existing transaction used by T1. > >> >>> BTW why should you support "commit" on a separate thread, if all what >>> the commit does is move a single value from one concurrent map to >>> another one? the heavy work of sending the data down to the real store >>> is performed in the separate thread. >> >> Ah sorry - I think I have misunderstood you. I thought what you meant was that T1 may call prepare() and T2 may call commit(), imposing restrictions on how your impl may make use of, say, a ThreadLocal. > > that's what is currently possible as the Executor in asyncStore might > have >1 threads, on turn taking some of the operations, so some > prepares() could go to thread X and some commits() to thread Y, so it > seems to me that thread X might perform the prepare for TX1 and the > thread Y receive the commit for the same TX1 (they might even try to > perform them in inverted order). > >> >> You are absolutely correct in that the work done in commit() does not need to be taken offline/performed asynchronously. > > Ok > >> >>> If you think this is not going in a good direction, I'll leave this >>> alone, otherwise I could try cleaning >>> up my current experiments and propose a patch. >> >> No, I think this is definitely a good direction. :) > > Ok, I'll try cleaning up and send you a patch. I assume this should go > in 4.1.0.Final, so we should handle it with care.. please continue > supporting me as this code is all new for me, I just make assumptions > on some parts of it. > > Sanne > >>> >>> 2010/8/26 Manik Surtani <manik(a)jboss.org>: >>>> Hi Sanne >>>> >>>> Thanks for the detailed analysis! >>>> >>>> On 26 Aug 2010, at 09:05, Sanne Grinovero wrote: >>>> >>>>> After (finally!) being able to isolate the issue with a core-only unit >>>>> test I changed the name of the issue to >>>>> "AsyncStore fails to save all values when batching" >>>>> As: >>>>> * it's not related to the JDBC stores but the Async wrapper >>>>> * I can only reproduce it while batching >>>>> >>>>> I hacked a fix for me, which appears to work fine both with the unit >>>>> tests and fix the staging issues in my project, but this changed quite >>>>> a bit so we should talk about it: >>>>> >>>>> I see the current AsyncStore implementation is "coalescing" all >>>>> changes being part of the same transaction during the prepare-commit >>>>> phase, and then async-delegating the same prepare-commit to the >>>>> decorated store. After that, it might also async-delegate a commit or >>>>> a rollback. >>>>> So if I understood it well enough, this means that: >>>>> >>>>> a) it's not merging changes from different transactions >>>>> >>>>> b) a commit (or rollback) could be received from the decorated >>>>> store implementation with a noticeable delay (a full cycle of >>>>> collected changes), and most importantly it seems to me that it might >>>>> receive a commit in a different thread than the prepare (is that >>>>> legal?). >>>> >>>> I believe some TMs do allow prepare-and-commit on different threads. Definitely prepare-and-rollback. >>>> >>>>> So I copied the transaction managing code from AbstractCacheStore (as >>>>> AsyncStore doesn't inherit from it) and replaced the prepare, commit >>>>> and rollback methods of AbstractCacheStore to the same as the >>>>> AbstractCacheStore implementations: >>>>> this way uncommitted changes are never inserted in the "state" map >>>>> (nor are commit and rollback operations), and it seems that I end up >>>>> sending to the delegate store only committed changes; >>>>> best part of it is that I'm aggregating in a single "coalesced" state >>>>> all changes from all transactions (only committed), this seems to me a >>>>> good improvement? >>>> >>>> Definitely. >>>> >>>>> The code I'm having is not that clean, as it copied several lines of >>>>> code from the AbstractCacheStore; to clean it up I think that the >>>>> AbstractDelegatingStore should inherit from AbstractCacheStore, and >>>>> avoid delegating the methods relating to transactions (in fact >>>>> implementing what I did for the AsyncStore for all decorated stores). >>>> >>>> Yes, much cleaner that way. >>>> >>>>> Also some unit tests will need fixing, as there are explicit tests >>>>> asserting that the commit is done in another thread: could I delete >>>>> these? >>>> >>>> Like I said we should still support commit on a separate thread. >>>> >>>>> Finally, as the delegated store is not being used transactionally >>>>> anymore (it would never receive a prepare or commit), wouldn't it make >>>>> sense to wrap all blocks of changes in a single transaction? rollback >>>>> from a failure in the async store is, as far as I understood, not >>>>> supported even with current code. >>>> >>>> Yes, true. >>>> >>>>> I'm eager to hear you opinions, especially what I would break with this design. >>>>> >>>>> Cheers, >>>>> Sanne >>>>> >>>>> >>>>> 2010/8/25 Sanne Grinovero <sanne.grinovero(a)gmail.com>: >>>>>> 2010/8/25 Manik Surtani <manik(a)jboss.org>: >>>>>>> >>>>>>> On 25 Aug 2010, at 14:48, Sanne Grinovero wrote: >>>>>>> >>>>>>>> Hello, >>>>>>>> I'm testing CR3, I got it so far that I'm testing the stop-restart >>>>>>>> application cycle now, I'm using the JdbcStringBasedCacheStore in >>>>>>>> async mode, >>>>>>>> and every time I stop the application - which issues a clean Stop() on >>>>>>>> the CacheManager, it appears that some values are lost. >>>>>>>> This is fixed if I remove the async. >>>>>>>> >>>>>>>> Is this expected? I hope not, as this way I can't shutdown the cluster >>>>>>>> when using async. >>>>>>>> This time I can reproduce it with a UT. >>>>>>> >>>>>>> Cool, a unit test would help. Is this only with the JDBC store? >>>>>> >>>>>> I understand this is not expected and created ISPN-618; you can find >>>>>> my failing unit test attached to the issue. >>>>>> >>>>>> I will try to reproduce the same issue depending on core only to >>>>>> understand if it's related to JDBC. >>>>>> >>>>>> Sanne >>>>>> >>>>>>> >>>>>>> -- >>>>>>> Manik Surtani >>>>>>> manik(a)jboss.org >>>>>>> Lead, Infinispan >>>>>>> Lead, JBoss Cache >>>>>>> http://www.infinispan.org >>>>>>> http://www.jbosscache.org >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> infinispan-dev mailing list >>>>>>> infinispan-dev(a)lists.jboss.org >>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev(a)lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>>> -- >>>> Manik Surtani >>>> manik(a)jboss.org >>>> Lead, Infinispan >>>> Lead, JBoss Cache >>>> http://www.infinispan.org >>>> http://www.jbosscache.org >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev(a)lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> -- >> Manik Surtani >> manik(a)jboss.org >> Lead, Infinispan >> Lead, JBoss Cache >> http://www.infinispan.org >> http://www.jbosscache.org >> >> >> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> > _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

-- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache

Sanne Grinovero

Tuesday, 31 August Tue, 31 Aug

5:37 a.m.

Yesterday with Galder we came up with a nice implementation which appears now to solve my issues, has a safe behaviour around a "clear" operation and also, in case the store lags behind, it will "coalesce changes" from different transactions like we had planned with ISPN-620 (moving that back to 4.1). The downside is that some operations are performed by a single thread, called "AsyncStoreCoordinator", but still all operations are async from the cache side, and the bulk of work on the store is also async. The obvious hotspot is that all cache operations push the Modifications to an unbounded LinkedBlockingDeque.

...

From the Store's perspective all operations are allowed to be

performed in parallel, unless it's a clear operation which can't be parallelized with any kind of work, or if it's a key for which either another AsyncProcessor is working on or if the processor couldn't acquire a lock on it from the container, in this case the Modification is going to be postponed to the next harvest of changes and possibly merged with other Modifications on the same key. A little optimization is lost: previous implementation did avoid a "remove" operation after a clear, this one doesn't, it will still perform the remove. For the future, I think I know already where to introduce a little delay to implement Galder's Write-BehindCaching, which is likely a much needed improvement to prevent too many workers to spin around, just introducing some delay in the loop of AsyncStoreCoordinator. Another improvement I think should be done - in future - is to limit the amount of modifications assigned to each AsyncProcessor, this should result in better balanced work assigned to different threads as the current risk is that one thread has a long list of operations to perform while others are idling. To finish this, the stop() operation needs a timeout. when stopping it will wait for all pending changes to be written, so to be safe the default should be quite long (1 hour?). Previous implementation would stop the backing store but then wait for just 60 seconds before shutting down the decorated store. What should I do here, introduce a configurable timeout? I need a new configuration option to do that; alternatively if the 1 minute timeout was considered fine before, this rule of thumb could still apply (but data gets lost if it takes more to store it!). Special thanks to Markus to help me identify the bug and shaping the road to this design, and to Galder for carefully reviewing it. Cheers, Sanne 2010/8/30 Galder Zamarreño <galder(a)redhat.com>:

...

Taking in account how complex this is getting, we should consider what I mentioned in https://jira.jboss.org/browse/ISPN-329. The use of multiple worker threads to consume async messages and sent them to the store is used primarily to speed up sending those messages to the store. Trying to use coalescing here is not that useful, because the aim is sending the modifications as quickly as possible to the store. So, the chances of having multiple mods for same key are small. On the other hand, if you used a scheduled async store where you send modifications periodically, using a single worker thread makes more sense and coalescing here is key to avoid sending unnecessary modifications. Now, if we only use coalescing when there's a single worker thread, we'd avoid a lot of headaches below since there would not be reordering issues between diff worker threads. So, I think we should revert back the current async store to a multi worker, no coalescing async store, or what I call Unscheduled Write-Behind strategy. When we look at scheduled async store, we'd put back coalescing. For reference, please find attached an email I sent to Manik where I discussed the differences between scheduled and unscheduled async stores and the wiki http://community.jboss.org/wiki/Write-ThroughAndWrite-BehindCaching: On Aug 27, 2010, at 8:37 PM, Sanne Grinovero wrote: > Some more news about this: > Mircea attached a patch he was working on to JIRA, but he hadn't time > to fix all tests, so I'm continuing. > > There are two more issues: > > 1)keeping an ordered list of transactions works fine for transactions > only, but modifications done in a transaction are wrongly reordered > with operations not being done in a transaction. > > 2) Even in the committed current code, I see that a clear() operation > is sent async together with all other operations. Then, when the state > changes are performed, the clear() causes the previously found > operations to be canceled, but "previous" applies here on the iterator > on the state map, not on the order of application of those state > Modifications, so the clear() is removing random Modifications. > > 3) Also my attempt to reuse some code from AbstractCacheStore though > inheritance is not going to make it as this implementation is doing > too much for the needs of AsyncStore: it cares for async Purge, and > all methods I wanted to commit are slightly different because of that. > > Proposal: > 1- Out-of-transaction modifications are pushed into a BlockingDeque > instead of a Map. > > 2- A prepare(tx, list) sets the list aside in the "transactions" Map. > > 3- A commit(tx) takes the list of operations from the "transactions" > and pushes the list on the same BlockingDeque as all other operations, > guaranteeing same order of application. > > 4- a clear operation clears the deque and pushes a Clear() operation on it > > 5- a purge operation is sent though the same deque: only risk I see is > being removed by a clear, but I bet there won't be much to purge after > that :) Also we don't care if the purge is going to be handled async > in the other store, that's a store implementation detail from what I > see in AbstractCacheStore. > > 6- A single processor of AsyncStore drains blocks of a fixed > configurable amount, puts them in a local map, applies the coalescing > algorithm, and puts the results on a new state map. > It seems to me that this worker needs to be exactly one, or reordering > problems arise again in the order of the blocks taken from the deque. > > 6/a - When receiving a clear() - exceptional case - we clear the map > of operations, and issue the clear() synchronously: not adding > anything to the map, and aquiring a write lock "clearop" to avoid race > conditions with the workers which might already be busy but happen to > finish after the clear(). > > 6/b - When receiving a purge() this bypasses other logic and is > directly passed to the decorated Store. > > 7- Now here it becomes tricky, we removed the transactions problem, > the clear operation and the purge from the table. But there still is a > map of Modifications to be applied, by N threads, and to which another > thread might write. This looks like the original problem in simplified > form, but reorderings are still possible as worker A might get some > work on key K1, then worker B kicks in, gets new work about K1 and > happens to perform it before A gets priority again. > Clearly this must be performed either by one thread or using very fine > grained locking: as we're only interested on a per-key lock, we don't > care on the order of operations on other keys: we could use a second > map to store the keys which "are being processed": when a worker takes > some work it uses atomic operations to check if somebody is working on > the same stuff, if it is it could either wait or put it back (if it > wasn't changed, otherwise discard) and pick another work to do. > > Comments? > > AFAIR the issue I mention on point 7 is a known limitation of Async, > so I could implement that in future as an improvement? I would be fine > in having just one worker. > > I wish I had time to do it all soon :| > Sanne > > > 2010/8/26 Sanne Grinovero <sanne.grinovero(a)gmail.com>: >> 2010/8/26 Manik Surtani <manik(a)jboss.org>: >>> >>> On 26 Aug 2010, at 12:22, Sanne Grinovero wrote: >>> >>>> Hi manik, thanks for reading through my long mail. >>>> >>>> though I didn't understand what you think I should do now :) >>>> Regarding to "Like I said we should still support commit on a separate thread.", >>>> you mean my approach is not correct? >>>> But then you say "Definitely" relating to my question if this is an improvement. >>> >>> I'm not sure how your approach prevents a commit on a separate thread. :) >> >> because the caller of AsyncStore.prepare(..) - T1 - will have the >> modifications list added to the "transactions" juc.ConcurrentHashMap, >> using the TX as a key, and this same values will be pushed to "state" >> by AsyncStore.commit(..), nothing else is done, the writing to the >> decorated store is done by T2 using the current logic which takes >> modifications from "state", totally independent and unaware to the >> existing transaction used by T1. >> >>> >>>> BTW why should you support "commit" on a separate thread, if all what >>>> the commit does is move a single value from one concurrent map to >>>> another one? the heavy work of sending the data down to the real store >>>> is performed in the separate thread. >>> >>> Ah sorry - I think I have misunderstood you. I thought what you meant was that T1 may call prepare() and T2 may call commit(), imposing restrictions on how your impl may make use of, say, a ThreadLocal. >> >> that's what is currently possible as the Executor in asyncStore might >> have >1 threads, on turn taking some of the operations, so some >> prepares() could go to thread X and some commits() to thread Y, so it >> seems to me that thread X might perform the prepare for TX1 and the >> thread Y receive the commit for the same TX1 (they might even try to >> perform them in inverted order). >> >>> >>> You are absolutely correct in that the work done in commit() does not need to be taken offline/performed asynchronously. >> >> Ok >> >>> >>>> If you think this is not going in a good direction, I'll leave this >>>> alone, otherwise I could try cleaning >>>> up my current experiments and propose a patch. >>> >>> No, I think this is definitely a good direction. :) >> >> Ok, I'll try cleaning up and send you a patch. I assume this should go >> in 4.1.0.Final, so we should handle it with care.. please continue >> supporting me as this code is all new for me, I just make assumptions >> on some parts of it. >> >> Sanne >> >>>> >>>> 2010/8/26 Manik Surtani <manik(a)jboss.org>: >>>>> Hi Sanne >>>>> >>>>> Thanks for the detailed analysis! >>>>> >>>>> On 26 Aug 2010, at 09:05, Sanne Grinovero wrote: >>>>> >>>>>> After (finally!) being able to isolate the issue with a core-only unit >>>>>> test I changed the name of the issue to >>>>>> "AsyncStore fails to save all values when batching" >>>>>> As: >>>>>> * it's not related to the JDBC stores but the Async wrapper >>>>>> * I can only reproduce it while batching >>>>>> >>>>>> I hacked a fix for me, which appears to work fine both with the unit >>>>>> tests and fix the staging issues in my project, but this changed quite >>>>>> a bit so we should talk about it: >>>>>> >>>>>> I see the current AsyncStore implementation is "coalescing" all >>>>>> changes being part of the same transaction during the prepare-commit >>>>>> phase, and then async-delegating the same prepare-commit to the >>>>>> decorated store. After that, it might also async-delegate a commit or >>>>>> a rollback. >>>>>> So if I understood it well enough, this means that: >>>>>> >>>>>> a) it's not merging changes from different transactions >>>>>> >>>>>> b) a commit (or rollback) could be received from the decorated >>>>>> store implementation with a noticeable delay (a full cycle of >>>>>> collected changes), and most importantly it seems to me that it might >>>>>> receive a commit in a different thread than the prepare (is that >>>>>> legal?). >>>>> >>>>> I believe some TMs do allow prepare-and-commit on different threads. Definitely prepare-and-rollback. >>>>> >>>>>> So I copied the transaction managing code from AbstractCacheStore (as >>>>>> AsyncStore doesn't inherit from it) and replaced the prepare, commit >>>>>> and rollback methods of AbstractCacheStore to the same as the >>>>>> AbstractCacheStore implementations: >>>>>> this way uncommitted changes are never inserted in the "state" map >>>>>> (nor are commit and rollback operations), and it seems that I end up >>>>>> sending to the delegate store only committed changes; >>>>>> best part of it is that I'm aggregating in a single "coalesced" state >>>>>> all changes from all transactions (only committed), this seems to me a >>>>>> good improvement? >>>>> >>>>> Definitely. >>>>> >>>>>> The code I'm having is not that clean, as it copied several lines of >>>>>> code from the AbstractCacheStore; to clean it up I think that the >>>>>> AbstractDelegatingStore should inherit from AbstractCacheStore, and >>>>>> avoid delegating the methods relating to transactions (in fact >>>>>> implementing what I did for the AsyncStore for all decorated stores). >>>>> >>>>> Yes, much cleaner that way. >>>>> >>>>>> Also some unit tests will need fixing, as there are explicit tests >>>>>> asserting that the commit is done in another thread: could I delete >>>>>> these? >>>>> >>>>> Like I said we should still support commit on a separate thread. >>>>> >>>>>> Finally, as the delegated store is not being used transactionally >>>>>> anymore (it would never receive a prepare or commit), wouldn't it make >>>>>> sense to wrap all blocks of changes in a single transaction? rollback >>>>>> from a failure in the async store is, as far as I understood, not >>>>>> supported even with current code. >>>>> >>>>> Yes, true. >>>>> >>>>>> I'm eager to hear you opinions, especially what I would break with this design. >>>>>> >>>>>> Cheers, >>>>>> Sanne >>>>>> >>>>>> >>>>>> 2010/8/25 Sanne Grinovero <sanne.grinovero(a)gmail.com>: >>>>>>> 2010/8/25 Manik Surtani <manik(a)jboss.org>: >>>>>>>> >>>>>>>> On 25 Aug 2010, at 14:48, Sanne Grinovero wrote: >>>>>>>> >>>>>>>>> Hello, >>>>>>>>> I'm testing CR3, I got it so far that I'm testing the stop-restart >>>>>>>>> application cycle now, I'm using the JdbcStringBasedCacheStore in >>>>>>>>> async mode, >>>>>>>>> and every time I stop the application - which issues a clean Stop() on >>>>>>>>> the CacheManager, it appears that some values are lost. >>>>>>>>> This is fixed if I remove the async. >>>>>>>>> >>>>>>>>> Is this expected? I hope not, as this way I can't shutdown the cluster >>>>>>>>> when using async. >>>>>>>>> This time I can reproduce it with a UT. >>>>>>>> >>>>>>>> Cool, a unit test would help. Is this only with the JDBC store? >>>>>>> >>>>>>> I understand this is not expected and created ISPN-618; you can find >>>>>>> my failing unit test attached to the issue. >>>>>>> >>>>>>> I will try to reproduce the same issue depending on core only to >>>>>>> understand if it's related to JDBC. >>>>>>> >>>>>>> Sanne >>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Manik Surtani >>>>>>>> manik(a)jboss.org >>>>>>>> Lead, Infinispan >>>>>>>> Lead, JBoss Cache >>>>>>>> http://www.infinispan.org >>>>>>>> http://www.jbosscache.org >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> infinispan-dev mailing list >>>>>>>> infinispan-dev(a)lists.jboss.org >>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> infinispan-dev mailing list >>>>>> infinispan-dev(a)lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>> >>>>> -- >>>>> Manik Surtani >>>>> manik(a)jboss.org >>>>> Lead, Infinispan >>>>> Lead, JBoss Cache >>>>> http://www.infinispan.org >>>>> http://www.jbosscache.org >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev(a)lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>> >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev(a)lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> -- >>> Manik Surtani >>> manik(a)jboss.org >>> Lead, Infinispan >>> Lead, JBoss Cache >>> http://www.infinispan.org >>> http://www.jbosscache.org >>> >>> >>> >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >> > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

Mircea Markus

5:24 a.m.

On 27 Aug 2010, at 19:37, Sanne Grinovero wrote:

...

ah I see. The operations within a transaction do not get reordered though, right?

...

2) Even in the committed current code, I see that a clear() operation is sent async together with all other operations. Then, when the state changes are performed, the clear() causes the previously found operations to be canceled, but "previous" applies here on the iterator on the state map, not on the order of application of those state Modifications, so the clear() is removing random Modifications.

yes

...

3) Also my attempt to reuse some code from AbstractCacheStore though inheritance is not going to make it as this implementation is doing too much for the needs of AsyncStore: it cares for async Purge, and all methods I wanted to commit are slightly different because of that. Proposal: 1- Out-of-transaction modifications are pushed into a BlockingDeque instead of a Map.

do we need a deque for this? Also do we want it blocking? Perhaps we can make this configurable and operate on an interface directly?

...

2- A prepare(tx, list) sets the list aside in the "transactions" Map. 3- A commit(tx) takes the list of operations from the "transactions" and pushes the list on the same BlockingDeque as all other operations, guaranteeing same order of application.

looks good

...

4- a clear operation clears the deque and pushes a Clear() operation on it 5- a purge operation is sent though the same deque: only risk I see is being removed by a clear, but I bet there won't be much to purge after that :) Also we don't care if the purge is going to be handled async in the other store, that's a store implementation detail from what I see in AbstractCacheStore.

Not sure we need to process purge in an async way: it is already called asynchronously by the eviction thread, so we can just delegate the call to the underlaying call.

...

6- A single processor of AsyncStore drains blocks of a fixed configurable amount, puts them in a local map, applies the coalescing algorithm, and puts the results on a new state map. It seems to me that this worker needs to be exactly one, or reordering problems arise again in the order of the blocks taken from the deque.

...

6/a - When receiving a clear() - exceptional case - we clear the map of operations, and issue the clear() synchronously: not adding anything to the map, and aquiring a write lock "clearop" to avoid race conditions with the workers which might already be busy but happen to finish after the clear().

aren't we using only one worker as per the previous paragraph?

...

6/b - When receiving a purge() this bypasses other logic and is directly passed to the decorated Store.

...

7- Now here it becomes tricky, we removed the transactions problem, the clear operation and the purge from the table. But there still is a map of Modifications to be applied, by N threads, and to which another thread might write. This looks like the original problem in simplified form, but reorderings are still possible as worker A might get some work on key K1, then worker B kicks in, gets new work about K1 and happens to perform it before A gets priority again. Clearly this must be performed either by one thread or using very fine grained locking: as we're only interested on a per-key lock, we don't care on the order of operations on other keys: we could use a second map to store the keys which "are being processed": when a worker takes some work it uses atomic operations to check if somebody is working on the same stuff, if it is it could either wait or put it back (if it wasn't changed, otherwise discard) and pick another work to do.

It might be good to use the same approach as with async marshalling: use one thread per default which would guarantee ordering. Allow a user to specify more that one thread, and clearly explain that reordering might happen: there might be situations where one would not care about ordering after all. One more thing I've just noticed: AsyncStore processes Commit but doesn't do the same with Rollback. On rollback it should remove the tx from the transaction(as described at 2) ) map to avoid memory leaks.

...

Comments? AFAIR the issue I mention on point 7 is a known limitation of Async, so I could implement that in future as an improvement? I would be fine in having just one worker. I wish I had time to do it all soon :| Sanne

...

2010/8/26 Sanne Grinovero <sanne.grinovero(a)gmail.com>: > 2010/8/26 Manik Surtani <manik(a)jboss.org>: >> >> On 26 Aug 2010, at 12:22, Sanne Grinovero wrote: >> >>> Hi manik, thanks for reading through my long mail. >>> >>> though I didn't understand what you think I should do now :) >>> Regarding to "Like I said we should still support commit on a separate thread.", >>> you mean my approach is not correct? >>> But then you say "Definitely" relating to my question if this is an improvement. >> >> I'm not sure how your approach prevents a commit on a separate thread. :) > > because the caller of AsyncStore.prepare(..) - T1 - will have the > modifications list added to the "transactions" juc.ConcurrentHashMap, > using the TX as a key, and this same values will be pushed to "state" > by AsyncStore.commit(..), nothing else is done, the writing to the > decorated store is done by T2 using the current logic which takes > modifications from "state", totally independent and unaware to the > existing transaction used by T1. > >> >>> BTW why should you support "commit" on a separate thread, if all what >>> the commit does is move a single value from one concurrent map to >>> another one? the heavy work of sending the data down to the real store >>> is performed in the separate thread. >> >> Ah sorry - I think I have misunderstood you. I thought what you meant was that T1 may call prepare() and T2 may call commit(), imposing restrictions on how your impl may make use of, say, a ThreadLocal. > > that's what is currently possible as the Executor in asyncStore might > have >1 threads, on turn taking some of the operations, so some > prepares() could go to thread X and some commits() to thread Y, so it > seems to me that thread X might perform the prepare for TX1 and the > thread Y receive the commit for the same TX1 (they might even try to > perform them in inverted order). > >> >> You are absolutely correct in that the work done in commit() does not need to be taken offline/performed asynchronously. > > Ok > >> >>> If you think this is not going in a good direction, I'll leave this >>> alone, otherwise I could try cleaning >>> up my current experiments and propose a patch. >> >> No, I think this is definitely a good direction. :) > > Ok, I'll try cleaning up and send you a patch. I assume this should go > in 4.1.0.Final, so we should handle it with care.. please continue > supporting me as this code is all new for me, I just make assumptions > on some parts of it. > > Sanne > >>> >>> 2010/8/26 Manik Surtani <manik(a)jboss.org>: >>>> Hi Sanne >>>> >>>> Thanks for the detailed analysis! >>>> >>>> On 26 Aug 2010, at 09:05, Sanne Grinovero wrote: >>>> >>>>> After (finally!) being able to isolate the issue with a core-only unit >>>>> test I changed the name of the issue to >>>>> "AsyncStore fails to save all values when batching" >>>>> As: >>>>> * it's not related to the JDBC stores but the Async wrapper >>>>> * I can only reproduce it while batching >>>>> >>>>> I hacked a fix for me, which appears to work fine both with the unit >>>>> tests and fix the staging issues in my project, but this changed quite >>>>> a bit so we should talk about it: >>>>> >>>>> I see the current AsyncStore implementation is "coalescing" all >>>>> changes being part of the same transaction during the prepare-commit >>>>> phase, and then async-delegating the same prepare-commit to the >>>>> decorated store. After that, it might also async-delegate a commit or >>>>> a rollback. >>>>> So if I understood it well enough, this means that: >>>>> >>>>> a) it's not merging changes from different transactions >>>>> >>>>> b) a commit (or rollback) could be received from the decorated >>>>> store implementation with a noticeable delay (a full cycle of >>>>> collected changes), and most importantly it seems to me that it might >>>>> receive a commit in a different thread than the prepare (is that >>>>> legal?). >>>> >>>> I believe some TMs do allow prepare-and-commit on different threads. Definitely prepare-and-rollback. >>>> >>>>> So I copied the transaction managing code from AbstractCacheStore (as >>>>> AsyncStore doesn't inherit from it) and replaced the prepare, commit >>>>> and rollback methods of AbstractCacheStore to the same as the >>>>> AbstractCacheStore implementations: >>>>> this way uncommitted changes are never inserted in the "state" map >>>>> (nor are commit and rollback operations), and it seems that I end up >>>>> sending to the delegate store only committed changes; >>>>> best part of it is that I'm aggregating in a single "coalesced" state >>>>> all changes from all transactions (only committed), this seems to me a >>>>> good improvement? >>>> >>>> Definitely. >>>> >>>>> The code I'm having is not that clean, as it copied several lines of >>>>> code from the AbstractCacheStore; to clean it up I think that the >>>>> AbstractDelegatingStore should inherit from AbstractCacheStore, and >>>>> avoid delegating the methods relating to transactions (in fact >>>>> implementing what I did for the AsyncStore for all decorated stores). >>>> >>>> Yes, much cleaner that way. >>>> >>>>> Also some unit tests will need fixing, as there are explicit tests >>>>> asserting that the commit is done in another thread: could I delete >>>>> these? >>>> >>>> Like I said we should still support commit on a separate thread. >>>> >>>>> Finally, as the delegated store is not being used transactionally >>>>> anymore (it would never receive a prepare or commit), wouldn't it make >>>>> sense to wrap all blocks of changes in a single transaction? rollback >>>>> from a failure in the async store is, as far as I understood, not >>>>> supported even with current code. >>>> >>>> Yes, true. >>>> >>>>> I'm eager to hear you opinions, especially what I would break with this design. >>>>> >>>>> Cheers, >>>>> Sanne >>>>> >>>>> >>>>> 2010/8/25 Sanne Grinovero <sanne.grinovero(a)gmail.com>: >>>>>> 2010/8/25 Manik Surtani <manik(a)jboss.org>: >>>>>>> >>>>>>> On 25 Aug 2010, at 14:48, Sanne Grinovero wrote: >>>>>>> >>>>>>>> Hello, >>>>>>>> I'm testing CR3, I got it so far that I'm testing the stop-restart >>>>>>>> application cycle now, I'm using the JdbcStringBasedCacheStore in >>>>>>>> async mode, >>>>>>>> and every time I stop the application - which issues a clean Stop() on >>>>>>>> the CacheManager, it appears that some values are lost. >>>>>>>> This is fixed if I remove the async. >>>>>>>> >>>>>>>> Is this expected? I hope not, as this way I can't shutdown the cluster >>>>>>>> when using async. >>>>>>>> This time I can reproduce it with a UT. >>>>>>> >>>>>>> Cool, a unit test would help. Is this only with the JDBC store? >>>>>> >>>>>> I understand this is not expected and created ISPN-618; you can find >>>>>> my failing unit test attached to the issue. >>>>>> >>>>>> I will try to reproduce the same issue depending on core only to >>>>>> understand if it's related to JDBC. >>>>>> >>>>>> Sanne >>>>>> >>>>>>> >>>>>>> -- >>>>>>> Manik Surtani >>>>>>> manik(a)jboss.org >>>>>>> Lead, Infinispan >>>>>>> Lead, JBoss Cache >>>>>>> http://www.infinispan.org >>>>>>> http://www.jbosscache.org >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> infinispan-dev mailing list >>>>>>> infinispan-dev(a)lists.jboss.org >>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev(a)lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>>> -- >>>> Manik Surtani >>>> manik(a)jboss.org >>>> Lead, Infinispan >>>> Lead, JBoss Cache >>>> http://www.infinispan.org >>>> http://www.jbosscache.org >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev(a)lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> -- >> Manik Surtani >> manik(a)jboss.org >> Lead, Infinispan >> Lead, JBoss Cache >> http://www.infinispan.org >> http://www.jbosscache.org >> >> >> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> > _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

Sanne Grinovero

5:57 a.m.

2010/8/31 Mircea Markus <mircea.markus(a)jboss.com>:

...

On 27 Aug 2010, at 19:37, Sanne Grinovero wrote: > Some more news about this: > Mircea attached a patch he was working on to JIRA, but he hadn't time > to fix all tests, so I'm continuing. > > There are two more issues: > > 1)keeping an ordered list of transactions works fine for transactions > only, but modifications done in a transaction are wrongly reordered > with operations not being done in a transaction. ah I see. The operations within a transaction do not get reordered though, right?

That's right, if I only use transactions it works as a charm, ignoring the corner-case of the clear() operation as explained in next paragraph. The problem with Lucene (to show a use case) is that it sometimes starts a batch for performance reasons; other one-shot operations are missing the batch.

...

> > 2) Even in the committed current code, I see that a clear() operation > is sent async together with all other operations. Then, when the state > changes are performed, the clear() causes the previously found > operations to be canceled, but "previous" applies here on the iterator > on the state map, not on the order of application of those state > Modifications, so the clear() is removing random Modifications. yes > > 3) Also my attempt to reuse some code from AbstractCacheStore though > inheritance is not going to make it as this implementation is doing > too much for the needs of AsyncStore: it cares for async Purge, and > all methods I wanted to commit are slightly different because of that. > > Proposal: > 1- Out-of-transaction modifications are pushed into a BlockingDeque > instead of a Map. do we need a deque for this? Also do we want it blocking? Perhaps we can make this configurable and operate on an interface directly?

any kind of order-mantaining will do, I did it with an unbounded blocking deque as that's nice to wait on for new work to appear, to avoid starvation and provide a clean road for shutdown.

...

> > 2- A prepare(tx, list) sets the list aside in the "transactions" Map. > > 3- A commit(tx) takes the list of operations from the "transactions" > and pushes the list on the same BlockingDeque as all other operations, > guaranteeing same order of application. looks good > > 4- a clear operation clears the deque and pushes a Clear() operation on it > > 5- a purge operation is sent though the same deque: only risk I see is > being removed by a clear, but I bet there won't be much to purge after > that :) Also we don't care if the purge is going to be handled async > in the other store, that's a store implementation detail from what I > see in AbstractCacheStore.

Not sure we need to process purge in an async way: it is already called asynchronously by the eviction thread, so we can just delegate the call to the underlaying call. Right, I got inspired by previous version. I have it done now by the AsyncStoreCoordinator, not spawning additional AsyncProcessors, as it's assumed to be async and return quickly; this way this kind of operation gets a sort of "priority" towards other modifications, but sill async from the point of view of the caller.

...

> > 6- A single processor of AsyncStore drains blocks of a fixed > configurable amount, puts them in a local map, applies the coalescing > algorithm, and puts the results on a new state map. > It seems to me that this worker needs to be exactly one, or reordering > problems arise again in the order of the blocks taken from the deque. +1

Actually that was the initial idea, but then I had to remove the "blocks of fixed configurable amount" to keep this one as simple as possible, we can add that later on: it seems to make sense with the write-behind delay of Galder's wiki link.

...

> > 6/a - When receiving a clear() - exceptional case - we clear the map > of operations, and issue the clear() synchronously: not adding > anything to the map, and aquiring a write lock "clearop" to avoid race > conditions with the workers which might already be busy but happen to > finish after the clear(). aren't we using only one worker as per the previous paragraph?

One AsyncStoreCoordinator, N AsyncProcessors. The one manages the queue, guarantees ordering and performs the clear() and purge(), when it finds Store or Remove operations it pushes these to the AsyncProcessors to deal with the (slow?) Store in async way.

...

> > 6/b - When receiving a purge() this bypasses other logic and is > directly passed to the decorated Store. +1 > > 7- Now here it becomes tricky, we removed the transactions problem, > the clear operation and the purge from the table. But there still is a > map of Modifications to be applied, by N threads, and to which another > thread might write. This looks like the original problem in simplified > form, but reorderings are still possible as worker A might get some > work on key K1, then worker B kicks in, gets new work about K1 and > happens to perform it before A gets priority again. > Clearly this must be performed either by one thread or using very fine > grained locking: as we're only interested on a per-key lock, we don't > care on the order of operations on other keys: we could use a second > map to store the keys which "are being processed": when a worker takes > some work it uses atomic operations to check if somebody is working on > the same stuff, if it is it could either wait or put it back (if it > wasn't changed, otherwise discard) and pick another work to do. > It might be good to use the same approach as with async marshalling: use one thread per default which would guarantee ordering. Allow a user to specify more that one thread, and clearly explain that reordering might happen: there might be situations where one would not care about ordering after all.

I think we solved it nicely, guaranteeing correct ordering and still doing most of the work in parallel. If problems arise, we'll advise to use a single thread.

...

One more thing I've just noticed: AsyncStore processes Commit but doesn't do the same with Rollback. On rollback it should remove the tx from the transaction(as described at 2) ) map to avoid memory leaks.

Right, doing that. [...more pages were here...] Cheers, Sanne

Mircea Markus

6:05 a.m.

On 31 Aug 2010, at 11:57, Sanne Grinovero wrote:

...

2010/8/31 Mircea Markus <mircea.markus(a)jboss.com>: > > On 27 Aug 2010, at 19:37, Sanne Grinovero wrote: > >> Some more news about this: >> Mircea attached a patch he was working on to JIRA, but he hadn't time >> to fix all tests, so I'm continuing. >> >> There are two more issues: >> >> 1)keeping an ordered list of transactions works fine for transactions >> only, but modifications done in a transaction are wrongly reordered >> with operations not being done in a transaction. > ah I see. The operations within a transaction do not get reordered though, right? That's right, if I only use transactions it works as a charm, ignoring the corner-case of the clear() operation as explained in next paragraph. The problem with Lucene (to show a use case) is that it sometimes starts a batch for performance reasons; other one-shot operations are missing the batch. > >> >> 2) Even in the committed current code, I see that a clear() operation >> is sent async together with all other operations. Then, when the state >> changes are performed, the clear() causes the previously found >> operations to be canceled, but "previous" applies here on the iterator >> on the state map, not on the order of application of those state >> Modifications, so the clear() is removing random Modifications. > yes >> >> 3) Also my attempt to reuse some code from AbstractCacheStore though >> inheritance is not going to make it as this implementation is doing >> too much for the needs of AsyncStore: it cares for async Purge, and >> all methods I wanted to commit are slightly different because of that. >> >> Proposal: >> 1- Out-of-transaction modifications are pushed into a BlockingDeque >> instead of a Map. > do we need a deque for this? Also do we want it blocking? Perhaps we can make this configurable and operate on an interface directly? any kind of order-mantaining will do, I did it with an unbounded blocking deque as that's nice to wait on for new work to appear, to avoid starvation and provide a clean road for shutdown. >> >> 2- A prepare(tx, list) sets the list aside in the "transactions" Map. >> >> 3- A commit(tx) takes the list of operations from the "transactions" >> and pushes the list on the same BlockingDeque as all other operations, >> guaranteeing same order of application. > looks good >> >> 4- a clear operation clears the deque and pushes a Clear() operation on it >> >> 5- a purge operation is sent though the same deque: only risk I see is >> being removed by a clear, but I bet there won't be much to purge after >> that :) Also we don't care if the purge is going to be handled async >> in the other store, that's a store implementation detail from what I >> see in AbstractCacheStore. Not sure we need to process purge in an async way: it is already called asynchronously by the eviction thread, so we can just delegate the call to the underlaying call. Right, I got inspired by previous version. I have it done now by the AsyncStoreCoordinator, not spawning additional AsyncProcessors, as it's assumed to be async and return quickly; this way this kind of operation gets a sort of "priority" towards other modifications, but sill async from the point of view of the caller.

The caller of purge is the eviction thread which is again an async thread (not a user thread). So there would be two levels of async calls. The only way in which a user can call purge is through the AdvancedCache.getEvictionManager().processEviction() - but I think if one does so it would want to have purging executed sync.

...

>> >> 6- A single processor of AsyncStore drains blocks of a fixed >> configurable amount, puts them in a local map, applies the coalescing >> algorithm, and puts the results on a new state map. >> It seems to me that this worker needs to be exactly one, or reordering >> problems arise again in the order of the blocks taken from the deque. > +1 Actually that was the initial idea, but then I had to remove the "blocks of fixed configurable amount" to keep this one as simple as possible, we can add that later on: it seems to make sense with the write-behind delay of Galder's wiki link. >> >> 6/a - When receiving a clear() - exceptional case - we clear the map >> of operations, and issue the clear() synchronously: not adding >> anything to the map, and aquiring a write lock "clearop" to avoid race >> conditions with the workers which might already be busy but happen to >> finish after the clear(). > aren't we using only one worker as per the previous paragraph? One AsyncStoreCoordinator, N AsyncProcessors. The one manages the queue, guarantees ordering and performs the clear() and purge(), when it finds Store or Remove operations it pushes these to the AsyncProcessors to deal with the (slow?) Store in async way. >> >> 6/b - When receiving a purge() this bypasses other logic and is >> directly passed to the decorated Store. > +1 >> >> 7- Now here it becomes tricky, we removed the transactions problem, >> the clear operation and the purge from the table. But there still is a >> map of Modifications to be applied, by N threads, and to which another >> thread might write. This looks like the original problem in simplified >> form, but reorderings are still possible as worker A might get some >> work on key K1, then worker B kicks in, gets new work about K1 and >> happens to perform it before A gets priority again. >> Clearly this must be performed either by one thread or using very fine >> grained locking: as we're only interested on a per-key lock, we don't >> care on the order of operations on other keys: we could use a second >> map to store the keys which "are being processed": when a worker takes >> some work it uses atomic operations to check if somebody is working on >> the same stuff, if it is it could either wait or put it back (if it >> wasn't changed, otherwise discard) and pick another work to do. >> > It might be good to use the same approach as with async marshalling: use one thread per default which would guarantee ordering. Allow a user to specify more that one thread, and clearly explain that reordering might happen: there might be situations where one would not care about ordering after all. I think we solved it nicely, guaranteeing correct ordering and still doing most of the work in parallel. If problems arise, we'll advise to use a single thread.

Good stuff!

...

> One more thing I've just noticed: AsyncStore processes Commit but doesn't do the same with Rollback. On rollback it should remove the tx from the transaction(as described at 2) ) map to avoid memory leaks. Right, doing that. [...more pages were here...] Cheers, Sanne _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

Sanne Grinovero

6:46 a.m.

2010/8/31 Mircea Markus <mircea.markus(a)jboss.com>:

...

On 31 Aug 2010, at 11:57, Sanne Grinovero wrote: > 2010/8/31 Mircea Markus <mircea.markus(a)jboss.com>: >> >> On 27 Aug 2010, at 19:37, Sanne Grinovero wrote: >> >>> Some more news about this: >>> Mircea attached a patch he was working on to JIRA, but he hadn't time >>> to fix all tests, so I'm continuing. >>> >>> There are two more issues: >>> >>> 1)keeping an ordered list of transactions works fine for transactions >>> only, but modifications done in a transaction are wrongly reordered >>> with operations not being done in a transaction. >> ah I see. The operations within a transaction do not get reordered though, right? > > That's right, if I only use transactions it works as a charm, ignoring > the corner-case of the clear() operation as explained in next > paragraph. > The problem with Lucene (to show a use case) is that it sometimes > starts a batch for performance reasons; other one-shot operations are > missing the batch. > >> >>> >>> 2) Even in the committed current code, I see that a clear() operation >>> is sent async together with all other operations. Then, when the state >>> changes are performed, the clear() causes the previously found >>> operations to be canceled, but "previous" applies here on the iterator >>> on the state map, not on the order of application of those state >>> Modifications, so the clear() is removing random Modifications. >> yes >>> >>> 3) Also my attempt to reuse some code from AbstractCacheStore though >>> inheritance is not going to make it as this implementation is doing >>> too much for the needs of AsyncStore: it cares for async Purge, and >>> all methods I wanted to commit are slightly different because of that. >>> >>> Proposal: >>> 1- Out-of-transaction modifications are pushed into a BlockingDeque >>> instead of a Map. >> do we need a deque for this? Also do we want it blocking? Perhaps we can make this configurable and operate on an interface directly? > > any kind of order-mantaining will do, I did it with an unbounded > blocking deque as that's nice to wait on for new work to appear, to > avoid starvation and provide a clean road for shutdown. > >>> >>> 2- A prepare(tx, list) sets the list aside in the "transactions" Map. >>> >>> 3- A commit(tx) takes the list of operations from the "transactions" >>> and pushes the list on the same BlockingDeque as all other operations, >>> guaranteeing same order of application. >> looks good >>> >>> 4- a clear operation clears the deque and pushes a Clear() operation on it >>> >>> 5- a purge operation is sent though the same deque: only risk I see is >>> being removed by a clear, but I bet there won't be much to purge after >>> that :) Also we don't care if the purge is going to be handled async >>> in the other store, that's a store implementation detail from what I >>> see in AbstractCacheStore. > Not sure we need to process purge in an async way: it is already > called asynchronously by the eviction thread, so we can just delegate > the call to the underlaying call. > Right, I got inspired by previous version. I have it done now by the > AsyncStoreCoordinator, not spawning additional AsyncProcessors, as > it's assumed to be async and return quickly; this way this kind of > operation gets a sort of "priority" towards other modifications, but > sill async from the point of view of the caller. The caller of purge is the eviction thread which is again an async thread (not a user thread). So there would be two levels of async calls. The only way in which a user can call purge is through the AdvancedCache.getEvictionManager().processEviction() - but I think if one does so it would want to have purging executed sync.

+1, changing that. Now I'm in need of feedback about adding the "shutdown timeout" option to AsyncStoreConfig; and what to set as default? Sanne

Manik Surtani

9:52 a.m.

On 31 Aug 2010, at 12:46, Sanne Grinovero wrote:

...

2010/8/31 Mircea Markus <mircea.markus(a)jboss.com>: > > On 31 Aug 2010, at 11:57, Sanne Grinovero wrote: > >> 2010/8/31 Mircea Markus <mircea.markus(a)jboss.com>: >>> >>> On 27 Aug 2010, at 19:37, Sanne Grinovero wrote: >>> >>>> Some more news about this: >>>> Mircea attached a patch he was working on to JIRA, but he hadn't time >>>> to fix all tests, so I'm continuing. >>>> >>>> There are two more issues: >>>> >>>> 1)keeping an ordered list of transactions works fine for transactions >>>> only, but modifications done in a transaction are wrongly reordered >>>> with operations not being done in a transaction. >>> ah I see. The operations within a transaction do not get reordered though, right? >> >> That's right, if I only use transactions it works as a charm, ignoring >> the corner-case of the clear() operation as explained in next >> paragraph. >> The problem with Lucene (to show a use case) is that it sometimes >> starts a batch for performance reasons; other one-shot operations are >> missing the batch. >> >>> >>>> >>>> 2) Even in the committed current code, I see that a clear() operation >>>> is sent async together with all other operations. Then, when the state >>>> changes are performed, the clear() causes the previously found >>>> operations to be canceled, but "previous" applies here on the iterator >>>> on the state map, not on the order of application of those state >>>> Modifications, so the clear() is removing random Modifications. >>> yes >>>> >>>> 3) Also my attempt to reuse some code from AbstractCacheStore though >>>> inheritance is not going to make it as this implementation is doing >>>> too much for the needs of AsyncStore: it cares for async Purge, and >>>> all methods I wanted to commit are slightly different because of that. >>>> >>>> Proposal: >>>> 1- Out-of-transaction modifications are pushed into a BlockingDeque >>>> instead of a Map. >>> do we need a deque for this? Also do we want it blocking? Perhaps we can make this configurable and operate on an interface directly? >> >> any kind of order-mantaining will do, I did it with an unbounded >> blocking deque as that's nice to wait on for new work to appear, to >> avoid starvation and provide a clean road for shutdown. >> >>>> >>>> 2- A prepare(tx, list) sets the list aside in the "transactions" Map. >>>> >>>> 3- A commit(tx) takes the list of operations from the "transactions" >>>> and pushes the list on the same BlockingDeque as all other operations, >>>> guaranteeing same order of application. >>> looks good >>>> >>>> 4- a clear operation clears the deque and pushes a Clear() operation on it >>>> >>>> 5- a purge operation is sent though the same deque: only risk I see is >>>> being removed by a clear, but I bet there won't be much to purge after >>>> that :) Also we don't care if the purge is going to be handled async >>>> in the other store, that's a store implementation detail from what I >>>> see in AbstractCacheStore. >> Not sure we need to process purge in an async way: it is already >> called asynchronously by the eviction thread, so we can just delegate >> the call to the underlaying call. >> Right, I got inspired by previous version. I have it done now by the >> AsyncStoreCoordinator, not spawning additional AsyncProcessors, as >> it's assumed to be async and return quickly; this way this kind of >> operation gets a sort of "priority" towards other modifications, but >> sill async from the point of view of the caller. > The caller of purge is the eviction thread which is again an async thread (not a user thread). So there would be two levels of async calls. > The only way in which a user can call purge is through the AdvancedCache.getEvictionManager().processEviction() - but I think if one does so it would want to have purging executed sync. +1, changing that. Now I'm in need of feedback about adding the "shutdown timeout" option to AsyncStoreConfig; and what to set as default?

Yeah I think a config option makes sense. I presume this timeout is a "max wait time", not a "guaranteed wait time"? If the former, a default of 1h or something large, to make sure writes are flushed would be a sensible default. -- Manik Surtani manik(a)jboss.org Lead, Infinispan Lead, JBoss Cache http://www.infinispan.org http://www.jbosscache.org

Emmanuel Bernard

10:13 a.m.

...

>> > > +1, changing that. > > Now I'm in need of feedback about adding the "shutdown timeout" option > to AsyncStoreConfig; > and what to set as default? Yeah I think a config option makes sense. I presume this timeout is a "max wait time", not a "guaranteed wait time"? If the former, a default of 1h or something large, to make sure writes are flushed would be a sensible default.

If you want something that high, how about infinite as the default? It seems it's what you really want.

Sanne Grinovero

10:53 a.m.

yes it's the maximum time the async store should wait for all operations to be flushed to the decorated store. If this timeout is exceeded, the decorated stop() is invoked, pending workers are interrupted and this stop() finally returns. I'll set 2 hours as default. Il giorno 31/ago/2010 16:52, "Manik Surtani" <manik(a)jboss.org> ha scritto: On 31 Aug 2010, at 12:46, Sanne Grinovero wrote:

...

2010/8/31 Mircea Markus <mircea.markus@jboss.co...

Galder Zamarreño

7:16 a.m.

On Aug 31, 2010, at 12:57 PM, Sanne Grinovero wrote:

...

2010/8/31 Mircea Markus <mircea.markus(a)jboss.com>: > > On 27 Aug 2010, at 19:37, Sanne Grinovero wrote: >> >> 3) Also my attempt to reuse some code from AbstractCacheStore though >> inheritance is not going to make it as this implementation is doing >> too much for the needs of AsyncStore: it cares for async Purge, and >> all methods I wanted to commit are slightly different because of that. >> >> Proposal: >> 1- Out-of-transaction modifications are pushed into a BlockingDeque >> instead of a Map. > do we need a deque for this? Also do we want it blocking? Perhaps we can make this configurable and operate on an interface directly? any kind of order-mantaining will do, I did it with an unbounded blocking deque as that's nice to wait on for new work to appear, to avoid starvation and provide a clean road for shutdown.

Why a deque though and not a queue? You only call put and take. No need for removing/putting from both ends of queue, isn't it? -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache

Sanne Grinovero

4:35 p.m.

2010/8/31 Galder Zamarreño <galder(a)redhat.com>:

...

On Aug 31, 2010, at 12:57 PM, Sanne Grinovero wrote: > 2010/8/31 Mircea Markus <mircea.markus(a)jboss.com>: >> >> On 27 Aug 2010, at 19:37, Sanne Grinovero wrote: >>> >>> 3) Also my attempt to reuse some code from AbstractCacheStore though >>> inheritance is not going to make it as this implementation is doing >>> too much for the needs of AsyncStore: it cares for async Purge, and >>> all methods I wanted to commit are slightly different because of that. >>> >>> Proposal: >>> 1- Out-of-transaction modifications are pushed into a BlockingDeque >>> instead of a Map. >> do we need a deque for this? Also do we want it blocking? Perhaps we can make this configurable and operate on an interface directly? > > any kind of order-mantaining will do, I did it with an unbounded > blocking deque as that's nice to wait on for new work to appear, to > avoid starvation and provide a clean road for shutdown. Why a deque though and not a queue? You only call put and take. No need for removing/putting from both ends of queue, isn't it?

absolutely, thanks for noticing. changed it to a LinkedBlockingQueue. Sanne

...

-- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

5740

days inactive

5746

days old

infinispan-dev@lists.jboss.org

Manage subscription

19 comments

5 participants

tags (0)

participants (5)

Emmanuel Bernard
Galder Zamarreño
Manik Surtani
Mircea Markus
Sanne Grinovero

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

CR3: async stores, again