September 2006 - jbosscache-dev - Jboss List Archives

Release 2.0 packaging refactoring and more

by Ben Wang

Just to give you a heads up on the release 2.0 packaging. Here are what I have done so far: 1. As discussed, the distribution now has three jbc zip: jboss-cache-core, jboss-cache-pojo, and jboss-cache-all. Please take a look at the contents when you have time. 2. In terms of documentation, I have split faq-pojo and tutorial-pojo from faq and tutorial, respectively. I have also update the tutorial for the core component but not faq yet since I figure Manik would have much more to say there. Same thing as in "TreeCache" doc. If there is any problem, please let me or Manik know. Cheers, -Ben

18 years, 2 months

1
0
0 / 0

RE: 1.4.0.SP2 WAS: Customer wants a patch with JBCACHE-776 + JBCACHE-777. Working on it today, could you confirm fixes for them?

by Galder Zamarreno

Apart from a customer asking for 776 and 777, which are very non functional fixes and Siemens asking for 761, I haven't see any other customer asking for 1.4.0.SP2. Galder Zamarreño Support Engineer JBoss, a division of Red Hat -----Original Message----- From: Manik Surtani [mailto:manik@jboss.org] Sent: 27 September 2006 12:38 To: Galder Zamarreno Cc: Brian Stansberry; Vladimir Blagojevic; jbosscache-dev(a)lists.jboss.org Subject: Re: 1.4.0.SP2 WAS: Customer wants a patch with JBCACHE-776 + JBCACHE-777. Working on it today, could you confirm fixes for them? This is what we have scheduled for 1.4.2.SP2: Bug * [ JBCACHE-755 ] Potential bug when using (async) replication queue and region based marshalling * [ JBCACHE-760 ] TreeCacheListener in PojoCache gets nodeModify events for invalid objects * [ JBCACHE-765 ] implementation of equals() in collections is incorrect * [ JBCACHE-769 ] JDBCCacheLoader should not directly serialize a map passed to put * [ JBCACHE-776 ] EvictionPolicyProvider WARN message is noisy and misleading * [ JBCACHE-777 ] Creating a custom cache loader which delegates to a standard cache loader, can generate misleading WARN messages * [ JBCACHE-785 ] InvocationContext and suspended transactions Task * [ JBCACHE-761 ] Make connecting the channel and state transfer an atomic part of startup Patch * [ JBCACHE-766 ] Don't return unnecessary values from _replicate, avoiding need for marshalling * [ JBCACHE-767 ] Fail more silently when setting node versions Of these: * JBCACHE-785 and JBCACHE-761 are still open (but will be completed before making such a release) * JBCACHE-766, JBCACHE-767, JBCACHE-776 and JBCACHE-777 are just "optimisations" Do the rest, which are valid bug fixes, warrant a release yet? What do people think? In terms of timescales, I could have the release tagged and ready for QA by next week. Cheers, -- Manik Surtani Lead, JBoss Cache JBoss, a division of Red Hat Email: manik(a)jboss.org Telephone: +44 7786 702 706 MSN: manik(a)surtani.org Yahoo/AIM/Skype: maniksurtani On 26 Sep 2006, at 14:00, Galder Zamarreno wrote: > Ok. I gave an idea to the customer of 2/3 which might be mistaken > then, ups! :-$. Assuming that the customer would go down this > route, how long would it take to release it? > > I guess it might also depend on Siemens and the atomic startup work > that is going into 1.4.x. > > Galder Zamarreño > Support Engineer > JBoss, a division of Red Hat > > > -----Original Message----- > From: Manik Surtani [mailto:manik@jboss.org] > Sent: 26 September 2006 13:47 > To: Galder Zamarreno > Cc: Brian Stansberry; Vladimir Blagojevic; jbosscache- > dev(a)lists.jboss.org > Subject: 1.4.0.SP2 WAS: Customer wants a patch with JBCACHE-776 + > JBCACHE-777. Working on it today, could you confirm fixes for them? > > No fixed release date on 1.4.0.SP2 ATM - don't see a pressing need. > What do people think? > -- > Manik Surtani > > Lead, JBoss Cache > JBoss, a division of Red Hat > > Email: manik(a)jboss.org > Telephone: +44 7786 702 706 > MSN: manik(a)surtani.org > Yahoo/AIM/Skype: maniksurtani > > > On 26 Sep 2006, at 10:57, Galder Zamarreno wrote: > >> Rel date for 1.4.0.SP2 is outdated: >> http://jira.jboss.com/jira/browse/JBCACHE? >> report=com.atlassian.jira.plugin.system.project:roadmap-panel >> >> Shall I tell him in the next 2/3 weeks? >> >> Galder Zamarreño >> Support Engineer >> JBoss, a division of Red Hat >> > > <SNIP>

18 years, 3 months

1
0
0 / 0

RE: [jbosscache-dev] DummyInMemoryCacheLoader

by Galder Zamarreno

Very useful indeed! :-D. I was previously using the DummyCacheLoader and it was hard to debug specific put/get operations in a dummy cache loader. Thanks! Galder Zamarreño Support Engineer JBoss, a division of Red Hat -----Original Message----- From: jbosscache-dev-bounces(a)lists.jboss.org [mailto:jbosscache-dev-bounces@lists.jboss.org] On Behalf Of Manik Surtani Sent: 19 September 2006 16:02 To: jbosscache-dev(a)lists.jboss.org Subject: [jbosscache-dev] DummyInMemoryCacheLoader Hi guys I've created a new DummyInMemoryCacheLoader, an easy way to test cache loader functionality when used in conjunction with other areas of functionality (such as the new move() API). A lot quicker than a file CL, easier to debug. It is in the tests/functional dir to prevent it from ever being used for anything other than unit tests. Hope you guys find it useful. PS: It is not transactional at the moment (as I haven't had a need for it to be so so far) but this can be added if there is a need. Cheers, -- Manik Surtani Lead, JBoss Cache JBoss, a division of Red Hat Email: manik(a)jboss.org Telephone: +44 7786 702 706 MSN: manik(a)surtani.org Yahoo/AIM/Skype: maniksurtani _______________________________________________ jbosscache-dev mailing list jbosscache-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/jbosscache-dev

18 years, 3 months

1
0
0 / 0

RE: [jbosscache-dev] Issues with FLUSH and JBC

by Ben Wang

Is solution D out of the picture already? I mean if we really can't find a good enough solution to solve it, why not just accpet it. State transfer, IMO, should not happen that often anyway, if it is just a new node joining. I think the important thing is keep the state valid/consistent. If it is because of network instability, then we will see lots of tx problems anyway. Are we throwing more fuel into the fire? -Ben -----Original Message----- From: jbosscache-dev-bounces(a)lists.jboss.org [mailto:jbosscache-dev-bounces@lists.jboss.org] On Behalf Of Bela Ban Sent: Wednesday, September 27, 2006 4:37 PM To: Brian Stansberry Cc: jbosscache-dev(a)lists.jboss.org Subject: Re: [jbosscache-dev] Issues with FLUSH and JBC What's the consensus as to how wwe should proceed ? 1. Solution A with not blocking unicasts during a flush or 2. Solution B where we block later (on FLUSH_COMPLETED) rather than on START_FLUSH ? Vladimir is for #2. How about adding of the unblock() callback in a separate listener interface ? I'd rather add this sooner than later. We would, however, also have to make sure that we actually do call this method. Vladimir: let's have a call on this today, so we can see how to proceed... Brian Stansberry wrote: > Bela Ban wrote: >> Okay, my comments will be available in book form at Prentice hall >> this fall... :-) >> > > LOL. I'll try to reform. At least my overly long messages are on > e-mail and so don't kill trees. :) > >> Just kidding, here are some comments: >> >> * I don't want to change the entire implementation of FLUSH this >> late, 2.4 is overdue for a final release. So option B doesn't like >> that appealing to me o OTOH: if we can resolve the issue, why not... >> * A: what if we block only **multicast** messages, but not >> **unicast** messages ? This would solve issue A, but maybe there are >> use cases that it won't solve... We can assume that unicast messages >> are always responses to multicasts, so they should be allowed to >> complete. If this solution flies, then we have a quickfix for our >> problem and can *really* cleanly fix it in the next release... > > We'd need to be sure JBC didn't make any unicast calls (besides RPC > responses) during the state transfer. Possible unicast calls I can > think of are: > > 1) Request for partial state transfer (with the current RPC-based > mechanism). E.g. 3 node cluster, node B redeploys a webapp and asks > for partial state transfer while node C is doing an initial state transfer. > This would be an odd case though; typically you disable initial state > transfer if you're going to use the activate/inactivateRegion API. > > 2) Calls related to buddy group assignments. Need to think about this > a bit. But if they are using BR they won't be using initial state > transfer, so probably not an issue. > >> * B: okay, but if my proposed solution above works, we can do this in >> 2.5... >> * C: this is essentially implementing the flush protocol at the >> application level, which is not a bad idea because the app always has >> more information than JGroups. However, it is probably a bit too >> redundant, and also requires quite a number of changes, which is also >> later for JBC 1.4 (SP?)... > > Yeah, it is a lot for 1.4. IMHO definitely moves it beyond the realm > of an SP2, into 1.4.1. > >> * I might have to add an additional callback blockCompleted() or >> unblock() to JGroups, to notify members that the FLUSH phase has >> completed and everybody can resume sending messages. >> I'm currently >> investigating this... Downside: an API change, so possibly a new >> ExtendedXXX interface which would get merged in JGroups 3.0 >> > > This would be needed with B if our current algorithm for JBC is going > to work. > >>> A downside of this idea is it changes the semantics of flush and >>> requires JGroups changes. We'd definitely like input from Bela on >>> this. Also, since we initially rejecting it, we haven't fully >>> thought it through. (As I'm editing this to send out I see there is >>> no way to tell JBC after it returns from block() to not let any >>> "new" activity through -- big hole. I'm back to rejecting this >>> approach.) >> Here, we might have to introduce additional callbacks, e.g. >> - block(): stop sending messages. FLUSH doesn't block yet though, so >> if an app ignores the convention and keeps sending messages it will >> succeed >> - No callback when FLUSH actually does block sending of messages >> - unblock(): called when the app can resume sending messages. >> FLUSH does not block sending of messages anymore >> > > Yep. Our current algorithm does the following during the block() call: > > 1) Create a latch or something that prevents new transactions > acquiring locks or existing transactions proceeding into the 2PC (i.e. > prevent > prepare() call.) > 2) Give transactions already in the 2PC time to complete. If they > don't, eventually roll them back. > 3) Release the latch. > 4) Immediately return from block(). (Vladimir -- problem here; there's > a race condition between threads released in #3 and the return from > block(). We need to figure out how to deal with that.) > > We count on FLUSH preventing the threads released in #3 sending any > prepare() calls until the state transfer is done. Solution B breaks > this for the period until FLUSH_COMPLETED is sent. > > An unblock() callback would help here, as we'd release the latch then. > >> I don't think the semantic changes are that big, actually you could >> argue there are *no* semantic changes as block() is an inidication >> that message sending will block, here we're just saying it will block >> some time in the (near) future. > > +1. > -- Bela Ban Lead JGroups / Manager JBoss Clustering Group JBoss - a division of Red Hat _______________________________________________ jbosscache-dev mailing list jbosscache-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/jbosscache-dev

18 years, 3 months

2
1
0 / 0

RE: 1.4.0.SP2 WAS: Customer wants a patch with JBCACHE-776 + JBCACHE-777. Working on it today, could you confirm fixes for them?

by Galder Zamarreno

Ok. I gave an idea to the customer of 2/3 which might be mistaken then, ups! :-$. Assuming that the customer would go down this route, how long would it take to release it? I guess it might also depend on Siemens and the atomic startup work that is going into 1.4.x. Galder Zamarreño Support Engineer JBoss, a division of Red Hat -----Original Message----- From: Manik Surtani [mailto:manik@jboss.org] Sent: 26 September 2006 13:47 To: Galder Zamarreno Cc: Brian Stansberry; Vladimir Blagojevic; jbosscache-dev(a)lists.jboss.org Subject: 1.4.0.SP2 WAS: Customer wants a patch with JBCACHE-776 + JBCACHE-777. Working on it today, could you confirm fixes for them? No fixed release date on 1.4.0.SP2 ATM - don't see a pressing need. What do people think? -- Manik Surtani Lead, JBoss Cache JBoss, a division of Red Hat Email: manik(a)jboss.org Telephone: +44 7786 702 706 MSN: manik(a)surtani.org Yahoo/AIM/Skype: maniksurtani On 26 Sep 2006, at 10:57, Galder Zamarreno wrote: > Rel date for 1.4.0.SP2 is outdated: > http://jira.jboss.com/jira/browse/JBCACHE? > report=com.atlassian.jira.plugin.system.project:roadmap-panel > > Shall I tell him in the next 2/3 weeks? > > Galder Zamarreño > Support Engineer > JBoss, a division of Red Hat > <SNIP>

18 years, 3 months

2
2
0 / 0

RE: [jbosscache-dev] Issues with FLUSH and JBC

by Brian Stansberry

Bela Ban wrote: > Okay, my comments will be available in book form at Prentice > hall this fall... :-) > LOL. I'll try to reform. At least my overly long messages are on e-mail and so don't kill trees. :) > Just kidding, here are some comments: > > * I don't want to change the entire implementation of FLUSH this > late, 2.4 is overdue for a final release. So option B > doesn't like > that appealing to me > o OTOH: if we can resolve the issue, why not... > * A: what if we block only **multicast** messages, but not > **unicast** messages ? This would solve issue A, but maybe there > are use cases that it won't solve... We can assume that unicast > messages are always responses to multicasts, so they should be > allowed to complete. If this solution flies, then we have a > quickfix for our problem and can *really* cleanly fix it in the > next release... We'd need to be sure JBC didn't make any unicast calls (besides RPC responses) during the state transfer. Possible unicast calls I can think of are: 1) Request for partial state transfer (with the current RPC-based mechanism). E.g. 3 node cluster, node B redeploys a webapp and asks for partial state transfer while node C is doing an initial state transfer. This would be an odd case though; typically you disable initial state transfer if you're going to use the activate/inactivateRegion API. 2) Calls related to buddy group assignments. Need to think about this a bit. But if they are using BR they won't be using initial state transfer, so probably not an issue. > * B: okay, but if my proposed solution above works, we can do > this in 2.5... > * C: this is essentially implementing the flush protocol at the > application level, which is not a bad idea because the > app always > has more information than JGroups. However, it is probably a bit > too redundant, and also requires quite a number of > changes, which > is also later for JBC 1.4 (SP?)... Yeah, it is a lot for 1.4. IMHO definitely moves it beyond the realm of an SP2, into 1.4.1. > * I might have to add an additional callback blockCompleted() or > unblock() to JGroups, to notify members that the FLUSH phase has > completed and everybody can resume sending messages. > I'm currently > investigating this... Downside: an API change, so possibly a new > ExtendedXXX interface which would get merged in JGroups 3.0 > This would be needed with B if our current algorithm for JBC is going to work. >> A downside of this idea is it changes the semantics of flush and >> requires JGroups changes. We'd definitely like input from Bela on >> this. Also, since we initially rejecting it, we haven't fully >> thought it through. (As I'm editing this to send out I see there is >> no way to tell JBC after it returns from block() to not let any >> "new" activity through -- big hole. I'm back to rejecting this >> approach.) > > Here, we might have to introduce additional callbacks, e.g. > - block(): stop sending messages. FLUSH doesn't block yet > though, so if an app ignores the convention and keeps sending > messages it will succeed > - No callback when FLUSH actually does block sending of messages > - unblock(): called when the app can resume sending messages. > FLUSH does not block sending of messages anymore > Yep. Our current algorithm does the following during the block() call: 1) Create a latch or something that prevents new transactions acquiring locks or existing transactions proceeding into the 2PC (i.e. prevent prepare() call.) 2) Give transactions already in the 2PC time to complete. If they don't, eventually roll them back. 3) Release the latch. 4) Immediately return from block(). (Vladimir -- problem here; there's a race condition between threads released in #3 and the return from block(). We need to figure out how to deal with that.) We count on FLUSH preventing the threads released in #3 sending any prepare() calls until the state transfer is done. Solution B breaks this for the period until FLUSH_COMPLETED is sent. An unblock() callback would help here, as we'd release the latch then. > > I don't think the semantic changes are that big, actually you > could argue there are *no* semantic changes as block() is an > inidication that message sending will block, here we're just > saying it will block some time in the (near) future. +1.

18 years, 3 months

2
1
0 / 0

RE: [jbosscache-dev] Re: CruiseControl

by Rajesh Rajasekaran

I had to hack the Cruisecontrol code to get this behavior. Builds now will report only modifications BETWEEN the failed builds or builds with testsuite errors. -----Original Message----- From: Vladimir Blagojevic Sent: Friday, September 22, 2006 11:08 AM To: Ryan Campbell; Eric Brown; Manik Surtani; QA Cc: 'jbosscache-dev(a)lists.jboss.org' Subject: RE: [jbosscache-dev] Re: CruiseControl Ryan, you are right. I overlooked this fact when I analyzed remoting example. It is very hard for a project of jbc complexity to achieve 100% test passing with almost 2000 unit tests during dev stage. It would be great if this option could be changed from build to build during dev stage rather than between two successful builds. > -----Original Message----- > From: Ryan Campbell > Sent: Friday, September 22, 2006 11:54 AM > To: Vladimir Blagojevic; Eric Brown; Manik Surtani; QA > Cc: 'jbosscache-dev(a)lists.jboss.org' > Subject: RE: [jbosscache-dev] Re: CruiseControl > > The commits are reported until a successful build.

18 years, 3 months

1
0
0 / 0

RE: [jbosscache-dev] Developer release

by Ben Wang

OK, I have just added it to the repository as 2.0.0.DR1. Please note that there are: jboss-cache.jar and pojocache.jar now (no -jdk50.jar version). -Ben -----Original Message----- From: Brian Stansberry Sent: Tuesday, September 26, 2006 10:25 AM To: Ben Wang; Manik Surtani; 'jbosscache-dev(a)lists.jboss.org' Subject: RE: [jbosscache-dev] Developer release Are the jars in repository.jboss.com? (I haven't looked.) Ben Wang wrote: > Brian, > > Thanks! I have subscribed to that Jira issue. > > -Ben > > -----Original Message----- > From: Brian Stansberry > Sent: Tuesday, September 26, 2006 2:52 AM > To: Ben Wang; Manik Surtani; jbosscache-dev(a)lists.jboss.org > Subject: RE: [jbosscache-dev] Developer release > > FYI, JIRA for integrating this in the AS and seeing what comes up is > http://jira.jboss.com/jira/browse/JBAS-3695. > > jbosscache-dev-bounces(a)lists.jboss.org wrote: >> OK, I have just tagged JBossCache head as "JBossCache_2_0_DR1". Since >> this is only a DR release so it is not going thru QA and not put out >> on sf.net. I will also announce it on the forum for those who are >> interested in the api changes. >> >> And as Manik mentioned, couple tests are still failing (CC stands as >> 98.3% today), docs and tutorial need to be updated, although I have >> updated the examples directory for PojoCache. >> Also good place to check is tests/functional/o/j/c/api and >> tests-50/functional/o/j/c/pojo for the new api changes. >> >> Please also note that (they are all in the Jiras): >> >> 1. Packaging is different now as previously discussed: >> jboss-cache-core, jboss-cache-pojo, and jboss-cache-all. >> 2. There is a separate pojocache.jar in addition to the regular >> jboss-cache.jar (now only has core Cache APIs). >> 3. In PojoCache, previoulsy we have jboss-aop.xml. Now I have renamed >> it as pojocache-aop.xml that contains the configuration for PojoCache >> interceptors. >> >> -Ben >> >> -----Original Message----- >> From: jbosscache-dev-bounces(a)lists.jboss.org >> [mailto:jbosscache-dev-bounces@lists.jboss.org] On Behalf Of Manik >> Surtani Sent: Thursday, September 21, 2006 6:41 PM >> To: jbosscache-dev(a)lists.jboss.org >> Subject: [jbosscache-dev] Developer release >> >> Guys, >> >> Ben and I are thinking of cutting a DR before the alpha. >> >> Known bugs include the one about marshalling large Strings (waiting >> for jboss-serialization-1.0.3 to be released) and >> StateTransferUnderLoad. Other issues include timing/concurrency >> issues with BR which show up on the multi-cpu lab servers and >> notifications around the move() API which I hope to sort out by the >> weekend, and some of Ben's PojoCache tests. >> >> The rest of the functionality is all in place though and I'd like to >> test this DR with AS HEAD and Hibernate (speaking with Steve E about >> this). >> >> Ben's going to release the DR over the weekend (I'll be off to Berlin >> on a clustering training) and hopefully we can look at an Alpha when >> I get back. >> >> FYI, refer to unit tests in org.jboss.cache.api for examples of using >> the new API and SPI. I'll update docs and tutorial with these before >> cutting the alpha. >> >> Cheers, Brian Stansberry Lead, AS Clustering JBoss, a division of Red Hat Ph: 510-396-3864 skype: bstansberry

18 years, 3 months

1
0
0 / 0

1.4.0.SP2 WAS: Customer wants a patch with JBCACHE-776 + JBCACHE-777. Working on it today, could you confirm fixes for them?

by Manik Surtani

No fixed release date on 1.4.0.SP2 ATM - don't see a pressing need. What do people think? -- Manik Surtani Lead, JBoss Cache JBoss, a division of Red Hat Email: manik(a)jboss.org Telephone: +44 7786 702 706 MSN: manik(a)surtani.org Yahoo/AIM/Skype: maniksurtani On 26 Sep 2006, at 10:57, Galder Zamarreno wrote: > Rel date for 1.4.0.SP2 is outdated: > http://jira.jboss.com/jira/browse/JBCACHE? > report=com.atlassian.jira.plugin.system.project:roadmap-panel > > Shall I tell him in the next 2/3 weeks? > > Galder Zamarreño > Support Engineer > JBoss, a division of Red Hat > <SNIP>

18 years, 3 months

1
0
0 / 0

Issues with FLUSH and JBC

by Brian Stansberry

Guys, Vladimir and I found a problem today with using FLUSH in a JBC cache. Following is a description of the issue and some proposed solutions. Comments are welcome. Please see docs/design/FLUSH.txt in JGroups for background info on how FLUSH works. A) We have a problem in that the FLUSH protocol makes the decision to shut off the ability to pass messages down the channel independently at each node. The protocol doesn't include anything at the JGroups level to readily support coordination between nodes as to when to shut off down messages. But, JBC needs coordination since it needs to make RPC calls around the cluster (e.g. commit()) as part of how it handles FLUSH. Basically, when the FLUSH protocol on a node receives a message telling it to START_FLUSH, it calls block() on the JBC instance. JBC does what it needs to do, then returns from block(). Following the return from block() the FLUSH protocol in that channel then begins blocking any further down() messages. Problem is as follows. 2 node REPL_SYNC cluster, A B where A is just starting up and thus initiates a FLUSH: 1) JBC on B has tx in progress, just starting the 2PC. Sends out the prepare(). 2) A sends out a START_FLUSH message. 3) A gets START_FLUSH, calls block() on JBC. 4) JBC on A is new, doesn't have much going on, very quickly returns from block(). A will no longer pass *down* any messages below FLUSH. 5) A gets the prepare() (no problem, FLUSH doesn't block up messages, just down messages.) 6) A executes the prepare(), but can't send the response to B because FLUSH is blocking the channel. 7) B gets the START_FLUSH, calls block() on JBC. 8) JBC B doesn't immediately return from block() as it is giving the prepare() some time to complete (avoid unnecessary tx rollback). But prepare() won't complete because A's channel is blocking the RPC response!! Eventually JBC B's block() impl will have to roll back the tx. Basically you have a race condition between calls to block() and prepare() calls, and can have different winners on different nodes. B) A solution we discussed, rejected and then came back to this evening (please read FLUSH.txt to understand the change we're discussing): Channel does not block down messages when block() returns. Rather it just sends out a FLUSH_OK message (see FLUSH.txt). It shouldn't initiate any new cluster activity (e.g. a prepare()) after sending FLUSH_OK, but it can respond to RPC calls. When it gets a FLUSH_OK from all the other members, it then blocks down messages and multicasts a FLUSH_COMPLETED to the cluster. Differences from the current FLUSH impl: 1) Node doesn't begin blocking down messages before sending FLUSH_OK. 2) Node begins blocking down messages before sending FLUSH_COMPLETED. 3) Node multicasts FLUSH_COMPLETED, rather than unicasting to the node that initiated the FLUSH. 4) Nodes regard the FLUSH_COMPLETED as the last message from another node, rather than the FLUSH_OK. A downside of this idea is it changes the semantics of flush and requires JGroups changes. We'd definitely like input from Bela on this. Also, since we initially rejecting it, we haven't fully thought it through. (As I'm editing this to send out I see there is no way to tell JBC after it returns from block() to not let any "new" activity through -- big hole. I'm back to rejecting this approach.) C) Alternative idea we discussed was to do application level coordination around the cluster, i.e. add something similar to the existing FLUSH_OK/FLUSH_COMPLETED, but at the JBC level. Revising the previous scenario: 1) JBC on B has tx in progress, just starting the 2PC. Sends out the prepare(). 2) A sends out a START_FLUSH message. 3) A gets START_FLUSH, calls block(). 4) JBC on A is new, doesn't have much going on, so doesn't do cleanup work on its own node. 4.1) JBC on A sends out an RPC call with its address as an arg to a new "flushReady()" method added to TreeCache. (Other name for method is fine.) 4.2) JBC on A blocks waiting for flushReady() RPC calls from all the other members. Does not return from block(). 5) A gets the prepare() (no problem, FLUSH doesn't block up messages, just down messages.) 6) A executes the prepare(), can send the response to B because FLUSH isn't blocking the channel. 7) B gets the START_FLUSH, calls block(). 8) JBC B doesn't immediately return from block() as it detects it has a 2PC in progress and is giving the prepare() some time to complete (avoid unnecessary tx rollback). 9) JBC B receives flushReady() call from A, adds entry to a vector recording A is ready. 10) B receives prepare() response from A, sends commit(). 12) B sends out RPC call with its address to "flushReady()" method 11) A receives commit(), commits tx. 12) A receives flushReady() call from B. Adds entry to a vector recording that B is ready. 13) A sees that all other nodes are ready, returns from block(). 14) B sees that all other nodes are ready, returns from block(). Downside to this is complexity and requirement to add another method for the "flushReady()" RPC. D) A 3rd alternative is to just accept the problem. The problem is a race condition where A blocks down events but then receives a prepare(). Its response to prepare() cannot be sent. The effect is JBC B's impl of FLUSH will detect the prepare() isn't progressing and at some point roll back the tx. This will result in a rollback() message being sent to A. A can receive it and roll back the tx. IIRC a rollback() is always async, so A does not need to send a response. A and B end up in a valid state. Downside of this is the tx gets rolled back. This could be a frequent occurrence in high load scenarios because a new node in the cluster could be expected to very quickly call blockOK(), possibly even before the START_FLUSH message goes out on the wire. Brian Stansberry Lead, AS Clustering JBoss, a division of Red Hat

18 years, 3 months

2
1
0 / 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

jbosscache-dev September 2006