From gustavonalle at gmail.com Wed Jul 2 07:03:38 2014 From: gustavonalle at gmail.com (Gustavo Fernandes) Date: Wed, 2 Jul 2014 12:03:38 +0100 Subject: [infinispan-dev] Hadoop and ISPN first and next steps In-Reply-To: <53B1787B.50905@redhat.com> References: <5E4FE91D-7A99-4AFA-A0D8-82EFE870A1E0@gmail.com> <429A0DED-0D09-45EE-8A32-06FEA7E7FA97@redhat.com> <53B1787B.50905@redhat.com> Message-ID: On Mon, Jun 30, 2014 at 3:47 PM, Vladimir Blagojevic wrote: > On 2014-06-26, 10:12 AM, Galder Zamarre?o wrote: > > On 23 Jun 2014, at 11:04, Gustavo Fernandes > wrote: > > > >> - I read with great interest the Spark paper [9]. Spark provides a DSL > with functional language constructs like map, flatMap and filter to process > distributed data in memory. In this scenario, Map Reduce is just a special > case achieved by chaining functions [10]. As Spark is much more than Map > Reduce, and can run many machine learning algorithms efficiently, I was > wondering if we should shift attention to Spark rather than focusing too > much on Map Reduce. Thoughts? > > I?m not an expert on these topics, but I like the look and the approach > of Spark :). The fact that it?s not tight to a single paradigm is > particularly interesting, and secondly, the fact that it?s tries to make > the most out of functional constructs, which seem to provide more elegant > ways of dealing with data. > > > > > Gustavo thanks for your email and the references. I like Spark as well! > I read the Spark paper over the weekend, definitely not an easy digest > and I will continue to read about this topic but this seems to be the > direction we should steer ourselves - data analytics platform! > > As for Hadoop implementation not sure that it make sense to > implement/support Hadoop v1.x unless it is super easy and low > maintenance. How hard would it be to implement YARN? > >From Map Reduce perspective, v2 is binary compatible with v1, so the same jar containing the job can run on both Map Reduce 1.x and YARN Map Reduce. It also should be straightforward to support YARN API directly as well Gustavo > > Regards, > Vladimir > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140702/2a16c07d/attachment-0001.html From rory.odonnell at oracle.com Fri Jul 4 05:32:15 2014 From: rory.odonnell at oracle.com (Rory O'Donnell Oracle, Dublin Ireland) Date: Fri, 04 Jul 2014 10:32:15 +0100 Subject: [infinispan-dev] Early Access builds for JDK 9 b21, JDK 8u20 b21 are available on java.net Message-ID: <53B6749F.6090607@oracle.com> Hi Galder, Early Access builds for JDK 9 b21 and JDK 8u20 b21 are available on java.net. As we enter the later phases of development for JDK 8u20 , please log any show stoppers as soon as possible. Rgds, Rory -- Rgds,Rory O'Donnell Quality Engineering Manager Oracle EMEA , Dublin, Ireland -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140704/d6fd328a/attachment.html From pierre.sutra at unine.ch Fri Jul 4 10:41:42 2014 From: pierre.sutra at unine.ch (Pierre Sutra) Date: Fri, 4 Jul 2014 16:41:42 +0200 Subject: [infinispan-dev] Clustered Listener In-Reply-To: References: <537219A9.1060301@unine.ch> Message-ID: <53B6BD26.7030806@unine.ch> Hello, > Are you talking about non clustered listeners? It seems unlikely you > would need so many cluster listeners. Cluster listeners should allow > you to only install a small amount of them, usually you would have > only additional ones if you have a Filter applied limiting what > key/values are returned. Our usage of the clustered API is a corner case, but the installation of a listener for a specific key (or key range) could be of general purpose. My point was that installing all filters everywhere is costly as every node should iterate over all filters for every modification. Our tentative code for key-specific filtering is available at github.com/otrack/Leads-infinispan (org.infinispan.notifications.KeySpecificListener and org.infinispan.notifications.cachelistener.CacheNotifierImpl). > Is the KeyFilter or KeyValueFilter not sufficient for this? void > addListener(Object listener, KeyFilter filter); void > addListener(Object listener, KeyValueFilter > filter, Converter converter); Also to note if > you are doing any kind of translation of the value to another value it > is recommended to do that via the supplied Converter. This can give > good performance as the conversion is done on the target node and not > all in 1 node and also you can reduce the payload if the resultant > value has a serialized form that is smaller than the original value. Indeed, this mechanism suffices for many purposes, I was just pointing out that it might be sometimes expensive. >> In such a case, the listener is solely >> installed at the key owners. This greatly helps the scalability of the >> mechanism at the cost of fault-tolerance since, in the current state of >> the implementation, listeners are not forwarded to new data owners. >> Since as a next step [1] it is planned to handle topology change, do you >> plan also to support key (or key range) specific listener ? > These should be covered with the 2 overloads as I mentioned above. > This should be the most performant way as the filter is replicated to > the node upon installation so a 1 time cost. But if a key/value pair > doesn't pass the filter the event is not sent to the node where the > listener is installed. I agree. > >> Besides, >> regarding this last point and the current state of the implementation, I >> would have like to know what is the purpose of the re-installation of >> the cluster listener in case of a view change in the addedListener() >> method of the CacheNotifierImpl class. > This isn't a re-installation. This is used to propgate the > RemoteClusterListener to the other nodes, so that when a new event is > generated it can see that and subsequently send it back to the node > where the listener is installed. There is also a second check in > there in case if a new node joins in the middle. The term re-installation was inappropriate. I was meaning here that in my understanding of the code of CacheNotifierImpl, a second check seems of no need since if a new node joins the cluster afterward it still has to install the pair . Best, Pierre ps: sorry for the late answer. From pedro at infinispan.org Fri Jul 4 11:09:57 2014 From: pedro at infinispan.org (Pedro Ruivo) Date: Fri, 04 Jul 2014 16:09:57 +0100 Subject: [infinispan-dev] LevelDBStore's expiryEntryQueue Message-ID: <53B6C3C5.6090301@infinispan.org> Hi guys, Is there a way to replace the expiryEntryQueue with a non-blocking structure? Long history: In high throughput systems, this queue gets full very often and it is blocking all the writes (that throws timeout exceptions everywhere). Also, I don't full understand why this queue exists. It is drain in purgeExpired and it deletes the keys expired, but the not-expires keys are never tested (or I am missing something) Can someone explains? Cheers, (and have a nice weekend) Pedro From dan.berindei at gmail.com Mon Jul 7 04:06:23 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Mon, 7 Jul 2014 11:06:23 +0300 Subject: [infinispan-dev] LevelDBStore's expiryEntryQueue In-Reply-To: <53B6C3C5.6090301@infinispan.org> References: <53B6C3C5.6090301@infinispan.org> Message-ID: On Fri, Jul 4, 2014 at 6:09 PM, Pedro Ruivo wrote: > Hi guys, > > Is there a way to replace the expiryEntryQueue with a non-blocking > structure? > > Did you try configuring levelDBStore.expiryQueueSize = MAX_INT? > Long history: > > In high throughput systems, this queue gets full very often and it is > blocking all the writes (that throws timeout exceptions everywhere). > > Also, I don't full understand why this queue exists. It is drain in > purgeExpired and it deletes the keys expired, but the not-expires keys > are never tested (or I am missing something) > > Can someone explains? > AFAICT the idea is to store the expiration time of each key in a separate DB, so that purging the expired entries doesn't require reading all the values from the regular DB. The regular DB still contains the expiration time, so the regular expiration check still works. Note that the purge method iterates over all the entries in the expired DB, not just the entries in the expiration queue. The expiration queue is needed so that we don't have 2 sync LevelDB writes for every Infinispan write. But I don't see any reason for it to be tied to the purge() method - instead, another thread should constantly read entries out of the expiration queue and write them to the expiration DB. Cheers Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140707/2132e476/attachment.html From bban at redhat.com Mon Jul 7 04:14:24 2014 From: bban at redhat.com (Bela Ban) Date: Mon, 07 Jul 2014 10:14:24 +0200 Subject: [infinispan-dev] A question and an observation Message-ID: <53BA56E0.1080607@redhat.com> 1: Observation: ------------- In my Infinispan perf test (IspnPerfTest), I used cache.getAdvancedCache().withFlags(...).put(key,value) in a tight loop. I've always thought that withFlags() was a fast operation, *but this is not the case* !! Once I changed this and predefined the 2 caches (sync and async) at the start, outside the loop, things got 10x faster ! So please change this if you made the same mistake ! 2. Question: ----------- In Infinispan 6, I defined my custom transport as follows: This is gone in 7. Do I now have to use programmatic configuration ? If so, how would I do this ? -- Bela Ban, JGroups lead (http://www.jgroups.org) From sanne at infinispan.org Mon Jul 7 04:58:51 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Mon, 7 Jul 2014 09:58:51 +0100 Subject: [infinispan-dev] A question and an observation In-Reply-To: <53BA56E0.1080607@redhat.com> References: <53BA56E0.1080607@redhat.com> Message-ID: On 7 July 2014 09:14, Bela Ban wrote: > > 1: Observation: > ------------- > In my Infinispan perf test (IspnPerfTest), I used > cache.getAdvancedCache().withFlags(...).put(key,value) in a tight loop. > > I've always thought that withFlags() was a fast operation, *but this is > not the case* !! > > Once I changed this and predefined the 2 caches (sync and async) at the > start, outside the loop, things got 10x faster ! So please change this > if you made the same mistake ! Right that's the better way to use the flags; I'm pretty sure we documented this at some point but I couldn't find it in the docs nor javadocs now...?!! bad bad. Where we do use flags internally (Lucene Directory), we hold on to multiple instances of the Cache, even if the same cache content but to use different flags. The code is quite horrible to read as it seems like you interact with different Caches, but as you noticed it's worth it. 10x faster? That's surprising for a benchmark which is supposed to be network bound isn't it? If you can measure a 10X improvement, it seems like your tests where bound by memory allocation (as that's the resource you starve by using _withFlags_ extensively) ? Might be worth checking with flight recorder if that's still the case, as _withFlags_ isn't sufficient on its own to saturate your memory bandwith, so I'd guess there are other hot consumers which might be easy to take down. > > 2. Question: > ----------- > In Infinispan 6, I defined my custom transport as follows: > > > This is gone in 7. Do I now have to use programmatic configuration ? If > so, how would I do this ? I don't know this one, hopefully others will.. ? Cheers, Sanne > > -- > Bela Ban, JGroups lead (http://www.jgroups.org) > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From mgencur at redhat.com Mon Jul 7 05:10:03 2014 From: mgencur at redhat.com (Martin Gencur) Date: Mon, 07 Jul 2014 11:10:03 +0200 Subject: [infinispan-dev] Example server config tests disabled? Message-ID: <53BA63EB.4050103@redhat.com> Hi, looks like all example config tests were marked as "Unstable" and hence disabled. I see a note "See ISPN-4026" in ExampleConfigsIT.java and it leads to https://issues.jboss.org/browse/ISPN-4026. This would be fine if not all tests for all example configs were disabled. As a result, tests for the example config for rolling upgrades did not run and this issue was ignored: https://issues.jboss.org/browse/ISPN-4026 This looks like a critical issue to me. PS: I will always wonder how we can be disabling tests in this way. Martin From bban at redhat.com Mon Jul 7 05:37:14 2014 From: bban at redhat.com (Bela Ban) Date: Mon, 07 Jul 2014 11:37:14 +0200 Subject: [infinispan-dev] A question and an observation In-Reply-To: References: <53BA56E0.1080607@redhat.com> Message-ID: <53BA6A4A.4040201@redhat.com> On 07/07/14 10:58, Sanne Grinovero wrote: > On 7 July 2014 09:14, Bela Ban wrote: >> >> 1: Observation: >> ------------- >> In my Infinispan perf test (IspnPerfTest), I used >> cache.getAdvancedCache().withFlags(...).put(key,value) in a tight loop. >> >> I've always thought that withFlags() was a fast operation, *but this is >> not the case* !! >> >> Once I changed this and predefined the 2 caches (sync and async) at the >> start, outside the loop, things got 10x faster ! So please change this >> if you made the same mistake ! > > Right that's the better way to use the flags; I'm pretty sure we > documented this at some point but I couldn't find it in the docs nor > javadocs now...?!! bad bad. Looking at the code, I see that withFlags() creates a new DecoratedCache, so this can obviously be mitigated by instantiating the caches beforehand. Haven't had time to investigate the cost of this. > Where we do use flags internally (Lucene Directory), we hold on to > multiple instances of the Cache, even if the same cache content but to > use different flags. The code is quite horrible to read as it seems > like you interact with different Caches, but as you noticed it's worth > it. Yes > 10x faster? That's surprising for a benchmark which is supposed to be > network bound isn't it? I ran 2 IspnPerfTest processes on my local box with numOwners=2 (no L1 cache). I guess that roughly half of the calls go to the local node, and there the cost of withFlags() is not amortized by the network round trip. > If you can measure a 10X improvement, it seems > like your tests where bound by memory allocation (as that's the > resource you starve by using _withFlags_ extensively) ? > Might be worth checking with flight recorder if that's still the case, > as _withFlags_ isn't sufficient on its own to saturate your memory > bandwith, so I'd guess there are other hot consumers which might be > easy to take down. > >> >> 2. Question: >> ----------- >> In Infinispan 6, I defined my custom transport as follows: >> >> >> This is gone in 7. Do I now have to use programmatic configuration ? If >> so, how would I do this ? > > I don't know this one, hopefully others will.. ? > > Cheers, > Sanne -- Bela Ban, JGroups lead (http://www.jgroups.org) From sanne at infinispan.org Mon Jul 7 05:37:08 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Mon, 7 Jul 2014 10:37:08 +0100 Subject: [infinispan-dev] Example server config tests disabled? In-Reply-To: <53BA63EB.4050103@redhat.com> References: <53BA63EB.4050103@redhat.com> Message-ID: On 7 July 2014 10:10, Martin Gencur wrote: > Hi, > looks like all example config tests were marked as "Unstable" and hence > disabled. I see a note "See ISPN-4026" in ExampleConfigsIT.java and it > leads to https://issues.jboss.org/browse/ISPN-4026. This would be fine > if not all tests for all example configs were disabled. > As a result, tests for the example config for rolling upgrades did not > run and this issue was ignored: > https://issues.jboss.org/browse/ISPN-4026 This looks like a critical > issue to me. > > PS: I will always wonder how we can be disabling tests in this way. +1 I don't understand how we can keep pushing things which break stuff by disabling tests rather than rolling back broken patches. There are situations in which a test is badly designed and it should be disabled, but I've had lots of tests in Query disabled because of stuff not working as it should in core.. stuff which used to work. In such cases one shouldn't disable the test but roll-back the changes which broke it, and think better about the "fixes" rather than move responsibility to others to figure it out eventually. Sanne > > Martin > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From pedro at infinispan.org Mon Jul 7 05:50:07 2014 From: pedro at infinispan.org (Pedro Ruivo) Date: Mon, 07 Jul 2014 10:50:07 +0100 Subject: [infinispan-dev] Example server config tests disabled? In-Reply-To: <53BA63EB.4050103@redhat.com> References: <53BA63EB.4050103@redhat.com> Message-ID: <53BA6D4F.7080903@infinispan.org> They are running in a different group: http://ci.infinispan.org/viewLog.html?currentGroup=test&scope=%23teamcity%23org.infinispan.server.test.configs%23teamcity%23ExampleConfigsIT&pager.currentPage=1&order=DURATION_DESC&recordsPerPage=20&filterText=ExampleConfig&status=&buildTypeId=bt34&buildId=9465&tab=testsInfo On 07/07/2014 10:10 AM, Martin Gencur wrote: > Hi, > looks like all example config tests were marked as "Unstable" and hence > disabled. I see a note "See ISPN-4026" in ExampleConfigsIT.java and it > leads to https://issues.jboss.org/browse/ISPN-4026. This would be fine > if not all tests for all example configs were disabled. > As a result, tests for the example config for rolling upgrades did not > run and this issue was ignored: > https://issues.jboss.org/browse/ISPN-4026 This looks like a critical > issue to me. > > PS: I will always wonder how we can be disabling tests in this way. > > Martin > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > From pedro at infinispan.org Mon Jul 7 05:51:09 2014 From: pedro at infinispan.org (Pedro Ruivo) Date: Mon, 07 Jul 2014 10:51:09 +0100 Subject: [infinispan-dev] A question and an observation In-Reply-To: <53BA56E0.1080607@redhat.com> References: <53BA56E0.1080607@redhat.com> Message-ID: <53BA6D8D.7010604@infinispan.org> On 07/07/2014 09:14 AM, Bela Ban wrote: > This is gone in 7. Do I now have to use programmatic configuration ? If > so, how would I do this ? AFAIK, yes it was removed from configuration file and can only be set by programmatic configuration. Pedro From bban at redhat.com Mon Jul 7 06:04:17 2014 From: bban at redhat.com (Bela Ban) Date: Mon, 07 Jul 2014 12:04:17 +0200 Subject: [infinispan-dev] A question and an observation In-Reply-To: <53BA6D8D.7010604@infinispan.org> References: <53BA56E0.1080607@redhat.com> <53BA6D8D.7010604@infinispan.org> Message-ID: <53BA70A1.5040306@redhat.com> How ? I already have an infinispan.xml, create a CacheManager off of it and now only want to change the transport. I need to get another PhD to understand programmatic configuration in Infinispan On 07/07/14 11:51, Pedro Ruivo wrote: > > > On 07/07/2014 09:14 AM, Bela Ban wrote: >> This is gone in 7. Do I now have to use programmatic configuration ? If >> so, how would I do this ? > > AFAIK, yes it was removed from configuration file and can only be set by > programmatic configuration. > > Pedro > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -- Bela Ban, JGroups lead (http://www.jgroups.org) From mgencur at redhat.com Mon Jul 7 06:17:00 2014 From: mgencur at redhat.com (Martin Gencur) Date: Mon, 07 Jul 2014 12:17:00 +0200 Subject: [infinispan-dev] Example server config tests disabled? In-Reply-To: <53BA6D4F.7080903@infinispan.org> References: <53BA63EB.4050103@redhat.com> <53BA6D4F.7080903@infinispan.org> Message-ID: <53BA739C.5000602@redhat.com> On 7.7.2014 11:50, Pedro Ruivo wrote: > They are running in a different group: > > http://ci.infinispan.org/viewLog.html?currentGroup=test&scope=%23teamcity%23org.infinispan.server.test.configs%23teamcity%23ExampleConfigsIT&pager.currentPage=1&order=DURATION_DESC&recordsPerPage=20&filterText=ExampleConfig&status=&buildTypeId=bt34&buildId=9465&tab=testsInfo Thanks Pedro. I was told they don't run so apparently they run in a different group. Do the test failures get the same attention as the failures in the main test suite? Martin > > On 07/07/2014 10:10 AM, Martin Gencur wrote: >> Hi, >> looks like all example config tests were marked as "Unstable" and hence >> disabled. I see a note "See ISPN-4026" in ExampleConfigsIT.java and it >> leads to https://issues.jboss.org/browse/ISPN-4026. This would be fine >> if not all tests for all example configs were disabled. >> As a result, tests for the example config for rolling upgrades did not >> run and this issue was ignored: >> https://issues.jboss.org/browse/ISPN-4026 This looks like a critical >> issue to me. >> >> PS: I will always wonder how we can be disabling tests in this way. >> >> Martin >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From sanne at infinispan.org Mon Jul 7 06:52:21 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Mon, 7 Jul 2014 11:52:21 +0100 Subject: [infinispan-dev] A question and an observation In-Reply-To: <53BA70A1.5040306@redhat.com> References: <53BA56E0.1080607@redhat.com> <53BA6D8D.7010604@infinispan.org> <53BA70A1.5040306@redhat.com> Message-ID: On 7 July 2014 11:04, Bela Ban wrote: > How ? I already have an infinispan.xml, create a CacheManager off of it > and now only want to change the transport. I have the same need; in Palma I asked for a CacheManager constructor which would take (String infinispanConfiguration, Transport customTransportInstance). Could we have that please please? I never opened a new specific JIRA as there is the more generally useful ISPN-1414 already. > > I need to get another PhD to understand programmatic configuration in > Infinispan > > On 07/07/14 11:51, Pedro Ruivo wrote: >> >> >> On 07/07/2014 09:14 AM, Bela Ban wrote: >>> This is gone in 7. Do I now have to use programmatic configuration ? If >>> so, how would I do this ? >> >> AFAIK, yes it was removed from configuration file and can only be set by >> programmatic configuration. >> >> Pedro >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> > > -- > Bela Ban, JGroups lead (http://www.jgroups.org) > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From pedro at infinispan.org Mon Jul 7 07:00:08 2014 From: pedro at infinispan.org (Pedro Ruivo) Date: Mon, 07 Jul 2014 12:00:08 +0100 Subject: [infinispan-dev] A question and an observation In-Reply-To: <53BA70A1.5040306@redhat.com> References: <53BA56E0.1080607@redhat.com> <53BA6D8D.7010604@infinispan.org> <53BA70A1.5040306@redhat.com> Message-ID: <53BA7DB8.2070008@infinispan.org> On 07/07/2014 11:04 AM, Bela Ban wrote: > How ? I already have an infinispan.xml, create a CacheManager off of it > and now only want to change the transport. I don't know if there is an easy way, but this may work: ParserRegistry parser = new ParserRegistry(); ConfigurationBuilderHolder holder = parser.parse(/*filename or input stream*/); holder.getGlobalConfiguration().transport().transport(new CustomTransport()); DefaultCacheManager manager = new DefaultCacheManager(holder, true); > > I need to get another PhD to understand programmatic configuration in > Infinispan lol > > On 07/07/14 11:51, Pedro Ruivo wrote: >> >> >> On 07/07/2014 09:14 AM, Bela Ban wrote: >>> This is gone in 7. Do I now have to use programmatic configuration ? If >>> so, how would I do this ? >> >> AFAIK, yes it was removed from configuration file and can only be set by >> programmatic configuration. >> >> Pedro >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> > From mmarkus at redhat.com Mon Jul 7 07:30:42 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Mon, 7 Jul 2014 12:30:42 +0100 Subject: [infinispan-dev] A question and an observation In-Reply-To: References: <53BA56E0.1080607@redhat.com> <53BA6D8D.7010604@infinispan.org> <53BA70A1.5040306@redhat.com> Message-ID: On Jul 7, 2014, at 11:52, Sanne Grinovero wrote: > On 7 July 2014 11:04, Bela Ban wrote: >> How ? I already have an infinispan.xml, create a CacheManager off of it >> and now only want to change the transport. > > I have the same need; in Palma I asked for a CacheManager constructor > which would take > (String infinispanConfiguration, Transport customTransportInstance). GlobalConfigurationBuilder gc = new GlobalConfigurationBuilder(); gc.transport().transport(new JGroupsTransport()); DefaultCacheManager dcm = new DefaultCacheManager(gc.build()); Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Mon Jul 7 07:36:30 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Mon, 7 Jul 2014 12:36:30 +0100 Subject: [infinispan-dev] A question and an observation In-Reply-To: <53BA6A4A.4040201@redhat.com> References: <53BA56E0.1080607@redhat.com> <53BA6A4A.4040201@redhat.com> Message-ID: <032E3CC3-D121-4C6B-85CD-970568871156@redhat.com> On Jul 7, 2014, at 10:37, Bela Ban wrote: >> 10x faster? That's surprising for a benchmark which is supposed to be >> network bound isn't it? > > I ran 2 IspnPerfTest processes on my local box with numOwners=2 (no L1 > cache). I guess that roughly half of the calls go to the local node, and > there the cost of withFlags() is not amortized by the network round trip. All the reads are local as both nodes own all the data in this setup. Writes will involve RPCs, though. Even for local caches, the 10x performance degradation from using the flags is way too much, there must be something else at stake. If you profile it we can take a look. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From bban at redhat.com Mon Jul 7 07:41:50 2014 From: bban at redhat.com (Bela Ban) Date: Mon, 07 Jul 2014 13:41:50 +0200 Subject: [infinispan-dev] A question and an observation In-Reply-To: References: <53BA56E0.1080607@redhat.com> <53BA6D8D.7010604@infinispan.org> <53BA70A1.5040306@redhat.com>

Message-ID: <53BA877E.4060709@redhat.com> But where do I pass in my infinispan.xml config file ? I don't want a pure programtic configuration On 07/07/14 13:30, Mircea Markus wrote: > > On Jul 7, 2014, at 11:52, Sanne Grinovero wrote: > >> On 7 July 2014 11:04, Bela Ban wrote: >>> How ? I already have an infinispan.xml, create a CacheManager off of it >>> and now only want to change the transport. >> >> I have the same need; in Palma I asked for a CacheManager constructor >> which would take >> (String infinispanConfiguration, Transport customTransportInstance). > > GlobalConfigurationBuilder gc = new GlobalConfigurationBuilder(); > gc.transport().transport(new JGroupsTransport()); > DefaultCacheManager dcm = new DefaultCacheManager(gc.build()); > > > Cheers, > -- Bela Ban, JGroups lead (http://www.jgroups.org) From mmarkus at redhat.com Mon Jul 7 07:42:19 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Mon, 7 Jul 2014 12:42:19 +0100 Subject: [infinispan-dev] A question and an observation In-Reply-To: <53BA56E0.1080607@redhat.com> References: <53BA56E0.1080607@redhat.com> Message-ID: On Jul 7, 2014, at 9:14, Bela Ban wrote: > 2. Question: > ----------- > In Infinispan 6, I defined my custom transport as follows: > > > This is gone in 7. Do I now have to use programmatic configuration ? If > so, how would I do this ? This has disappeared as part of the configuration revamp undertaken in 7.0, not sure if this was in purpose, Galder? Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From bban at redhat.com Mon Jul 7 07:47:33 2014 From: bban at redhat.com (Bela Ban) Date: Mon, 07 Jul 2014 13:47:33 +0200 Subject: [infinispan-dev] A question and an observation In-Reply-To: <032E3CC3-D121-4C6B-85CD-970568871156@redhat.com> References: <53BA56E0.1080607@redhat.com> <53BA6A4A.4040201@redhat.com> <032E3CC3-D121-4C6B-85CD-970568871156@redhat.com> Message-ID: <53BA88D5.4060405@redhat.com> Makes sense; I'll try to get a jmc dump On 07/07/14 13:36, Mircea Markus wrote: > > On Jul 7, 2014, at 10:37, Bela Ban wrote: > >>> 10x faster? That's surprising for a benchmark which is supposed to be >>> network bound isn't it? >> >> I ran 2 IspnPerfTest processes on my local box with numOwners=2 (no L1 >> cache). I guess that roughly half of the calls go to the local node, and >> there the cost of withFlags() is not amortized by the network round trip. > > All the reads are local as both nodes own all the data in this setup. Writes will involve RPCs, though. > Even for local caches, the 10x performance degradation from using the flags is way too much, there must be something else at stake. If you profile it we can take a look. > > Cheers, > -- Bela Ban, JGroups lead (http://www.jgroups.org) From bban at redhat.com Mon Jul 7 08:31:07 2014 From: bban at redhat.com (Bela Ban) Date: Mon, 07 Jul 2014 14:31:07 +0200 Subject: [infinispan-dev] A question and an observation In-Reply-To: References: <53BA56E0.1080607@redhat.com> <53BA6D8D.7010604@infinispan.org> <53BA70A1.5040306@redhat.com>

Message-ID: <53BA930B.3010306@redhat.com> The use case is this: #1 I want to configure all caches from a config XML file #2 The I want to *override* one or two small aspects, such as the transport Ideally, I could define #2 in the config file as well, but that changed from 6 to 7. Note that I don't want to switch to programmatic configuration entirely just because of #2 On 07/07/14 13:30, Mircea Markus wrote: > > On Jul 7, 2014, at 11:52, Sanne Grinovero wrote: > >> On 7 July 2014 11:04, Bela Ban wrote: >>> How ? I already have an infinispan.xml, create a CacheManager off of it >>> and now only want to change the transport. >> >> I have the same need; in Palma I asked for a CacheManager constructor >> which would take >> (String infinispanConfiguration, Transport customTransportInstance). > > GlobalConfigurationBuilder gc = new GlobalConfigurationBuilder(); > gc.transport().transport(new JGroupsTransport()); > DefaultCacheManager dcm = new DefaultCacheManager(gc.build()); > > > Cheers, > -- Bela Ban, JGroups lead (http://www.jgroups.org) From mudokonman at gmail.com Mon Jul 7 16:58:40 2014 From: mudokonman at gmail.com (William Burns) Date: Mon, 7 Jul 2014 16:58:40 -0400 Subject: [infinispan-dev] Clustered Listener In-Reply-To: <53B6BD26.7030806@unine.ch> References: <537219A9.1060301@unine.ch> <53B6BD26.7030806@unine.ch> Message-ID: On Fri, Jul 4, 2014 at 10:41 AM, Pierre Sutra wrote: > Hello, > >> Are you talking about non clustered listeners? It seems unlikely you >> would need so many cluster listeners. Cluster listeners should allow >> you to only install a small amount of them, usually you would have >> only additional ones if you have a Filter applied limiting what >> key/values are returned. > Our usage of the clustered API is a corner case, but the installation of > a listener for a specific key (or key range) could be of general > purpose. My point was that installing all filters everywhere is costly > as every node should iterate over all filters for every modification. > Our tentative code for key-specific filtering is available at > github.com/otrack/Leads-infinispan > (org.infinispan.notifications.KeySpecificListener and > org.infinispan.notifications.cachelistener.CacheNotifierImpl). In this case it still has to iterate over the listener for modifications that live on the same node, but in this case the chance of having the listener present is smaller. It doesn't look like what you have currently is safe for rehashes though since the owners would change nodes. You would need to move the listener between nodes in this case. Also you removed an edge case when a listener might not be installed if a CH change occurs right when sending to nodes (talked about later). > >> Is the KeyFilter or KeyValueFilter not sufficient for this? void >> addListener(Object listener, KeyFilter filter); void >> addListener(Object listener, KeyValueFilter >> filter, Converter converter); Also to note if >> you are doing any kind of translation of the value to another value it >> is recommended to do that via the supplied Converter. This can give >> good performance as the conversion is done on the target node and not >> all in 1 node and also you can reduce the payload if the resultant >> value has a serialized form that is smaller than the original value. > Indeed, this mechanism suffices for many purposes, I was just pointing > out that it might be sometimes expensive. I think I better understand what part you are talking about here. Your issue was around the fact that every node has the listener installed and thus any modification must be checked against the filter. If you have a large amount of filters I agree this could be costly, however cluster listeners was not envisioned to have hundreds installed. I think maybe if I better understood your use case we could add some support for this which would work better. Unfortunately a Filter currently doesn't designate a key (which is the core of the issue from my understanding), however we could look into enhancing it to support something like you have. One thing that I haven't implemented yet that I was hoping to get to was doing a single notification on an event occuring instead of N of matches. Say in the case you have 10 listeners installed as cluster listeners and you have 1 modification, this could cause 10 remote calls to occur, 1 for each listener. I was thinking instead I could batch those events so only a single event is sent. I wonder if you are running into this as well? > >>> In such a case, the listener is solely >>> installed at the key owners. This greatly helps the scalability of the >>> mechanism at the cost of fault-tolerance since, in the current state of >>> the implementation, listeners are not forwarded to new data owners. >>> Since as a next step [1] it is planned to handle topology change, do you >>> plan also to support key (or key range) specific listener ? >> These should be covered with the 2 overloads as I mentioned above. >> This should be the most performant way as the filter is replicated to >> the node upon installation so a 1 time cost. But if a key/value pair >> doesn't pass the filter the event is not sent to the node where the >> listener is installed. > I agree. > >> >>> Besides, >>> regarding this last point and the current state of the implementation, I >>> would have like to know what is the purpose of the re-installation of >>> the cluster listener in case of a view change in the addedListener() >>> method of the CacheNotifierImpl class. >> This isn't a re-installation. This is used to propgate the >> RemoteClusterListener to the other nodes, so that when a new event is >> generated it can see that and subsequently send it back to the node >> where the listener is installed. There is also a second check in >> there in case if a new node joins in the middle. > The term re-installation was inappropriate. I was meaning here that in > my understanding of the code of CacheNotifierImpl, a second check seems > of no need since if a new node joins the cluster afterward it still has > to install the pair . The problem was that there is a overlap when you have a node joining while you are sending the initial requests that it wouldn't install the listener. Cluster -> Node A, B, C 1. User installs listener on Node C 2. Node C is sending listeners to Nodes A + B 3. Node D joins in the time in between and asks for the listener (from coordinator), but it isn't fully installed yet to be retrieved 4. Node C finishes installing listeners on Nodes A + B in this case Node D never would have gotten listener, so Node C also sees if anyone else has joined. The difference is that Node D only sends 1 message to coordinator to ask for listeners instead of sending N # of messages to all nodes (which would be required on every JOIN from any node). This should scale better in the long run especially since most cases this shouldn't happen. > > Best, > Pierre > > ps: sorry for the late answer. > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From pierre.sutra at unine.ch Tue Jul 8 05:47:24 2014 From: pierre.sutra at unine.ch (Pierre Sutra) Date: Tue, 8 Jul 2014 11:47:24 +0200 Subject: [infinispan-dev] Infinispan and Gora Message-ID: <53BBBE2C.3020404@unine.ch> Hello, As part of the LEADS project, we are planning to run Apache Nutch on top of Infinispan. To that end, we implemented an infinispan module for Apache Gora. The implementation is accessible via GitHub ( projects https://github.com/otrack/Leads-infinispan.git and https://github.com/vschiavoni/gora). At core, it relies on a preliminary support for Avro in infinispan (impacting the remote-query-client, remote-query-server and hotrod-client modules). This support uses the self-descriptive capabilities of Avro data to avoid declaring types in advance via Google Protocol Buffers ( protobuf). In the current state, our modifications are not fully compatible with the existing protobuf-based remote operations, but if they look of interest we can improve this. Cheers, Pierre From mmarkus at redhat.com Wed Jul 9 11:32:38 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Wed, 9 Jul 2014 16:32:38 +0100 Subject: [infinispan-dev] Example server config tests disabled? In-Reply-To: References: <53BA63EB.4050103@redhat.com> Message-ID: <77DA558C-A967-4C8C-B734-EA560452B36E@redhat.com> On Jul 7, 2014, at 10:37, Sanne Grinovero wrote: >> looks like all example config tests were marked as "Unstable" and hence >> disabled. I see a note "See ISPN-4026" in ExampleConfigsIT.java and it >> leads to https://issues.jboss.org/browse/ISPN-4026. This would be fine >> if not all tests for all example configs were disabled. >> As a result, tests for the example config for rolling upgrades did not >> run and this issue was ignored: >> https://issues.jboss.org/browse/ISPN-4026 This looks like a critical >> issue to me. >> >> PS: I will always wonder how we can be disabling tests in this way. > > +1 > I don't understand how we can keep pushing things which break stuff by > disabling tests rather than rolling back broken patches. > > There are situations in which a test is badly designed and it should > be disabled, but I've had lots of tests in Query disabled because of > stuff not working as it should in core.. stuff which used to work. > In such cases one shouldn't disable the test but roll-back the changes > which broke it, and think better about the "fixes" rather than move > responsibility to others to figure it out eventually. Tests should not be disabled in order to be fixed at a future point in time, but to be investigated in background and allow others to integrate PRs into upstream. Just disabling tests and not looking into them is pointless. We'll reenable all the tests before going beta. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From sanne at infinispan.org Wed Jul 9 11:50:57 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Wed, 9 Jul 2014 16:50:57 +0100 Subject: [infinispan-dev] Example server config tests disabled? In-Reply-To: <77DA558C-A967-4C8C-B734-EA560452B36E@redhat.com> References: <53BA63EB.4050103@redhat.com> <77DA558C-A967-4C8C-B734-EA560452B36E@redhat.com> Message-ID: On 9 July 2014 16:32, Mircea Markus wrote: > > On Jul 7, 2014, at 10:37, Sanne Grinovero wrote: > >>> looks like all example config tests were marked as "Unstable" and hence >>> disabled. I see a note "See ISPN-4026" in ExampleConfigsIT.java and it >>> leads to https://issues.jboss.org/browse/ISPN-4026. This would be fine >>> if not all tests for all example configs were disabled. >>> As a result, tests for the example config for rolling upgrades did not >>> run and this issue was ignored: >>> https://issues.jboss.org/browse/ISPN-4026 This looks like a critical >>> issue to me. >>> >>> PS: I will always wonder how we can be disabling tests in this way. >> >> +1 >> I don't understand how we can keep pushing things which break stuff by >> disabling tests rather than rolling back broken patches. >> >> There are situations in which a test is badly designed and it should >> be disabled, but I've had lots of tests in Query disabled because of >> stuff not working as it should in core.. stuff which used to work. >> In such cases one shouldn't disable the test but roll-back the changes >> which broke it, and think better about the "fixes" rather than move >> responsibility to others to figure it out eventually. > > Tests should not be disabled in order to be fixed at a future point in time, but to be investigated in background and allow others to integrate PRs into upstream. Just disabling tests and not looking into them is pointless. We'll reenable all the tests before going beta. Cool, but also it's important to make a distinction between A) tests which are not correct or not deterministic enough to be trusted (we need to get rid of these as it makes the whole process very painful) B) correct tests which suddenly start to fail In case of B you don't want to investigate after several weeks, it's much easier if you're not allowed to merge a PR until at least you know what relation there is between the changes and the failing test. If there is none, then you're in case A, right? But if there is a relation, then you *might* decide that you opt to disable the test and go ahead, but at least you make it a responsible decision and know exactly what is going on, and also makes it easier to estimate further blockers. Tests should never be disabled for the convenience of investigating "tomorrow", only when you can clearly say it's not a deterministic one. Sanne > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From emmanuel at hibernate.org Wed Jul 9 11:51:10 2014 From: emmanuel at hibernate.org (Emmanuel Bernard) Date: Wed, 9 Jul 2014 17:51:10 +0200 Subject: [infinispan-dev] Propagate the schema to the cachestore Message-ID: A remark by Divya made me think of something. With Infinispan moving to the direction of ProtoBuf and schemas, cache store would greatly benefit from receiving in one shape or another that schema to transform a blob into something more structure depending on the underlying capability of the datastore. Has anyone explored that angle? Emmanuel From pedro at infinispan.org Thu Jul 10 06:23:27 2014 From: pedro at infinispan.org (Pedro Ruivo) Date: Thu, 10 Jul 2014 11:23:27 +0100 Subject: [infinispan-dev] LevelDB & expirationQueue Message-ID: <53BE699F.60801@infinispan.org> Hi, I found a couple of issue with the expirationQueue in leveldb. AFAIK, this queue has the goal to avoid 2 writes to leveldb per infinispan write. correct me if I'm wrong. Also, it is drain when the eviction thread is trigger (every minute by default). #1 the queue is is only drained when the eviction thread is triggered. It is difficult to configure a queue-size + wake-up interval for all the possible workloads. A possible solution is to use an internal thread in LevelDBStore to drain this queue. #2 It is possible to write to leveldb asynchronously. So why can't we remove the queue? Do we have some performance numbers that shows a degradation without the queue? Thoughts? Cheers, Pedro From mmarkus at redhat.com Thu Jul 10 09:04:46 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Thu, 10 Jul 2014 14:04:46 +0100 Subject: [infinispan-dev] infinispan test suite, reloaded Message-ID: <8D4E3043-CED1-4978-85A3-73F9F1E4CB44@redhat.com> I just had a chat with Dan and we don't think the current process for the test suite works. Not hard to see why, the suite is almost never green. So we will adopt a more classic and simple approach: if a test fails a blocker JIRA is created for it and assigned to a component lead then team member who'll start working on it *immediately*. Dan will be watch dog starting today so please expect blocker JIRAs coming your way and treat them accordingly. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From sanne at infinispan.org Thu Jul 10 10:03:30 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Thu, 10 Jul 2014 15:03:30 +0100 Subject: [infinispan-dev] infinispan test suite, reloaded In-Reply-To: <8D4E3043-CED1-4978-85A3-73F9F1E4CB44@redhat.com> References: <8D4E3043-CED1-4978-85A3-73F9F1E4CB44@redhat.com> Message-ID: The important point for me is that patches don't get merged if they introduce any regression. I hope that rule stays? BTW this matches with the "classic" approach as far as I know it. On 10 July 2014 14:04, Mircea Markus wrote: > I just had a chat with Dan and we don't think the current process for the test suite works. Not hard to see why, the suite is almost never green. So we will adopt a more classic and simple approach: if a test fails a blocker JIRA is created for it and assigned to a component lead then team member who'll start working on it *immediately*. Dan will be watch dog starting today so please expect blocker JIRAs coming your way and treat them accordingly. > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From mmarkus at redhat.com Thu Jul 10 10:13:09 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Thu, 10 Jul 2014 15:13:09 +0100 Subject: [infinispan-dev] infinispan test suite, reloaded In-Reply-To: References: <8D4E3043-CED1-4978-85A3-73F9F1E4CB44@redhat.com> Message-ID: <4B6CD478-E8E1-4E7F-BE5B-F9AB04283D87@redhat.com> On Jul 10, 2014, at 15:03, Sanne Grinovero wrote: > The important point for me is that patches don't get merged if they > introduce any regression. I hope that rule stays? > BTW this matches with the "classic" approach as far as I know it. yes. A patch might pass the test when integrated and cause intermittent failures later on, so it's not straight forward to avoid patches introducing regressions, nor to identify which patch has caused it so that we can roll it back. > > On 10 July 2014 14:04, Mircea Markus wrote: >> I just had a chat with Dan and we don't think the current process for the test suite works. Not hard to see why, the suite is almost never green. So we will adopt a more classic and simple approach: if a test fails a blocker JIRA is created for it and assigned to a component lead then team member who'll start working on it *immediately*. Dan will be watch dog starting today so please expect blocker JIRAs coming your way and treat them accordingly. >> >> Cheers, >> -- >> Mircea Markus >> Infinispan lead (www.infinispan.org) >> >> >> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From sanne at infinispan.org Thu Jul 10 11:12:52 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Thu, 10 Jul 2014 16:12:52 +0100 Subject: [infinispan-dev] infinispan test suite, reloaded In-Reply-To: <4B6CD478-E8E1-4E7F-BE5B-F9AB04283D87@redhat.com> References: <8D4E3043-CED1-4978-85A3-73F9F1E4CB44@redhat.com> <4B6CD478-E8E1-4E7F-BE5B-F9AB04283D87@redhat.com> Message-ID: On 10 July 2014 15:13, Mircea Markus wrote: > > On Jul 10, 2014, at 15:03, Sanne Grinovero wrote: > >> The important point for me is that patches don't get merged if they >> introduce any regression. I hope that rule stays? >> BTW this matches with the "classic" approach as far as I know it. > > yes. A patch might pass the test when integrated and cause intermittent failures later on, so it's not straight forward to avoid patches introducing regressions, nor to identify which patch has caused it so that we can roll it back. It's self-speaking that it's hard to immediately evaluate if a patch is going to cause intermittent / time-bound failures in the future: we have no crystal ball, so I'm not making absurd demands. There will always be cases in which developers will need to ask forgiveness. But if anything in the test run fails during a review of a patch, and this patch still gets merged because "the cause is likely unrelated", or worse "we'll fix that problem later" that's unacceptable and needs to be investigated further before being merged. I always did that, and spending a lot of time on things which are not directly related on my goals, for sake of respect of other developers in the team and I expect no less from everyone else: when the testsuite doesn't pass I can't make progress on my own work and need to necessarily shift to firefighthing. Or go on holidays. That's what necessarily needs to happen when a bad patch slips in past our guard, even if doing so you need to reschedule other tasks: because otherwise it's other people, and more and more people needing to reschedule their own tasks. Worse yet, the other people who will need to look at it will not have any of the context to understand what might be going on. So I hope we're on the same page with this, because ultimately it's about respect for each person contributing or working on it. I'm also puzzled on why exactly the current situation is not sustainable. Sure there is a lot of technical debt to pay, but you know debts come with interests and it's hard. I've also suggested many ways to improve our testsuite to make our life easier, like use Byteman, mock the timers via a TimeService, get rid of TestNG to move to something more reliable, remove unnecessary dependencies from each module, use more mocking in areas where we don't actually have an interest in testing JGroups.. Nothing like that was done, so you can't say in all fairness that we tried to do better. My PR which is the first step to get rid of TestNG from the Query modules is open since 43 days.. so don't expect me to fix any problem quickly, it's not particularly motivating. You can try playing with processes, and I don't disagree with idea, but when I'm able to find a regression is is that I will do: I'll revert all commits until the first stable point I find. I've suggested this approach to you as well, not sure why you don't apply it more regularly? There is no shame in reverting commits, especially if we do it regularly. It's stable today, so I'll make a note of this commit id, my crystal ball is telling that it will be useful soon ;-) Sanne > >> >> On 10 July 2014 14:04, Mircea Markus wrote: >>> I just had a chat with Dan and we don't think the current process for the test suite works. Not hard to see why, the suite is almost never green. So we will adopt a more classic and simple approach: if a test fails a blocker JIRA is created for it and assigned to a component lead then team member who'll start working on it *immediately*. Dan will be watch dog starting today so please expect blocker JIRAs coming your way and treat them accordingly. >>> >>> Cheers, >>> -- >>> Mircea Markus >>> Infinispan lead (www.infinispan.org) >>> >>> >>> >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From mgencur at redhat.com Fri Jul 11 02:58:12 2014 From: mgencur at redhat.com (Martin Gencur) Date: Fri, 11 Jul 2014 08:58:12 +0200 Subject: [infinispan-dev] Issue with JGroups config files in ispn-core In-Reply-To: <539AB196.5070108@redhat.com> References: <53999D6A.9080300@redhat.com> <53999F02.3020008@redhat.com> <386547548.20027436.1402577043992.JavaMail.zimbra@redhat.com> <5399D97E.3000003@redhat.com> <539AB196.5070108@redhat.com> Message-ID: <53BF8B04.2050503@redhat.com> OK, I filed this one: https://issues.jboss.org/browse/ISPN-4499 Martin On 13.6.2014 10:08, Tristan Tarrant wrote: > Let me add: > > specifying a configuration file just by name is NOT enough. We also need > to allow users to specify an InputStream which IMHO is the "only correct > way" :) > > Tristan > > On 13/06/14 01:44, Sanne Grinovero wrote: >> On 12 June 2014 17:46, Dennis Reed wrote: >>> +1 to changing the name/directory. >>> -100 to changing the order of where it's looked for instead. >>> >>> All resource lookups should use the normal rules for finding resources. >>> Don't change standard behavior without a *very* good reason. >> There isn't a "standard behaviour", the problem is exactly that it's >> not defined. >> By actually specifying an order we can decide that a user >> configuration file will take priority over our own file, at least >> assuming a modular classloader is being used. >> So indeed it's not a solution for flat classloaders, but it's at least >> correct in that case. >> >> This is what we do in other frameworks: the order in which you look >> for resources needs to be well defined, but it's not in Infinispan. >> >>> Doing anything special (like META-INF/_internal/jgroups-udp.xml) is >>> completely non-intuitive >>> and will cause support issues down the road. >>> Using config/jgroups-udp.xml is standard, and would be immediately >>> understood by anyone. >> I would agree with you about this being a common expectation, but it's >> not rock-solid; we still can't state for sure that the user won't use >> "config" directory too.. >> >> The only foolproof solution is to store some signature of the >> jgroups-udp.xml hardcoded in a class, or simply hardcode the whole >> file as a constant. >> We could use the maven inject plugin to maintain the xml file as >> normal source code, and "seal" some constant at build time.. but I >> don't think it's worth it as by defining the lookup order the problem >> is solved in JBoss or WildFly as we would know that the >> infinispan-core.jar classloader is different than the user one, unless >> people bundle the Infinispan jars in their app. >> >> Considering JGroups already logs the full configuration file that it's >> applying, maybe it could log some hash signature as well? This way you >> could just compare the signature to make sure it's using some "well >> known" configuration file. >> >> Sanne >> >> >>> -Dennis >>> >>> On 06/12/2014 07:44 AM, Alan Field wrote: >>>> Tristan, >>>> >>>> So the server and library configuration parsers will handle something like this? >>>> >>>> >>>> >>>> >>>> >>>> If this is true, then I agree that this is a good solution as well. >>>> >>>> Thanks, >>>> Alan >>>> >>>> ----- Original Message ----- >>>>> From: "Tristan Tarrant" >>>>> To: "infinispan -Dev List" >>>>> Sent: Thursday, June 12, 2014 2:37:22 PM >>>>> Subject: Re: [infinispan-dev] Issue with JGroups config files in ispn-core >>>>> >>>>> I think the "internal" jgroups files should be "moved" to a separate >>>>> directory within the core jar, to be searched after the "root". So the >>>>> user can still provide a jgroups-udp.xml and it won't conflict. >>>>> >>>>> Tristan >>>>> >>>>> On 12/06/14 14:30, Martin Gencur wrote: >>>>>> Hi, >>>>>> let me mention an issue that several people faced in the past, >>>>>> independently of each other: >>>>>> >>>>>> A user app uses a custom JGroups configuration file. However, they >>>>>> choose the same name as the files which we bundle inside >>>>>> infinispan-core.jar. >>>>>> Result? People are wondering why their custom configuration does not >>>>>> take effect. >>>>>> Reason? Infinispan uses the default jgroups file bundled in infinispan-core >>>>>> Who faced the issue? (I suppose it's just a small subset:)) Me, Radim, >>>>>> Alan, Wolf Fink >>>>>> >>>>>> I believe a lot of users run into this issue. >>>>>> >>>>>> We were considering a possible solution and this one seems like it could >>>>>> work (use both 1) and 2)): >>>>>> 1) rename the config files in the distribution e.g. this way: >>>>>> jgroups-ec2.xml -> default-jgroups-ec2.xml >>>>>> jgroups-udp.xml -> default-jgroups-udp.xml >>>>>> jgroups-tcp.xml -> default-jgroups-tcp.xml >>>>>> >>>>>> Any other suggestions? internal-jgroups-udp.xml ? >>>>>> dontEverUseThisFileInYourAppAsTheCustomConfigurationFile-jgroups-udp.xml >>>>>> ? (joke) >>>>>> (simply something that users would automatically like to change once >>>>>> they use it in their app) >>>>>> >>>>>> 2) Throw a warning whenever a user wants to use a custom jgroups >>>>>> configuration file that has the same name as one of the above >>>>>> >>>>>> >>>>>> WDYT? >>>>>> >>>>>> Thanks! >>>>>> Martin >>>>>> >>>>>> _______________________________________________ >>>>>> infinispan-dev mailing list >>>>>> infinispan-dev at lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev at lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From dan.berindei at gmail.com Fri Jul 11 05:26:46 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Fri, 11 Jul 2014 12:26:46 +0300 Subject: [infinispan-dev] Issue with JGroups config files in ispn-core In-Reply-To: <2017472995.20619817.1402643387052.JavaMail.zimbra@redhat.com> References: <53999D6A.9080300@redhat.com> <53999F02.3020008@redhat.com> <386547548.20027436.1402577043992.JavaMail.zimbra@redhat.com> <5399D97E.3000003@redhat.com> <5399EE02.3030503@redhat.com> <539A8DCA.5020202@redhat.com> <2017472995.20619817.1402643387052.JavaMail.zimbra@redhat.com> Message-ID: +1 to having a readable (and publicly advertised!) name for our default JGroups configuration. I particularly like the default-jgroups-tcp.xml proposal. We always chastise users for going with their own configuration inherited from their Infinispan 4.0 days instead of starting with our default configuration and only modifying the settings that they really need. It would be much harder for the users to believe our advice if we call our configuration META-INF/we-dont-want-you-to-see-this/jgroups-tcp.xml I vote to just log a warning on startup if we find look up the configuration file name in the classpath and find more than one match. Bela: we also support absolute path names and URLs AFAIK but I don't really see a good use for those options. Cheers Dan On Fri, Jun 13, 2014 at 10:09 AM, Alan Field wrote: > > > ----- Original Message ----- > > From: "Martin Gencur" > > To: "infinispan -Dev List" > > Sent: Friday, June 13, 2014 7:36:10 AM > > Subject: Re: [infinispan-dev] Issue with JGroups config files in > ispn-core > > > > On 12.6.2014 20:14, Tristan Tarrant wrote: > > > On 12/06/14 18:46, Dennis Reed wrote: > > >> +1 to changing the name/directory. > > >> -100 to changing the order of where it's looked for instead. > > >> > > >> All resource lookups should use the normal rules for finding > resources. > > >> Don't change standard behavior without a *very* good reason. > > >> > > >> Doing anything special (like META-INF/_internal/jgroups-udp.xml) is > > >> completely non-intuitive > > >> and will cause support issues down the road. > > >> Using config/jgroups-udp.xml is standard, and would be immediately > > >> understood by anyone. > > >> > > > Users don't even need to know that META-INF/_internal/blah actually > > > exists. It is just an internal detail when using the "default" (i.e. > > > just enable clustering without explicitly specifying a configuration > file). > > > > My understanding was that users just take an example config. file (i.e. > > jgroups-udp.xml), copy it into their application and modify. That's how > > users get the same name for their configuration file as the default. So > > in this case, they might find it again, even in > > META-INF/_internal/jgroups.udp.xml :) > > I agree that defining the order that configuration files are loaded is > important and should be defined. I would prefer a more readable path like > "META-INF/example_configurations/jgroups/udp.xml". This also gives us a > good location to provide example cache configuration files as well. I also > think that a message should be logged with the path to the configuration > file being used: > > 2014-06-12 06:45:02,871 [thread-name] INFO [package-name] Using JGroups > configuration file 'jar:META-INF/example_configurations/jgroups/udp.xml' > 2014-06-12 06:45:02,871 [thread-name] INFO [package-name] Using JGroups > configuration file 'file:/app_home/config/jgroups-udp.xml' > > This would make it more obvious to the user which configuration file is in > use. > > Thanks, > Alan > > > > > > Martin > > > > > > > > > > Tristan > > > _______________________________________________ > > > infinispan-dev mailing list > > > infinispan-dev at lists.jboss.org > > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140711/858cd414/attachment-0001.html From mmarkus at redhat.com Fri Jul 11 09:05:38 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Fri, 11 Jul 2014 14:05:38 +0100 Subject: [infinispan-dev] Issue with JGroups config files in ispn-core In-Reply-To: References: <53999D6A.9080300@redhat.com> <53999F02.3020008@redhat.com> <386547548.20027436.1402577043992.JavaMail.zimbra@redhat.com> <5399D97E.3000003@redhat.com> <5399EE02.3030503@redhat.com> <539A8DCA.5020202@redhat.com> <2017472995.20619817.1402643387052.JavaMail.zimbra@redhat.com> Message-ID: <8E8A25BF-ECAB-4CC2-B68F-EC9A1C4264C1@redhat.com> On Jul 11, 2014, at 10:26, Dan Berindei wrote: > +1 to having a readable (and publicly advertised!) name for our default JGroups configuration. I particularly like the default-jgroups-tcp.xml proposal. > > We always chastise users for going with their own configuration inherited from their Infinispan 4.0 days instead of starting with our default configuration and only modifying the settings that they really need. It would be much harder for the users to believe our advice if we call our configuration META-INF/we-dont-want-you-to-see-this/jgroups-tcp.xml > > I vote to just log a warning on startup if we find look up the configuration file name in the classpath and find more than one match. +1 to both suggestions. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Fri Jul 11 09:14:56 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Fri, 11 Jul 2014 14:14:56 +0100 Subject: [infinispan-dev] Infinispan roadmap Message-ID: For increased visibility I've added a roadmap to the Infinispan site directly: http://infinispan.org/roadmap/ Also contains my view of the Infinispan 8.0 release, based on the feedback I got so far. Any feedback welcomed. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From bban at redhat.com Fri Jul 11 10:51:33 2014 From: bban at redhat.com (Bela Ban) Date: Fri, 11 Jul 2014 16:51:33 +0200 Subject: [infinispan-dev] Multicast routing on Max OS X Message-ID: <53BFF9F5.80402@redhat.com> I added some bits of advice for configuration of IP multicast routes on Mac OS X. This is probably only of concern to those who want to bind to the loopback device (127.0.0.1) and multicast locally, e.g. for running the test suite. It is beyond me why a node cannot bind to 127.0.0.1 and use the default route (0.0.0.0) for multicasting, e.g. if no multicast route has been defined). This works perfectly on other operating systems. If you know, please share the solution; then [1] would not be needed... See [1] for details. [1] https://issues.jboss.org/browse/JGRP-1808 -- Bela Ban, JGroups lead (http://www.jgroups.org) From sanne at infinispan.org Fri Jul 11 11:16:28 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Fri, 11 Jul 2014 16:16:28 +0100 Subject: [infinispan-dev] Multicast routing on Max OS X In-Reply-To: <53BFF9F5.80402@redhat.com> References: <53BFF9F5.80402@redhat.com> Message-ID: Thanks Bela! it's indeed very annoying for occasional contributors just checking out the code and firing the testuite. In Hibernate Search we worked around it by simply avoiding need for Multicast in the stacks used for testing, but I suspect Infinispan doesn't have this luxury so it would be nice to find a way to detect and warn for this? I guess we could just abort the build with a meaningful message if you're using OSX, nobody should use it anyway :-P Sanne On 11 July 2014 15:51, Bela Ban wrote: > I added some bits of advice for configuration of IP multicast routes on > Mac OS X. > > This is probably only of concern to those who want to bind to the > loopback device (127.0.0.1) and multicast locally, e.g. for running the > test suite. > > It is beyond me why a node cannot bind to 127.0.0.1 and use the default > route (0.0.0.0) for multicasting, e.g. if no multicast route has been > defined). This works perfectly on other operating systems. If you know, > please share the solution; then [1] would not be needed... > > See [1] for details. > > [1] https://issues.jboss.org/browse/JGRP-1808 > > -- > Bela Ban, JGroups lead (http://www.jgroups.org) > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From mmarkus at redhat.com Fri Jul 11 11:54:54 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Fri, 11 Jul 2014 16:54:54 +0100 Subject: [infinispan-dev] Where's the roadmap? In-Reply-To: References: <7335F27A-7B85-4341-A8A0-35670F8E827C@redhat.com> Message-ID: On May 12, 2014, at 19:04, Sanne Grinovero wrote: > Hi, > I think you mentioned having created the roadmap page but I can't find > it, and people keep asking about it so I'm probably not the only one > not finding it: > > https://community.jboss.org/message/870798 > > Could we make it more visible on the website? it is now: http://infinispan.org/roadmap/ Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From bban at redhat.com Sat Jul 12 05:25:58 2014 From: bban at redhat.com (Bela Ban) Date: Sat, 12 Jul 2014 11:25:58 +0200 Subject: [infinispan-dev] Multicast routing on Max OS X In-Reply-To: References: <53BFF9F5.80402@redhat.com> Message-ID: <53C0FF26.5060202@redhat.com> I think the alternatives for running the Infinispan test suite wrt multicasting are: - Use of 127.0.0.1 and ip_ttl=1. This ensures that multicast traffic is local to the machine on which the test suite is run. This is a good idea anyway, also for other operating systems. I believe this is already done - Use of a real IP address and ip_ttl. This would allow the test suite to work without any routing table changes, as the default route (0.0.0.0) is used, and it usually points to en0 - Check if there's a multicast route defined before running the test suite (via a script?) and issue a warning if there isn't... ? On 11/07/14 17:16, Sanne Grinovero wrote: > Thanks Bela! > it's indeed very annoying for occasional contributors just checking > out the code and firing the testuite. > In Hibernate Search we worked around it by simply avoiding need for > Multicast in the stacks used for testing, but I suspect Infinispan > doesn't have this luxury so it would be nice to find a way to detect > and warn for this? > I guess we could just abort the build with a meaningful message if > you're using OSX, nobody should use it anyway :-P > > Sanne > > On 11 July 2014 15:51, Bela Ban wrote: >> I added some bits of advice for configuration of IP multicast routes on >> Mac OS X. >> >> This is probably only of concern to those who want to bind to the >> loopback device (127.0.0.1) and multicast locally, e.g. for running the >> test suite. >> >> It is beyond me why a node cannot bind to 127.0.0.1 and use the default >> route (0.0.0.0) for multicasting, e.g. if no multicast route has been >> defined). This works perfectly on other operating systems. If you know, >> please share the solution; then [1] would not be needed... >> >> See [1] for details. >> >> [1] https://issues.jboss.org/browse/JGRP-1808 >> >> -- >> Bela Ban, JGroups lead (http://www.jgroups.org) -- Bela Ban, JGroups lead (http://www.jgroups.org) From pierre.sutra at unine.ch Mon Jul 14 06:16:08 2014 From: pierre.sutra at unine.ch (Pierre Sutra) Date: Mon, 14 Jul 2014 12:16:08 +0200 Subject: [infinispan-dev] Clustered Listener In-Reply-To: References: <537219A9.1060301@unine.ch> <53B6BD26.7030806@unine.ch> Message-ID: <53C3ADE8.1060708@unine.ch> Hello, > It doesn't look like what you have currently is safe for rehashes > though since the owners would change nodes. You would need to move > the listener between nodes in this case. Also you removed an edge > case when a listener might not be installed if a CH change occurs > right when sending to nodes (talked about later). Indeed, this modification is not safe in presence of cluster changes. In fact, I missed that the current implementation was ensuring elasticity. > The problem was that there is a overlap when you have a node joining > while you are sending the initial requests that it wouldn't install > the listener. > > Cluster -> Node A, B, C > > 1. User installs listener on Node C > 2. Node C is sending listeners to Nodes A + B > 3. Node D joins in the time in between and asks for the listener (from > coordinator), but it isn't fully installed yet to be retrieved > 4. Node C finishes installing listeners on Nodes A + B > > in this case Node D never would have gotten listener, so Node C also > sees if anyone else has joined. I understood this was necessary for cache.addListener() atomicity, but I though erroneously that elasticity was not implemented (I also needed a quick fix). In my view, and please correct me if I am wrong, the architecture you describe has still an issue because the coordinator can fail. It is in fact necessary to re-execute the installation code until a stable view is obtained (twice the same). Going back to your example, consider a step 5 where the coordinator, say A, fails at the time C is installing the listener on D, and some node E is joining. In case D is the newly elected coordinator, E will never retrieve the listener. What do you think of this scenario ? > > The difference is that Node D only sends 1 message to coordinator to > ask for listeners instead of sending N # of messages to all nodes > (which would be required on every JOIN from any node). This should > scale better in the long run especially since most cases this > shouldn't happen. In fact, D needs retrieving filters from the nodes that where holding the keys it is in charge now to replicate. Indeed this requires at first glance to send a JOIN message to all nodes. If you believe that this key-based (or key range based) listener functionality is a direction of interest, I can try amending my code to ensure elasticity. Pierre From sanne at infinispan.org Mon Jul 14 17:51:29 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Mon, 14 Jul 2014 22:51:29 +0100 Subject: [infinispan-dev] Writing a custom CacheStore: MarshalledEntryFactory Message-ID: Hi all, I was toying with a custom CacheStore experiment, and am having some friction with some of the new SPIs. So interface org.infinispan.marshall.core.MarshalledEntryFactory is an helper to use in the CacheStorei implementation, which exposes three methods: MarshalledEntry newMarshalledEntry(ByteBuffer key, ByteBuffer valueBytes, ByteBuffer metadataBytes); MarshalledEntry newMarshalledEntry(Object key, ByteBuffer valueBytes, ByteBuffer metadataBytes); MarshalledEntry newMarshalledEntry(Object key, Object value, InternalMetadata im); In my CacheStore - and I suspect most efficiency minded implementations - I don't care about the value Object but I express a specific physical layout for the metadata, so to run for example an efficient "purge expired" task. So, the key is given, the value Object needs to be serialized, but the InternalMetadata I can map to specific fields. Problem is at read time: I don't have a marshalled version of the Metadata but I need to unmarshall the value.. there is no helper to cover for this case. Wouldn't this interface be more practical if it had: Object unMarshallKey(ByteBuffer); Object unMarshallValue(ByteBuffer); InternalMetadata unMarshallMetadata(ByteBuffer); MarshalledEntry newMarshalledEntry(Object key, Object value, InternalMetadata im); Also, I'd get rid of generics. They are not helping at all, I can hardly couple my custom CacheStore implementation to the end user's domain model, right? I was also quite surprised that other existing CacheStore implementations don't have this limitation; peeking in the JDBCCacheStore to see how this is supposed to work, it seems that essentially it duplicates the data by serializazing the InternalMetadata in the BLOB but also stored an Expiry column to query via SQL. I was interested to see how the Purge method could be implemented efficiently, and found a "TODO notify listeners" ;-) All other JDBC based stores serialize buckets in groups, REST store doesn't do purging, LevelDB also does duplication for the metadata, Cassandra is outdated and doesn't do events on expiry. From mmarkus at redhat.com Mon Jul 14 09:43:07 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Mon, 14 Jul 2014 14:43:07 +0100 Subject: [infinispan-dev] infinispan test suite, reloaded In-Reply-To: References: <8D4E3043-CED1-4978-85A3-73F9F1E4CB44@redhat.com> <4B6CD478-E8E1-4E7F-BE5B-F9AB04283D87@redhat.com> Message-ID: <389C0123-7B29-453C-8B37-9270CE628E1C@redhat.com> On Jul 10, 2014, at 16:12, Sanne Grinovero wrote: > On 10 July 2014 15:13, Mircea Markus wrote: >> >> On Jul 10, 2014, at 15:03, Sanne Grinovero wrote: >> >>> The important point for me is that patches don't get merged if they >>> introduce any regression. I hope that rule stays? >>> BTW this matches with the "classic" approach as far as I know it. >> >> yes. A patch might pass the test when integrated and cause intermittent failures later on, so it's not straight forward to avoid patches introducing regressions, nor to identify which patch has caused it so that we can roll it back. > > It's self-speaking that it's hard to immediately evaluate if a patch > is going to cause intermittent / time-bound failures in the future: we > have no crystal ball, so I'm not making absurd demands. There will > always be cases in which developers will need to ask forgiveness. > > But if anything in the test run fails during a review of a patch, and > this patch still gets merged because "the cause is likely unrelated", > or worse "we'll fix that problem later" that's unacceptable and needs > to be investigated further before being merged. > I always did that, and spending a lot of time on things which are not > directly related on my goals, for sake of respect of other developers > in the team and I expect no less from everyone else: when the > testsuite doesn't pass I can't make progress on my own work and need > to necessarily shift to firefighthing. Or go on holidays. > That's what necessarily needs to happen when a bad patch slips in past > our guard, even if doing so you need to reschedule other tasks: > because otherwise it's other people, and more and more people needing > to reschedule their own tasks. Worse yet, the other people who will > need to look at it will not have any of the context to understand what > might be going on. So I hope we're on the same page with this, because > ultimately it's about respect for each person contributing or working > on it. I understand your frustration and you are absolutely right. > > I'm also puzzled on why exactly the current situation is not > sustainable. Sure there is a lot of technical debt to pay, but you > know debts come with interests and it's hard.I've also suggested many > ways to improve our testsuite to make our life easier, like use > Byteman, mock the timers via a TimeService, get rid of TestNG to move > to something more reliable, remove unnecessary dependencies from each > module, use more mocking in areas where we don't actually have an > interest in testing JGroups.. IMO the problem is that ATM we lack the discipline, as a team, to maintain the suite green. In the last 7 years I've been working on JBossCache and then ISPN it has always been like this - intermittent failures - and always to big and too ugly of a task to fix it and maintain it green, especially with the schedule we had ahead of us. Now that the team is big, this compromise - intermittent failures - really impacts the productivity of a lot of people. The suite should stay green and we should treat any failure that keep us away from that as an blocker - a thing we didn't really do up to now. > Nothing like that was done, so you can't say in all fairness that we > tried to do better. My PR which is the first step to get rid of TestNG > from the Query modules is open since 43 days.. so don't expect me to > fix any problem quickly, it's not particularly motivating. > > You can try playing with processes, and I don't disagree with idea, > but when I'm able to find a regression is is that I will do: I'll > revert all commits until the first stable point I find. I've > suggested this approach to you as well, not sure why you don't apply > it more regularly? Because the last green commit will trigger intermittent failures, try it. > There is no shame in reverting commits, especially > if we do it regularly. > It's stable today, so I'll make a note of this commit id, my crystal > ball is telling that it will be useful soon ;-) > > Sanne > > >> >>> >>> On 10 July 2014 14:04, Mircea Markus wrote: >>>> I just had a chat with Dan and we don't think the current process for the test suite works. Not hard to see why, the suite is almost never green. So we will adopt a more classic and simple approach: if a test fails a blocker JIRA is created for it and assigned to a component lead then team member who'll start working on it *immediately*. Dan will be watch dog starting today so please expect blocker JIRAs coming your way and treat them accordingly. >>>> >>>> Cheers, >>>> -- >>>> Mircea Markus >>>> Infinispan lead (www.infinispan.org) >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> Cheers, >> -- >> Mircea Markus >> Infinispan lead (www.infinispan.org) >> >> >> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From galder at redhat.com Mon Jul 14 05:06:47 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Mon, 14 Jul 2014 11:06:47 +0200 Subject: [infinispan-dev] A question and an observation In-Reply-To: References: <53BA56E0.1080607@redhat.com> <53BA6D8D.7010604@infinispan.org> <53BA70A1.5040306@redhat.com> Message-ID: <967191B4-5D72-4261-8D01-6ED931461D91@redhat.com> On 07 Jul 2014, at 12:52, Sanne Grinovero wrote: > On 7 July 2014 11:04, Bela Ban wrote: >> How ? I already have an infinispan.xml, create a CacheManager off of it >> and now only want to change the transport. > > I have the same need; in Palma I asked for a CacheManager constructor > which would take > (String infinispanConfiguration, Transport customTransportInstance). > > Could we have that please please? > I never opened a new specific JIRA as there is the more generally > useful ISPN-1414 already. No idea where we?re at with ISPN-1414, AFAIK it has not been looked at for quite some time. I don?t remember the discussion in Palma on this (anyone has minutes of this discussion?), but I suppose the current set up is not enough for your case... Was your request related to app server integration? Cheers, > >> >> I need to get another PhD to understand programmatic configuration in >> Infinispan >> >> On 07/07/14 11:51, Pedro Ruivo wrote: >>> >>> >>> On 07/07/2014 09:14 AM, Bela Ban wrote: >>>> This is gone in 7. Do I now have to use programmatic configuration ? If >>>> so, how would I do this ? >>> >>> AFAIK, yes it was removed from configuration file and can only be set by >>> programmatic configuration. >>> >>> Pedro >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >> >> -- >> Bela Ban, JGroups lead (http://www.jgroups.org) >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz From galder at redhat.com Mon Jul 14 05:00:05 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Mon, 14 Jul 2014 11:00:05 +0200 Subject: [infinispan-dev] A question and an observation In-Reply-To: <53BA56E0.1080607@redhat.com> References: <53BA56E0.1080607@redhat.com> Message-ID: On 07 Jul 2014, at 10:14, Bela Ban wrote: > > 1: Observation: > ------------- > In my Infinispan perf test (IspnPerfTest), I used > cache.getAdvancedCache().withFlags(...).put(key,value) in a tight loop. > > I've always thought that withFlags() was a fast operation, *but this is > not the case* !! > > Once I changed this and predefined the 2 caches (sync and async) at the > start, outside the loop, things got 10x faster ! So please change this > if you made the same mistake ! > > 2. Question: > ----------- > In Infinispan 6, I defined my custom transport as follows: > > > This is gone in 7. Do I now have to use programmatic configuration ? If > so, how would I do this ? ^ An oversight, it?ll be fixed before Final: https://issues.jboss.org/browse/ISPN-4510 > > -- > Bela Ban, JGroups lead (http://www.jgroups.org) > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz From mudokonman at gmail.com Tue Jul 15 08:01:35 2014 From: mudokonman at gmail.com (William Burns) Date: Tue, 15 Jul 2014 08:01:35 -0400 Subject: [infinispan-dev] Clustered Listener In-Reply-To: <53C3ADE8.1060708@unine.ch> References: <537219A9.1060301@unine.ch> <53B6BD26.7030806@unine.ch> <53C3ADE8.1060708@unine.ch> Message-ID: On Mon, Jul 14, 2014 at 6:16 AM, Pierre Sutra wrote: > Hello, >> It doesn't look like what you have currently is safe for rehashes >> though since the owners would change nodes. You would need to move >> the listener between nodes in this case. Also you removed an edge >> case when a listener might not be installed if a CH change occurs >> right when sending to nodes (talked about later). > Indeed, this modification is not safe in presence of cluster changes. In > fact, I missed that the current implementation was ensuring elasticity. > >> The problem was that there is a overlap when you have a node joining >> while you are sending the initial requests that it wouldn't install >> the listener. >> >> Cluster -> Node A, B, C >> >> 1. User installs listener on Node C >> 2. Node C is sending listeners to Nodes A + B >> 3. Node D joins in the time in between and asks for the listener (from >> coordinator), but it isn't fully installed yet to be retrieved >> 4. Node C finishes installing listeners on Nodes A + B >> >> in this case Node D never would have gotten listener, so Node C also >> sees if anyone else has joined. > I understood this was necessary for cache.addListener() atomicity, but I > though erroneously that elasticity was not implemented (I also needed a > quick fix). In my view, and please correct me if I am wrong, the > architecture you describe has still an issue because the coordinator can > fail. It is in fact necessary to re-execute the installation code until > a stable view is obtained (twice the same). Going back to your example, > consider a step 5 where the coordinator, say A, fails at the time C is > installing the listener on D, and some node E is joining. In case D is > the newly elected coordinator, E will never retrieve the listener. What > do you think of this scenario ? Sorry I didn't say the entire implementation, I was just talking about the successful path. It asks each member in it's view one at a time until it passes, it just happens that it always asks the coordinator first. > >> >> The difference is that Node D only sends 1 message to coordinator to >> ask for listeners instead of sending N # of messages to all nodes >> (which would be required on every JOIN from any node). This should >> scale better in the long run especially since most cases this >> shouldn't happen. > In fact, D needs retrieving filters from the nodes that where holding > the keys it is in charge now to replicate. Indeed this requires at first > glance to send a JOIN message to all nodes. If you believe that this > key-based (or key range based) listener functionality is a direction of > interest, I can try amending my code to ensure elasticity. I think it would be probably useful to have some sort of specific key implementation as it would be better performant as you have found out. The changes when a rebalance occur could be a bit difficult to implement though to get all the edge cases properly, but if you want to take a try at it, I think it would be pretty cool. Another way (if the elasticity of the other doesn't work) was that we could have a shared listener on every node that when you register a specific key listener it would register that this node wants to know of notifications for this key with all nodes. This way the filter would only do a constant time lookup in a hash map instead of having to iterate over all listeners on a modification. Then the rebalance is simple as well in that only joiners have to ask for the shared listener info. The only problematic part is making sure when a node unregisters itself, but shouldn't be bad. > > Pierre > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From dan.berindei at gmail.com Tue Jul 15 09:15:15 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Tue, 15 Jul 2014 16:15:15 +0300 Subject: [infinispan-dev] Writing a custom CacheStore: MarshalledEntryFactory In-Reply-To: References: Message-ID: On Tue, Jul 15, 2014 at 12:51 AM, Sanne Grinovero wrote: > Hi all, > I was toying with a custom CacheStore experiment, and am having some > friction with some of the new SPIs. > > So interface org.infinispan.marshall.core.MarshalledEntryFactory > is an helper to use in the CacheStorei implementation, which exposes > three methods: > > MarshalledEntry newMarshalledEntry(ByteBuffer key, ByteBuffer > valueBytes, ByteBuffer metadataBytes); > MarshalledEntry newMarshalledEntry(Object key, ByteBuffer > valueBytes, ByteBuffer metadataBytes); > MarshalledEntry newMarshalledEntry(Object key, Object value, > InternalMetadata im); > > In my CacheStore - and I suspect most efficiency minded > implementations - I don't care about the value Object but I express a > specific physical layout for the metadata, so to run for example an > efficient "purge expired" task. > So, the key is given, the value Object needs to be serialized, but the > InternalMetadata I can map to specific fields. > > Problem is at read time: I don't have a marshalled version of the > Metadata but I need to unmarshall the value.. there is no helper to > cover for this case. > > Wouldn't this interface be more practical if it had: > > Object unMarshallKey(ByteBuffer); > Object unMarshallValue(ByteBuffer); > InternalMetadata unMarshallMetadata(ByteBuffer); > MarshalledEntry newMarshalledEntry(Object key, Object value, > InternalMetadata im); > I guess the idea was that MarshalledEntry unmarshalls the key, value, and metadata lazily. Even a purge listener may only be interested in the key, and in that case case we can avoid unmarshalling the value. I think you can do what you want by stuffing the bytes for everything in the MarshalledEntry, and unmarshalling the data via MarshalledEntry.getMetadata(). > > Also, I'd get rid of generics. They are not helping at all, I can > hardly couple my custom CacheStore implementation to the end user's > domain model, right? > I can see the user writing a custom JDBC store that stores the object's properties into separate columns and thus supporting a single type. But it's a pretty specialized case, and the user can very well do the casts himself. Maybe Paul and Will have more stuff to add here, they been discussing about generics in the cache store SPIs around https://github.com/infinispan/infinispan/pull/2705 > I was also quite surprised that other existing CacheStore > implementations don't have this limitation; peeking in the > JDBCCacheStore to see how this is supposed to work, it seems that > essentially it duplicates the data by serializazing the > InternalMetadata in the BLOB but also stored an Expiry column to query > via SQL. I was interested to see how the Purge method could be > implemented efficiently, and found a "TODO notify listeners" ;-) > I believe the reason why we don't support purge listeners in the JDBC store is that we don't want to fetch the entries from the database at all. We can't ask CacheNotifier whether there are any listeners registered ATM, we need that to avoid the overhead when there are no listeners. > All other JDBC based stores serialize buckets in groups, REST store > doesn't do purging, LevelDB also does duplication for the metadata, > Cassandra is outdated and doesn't do events on expiry. > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140715/8b6527d8/attachment.html From sanne at infinispan.org Tue Jul 15 10:49:23 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Tue, 15 Jul 2014 15:49:23 +0100 Subject: [infinispan-dev] Writing a custom CacheStore: MarshalledEntryFactory In-Reply-To: References: Message-ID: On 15 July 2014 14:15, Dan Berindei wrote: > > > > On Tue, Jul 15, 2014 at 12:51 AM, Sanne Grinovero > wrote: >> >> Hi all, >> I was toying with a custom CacheStore experiment, and am having some >> friction with some of the new SPIs. >> >> So interface org.infinispan.marshall.core.MarshalledEntryFactory >> is an helper to use in the CacheStorei implementation, which exposes >> three methods: >> >> MarshalledEntry newMarshalledEntry(ByteBuffer key, ByteBuffer >> valueBytes, ByteBuffer metadataBytes); >> MarshalledEntry newMarshalledEntry(Object key, ByteBuffer >> valueBytes, ByteBuffer metadataBytes); >> MarshalledEntry newMarshalledEntry(Object key, Object value, >> InternalMetadata im); >> >> In my CacheStore - and I suspect most efficiency minded >> implementations - I don't care about the value Object but I express a >> specific physical layout for the metadata, so to run for example an >> efficient "purge expired" task. >> So, the key is given, the value Object needs to be serialized, but the >> InternalMetadata I can map to specific fields. >> >> Problem is at read time: I don't have a marshalled version of the >> Metadata but I need to unmarshall the value.. there is no helper to >> cover for this case. >> >> Wouldn't this interface be more practical if it had: >> >> Object unMarshallKey(ByteBuffer); >> Object unMarshallValue(ByteBuffer); >> InternalMetadata unMarshallMetadata(ByteBuffer); >> MarshalledEntry newMarshalledEntry(Object key, Object value, >> InternalMetadata im); > > > I guess the idea was that MarshalledEntry unmarshalls the key, value, and > metadata lazily. Even a purge listener may only be interested in the key, > and in that case case we can avoid unmarshalling the value. > > I think you can do what you want by stuffing the bytes for everything in the > MarshalledEntry, and unmarshalling the data via > MarshalledEntry.getMetadata(). That's what I did but it's far from optimal, so I'm proposing the improvement. >> Also, I'd get rid of generics. They are not helping at all, I can >> hardly couple my custom CacheStore implementation to the end user's >> domain model, right? > > > I can see the user writing a custom JDBC store that stores the object's > properties into separate columns and thus supporting a single type. But it's > a pretty specialized case, and the user can very well do the casts himself. Exactly > > Maybe Paul and Will have more stuff to add here, they been discussing about > generics in the cache store SPIs around > https://github.com/infinispan/infinispan/pull/2705 > >> >> I was also quite surprised that other existing CacheStore >> implementations don't have this limitation; peeking in the >> JDBCCacheStore to see how this is supposed to work, it seems that >> essentially it duplicates the data by serializazing the >> InternalMetadata in the BLOB but also stored an Expiry column to query >> via SQL. I was interested to see how the Purge method could be >> implemented efficiently, and found a "TODO notify listeners" ;-) > > > I believe the reason why we don't support purge listeners in the JDBC store > is that we don't want to fetch the entries from the database at all. We > can't ask CacheNotifier whether there are any listeners registered ATM, we > need that to avoid the overhead when there are no listeners. I understand the reason, still it's wrong isn't it ;-) Having to load each entry for the "maybe there's a listener" case is definitely silly, we should inform the CacheStore instance if notifications are needed or not, and if they need just the key or the whole entry, no need for metadata. And for CacheStore instances which don't respect this we should at least document the limitation, or open a JIRA to get it done. Would you agree if I opened issues for each weirdness I'm finding in the existing CacheStore implementations? I've hit some more in the meantime. Sanne > >> >> All other JDBC based stores serialize buckets in groups, REST store >> doesn't do purging, LevelDB also does duplication for the metadata, >> Cassandra is outdated and doesn't do events on expiry. >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From dan.berindei at gmail.com Tue Jul 15 16:35:55 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Tue, 15 Jul 2014 23:35:55 +0300 Subject: [infinispan-dev] Writing a custom CacheStore: MarshalledEntryFactory In-Reply-To: References:

Message-ID: On Tue, Jul 15, 2014 at 5:49 PM, Sanne Grinovero wrote: > On 15 July 2014 14:15, Dan Berindei wrote: > > > > > > > > On Tue, Jul 15, 2014 at 12:51 AM, Sanne Grinovero > > wrote: > >> > >> Hi all, > >> I was toying with a custom CacheStore experiment, and am having some > >> friction with some of the new SPIs. > >> > >> So interface org.infinispan.marshall.core.MarshalledEntryFactory > >> is an helper to use in the CacheStorei implementation, which exposes > >> three methods: > >> > >> MarshalledEntry newMarshalledEntry(ByteBuffer key, ByteBuffer > >> valueBytes, ByteBuffer metadataBytes); > >> MarshalledEntry newMarshalledEntry(Object key, ByteBuffer > >> valueBytes, ByteBuffer metadataBytes); > >> MarshalledEntry newMarshalledEntry(Object key, Object value, > >> InternalMetadata im); > >> > >> In my CacheStore - and I suspect most efficiency minded > >> implementations - I don't care about the value Object but I express a > >> specific physical layout for the metadata, so to run for example an > >> efficient "purge expired" task. > >> So, the key is given, the value Object needs to be serialized, but the > >> InternalMetadata I can map to specific fields. > >> > >> Problem is at read time: I don't have a marshalled version of the > >> Metadata but I need to unmarshall the value.. there is no helper to > >> cover for this case. > >> > >> Wouldn't this interface be more practical if it had: > >> > >> Object unMarshallKey(ByteBuffer); > >> Object unMarshallValue(ByteBuffer); > >> InternalMetadata unMarshallMetadata(ByteBuffer); > >> MarshalledEntry newMarshalledEntry(Object key, Object value, > >> InternalMetadata im); > > > > > > I guess the idea was that MarshalledEntry unmarshalls the key, value, and > > metadata lazily. Even a purge listener may only be interested in the key, > > and in that case case we can avoid unmarshalling the value. > > > > I think you can do what you want by stuffing the bytes for everything in > the > > MarshalledEntry, and unmarshalling the data via > > MarshalledEntry.getMetadata(). > > > That's what I did but it's far from optimal, so I'm proposing the > improvement. > Could you expand a bit on why this is suboptimal? The way I see it, you have to support any custom Metadata implementation anyway, so you have to either read and deserialize the entire metadata, or store a duplicate of the expiration timestamp somewhere else. In LevelDB we store the timestamps in a separate DB, doing 2 LevelDB writes for every store write (with expiration), and I would be very interested in an alternative solution that only wrote to 1 DB. > >> Also, I'd get rid of generics. They are not helping at all, I can > >> hardly couple my custom CacheStore implementation to the end user's > >> domain model, right? > > > > > > I can see the user writing a custom JDBC store that stores the object's > > properties into separate columns and thus supporting a single type. But > it's > > a pretty specialized case, and the user can very well do the casts > himself. > > Exactly > > > > > Maybe Paul and Will have more stuff to add here, they been discussing > about > > generics in the cache store SPIs around > > https://github.com/infinispan/infinispan/pull/2705 > > > >> > >> I was also quite surprised that other existing CacheStore > >> implementations don't have this limitation; peeking in the > >> JDBCCacheStore to see how this is supposed to work, it seems that > >> essentially it duplicates the data by serializazing the > >> InternalMetadata in the BLOB but also stored an Expiry column to query > >> via SQL. I was interested to see how the Purge method could be > >> implemented efficiently, and found a "TODO notify listeners" ;-) > > > > > > I believe the reason why we don't support purge listeners in the JDBC > store > > is that we don't want to fetch the entries from the database at all. We > > can't ask CacheNotifier whether there are any listeners registered ATM, > we > > need that to avoid the overhead when there are no listeners. > > I understand the reason, still it's wrong isn't it ;-) > Having to load each entry for the "maybe there's a listener" case is > definitely silly, we should inform the CacheStore instance if > notifications are needed or not, and if they need just the key or the > whole entry, no need for metadata. > And for CacheStore instances which don't respect this we should at > least document the limitation, or open a JIRA to get it done. Would > you agree if I opened issues for each weirdness I'm finding in the > existing CacheStore implementations? I've hit some more in the > meantime. > > Sure. I'm not too enthusiastic about changing the persistence SPI, but we should at least have a discussion on the pros and cons of the choices we are making there. > > > > >> > >> All other JDBC based stores serialize buckets in groups, REST store > >> doesn't do purging, LevelDB also does duplication for the metadata, > >> Cassandra is outdated and doesn't do events on expiry. > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140715/dbda9166/attachment.html From galder at redhat.com Wed Jul 16 09:25:01 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Wed, 16 Jul 2014 15:25:01 +0200 Subject: [infinispan-dev] New configuration In-Reply-To: <025944F9-1433-4A4C-B73A-591C4B027375@redhat.com> References: <533D2C05.9020609@redhat.com> <1397065123.5324.2@smtp.gmail.com> <7F64ED41-638F-43EE-A37A-E62B655A6B16@redhat.com> <534D423E.9050001@redhat.com> <1397573555.6281.5@smtp.gmail.com> <025944F9-1433-4A4C-B73A-591C4B027375@redhat.com> Message-ID: <04151057-3664-42BC-AC75-6DE8D169E5D1@redhat.com> On 28 Apr 2014, at 16:42, Galder Zamarre?o wrote: > > On 15 Apr 2014, at 16:52, Dan Berindei wrote: > >> >> >> On Tue, Apr 15, 2014 at 5:29 PM, Radim Vansa wrote: >>> >>> On 04/15/2014 02:31 PM, Galder Zamarre?o wrote: >>> >>> On 09 Apr 2014, at 19:38, Dan Berindei wrote: >>> >>> >>> >>> On Wed, Apr 9, 2014 at 5:37 PM, Galder Zamarre?o wrote: >>> >>> On 03 Apr 2014, at 11:38, Radim Vansa < >>> rvansa at redhat.com >>> >>> wrote: >>> >>> >>> Hi, >>> >>> looking on the new configuration parser, I've noticed that you cannot >>> configure ConsistentHashFactory anymore - is this by purpose? >>> >>> >>> ^ Rather than being something the users should be tweaking, it?s something that?s used internally. So, I applied a bit of if-in-doubt-leave-it-out logic. I don?t think we lose any major functionality with this. >>> >>> >>> For now it's the only way for the user to use the SyncConsistentHashFactory, so it's not used just internally. >>> >>> What?s the use case for that? The javadoc is not very clear on the benefits of using it. >>> >>> >>> >>> One use case I've noticed is having two caches with same keys, and >>> modification listener handler retrieving data from the other cache. In >>> order to execute the listener soon, you don't want to execute remote >>> gets, and therefore, it's useful to have the hashes synchronized. >>> >> >> Erik is using it with distributed tasks. Normally, keys with the same group in multiple caches doesn't guarantee you that the keys are all located on the same nodes, which means we can't guarantee that a distributed task that accesses multiple caches has all the keys it needs locally just with grouping. SyncConsistentHashFactory fixes that. > > Thanks Radim and Dan. Based on a further chat I had with Dan, I?ve sent a PR to update the SyncCHF javadoc to explain why that class exists in the first place:https://github.com/infinispan/infinispan/pull/2528 > > @Dan, have a look and see if you?re happy. > > Btw, I?ve just created https://issues.jboss.org/browse/ISPN-4245 to address this configuration issue. @Radim FYI, PR for ^ https://github.com/infinispan/infinispan/pull/2725 > > Cheers, > >> >> >>> >>> Radim >>> >>> >>> >>> >>> >>> Another my concern is the fact that you enable stuff by parsing the >>> element - for example L1. I expect that omitting the element and setting >>> it with the default value (as presented in XSD) makes no difference, but >>> this is not how current configuration works. >>> >>> >>> L1 is disabled by default. You enable it by configuring the L1 lifespan to be bigger than 0. The attribute definition follows the pattern that Paul did for the server side. >>> >>> >>> My opinion comes probably too late as the PR was already reviewed, >>> discussed and integrated, but at least, please clearly describe the >>> behaviour in the XSD. The fact that l1-lifespan "Defaults to 10 >>> minutes." is not correct - it defaults to L1 being disabled. >>> >>> >>> Yeah, I?ll update the XSD and documentation accordingly: >>> >>> >>> https://issues.jboss.org/browse/ISPN-4195 >>> >>> >>> >>> Cheers >>> >>> >>> >>> Thanks >>> >>> Radim >>> >>> -- >>> Radim Vansa >>> JBoss DataGrid QA >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> >>> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> >>> >>> -- >>> Galder Zamarre?o >>> galder at redhat.com >>> twitter.com/galderz >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> >>> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> -- >>> Galder Zamarre?o >>> galder at redhat.com >>> twitter.com/galderz >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> >>> >>> -- >>> >>> Radim Vansa >>> JBoss DataGrid QA >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz -- Galder Zamarre?o galder at redhat.com twitter.com/galderz From pierre.sutra at unine.ch Thu Jul 17 04:56:43 2014 From: pierre.sutra at unine.ch (Pierre Sutra) Date: Thu, 17 Jul 2014 10:56:43 +0200 Subject: [infinispan-dev] PutAll command Message-ID: <53C78FCB.5040503@unine.ch> Hello, I would like to know if it is possible to execute a putAll(Map M) command in embedded node via IP multicast. More precisely, I wonder if instead of sending the map M to each node iteratively, there is a way to send it to all nodes with IP multicast, each node projecting M on the data it replicates. I thank you in advance for your help. Cheers, Pierre From rvansa at redhat.com Thu Jul 17 05:28:13 2014 From: rvansa at redhat.com (Radim Vansa) Date: Thu, 17 Jul 2014 11:28:13 +0200 Subject: [infinispan-dev] PutAll command In-Reply-To: <53C78FCB.5040503@unine.ch> References: <53C78FCB.5040503@unine.ch> Message-ID: <53C7972D.3000501@redhat.com> In fact, putAll implementation (the PutMapCommand) is quite ineffective, see reasons in [1], IMO it needs a major rework. Radim [1] http://lists.jboss.org/pipermail/infinispan-dev/2013-June/013080.html On 07/17/2014 10:56 AM, Pierre Sutra wrote: > Hello, > > I would like to know if it is possible to execute a putAll(Map M) > command in embedded node via IP multicast. More precisely, I wonder if > instead of sending the map M to each node iteratively, there is a way to > send it to all nodes with IP multicast, each node projecting M on the > data it replicates. > > I thank you in advance for your help. > > Cheers, > Pierre > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Radim Vansa JBoss DataGrid QA From vblagoje at redhat.com Fri Jul 18 00:44:49 2014 From: vblagoje at redhat.com (Vladimir Blagojevic) Date: Fri, 18 Jul 2014 00:44:49 -0400 Subject: [infinispan-dev] Infinispan 7.0.0.Alpha5 has been released Message-ID: <53C8A641.3040508@redhat.com> Dear all, I am proud to announce that Infinispan 7.0.0.Alpha5 is out. There are numerous improvements and fixes included in this release. It is best to refer to release notes [1] for details. Regards, Vladimir [1] https://issues.jboss.org/secure/ReleaseNote.jspa?projectId=12310799&version=12324947 From ttarrant at redhat.com Fri Jul 18 04:19:47 2014 From: ttarrant at redhat.com (Tristan Tarrant) Date: Fri, 18 Jul 2014 10:19:47 +0200 Subject: [infinispan-dev] Infinispan 7.0.0.Alpha5 has been released In-Reply-To: <53C8A641.3040508@redhat.com> References: <53C8A641.3040508@redhat.com> Message-ID: <53C8D8A3.7050606@redhat.com> Thanks Vladimir. A little note: instead of relying on Jira's illegible release notes, we should do collect a list of notable features / bug fixes before the release in order to get something a bit more attention-grabbing in the announcements Tristan On 18/07/14 06:44, Vladimir Blagojevic wrote: > Dear all, > > I am proud to announce that Infinispan 7.0.0.Alpha5 is out. There are > numerous improvements and fixes included in this release. It is best to > refer to release notes [1] for details. > > Regards, > Vladimir > > [1] > https://issues.jboss.org/secure/ReleaseNote.jspa?projectId=12310799&version=12324947 > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > From sanne at infinispan.org Fri Jul 18 04:28:37 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Fri, 18 Jul 2014 09:28:37 +0100 Subject: [infinispan-dev] Infinispan 7.0.0.Alpha5 has been released In-Reply-To: <53C8D8A3.7050606@redhat.com> References: <53C8A641.3040508@redhat.com> <53C8D8A3.7050606@redhat.com> Message-ID: +1 There is still time to fix that by writing a blog post ;-) On 18 July 2014 09:19, Tristan Tarrant wrote: > Thanks Vladimir. > > A little note: instead of relying on Jira's illegible release notes, we > should do collect a list of notable features / bug fixes before the > release in order to get something a bit more attention-grabbing in the > announcements > > Tristan > > On 18/07/14 06:44, Vladimir Blagojevic wrote: >> Dear all, >> >> I am proud to announce that Infinispan 7.0.0.Alpha5 is out. There are >> numerous improvements and fixes included in this release. It is best to >> refer to release notes [1] for details. >> >> Regards, >> Vladimir >> >> [1] >> https://issues.jboss.org/secure/ReleaseNote.jspa?projectId=12310799&version=12324947 >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From ttarrant at redhat.com Fri Jul 18 05:51:29 2014 From: ttarrant at redhat.com (Tristan Tarrant) Date: Fri, 18 Jul 2014 11:51:29 +0200 Subject: [infinispan-dev] A question and an observation In-Reply-To: References: <53BA56E0.1080607@redhat.com> <53BA6D8D.7010604@infinispan.org> <53BA70A1.5040306@redhat.com> Message-ID: <53C8EE21.4060207@redhat.com> No more constructors. Tristan On 07/07/14 12:52, Sanne Grinovero wrote: > On 7 July 2014 11:04, Bela Ban wrote: >> How ? I already have an infinispan.xml, create a CacheManager off of it >> and now only want to change the transport. > I have the same need; in Palma I asked for a CacheManager constructor > which would take > (String infinispanConfiguration, Transport customTransportInstance). > > Could we have that please please? > I never opened a new specific JIRA as there is the more generally > useful ISPN-1414 already. > >> I need to get another PhD to understand programmatic configuration in >> Infinispan >> >> On 07/07/14 11:51, Pedro Ruivo wrote: >>> >>> On 07/07/2014 09:14 AM, Bela Ban wrote: >>>> This is gone in 7. Do I now have to use programmatic configuration ? If >>>> so, how would I do this ? >>> AFAIK, yes it was removed from configuration file and can only be set by >>> programmatic configuration. >>> >>> Pedro >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >> -- >> Bela Ban, JGroups lead (http://www.jgroups.org) >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > From ttarrant at redhat.com Fri Jul 18 15:39:57 2014 From: ttarrant at redhat.com (Tristan Tarrant) Date: Fri, 18 Jul 2014 21:39:57 +0200 Subject: [infinispan-dev] Propagate the schema to the cachestore In-Reply-To: References: Message-ID: <53C9780D.8070800@redhat.com> On 09/07/14 17:51, Emmanuel Bernard wrote: > A remark by Divya made me think of something. > With Infinispan moving to the direction of ProtoBuf and schemas, cache store would greatly benefit from receiving in one shape or another that schema to transform a blob into something more structure depending on the underlying capability of the datastore. > Has anyone explored that angle? Yes, this is something I suggested at some point when we met in Farnborough: the metadata provider (JPA, ProtoBuf, etc) should be wired to the persistence SPI so that clever things can be done at that level. This can also be applied to the compatibility layer, so that I can turn a ProtoBuf representation into JSON or XML for example Tristan From vblagoje at redhat.com Fri Jul 18 16:41:07 2014 From: vblagoje at redhat.com (Vladimir Blagojevic) Date: Fri, 18 Jul 2014 16:41:07 -0400 Subject: [infinispan-dev] Infinispan 7.0.0.Alpha5 has been released In-Reply-To: References: <53C8A641.3040508@redhat.com> <53C8D8A3.7050606@redhat.com> Message-ID: <53C98663.6030008@redhat.com> Should be fixed by now! Have a look again. Vladimir On 2014-07-18, 4:28 AM, Sanne Grinovero wrote: > +1 > There is still time to fix that by writing a blog post ;-) > > On 18 July 2014 09:19, Tristan Tarrant wrote: >> Thanks Vladimir. >> >> A little note: instead of relying on Jira's illegible release notes, we >> should do collect a list of notable features / bug fixes before the >> release in order to get something a bit more attention-grabbing in the >> announcements >> >> From sanne at infinispan.org Fri Jul 18 16:50:11 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Fri, 18 Jul 2014 21:50:11 +0100 Subject: [infinispan-dev] Infinispan 7.0.0.Alpha5 has been released In-Reply-To: <53C98663.6030008@redhat.com> References: <53C8A641.3040508@redhat.com> <53C8D8A3.7050606@redhat.com> <53C98663.6030008@redhat.com> Message-ID: thanks! Looks great On 18 July 2014 21:41, Vladimir Blagojevic wrote: > Should be fixed by now! Have a look again. > Vladimir > On 2014-07-18, 4:28 AM, Sanne Grinovero wrote: >> +1 >> There is still time to fix that by writing a blog post ;-) >> >> On 18 July 2014 09:19, Tristan Tarrant wrote: >>> Thanks Vladimir. >>> >>> A little note: instead of relying on Jira's illegible release notes, we >>> should do collect a list of notable features / bug fixes before the >>> release in order to get something a bit more attention-grabbing in the >>> announcements >>> >>> > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From bban at redhat.com Mon Jul 21 03:25:17 2014 From: bban at redhat.com (Bela Ban) Date: Mon, 21 Jul 2014 09:25:17 +0200 Subject: [infinispan-dev] A question and an observation In-Reply-To: References: <53BA56E0.1080607@redhat.com> Message-ID: <53CCC05D.3010000@redhat.com> Thanks Galder, appreciated ! On 14/07/14 11:00, Galder Zamarre?o wrote: > > On 07 Jul 2014, at 10:14, Bela Ban wrote: > >> >> 1: Observation: >> ------------- >> In my Infinispan perf test (IspnPerfTest), I used >> cache.getAdvancedCache().withFlags(...).put(key,value) in a tight loop. >> >> I've always thought that withFlags() was a fast operation, *but this is >> not the case* !! >> >> Once I changed this and predefined the 2 caches (sync and async) at the >> start, outside the loop, things got 10x faster ! So please change this >> if you made the same mistake ! >> >> 2. Question: >> ----------- >> In Infinispan 6, I defined my custom transport as follows: >> >> >> This is gone in 7. Do I now have to use programmatic configuration ? If >> so, how would I do this ? > > ^ An oversight, it?ll be fixed before Final: https://issues.jboss.org/browse/ISPN-4510 > >> >> -- >> Bela Ban, JGroups lead (http://www.jgroups.org) >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -- Bela Ban, JGroups lead (http://www.jgroups.org) From sanne at infinispan.org Mon Jul 21 11:03:00 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Mon, 21 Jul 2014 16:03:00 +0100 Subject: [infinispan-dev] JPA & OSGi.. still a long way to go. Message-ID: Hi all, I just noticed that "ISPN-4276 - Make JPA cache store work in Karaf" was resolved. I trust that a single instance might work now, but we need to take into considerations some limitations of running Hibernate in OSGi, in particular the caveats documented here: http://docs.jboss.org/hibernate/orm/4.2/devguide/en-US/html/ch17.html#d5e5021 The classloader being overriden by an OSGi deployment, it overrides a static which affects all instances of Hibernate running in a specified classloader. This is meant to be fixed in Hibernate ORM 5.0. Considering that a JPACacheStore can handle a single entity, and consequentially we need to suggest users to use a separate Cache for each type - but you can't give this suggestion when running in Karaf so "work in Karaf" is still needing some love unless we intend to resolve only the single Cache - single type use case. This is in addition of the functional limitations (like no support for relations) that we already discussed in a different context: in OSGi, you can't run more than one JPACacheStore currently. Where should these limitations be tracked? -- Sanne From galder at redhat.com Thu Jul 24 11:52:48 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Thu, 24 Jul 2014 17:52:48 +0200 Subject: [infinispan-dev] JPA & OSGi.. still a long way to go. In-Reply-To: References: Message-ID: On 21 Jul 2014, at 17:03, Sanne Grinovero wrote: > Hi all, > I just noticed that "ISPN-4276 - Make JPA cache store work in Karaf" > was resolved. > > I trust that a single instance might work now, but we need to take > into considerations some limitations of running Hibernate in OSGi, in > particular the caveats documented here: > > http://docs.jboss.org/hibernate/orm/4.2/devguide/en-US/html/ch17.html#d5e5021 > > The classloader being overriden by an OSGi deployment, it overrides a > static which affects all instances of Hibernate running in a specified > classloader. This is meant to be fixed in Hibernate ORM 5.0. > > Considering that a JPACacheStore can handle a single entity, and > consequentially we need to suggest users to use a separate Cache for > each type - but you can't give this suggestion when running in Karaf > so "work in Karaf" is still needing some love unless we intend to > resolve only the single Cache - single type use case. > > This is in addition of the functional limitations (like no support for > relations) that we already discussed in a different context: in OSGi, > you can't run more than one JPACacheStore currently. > > Where should these limitations be tracked? In the JIRA itself so that this is noted in documentation? Ion?s assigned to it. > > -- Sanne > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz From afield at redhat.com Fri Jul 25 09:54:21 2014 From: afield at redhat.com (Alan Field) Date: Fri, 25 Jul 2014 09:54:21 -0400 (EDT) Subject: [infinispan-dev] Cache.size() on distributed caches? In-Reply-To: <1875389269.13059877.1406294774001.JavaMail.zimbra@redhat.com> Message-ID: <1532946813.13089452.1406296461608.JavaMail.zimbra@redhat.com> Hey, I have been looking at adding the ability to get the total size of a cache in RadarGun. The first implementation I coded used the distributed iterators in Infinispan 7.[1] I then realized that implementing getTotalSize() method using a distributed executor would allow the code in versions back to Infinispan 5.2. I have the code written, and I have been running some Jenkins jobs with Infinispan 6.0.1 Final to verify that the results are correct.[2] I use the RandomData stage to put data in the cache. Here is what it writes in the log: 04:11:59,573 INFO [org.radargun.stages.cache.RandomDataStage] (main) Received responses from all 4 slaves. Durations [0 = 17.04 minutes, 1 = 18.36 minutes, 2 = 18.44 minutes, 3 = 18.58 minutes] 04:11:59,574 INFO [org.radargun.stages.cache.RandomDataStage] (main) -------------------- 04:11:59,574 INFO [org.radargun.stages.cache.RandomDataStage] (main) Filled cache with String objects totaling 25% of the Java heap 04:11:59,574 INFO [org.radargun.stages.cache.RandomDataStage] (main) Slave 0 wrote 479352 values to the cache with a total size of 958,704 kb; targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44900952 04:11:59,575 INFO [org.radargun.stages.cache.RandomDataStage] (main) Slave 1 wrote 479352 values to the cache with a total size of 958,704 kb; targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44914319 04:11:59,575 INFO [org.radargun.stages.cache.RandomDataStage] (main) Slave 2 wrote 479352 values to the cache with a total size of 958,704 kb; targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44906729 04:11:59,575 INFO [org.radargun.stages.cache.RandomDataStage] (main) Slave 3 wrote 479352 values to the cache with a total size of 958,704 kb; targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44908687 04:11:59,576 INFO [org.radargun.stages.cache.RandomDataStage] (main) The cache contains 1917408 values with a total size of 3,834,816 kb 04:11:59,576 INFO [org.radargun.stages.cache.RandomDataStage] (main) 100 words were generated with a maximum length of 20 characters 04:11:59,576 INFO [org.radargun.stages.cache.RandomDataStage] (main) -------------------- These are the outputs from my getTotalSize() code: 04:11:59,591 INFO [org.radargun.service.Infinispan53CacheInfo] (main) org.radargun.service.Infinispan53CacheInfo$Cache.getTotalSize() for cache testCache 04:12:12,094 INFO [org.radargun.service.Infinispan53CacheInfo] (main) cache.size() = 1917408 04:12:26,283 INFO [org.radargun.service.Infinispan53CacheInfo] (main) cache.getAdvancedCache().size() = 1917408 04:12:26,283 INFO [org.radargun.service.Infinispan53CacheInfo] (main) cache.getAdvancedCache().getCacheConfiguration().clustering().hash().numOwners() = 2 04:12:26,283 INFO [org.radargun.service.Infinispan53CacheInfo] (main) cache.getCacheManager().getMembers().size() = 4 04:12:41,955 INFO [org.radargun.stages.cache.ClearCacheStage] (main) Cache size = 3834800 The "Cache size =" message is from the results of my distributed executor, and the other messages are informational. These outputs show that calling cache size on a distributed cache returns the size of the entire cache including any passivated entries, not just the size of the cache on the local node. This breaks the code of my distributed executor, but mostly makes it unnecessary if I can just call cache.size(). Is this an expected change in behavior? Thanks, Alan [1] https://github.com/radargun/radargun/blob/master/plugins/infinispan70/src/main/java/org/radargun/service/Infinispan70CacheInfo.java#L39 [2] https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/jdg-radargun-gettotalcachesize-test/6 -- Alan Field Principal Quality Engineer - JBoss Data Grid T: (919) 890-8932 | Ext. 8148932 From rvansa at redhat.com Fri Jul 25 10:40:17 2014 From: rvansa at redhat.com (Radim Vansa) Date: Fri, 25 Jul 2014 16:40:17 +0200 Subject: [infinispan-dev] Cache.size() on distributed caches? In-Reply-To: <1532946813.13089452.1406296461608.JavaMail.zimbra@redhat.com> References: <1532946813.13089452.1406296461608.JavaMail.zimbra@redhat.com> Message-ID: <53D26C51.5020003@redhat.com> I don't think RadarGun is much related to that, besides the fact that it can be buggy :) The question is whether the cache.size() behaviour has changed in a way that i would report full cache size, not just the size of container/the size of cache store? Radim On 07/25/2014 03:54 PM, Alan Field wrote: > Hey, > > I have been looking at adding the ability to get the total size of a cache in RadarGun. The first implementation I coded used the distributed iterators in Infinispan 7.[1] I then realized that implementing getTotalSize() method using a distributed executor would allow the code in versions back to Infinispan 5.2. I have the code written, and I have been running some Jenkins jobs with Infinispan 6.0.1 Final to verify that the results are correct.[2] I use the RandomData stage to put data in the cache. Here is what it writes in the log: > > 04:11:59,573 INFO [org.radargun.stages.cache.RandomDataStage] (main) Received responses from all 4 slaves. Durations [0 = 17.04 minutes, 1 = 18.36 minutes, 2 = 18.44 minutes, 3 = 18.58 minutes] > 04:11:59,574 INFO [org.radargun.stages.cache.RandomDataStage] (main) -------------------- > 04:11:59,574 INFO [org.radargun.stages.cache.RandomDataStage] (main) Filled cache with String objects totaling 25% of the Java heap > 04:11:59,574 INFO [org.radargun.stages.cache.RandomDataStage] (main) Slave 0 wrote 479352 values to the cache with a total size of 958,704 kb; targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44900952 > 04:11:59,575 INFO [org.radargun.stages.cache.RandomDataStage] (main) Slave 1 wrote 479352 values to the cache with a total size of 958,704 kb; targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44914319 > 04:11:59,575 INFO [org.radargun.stages.cache.RandomDataStage] (main) Slave 2 wrote 479352 values to the cache with a total size of 958,704 kb; targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44906729 > 04:11:59,575 INFO [org.radargun.stages.cache.RandomDataStage] (main) Slave 3 wrote 479352 values to the cache with a total size of 958,704 kb; targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44908687 > 04:11:59,576 INFO [org.radargun.stages.cache.RandomDataStage] (main) The cache contains 1917408 values with a total size of 3,834,816 kb > 04:11:59,576 INFO [org.radargun.stages.cache.RandomDataStage] (main) 100 words were generated with a maximum length of 20 characters > 04:11:59,576 INFO [org.radargun.stages.cache.RandomDataStage] (main) -------------------- > > These are the outputs from my getTotalSize() code: > > 04:11:59,591 INFO [org.radargun.service.Infinispan53CacheInfo] (main) org.radargun.service.Infinispan53CacheInfo$Cache.getTotalSize() for cache testCache > 04:12:12,094 INFO [org.radargun.service.Infinispan53CacheInfo] (main) cache.size() = 1917408 > 04:12:26,283 INFO [org.radargun.service.Infinispan53CacheInfo] (main) cache.getAdvancedCache().size() = 1917408 > 04:12:26,283 INFO [org.radargun.service.Infinispan53CacheInfo] (main) cache.getAdvancedCache().getCacheConfiguration().clustering().hash().numOwners() = 2 > 04:12:26,283 INFO [org.radargun.service.Infinispan53CacheInfo] (main) cache.getCacheManager().getMembers().size() = 4 > 04:12:41,955 INFO [org.radargun.stages.cache.ClearCacheStage] (main) Cache size = 3834800 > > The "Cache size =" message is from the results of my distributed executor, and the other messages are informational. These outputs show that calling cache size on a distributed cache returns the size of the entire cache including any passivated entries, not just the size of the cache on the local node. This breaks the code of my distributed executor, but mostly makes it unnecessary if I can just call cache.size(). > > Is this an expected change in behavior? > > Thanks, > Alan > > [1] https://github.com/radargun/radargun/blob/master/plugins/infinispan70/src/main/java/org/radargun/service/Infinispan70CacheInfo.java#L39 > [2] https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/jdg-radargun-gettotalcachesize-test/6 > -- Radim Vansa JBoss DataGrid QA From mmarkus at redhat.com Fri Jul 25 10:42:50 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Fri, 25 Jul 2014 15:42:50 +0100 Subject: [infinispan-dev] Cache.size() on distributed caches? In-Reply-To: <53D26C51.5020003@redhat.com> References: <1532946813.13089452.1406296461608.JavaMail.zimbra@redhat.com> <53D26C51.5020003@redhat.com> Message-ID: That has changed in ISPN 6: https://issues.jboss.org/browse/ISPN-761 On Jul 25, 2014, at 15:40, Radim Vansa wrote: > I don't think RadarGun is much related to that, besides the fact that it > can be buggy :) > > The question is whether the cache.size() behaviour has changed in a way > that i would report full cache size, not just the size of container/the > size of cache store? > > Radim > > On 07/25/2014 03:54 PM, Alan Field wrote: >> Hey, >> >> I have been looking at adding the ability to get the total size of a cache in RadarGun. The first implementation I coded used the distributed iterators in Infinispan 7.[1] I then realized that implementing getTotalSize() method using a distributed executor would allow the code in versions back to Infinispan 5.2. I have the code written, and I have been running some Jenkins jobs with Infinispan 6.0.1 Final to verify that the results are correct.[2] I use the RandomData stage to put data in the cache. Here is what it writes in the log: >> >> 04:11:59,573 INFO [org.radargun.stages.cache.RandomDataStage] (main) Received responses from all 4 slaves. Durations [0 = 17.04 minutes, 1 = 18.36 minutes, 2 = 18.44 minutes, 3 = 18.58 minutes] >> 04:11:59,574 INFO [org.radargun.stages.cache.RandomDataStage] (main) -------------------- >> 04:11:59,574 INFO [org.radargun.stages.cache.RandomDataStage] (main) Filled cache with String objects totaling 25% of the Java heap >> 04:11:59,574 INFO [org.radargun.stages.cache.RandomDataStage] (main) Slave 0 wrote 479352 values to the cache with a total size of 958,704 kb; targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44900952 >> 04:11:59,575 INFO [org.radargun.stages.cache.RandomDataStage] (main) Slave 1 wrote 479352 values to the cache with a total size of 958,704 kb; targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44914319 >> 04:11:59,575 INFO [org.radargun.stages.cache.RandomDataStage] (main) Slave 2 wrote 479352 values to the cache with a total size of 958,704 kb; targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44906729 >> 04:11:59,575 INFO [org.radargun.stages.cache.RandomDataStage] (main) Slave 3 wrote 479352 values to the cache with a total size of 958,704 kb; targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44908687 >> 04:11:59,576 INFO [org.radargun.stages.cache.RandomDataStage] (main) The cache contains 1917408 values with a total size of 3,834,816 kb >> 04:11:59,576 INFO [org.radargun.stages.cache.RandomDataStage] (main) 100 words were generated with a maximum length of 20 characters >> 04:11:59,576 INFO [org.radargun.stages.cache.RandomDataStage] (main) -------------------- >> >> These are the outputs from my getTotalSize() code: >> >> 04:11:59,591 INFO [org.radargun.service.Infinispan53CacheInfo] (main) org.radargun.service.Infinispan53CacheInfo$Cache.getTotalSize() for cache testCache >> 04:12:12,094 INFO [org.radargun.service.Infinispan53CacheInfo] (main) cache.size() = 1917408 >> 04:12:26,283 INFO [org.radargun.service.Infinispan53CacheInfo] (main) cache.getAdvancedCache().size() = 1917408 >> 04:12:26,283 INFO [org.radargun.service.Infinispan53CacheInfo] (main) cache.getAdvancedCache().getCacheConfiguration().clustering().hash().numOwners() = 2 >> 04:12:26,283 INFO [org.radargun.service.Infinispan53CacheInfo] (main) cache.getCacheManager().getMembers().size() = 4 >> 04:12:41,955 INFO [org.radargun.stages.cache.ClearCacheStage] (main) Cache size = 3834800 >> >> The "Cache size =" message is from the results of my distributed executor, and the other messages are informational. These outputs show that calling cache size on a distributed cache returns the size of the entire cache including any passivated entries, not just the size of the cache on the local node. This breaks the code of my distributed executor, but mostly makes it unnecessary if I can just call cache.size(). >> >> Is this an expected change in behavior? >> >> Thanks, >> Alan >> >> [1] https://github.com/radargun/radargun/blob/master/plugins/infinispan70/src/main/java/org/radargun/service/Infinispan70CacheInfo.java#L39 >> [2] https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/jdg-radargun-gettotalcachesize-test/6 >> > > > -- > Radim Vansa > JBoss DataGrid QA > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From afield at redhat.com Fri Jul 25 10:47:39 2014 From: afield at redhat.com (Alan Field) Date: Fri, 25 Jul 2014 10:47:39 -0400 (EDT) Subject: [infinispan-dev] Cache.size() on distributed caches? In-Reply-To: References: <1532946813.13089452.1406296461608.JavaMail.zimbra@redhat.com> <53D26C51.5020003@redhat.com> Message-ID: <175233895.13139588.1406299659750.JavaMail.zimbra@redhat.com> OK, thanks Mircea! Alan ----- Original Message ----- > From: "Mircea Markus" > To: "infinispan -Dev List" > Sent: Friday, July 25, 2014 4:42:50 PM > Subject: Re: [infinispan-dev] Cache.size() on distributed caches? > > That has changed in ISPN 6: https://issues.jboss.org/browse/ISPN-761 > > > On Jul 25, 2014, at 15:40, Radim Vansa wrote: > > > I don't think RadarGun is much related to that, besides the fact that it > > can be buggy :) > > > > The question is whether the cache.size() behaviour has changed in a way > > that i would report full cache size, not just the size of container/the > > size of cache store? > > > > Radim > > > > On 07/25/2014 03:54 PM, Alan Field wrote: > >> Hey, > >> > >> I have been looking at adding the ability to get the total size of a cache > >> in RadarGun. The first implementation I coded used the distributed > >> iterators in Infinispan 7.[1] I then realized that implementing > >> getTotalSize() method using a distributed executor would allow the code > >> in versions back to Infinispan 5.2. I have the code written, and I have > >> been running some Jenkins jobs with Infinispan 6.0.1 Final to verify that > >> the results are correct.[2] I use the RandomData stage to put data in the > >> cache. Here is what it writes in the log: > >> > >> 04:11:59,573 INFO [org.radargun.stages.cache.RandomDataStage] (main) > >> Received responses from all 4 slaves. Durations [0 = 17.04 minutes, 1 = > >> 18.36 minutes, 2 = 18.44 minutes, 3 = 18.58 minutes] > >> 04:11:59,574 INFO [org.radargun.stages.cache.RandomDataStage] (main) > >> -------------------- > >> 04:11:59,574 INFO [org.radargun.stages.cache.RandomDataStage] (main) > >> Filled cache with String objects totaling 25% of the Java heap > >> 04:11:59,574 INFO [org.radargun.stages.cache.RandomDataStage] (main) > >> Slave 0 wrote 479352 values to the cache with a total size of 958,704 kb; > >> targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44900952 > >> 04:11:59,575 INFO [org.radargun.stages.cache.RandomDataStage] (main) > >> Slave 1 wrote 479352 values to the cache with a total size of 958,704 kb; > >> targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44914319 > >> 04:11:59,575 INFO [org.radargun.stages.cache.RandomDataStage] (main) > >> Slave 2 wrote 479352 values to the cache with a total size of 958,704 kb; > >> targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44906729 > >> 04:11:59,575 INFO [org.radargun.stages.cache.RandomDataStage] (main) > >> Slave 3 wrote 479352 values to the cache with a total size of 958,704 kb; > >> targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44908687 > >> 04:11:59,576 INFO [org.radargun.stages.cache.RandomDataStage] (main) The > >> cache contains 1917408 values with a total size of 3,834,816 kb > >> 04:11:59,576 INFO [org.radargun.stages.cache.RandomDataStage] (main) 100 > >> words were generated with a maximum length of 20 characters > >> 04:11:59,576 INFO [org.radargun.stages.cache.RandomDataStage] (main) > >> -------------------- > >> > >> These are the outputs from my getTotalSize() code: > >> > >> 04:11:59,591 INFO [org.radargun.service.Infinispan53CacheInfo] (main) > >> org.radargun.service.Infinispan53CacheInfo$Cache.getTotalSize() for cache > >> testCache > >> 04:12:12,094 INFO [org.radargun.service.Infinispan53CacheInfo] (main) > >> cache.size() = 1917408 > >> 04:12:26,283 INFO [org.radargun.service.Infinispan53CacheInfo] (main) > >> cache.getAdvancedCache().size() = 1917408 > >> 04:12:26,283 INFO [org.radargun.service.Infinispan53CacheInfo] (main) > >> cache.getAdvancedCache().getCacheConfiguration().clustering().hash().numOwners() > >> = 2 > >> 04:12:26,283 INFO [org.radargun.service.Infinispan53CacheInfo] (main) > >> cache.getCacheManager().getMembers().size() = 4 > >> 04:12:41,955 INFO [org.radargun.stages.cache.ClearCacheStage] (main) > >> Cache size = 3834800 > >> > >> The "Cache size =" message is from the results of my distributed executor, > >> and the other messages are informational. These outputs show that calling > >> cache size on a distributed cache returns the size of the entire cache > >> including any passivated entries, not just the size of the cache on the > >> local node. This breaks the code of my distributed executor, but mostly > >> makes it unnecessary if I can just call cache.size(). > >> > >> Is this an expected change in behavior? > >> > >> Thanks, > >> Alan > >> > >> [1] > >> https://github.com/radargun/radargun/blob/master/plugins/infinispan70/src/main/java/org/radargun/service/Infinispan70CacheInfo.java#L39 > >> [2] > >> https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/jdg-radargun-gettotalcachesize-test/6 > >> > > > > > > -- > > Radim Vansa > > JBoss DataGrid QA > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > From mmarkus at redhat.com Mon Jul 28 11:04:24 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Mon, 28 Jul 2014 16:04:24 +0100 Subject: [infinispan-dev] minutes from the monitoring&management meeting Message-ID: <912B2F10-1854-494C-AD55-1C91413D4A51@redhat.com> Hi, Tristan, Sanne, Gustavo and I meetlast week to discuss a) Infinispan usability and b) monitoring and management. Minutes attached. https://docs.google.com/document/d/1dIxH0xTiYBHH6_nkqybc13_zzW9gMIcaF_GX5Y7_PPQ/edit?usp=sharing Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From dan.berindei at gmail.com Mon Jul 28 12:24:46 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Mon, 28 Jul 2014 19:24:46 +0300 Subject: [infinispan-dev] minutes from the monitoring&management meeting In-Reply-To: <912B2F10-1854-494C-AD55-1C91413D4A51@redhat.com> References: <912B2F10-1854-494C-AD55-1C91413D4A51@redhat.com> Message-ID: 1. Configuration inheritance: I would go further and allow the definition of "configurations" that cannot be started, only inherited from. 2. Replication queue: I discussed an upgrade to the replication queue with Will, similar to JGroups' TransferQueuBundler [1]. It does raise an interesting point about why we need the same functionality both in JGroups and in Infinispan, though... 3. I think ClusterLoader is still useful in invalidation mode, and also replication mode w/out state transfer ("lazy" state transfer). [1] https://issues.jboss.org/browse/ISPN-4547 Cheers Dan On Mon, Jul 28, 2014 at 6:04 PM, Mircea Markus wrote: > Hi, > > Tristan, Sanne, Gustavo and I meetlast week to discuss a) Infinispan > usability and b) monitoring and management. Minutes attached. > > > https://docs.google.com/document/d/1dIxH0xTiYBHH6_nkqybc13_zzW9gMIcaF_GX5Y7_PPQ/edit?usp=sharing > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140728/15f41458/attachment.html From sanne at infinispan.org Mon Jul 28 12:35:48 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Mon, 28 Jul 2014 17:35:48 +0100 Subject: [infinispan-dev] minutes from the monitoring&management meeting In-Reply-To: References: <912B2F10-1854-494C-AD55-1C91413D4A51@redhat.com> Message-ID: On 28 July 2014 17:24, Dan Berindei wrote: > 1. Configuration inheritance: I would go further and allow the definition of > "configurations" that cannot be started, only inherited from. > 2. Replication queue: I discussed an upgrade to the replication queue with > Will, similar to JGroups' TransferQueuBundler [1]. It does raise an > interesting point about why we need the same functionality both in JGroups > and in Infinispan, though... > 3. I think ClusterLoader is still useful in invalidation mode, and also > replication mode w/out state transfer ("lazy" state transfer). What's the use case for asking for REPL but not caring for actual REPL? In such use case I expect data to be either very uninportant (other than performance-wise) or persisted to a shared cachestore: I'd use Invalidation, L1 and maybe even disable state transfer. Sanne > > [1] https://issues.jboss.org/browse/ISPN-4547 > > Cheers > Dan > > > > On Mon, Jul 28, 2014 at 6:04 PM, Mircea Markus wrote: >> >> Hi, >> >> Tristan, Sanne, Gustavo and I meetlast week to discuss a) Infinispan >> usability and b) monitoring and management. Minutes attached. >> >> >> https://docs.google.com/document/d/1dIxH0xTiYBHH6_nkqybc13_zzW9gMIcaF_GX5Y7_PPQ/edit?usp=sharing >> >> Cheers, >> -- >> Mircea Markus >> Infinispan lead (www.infinispan.org) >> >> >> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From dan.berindei at gmail.com Mon Jul 28 12:55:42 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Mon, 28 Jul 2014 19:55:42 +0300 Subject: [infinispan-dev] minutes from the monitoring&management meeting In-Reply-To: References: <912B2F10-1854-494C-AD55-1C91413D4A51@redhat.com>

Message-ID: On Mon, Jul 28, 2014 at 7:35 PM, Sanne Grinovero wrote: > On 28 July 2014 17:24, Dan Berindei wrote: > > 1. Configuration inheritance: I would go further and allow the > definition of > > "configurations" that cannot be started, only inherited from. > > 2. Replication queue: I discussed an upgrade to the replication queue > with > > Will, similar to JGroups' TransferQueuBundler [1]. It does raise an > > interesting point about why we need the same functionality both in > JGroups > > and in Infinispan, though... > > 3. I think ClusterLoader is still useful in invalidation mode, and also > > replication mode w/out state transfer ("lazy" state transfer). > > What's the use case for asking for REPL but not caring for actual REPL? > In such use case I expect data to be either very uninportant (other > than performance-wise) or persisted to a shared cachestore: I'd use > Invalidation, L1 and maybe even disable state transfer. > Invalidation mode doesn't have L1, ClusterLoader is the closest equivalent. I'm guessing replication - state transfer + ClusterLoader would perform more or less like regular replication + state transfer, but without the delay when bringing up a new node. Invalidation mode has the downside that application writes invalidate the key on all nodes, so the next read for that key will be slow. > Sanne > > > > > [1] https://issues.jboss.org/browse/ISPN-4547 > > > > Cheers > > Dan > > > > > > > > On Mon, Jul 28, 2014 at 6:04 PM, Mircea Markus > wrote: > >> > >> Hi, > >> > >> Tristan, Sanne, Gustavo and I meetlast week to discuss a) Infinispan > >> usability and b) monitoring and management. Minutes attached. > >> > >> > >> > https://docs.google.com/document/d/1dIxH0xTiYBHH6_nkqybc13_zzW9gMIcaF_GX5Y7_PPQ/edit?usp=sharing > >> > >> Cheers, > >> -- > >> Mircea Markus > >> Infinispan lead (www.infinispan.org) > >> > >> > >> > >> > >> > >> _______________________________________________ > >> infinispan-dev mailing list > >> infinispan-dev at lists.jboss.org > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140728/8aa3d098/attachment.html From mudokonman at gmail.com Mon Jul 28 13:51:27 2014 From: mudokonman at gmail.com (William Burns) Date: Mon, 28 Jul 2014 13:51:27 -0400 Subject: [infinispan-dev] Cache.size() on distributed caches? In-Reply-To: <175233895.13139588.1406299659750.JavaMail.zimbra@redhat.com> References: <1532946813.13089452.1406296461608.JavaMail.zimbra@redhat.com> <53D26C51.5020003@redhat.com> <175233895.13139588.1406299659750.JavaMail.zimbra@redhat.com> Message-ID: I am assuming that you were using a shared cache loader without passivation? In that case the size method will return all the entries in the cache properly (albeit using a large amount of memory) To be honest I can't think of a way to get an accurate count of what is in the cache unless you are using a shared loader without passivation in previous versions. The only way would be if you had no gets/writes(passivation issues) or state transfer while you did the map/reduce task and had it based on if passivation was enabled or not (a bit messy but doable). The distributed iterator should work irrespective of configuration or concurrent operations though. On Fri, Jul 25, 2014 at 10:47 AM, Alan Field wrote: > OK, thanks Mircea! > > Alan > > ----- Original Message ----- >> From: "Mircea Markus" >> To: "infinispan -Dev List" >> Sent: Friday, July 25, 2014 4:42:50 PM >> Subject: Re: [infinispan-dev] Cache.size() on distributed caches? >> >> That has changed in ISPN 6: https://issues.jboss.org/browse/ISPN-761 >> >> >> On Jul 25, 2014, at 15:40, Radim Vansa wrote: >> >> > I don't think RadarGun is much related to that, besides the fact that it >> > can be buggy :) >> > >> > The question is whether the cache.size() behaviour has changed in a way >> > that i would report full cache size, not just the size of container/the >> > size of cache store? >> > >> > Radim >> > >> > On 07/25/2014 03:54 PM, Alan Field wrote: >> >> Hey, >> >> >> >> I have been looking at adding the ability to get the total size of a cache >> >> in RadarGun. The first implementation I coded used the distributed >> >> iterators in Infinispan 7.[1] I then realized that implementing >> >> getTotalSize() method using a distributed executor would allow the code >> >> in versions back to Infinispan 5.2. I have the code written, and I have >> >> been running some Jenkins jobs with Infinispan 6.0.1 Final to verify that >> >> the results are correct.[2] I use the RandomData stage to put data in the >> >> cache. Here is what it writes in the log: >> >> >> >> 04:11:59,573 INFO [org.radargun.stages.cache.RandomDataStage] (main) >> >> Received responses from all 4 slaves. Durations [0 = 17.04 minutes, 1 = >> >> 18.36 minutes, 2 = 18.44 minutes, 3 = 18.58 minutes] >> >> 04:11:59,574 INFO [org.radargun.stages.cache.RandomDataStage] (main) >> >> -------------------- >> >> 04:11:59,574 INFO [org.radargun.stages.cache.RandomDataStage] (main) >> >> Filled cache with String objects totaling 25% of the Java heap >> >> 04:11:59,574 INFO [org.radargun.stages.cache.RandomDataStage] (main) >> >> Slave 0 wrote 479352 values to the cache with a total size of 958,704 kb; >> >> targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44900952 >> >> 04:11:59,575 INFO [org.radargun.stages.cache.RandomDataStage] (main) >> >> Slave 1 wrote 479352 values to the cache with a total size of 958,704 kb; >> >> targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44914319 >> >> 04:11:59,575 INFO [org.radargun.stages.cache.RandomDataStage] (main) >> >> Slave 2 wrote 479352 values to the cache with a total size of 958,704 kb; >> >> targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44906729 >> >> 04:11:59,575 INFO [org.radargun.stages.cache.RandomDataStage] (main) >> >> Slave 3 wrote 479352 values to the cache with a total size of 958,704 kb; >> >> targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44908687 >> >> 04:11:59,576 INFO [org.radargun.stages.cache.RandomDataStage] (main) The >> >> cache contains 1917408 values with a total size of 3,834,816 kb >> >> 04:11:59,576 INFO [org.radargun.stages.cache.RandomDataStage] (main) 100 >> >> words were generated with a maximum length of 20 characters >> >> 04:11:59,576 INFO [org.radargun.stages.cache.RandomDataStage] (main) >> >> -------------------- >> >> >> >> These are the outputs from my getTotalSize() code: >> >> >> >> 04:11:59,591 INFO [org.radargun.service.Infinispan53CacheInfo] (main) >> >> org.radargun.service.Infinispan53CacheInfo$Cache.getTotalSize() for cache >> >> testCache >> >> 04:12:12,094 INFO [org.radargun.service.Infinispan53CacheInfo] (main) >> >> cache.size() = 1917408 >> >> 04:12:26,283 INFO [org.radargun.service.Infinispan53CacheInfo] (main) >> >> cache.getAdvancedCache().size() = 1917408 >> >> 04:12:26,283 INFO [org.radargun.service.Infinispan53CacheInfo] (main) >> >> cache.getAdvancedCache().getCacheConfiguration().clustering().hash().numOwners() >> >> = 2 >> >> 04:12:26,283 INFO [org.radargun.service.Infinispan53CacheInfo] (main) >> >> cache.getCacheManager().getMembers().size() = 4 >> >> 04:12:41,955 INFO [org.radargun.stages.cache.ClearCacheStage] (main) >> >> Cache size = 3834800 >> >> >> >> The "Cache size =" message is from the results of my distributed executor, >> >> and the other messages are informational. These outputs show that calling >> >> cache size on a distributed cache returns the size of the entire cache >> >> including any passivated entries, not just the size of the cache on the >> >> local node. This breaks the code of my distributed executor, but mostly >> >> makes it unnecessary if I can just call cache.size(). >> >> >> >> Is this an expected change in behavior? >> >> >> >> Thanks, >> >> Alan >> >> >> >> [1] >> >> https://github.com/radargun/radargun/blob/master/plugins/infinispan70/src/main/java/org/radargun/service/Infinispan70CacheInfo.java#L39 >> >> [2] >> >> https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/jdg-radargun-gettotalcachesize-test/6 >> >> >> > >> > >> > -- >> > Radim Vansa >> > JBoss DataGrid QA >> > >> > _______________________________________________ >> > infinispan-dev mailing list >> > infinispan-dev at lists.jboss.org >> > https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> Cheers, >> -- >> Mircea Markus >> Infinispan lead (www.infinispan.org) >> >> >> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From rvansa at redhat.com Tue Jul 29 03:00:18 2014 From: rvansa at redhat.com (Radim Vansa) Date: Tue, 29 Jul 2014 09:00:18 +0200 Subject: [infinispan-dev] minutes from the monitoring&management meeting In-Reply-To: References: <912B2F10-1854-494C-AD55-1C91413D4A51@redhat.com> Message-ID: <53D74682.9040608@redhat.com> On 07/28/2014 06:24 PM, Dan Berindei wrote: > 1. Configuration inheritance: I would go further and allow the > definition of "configurations" that cannot be started, only inherited > from. +1 - Coherence uses the term "scheme" rather than "configuration" Radim -- Radim Vansa JBoss DataGrid QA From afield at redhat.com Tue Jul 29 03:53:42 2014 From: afield at redhat.com (Alan Field) Date: Tue, 29 Jul 2014 03:53:42 -0400 (EDT) Subject: [infinispan-dev] Cache.size() on distributed caches? In-Reply-To: References: <1532946813.13089452.1406296461608.JavaMail.zimbra@redhat.com> <53D26C51.5020003@redhat.com> <175233895.13139588.1406299659750.JavaMail.zimbra@redhat.com> Message-ID: <723755503.15383153.1406620422329.JavaMail.zimbra@redhat.com> Hey Will, ----- Original Message ----- > From: "William Burns" > To: "infinispan -Dev List" > Sent: Monday, July 28, 2014 7:51:27 PM > Subject: Re: [infinispan-dev] Cache.size() on distributed caches? > > I am assuming that you were using a shared cache loader without > passivation? ?In that case the size method will return all the entries > in the cache properly (albeit using a large amount of memory) This is the cache store configuration I had for the cache in question: ?? ? ? ?? ? ? ?? ? ? ? ?? ? ? Shared is disabled by default, but passivation was enabled. However, I am also seeing this behavior with these lines commented out in the cache configuration. ISPN-761 talks about handling passivated entries in a cache store, but also says that size() will only show the number of entries on the local node and won't check with other nodes in the cluster. That is not what I am seeing. I have four different machines writing entries to the cache: 04:11:59,574 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) Filled cache with String objects totaling 25% of the Java heap 04:11:59,574 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) Slave 0 wrote 479352 values to the cache with a total size of 958,704 kb; targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44900952 04:11:59,575 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) Slave 1 wrote 479352 values to the cache with a total size of 958,704 kb; targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44914319 04:11:59,575 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) Slave 2 wrote 479352 values to the cache with a total size of 958,704 kb; targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44906729 04:11:59,575 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) Slave 3 wrote 479352 values to the cache with a total size of 958,704 kb; targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44908687 04:11:59,576 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) The cache contains 1917408 values with a total size of 3,834,816 kb And when I call Cache.size(), I am getting 1917408 which is the total size of the cache across all 4 nodes: 04:12:12,094 INFO ?[org.radargun.service.Infinispan53CacheInfo] (main) cache.size() = 1917408 I am not expecting this to happen, but that is what I am seeing with or without a cache store. Thanks, Alan > > To be honest I can't think of a way to get an accurate count of what > is in the cache unless you are using a shared loader without > passivation in previous versions. ?The only way would be if you had no > gets/writes(passivation issues) or state transfer while you did the > map/reduce task and had it based on if passivation was enabled or not > (a bit messy but doable). > > The distributed iterator should work irrespective of configuration or > concurrent operations though. > > On Fri, Jul 25, 2014 at 10:47 AM, Alan Field wrote: > > OK, thanks Mircea! > > > > Alan > > > > ----- Original Message ----- > >> From: "Mircea Markus" > >> To: "infinispan -Dev List" > >> Sent: Friday, July 25, 2014 4:42:50 PM > >> Subject: Re: [infinispan-dev] Cache.size() on distributed caches? > >> > >> That has changed in ISPN 6: https://issues.jboss.org/browse/ISPN-761 > >> > >> > >> On Jul 25, 2014, at 15:40, Radim Vansa wrote: > >> > >> > I don't think RadarGun is much related to that, besides the fact that it > >> > can be buggy :) > >> > > >> > The question is whether the cache.size() behaviour has changed in a way > >> > that i would report full cache size, not just the size of container/the > >> > size of cache store? > >> > > >> > Radim > >> > > >> > On 07/25/2014 03:54 PM, Alan Field wrote: > >> >> Hey, > >> >> > >> >> I have been looking at adding the ability to get the total size of a > >> >> cache > >> >> in RadarGun. The first implementation I coded used the distributed > >> >> iterators in Infinispan 7.[1] I then realized that implementing > >> >> getTotalSize() method using a distributed executor would allow the code > >> >> in versions back to Infinispan 5.2. I have the code written, and I have > >> >> been running some Jenkins jobs with Infinispan 6.0.1 Final to verify > >> >> that > >> >> the results are correct.[2] I use the RandomData stage to put data in > >> >> the > >> >> cache. Here is what it writes in the log: > >> >> > >> >> 04:11:59,573 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) > >> >> Received responses from all 4 slaves. Durations [0 = 17.04 minutes, 1 = > >> >> 18.36 minutes, 2 = 18.44 minutes, 3 = 18.58 minutes] > >> >> 04:11:59,574 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) > >> >> -------------------- > >> >> 04:11:59,574 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) > >> >> Filled cache with String objects totaling 25% of the Java heap > >> >> 04:11:59,574 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) > >> >> Slave 0 wrote 479352 values to the cache with a total size of 958,704 > >> >> kb; > >> >> targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44900952 > >> >> 04:11:59,575 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) > >> >> Slave 1 wrote 479352 values to the cache with a total size of 958,704 > >> >> kb; > >> >> targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44914319 > >> >> 04:11:59,575 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) > >> >> Slave 2 wrote 479352 values to the cache with a total size of 958,704 > >> >> kb; > >> >> targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44906729 > >> >> 04:11:59,575 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) > >> >> Slave 3 wrote 479352 values to the cache with a total size of 958,704 > >> >> kb; > >> >> targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44908687 > >> >> 04:11:59,576 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) > >> >> The > >> >> cache contains 1917408 values with a total size of 3,834,816 kb > >> >> 04:11:59,576 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) > >> >> 100 > >> >> words were generated with a maximum length of 20 characters > >> >> 04:11:59,576 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) > >> >> -------------------- > >> >> > >> >> These are the outputs from my getTotalSize() code: > >> >> > >> >> 04:11:59,591 INFO ?[org.radargun.service.Infinispan53CacheInfo] (main) > >> >> org.radargun.service.Infinispan53CacheInfo$Cache.getTotalSize() for > >> >> cache > >> >> testCache > >> >> 04:12:12,094 INFO ?[org.radargun.service.Infinispan53CacheInfo] (main) > >> >> cache.size() = 1917408 > >> >> 04:12:26,283 INFO ?[org.radargun.service.Infinispan53CacheInfo] (main) > >> >> cache.getAdvancedCache().size() = 1917408 > >> >> 04:12:26,283 INFO ?[org.radargun.service.Infinispan53CacheInfo] (main) > >> >> cache.getAdvancedCache().getCacheConfiguration().clustering().hash().numOwners() > >> >> = 2 > >> >> 04:12:26,283 INFO ?[org.radargun.service.Infinispan53CacheInfo] (main) > >> >> cache.getCacheManager().getMembers().size() = 4 > >> >> 04:12:41,955 INFO ?[org.radargun.stages.cache.ClearCacheStage] (main) > >> >> Cache size = 3834800 > >> >> > >> >> The "Cache size =" message is from the results of my distributed > >> >> executor, > >> >> and the other messages are informational. These outputs show that > >> >> calling > >> >> cache size on a distributed cache returns the size of the entire cache > >> >> including any passivated entries, not just the size of the cache on the > >> >> local node. This breaks the code of my distributed executor, but mostly > >> >> makes it unnecessary if I can just call cache.size(). > >> >> > >> >> Is this an expected change in behavior? > >> >> > >> >> Thanks, > >> >> Alan > >> >> > >> >> [1] > >> >> https://github.com/radargun/radargun/blob/master/plugins/infinispan70/src/main/java/org/radargun/service/Infinispan70CacheInfo.java#L39 > >> >> [2] > >> >> https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/jdg-radargun-gettotalcachesize-test/6 > >> >> > >> > > >> > > >> > -- > >> > Radim Vansa > >> > JBoss DataGrid QA > >> > > >> > _______________________________________________ > >> > infinispan-dev mailing list > >> > infinispan-dev at lists.jboss.org > >> > https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> > >> Cheers, > >> -- > >> Mircea Markus > >> Infinispan lead (www.infinispan.org) > >> > >> > >> > >> > >> > >> _______________________________________________ > >> infinispan-dev mailing list > >> infinispan-dev at lists.jboss.org > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > From rory.odonnell at oracle.com Tue Jul 29 05:27:27 2014 From: rory.odonnell at oracle.com (Rory O'Donnell Oracle, Dublin Ireland) Date: Tue, 29 Jul 2014 10:27:27 +0100 Subject: [infinispan-dev] Early Access builds for JDK 9 b24, JDK 8u20 b23 are available on java.net Message-ID: <53D768FF.7080607@oracle.com> Hi Galder, Early Access builds for JDK 9 b24 and JDK 8u20 b23 are available on java.net. As we enter the later phases of development for JDK 8u20 , please log any show stoppers as soon as possible. Rgds, Rory -- Rgds,Rory O'Donnell Quality Engineering Manager Oracle EMEA , Dublin, Ireland -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140729/a4fcb00d/attachment.html From afield at redhat.com Tue Jul 29 08:42:18 2014 From: afield at redhat.com (Alan Field) Date: Tue, 29 Jul 2014 08:42:18 -0400 (EDT) Subject: [infinispan-dev] Cache.size() on distributed caches? In-Reply-To: <723755503.15383153.1406620422329.JavaMail.zimbra@redhat.com> References: <1532946813.13089452.1406296461608.JavaMail.zimbra@redhat.com> <53D26C51.5020003@redhat.com> <175233895.13139588.1406299659750.JavaMail.zimbra@redhat.com> <723755503.15383153.1406620422329.JavaMail.zimbra@redhat.com> Message-ID: <1801569529.15512890.1406637738072.JavaMail.zimbra@redhat.com> Will helped me figure this out. The cache was in replicated mode. I thought the cache was in distributed mode because Cache.getAdvancedCache().getCacheConfiguration().clustering().hash().numOwners() was returning 2. I expected it to return the correct number of owners in replicated mode, but that's not the case. Sorry for the noise, and for my confusion. Thanks, Alan ----- Original Message ----- > From: "Alan Field" > To: "infinispan -Dev List" > Sent: Tuesday, July 29, 2014 9:53:42 AM > Subject: Re: [infinispan-dev] Cache.size() on distributed caches? > > Hey Will, > > ----- Original Message ----- > > From: "William Burns" > > To: "infinispan -Dev List" > > Sent: Monday, July 28, 2014 7:51:27 PM > > Subject: Re: [infinispan-dev] Cache.size() on distributed caches? > > > > I am assuming that you were using a shared cache loader without > > passivation? ?In that case the size method will return all the entries > > in the cache properly (albeit using a large amount of memory) > > This is the cache store configuration I had for the cache in question: > > ?? ? ? > ?? ? ? > ?? ? ? ? ?? ? ? ? ? ? ? ? ? ?implementationType="JAVA" > ?? ? ? ? ? ? ? ? ? ?location="/tmp/ispn-leveldb-jni/data" > ?? ? ? ? ? ? ? ? ? ?expiredLocation="/tmp/ispn-leveldb-jni/expired" > ?? ? ? ? ? ? ? ? ? ?purgeOnStartup="true" > ?? ? ? ? ? ? ? ? ? ?preload="false" > ?? ? ? ? ? ? ? ? ? ?/> > ?? ? ? > > Shared is disabled by default, but passivation was enabled. However, I am > also seeing this behavior with these lines commented out in the cache > configuration. > > ISPN-761 talks about handling passivated entries in a cache store, but also > says that size() will only show the number of entries on the local node and > won't check with other nodes in the cluster. That is not what I am seeing. I > have four different machines writing entries to the cache: > > 04:11:59,574 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) Filled > cache with String objects totaling 25% of the Java heap > 04:11:59,574 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) Slave 0 > wrote 479352 values to the cache with a total size of 958,704 kb; > targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44900952 > 04:11:59,575 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) Slave 1 > wrote 479352 values to the cache with a total size of 958,704 kb; > targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44914319 > 04:11:59,575 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) Slave 2 > wrote 479352 values to the cache with a total size of 958,704 kb; > targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44906729 > 04:11:59,575 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) Slave 3 > wrote 479352 values to the cache with a total size of 958,704 kb; > targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44908687 > 04:11:59,576 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) The > cache contains 1917408 values with a total size of 3,834,816 kb > > And when I call Cache.size(), I am getting 1917408 which is the total size of > the cache across all 4 nodes: > > 04:12:12,094 INFO ?[org.radargun.service.Infinispan53CacheInfo] (main) > cache.size() = 1917408 > > I am not expecting this to happen, but that is what I am seeing with or > without a cache store. > > Thanks, > Alan > > > > > To be honest I can't think of a way to get an accurate count of what > > is in the cache unless you are using a shared loader without > > passivation in previous versions. ?The only way would be if you had no > > gets/writes(passivation issues) or state transfer while you did the > > map/reduce task and had it based on if passivation was enabled or not > > (a bit messy but doable). > > > > The distributed iterator should work irrespective of configuration or > > concurrent operations though. > > > > On Fri, Jul 25, 2014 at 10:47 AM, Alan Field wrote: > > > OK, thanks Mircea! > > > > > > Alan > > > > > > ----- Original Message ----- > > >> From: "Mircea Markus" > > >> To: "infinispan -Dev List" > > >> Sent: Friday, July 25, 2014 4:42:50 PM > > >> Subject: Re: [infinispan-dev] Cache.size() on distributed caches? > > >> > > >> That has changed in ISPN 6: https://issues.jboss.org/browse/ISPN-761 > > >> > > >> > > >> On Jul 25, 2014, at 15:40, Radim Vansa wrote: > > >> > > >> > I don't think RadarGun is much related to that, besides the fact that > > >> > it > > >> > can be buggy :) > > >> > > > >> > The question is whether the cache.size() behaviour has changed in a > > >> > way > > >> > that i would report full cache size, not just the size of > > >> > container/the > > >> > size of cache store? > > >> > > > >> > Radim > > >> > > > >> > On 07/25/2014 03:54 PM, Alan Field wrote: > > >> >> Hey, > > >> >> > > >> >> I have been looking at adding the ability to get the total size of a > > >> >> cache > > >> >> in RadarGun. The first implementation I coded used the distributed > > >> >> iterators in Infinispan 7.[1] I then realized that implementing > > >> >> getTotalSize() method using a distributed executor would allow the > > >> >> code > > >> >> in versions back to Infinispan 5.2. I have the code written, and I > > >> >> have > > >> >> been running some Jenkins jobs with Infinispan 6.0.1 Final to verify > > >> >> that > > >> >> the results are correct.[2] I use the RandomData stage to put data in > > >> >> the > > >> >> cache. Here is what it writes in the log: > > >> >> > > >> >> 04:11:59,573 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) > > >> >> Received responses from all 4 slaves. Durations [0 = 17.04 minutes, 1 > > >> >> = > > >> >> 18.36 minutes, 2 = 18.44 minutes, 3 = 18.58 minutes] > > >> >> 04:11:59,574 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) > > >> >> -------------------- > > >> >> 04:11:59,574 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) > > >> >> Filled cache with String objects totaling 25% of the Java heap > > >> >> 04:11:59,574 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) > > >> >> Slave 0 wrote 479352 values to the cache with a total size of 958,704 > > >> >> kb; > > >> >> targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44900952 > > >> >> 04:11:59,575 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) > > >> >> Slave 1 wrote 479352 values to the cache with a total size of 958,704 > > >> >> kb; > > >> >> targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44914319 > > >> >> 04:11:59,575 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) > > >> >> Slave 2 wrote 479352 values to the cache with a total size of 958,704 > > >> >> kb; > > >> >> targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44906729 > > >> >> 04:11:59,575 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) > > >> >> Slave 3 wrote 479352 values to the cache with a total size of 958,704 > > >> >> kb; > > >> >> targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44908687 > > >> >> 04:11:59,576 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) > > >> >> The > > >> >> cache contains 1917408 values with a total size of 3,834,816 kb > > >> >> 04:11:59,576 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) > > >> >> 100 > > >> >> words were generated with a maximum length of 20 characters > > >> >> 04:11:59,576 INFO ?[org.radargun.stages.cache.RandomDataStage] (main) > > >> >> -------------------- > > >> >> > > >> >> These are the outputs from my getTotalSize() code: > > >> >> > > >> >> 04:11:59,591 INFO ?[org.radargun.service.Infinispan53CacheInfo] > > >> >> (main) > > >> >> org.radargun.service.Infinispan53CacheInfo$Cache.getTotalSize() for > > >> >> cache > > >> >> testCache > > >> >> 04:12:12,094 INFO ?[org.radargun.service.Infinispan53CacheInfo] > > >> >> (main) > > >> >> cache.size() = 1917408 > > >> >> 04:12:26,283 INFO ?[org.radargun.service.Infinispan53CacheInfo] > > >> >> (main) > > >> >> cache.getAdvancedCache().size() = 1917408 > > >> >> 04:12:26,283 INFO ?[org.radargun.service.Infinispan53CacheInfo] > > >> >> (main) > > >> >> cache.getAdvancedCache().getCacheConfiguration().clustering().hash().numOwners() > > >> >> = 2 > > >> >> 04:12:26,283 INFO ?[org.radargun.service.Infinispan53CacheInfo] > > >> >> (main) > > >> >> cache.getCacheManager().getMembers().size() = 4 > > >> >> 04:12:41,955 INFO ?[org.radargun.stages.cache.ClearCacheStage] (main) > > >> >> Cache size = 3834800 > > >> >> > > >> >> The "Cache size =" message is from the results of my distributed > > >> >> executor, > > >> >> and the other messages are informational. These outputs show that > > >> >> calling > > >> >> cache size on a distributed cache returns the size of the entire > > >> >> cache > > >> >> including any passivated entries, not just the size of the cache on > > >> >> the > > >> >> local node. This breaks the code of my distributed executor, but > > >> >> mostly > > >> >> makes it unnecessary if I can just call cache.size(). > > >> >> > > >> >> Is this an expected change in behavior? > > >> >> > > >> >> Thanks, > > >> >> Alan > > >> >> > > >> >> [1] > > >> >> https://github.com/radargun/radargun/blob/master/plugins/infinispan70/src/main/java/org/radargun/service/Infinispan70CacheInfo.java#L39 > > >> >> [2] > > >> >> https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/jdg-radargun-gettotalcachesize-test/6 > > >> >> > > >> > > > >> > > > >> > -- > > >> > Radim Vansa > > >> > JBoss DataGrid QA > > >> > > > >> > _______________________________________________ > > >> > infinispan-dev mailing list > > >> > infinispan-dev at lists.jboss.org > > >> > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > >> > > >> Cheers, > > >> -- > > >> Mircea Markus > > >> Infinispan lead (www.infinispan.org) > > >> > > >> > > >> > > >> > > >> > > >> _______________________________________________ > > >> infinispan-dev mailing list > > >> infinispan-dev at lists.jboss.org > > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > >> > > > _______________________________________________ > > > infinispan-dev mailing list > > > infinispan-dev at lists.jboss.org > > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From bban at redhat.com Tue Jul 29 09:56:06 2014 From: bban at redhat.com (Bela Ban) Date: Tue, 29 Jul 2014 15:56:06 +0200 Subject: [infinispan-dev] DIST-SYNC, put(), a problem and a solution Message-ID: <53D7A7F6.1010100@redhat.com> Hi guys, sorry for the long post, but I do think I ran into an important problem and we need to fix it ... :-) I've spent the last couple of days running the IspnPerfTest [1] perftest on Google Compute Engine (GCE), and I've run into a problem with Infinispan. It is a design problem and can be mitigated by sizing thread pools correctly, but cannot be eliminated entirely. Symptom: -------- IspnPerfTest has every node in a cluster perform 20'000 requests on keys in range [1..20000]. 80% of the requests are reads and 20% writes. By default, we have 25 requester threads per node and 100 nodes in a cluster, so a total of 2500 requester threads. The cache used is NON-TRANSACTIONAL / dist-sync / 2 owners: It has 2 owners, a lock acquisition timeout of 5s and a repl timeout of 20s. Lock stripting is off, so we have 1 lock per key. When I run the test, I always get errors like those below: org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [10 seconds] on key [19386] for requestor [Thread[invoker-3,5,main]]! Lock held by [Thread[OOB-194,ispn-perf-test,m5.1,5,main]] and org.infinispan.util.concurrent.TimeoutException: Node m8.1 timed out Investigation: ------------ When I looked at UNICAST3, I saw a lot of missing messages on the receive side and unacked messages on the send side. This caused me to look into the (mainly OOB) thread pools and - voila - maxed out ! I learned from Pedro that the Infinispan internal thread pool (with a default of 32 threads) can be configured, so I increased it to 300 and increased the OOB pools as well. This mitigated the problem somewhat, but when I increased the requester threads to 100, I had the same problem again. Apparently, the Infinispan internal thread pool uses a rejection policy of "run" and thus uses the JGroups (OOB) thread when exhausted. I learned (from Pedro and Mircea) that GETs and PUTs work as follows in dist-sync / 2 owners: - GETs are sent to the primary and backup owners and the first response received is returned to the caller. No locks are acquired, so GETs shouldn't cause problems. - A PUT(K) is sent to the primary owner of K - The primary owner (1) locks K (2) updates the backup owner synchronously *while holding the lock* (3) releases the lock Hypothesis ---------- (2) above is done while holding the lock. The sync update of the backup owner is done with the lock held to guarantee that the primary and backup owner of K have the same values for K. However, the sync update *inside the lock scope* slows things down (can it also lead to deadlocks?); there's the risk that the request is dropped due to a full incoming thread pool, or that the response is not received because of the same, or that the locking at the backup owner blocks for some time. If we have many threads modifying the same key, then we have a backlog of locking work against that key. Say we have 100 requester threads and a 100 node cluster. This means that we have 10'000 threads accessing keys; with 2'000 writers there's a big chance that some writers pick the same key at the same time. For example, if we have 100 threads accessing key K and it takes 3ms to replicate K to the backup owner, then the last of the 100 threads waits ~300ms before it gets a chance to lock K on the primary owner and replicate it as well. Just a small hiccup in sending the PUT to the primary owner, sending the modification to the backup owner, waitting for the response, or GC, and the delay will quickly become bigger. Verification ---------- To verify the above, I set numOwners to 1. This means that the primary owner of K does *not* send the modification to the backup owner, it only locks K, modifies K and unlocks K again. I ran the IspnPerfTest again on 100 nodes, with 25 requesters, and NO PROBLEM ! I then increased the requesters to 100, 150 and 200 and the test completed flawlessly ! Performance was around *40'000 requests per node per sec* on 4-core boxes ! Root cause --------- ******************* The root cause is the sync RPC of K to the backup owner(s) of K while the primary owner holds the lock for K. ******************* This causes a backlog of threads waiting for the lock and that backlog can grow to exhaust the thread pools. First the Infinispan internal thread pool, then the JGroups OOB thread pool. The latter causes retransmissions to get dropped, which compounds the problem... Goal ---- The goal is to make sure that primary and backup owner(s) of K have the same value for K. Simply sending the modification to the backup owner(s) asynchronously won't guarantee this, as modification messages might get processed out of order as they're OOB ! Suggested solution ---------------- The modification RPC needs to be invoked *outside of the lock scope*: - lock K - modify K - unlock K - send modification to backup owner(s) // outside the lock scope The primary owner puts the modification of K into a queue from where a separate thread/task removes it. The thread then invokes the PUT(K) on the backup owner(s). The queue has the modified keys in FIFO order, so the modifications arrive at the backup owner(s) in the right order. This requires that the way GET is implemented changes slightly: instead of invoking a GET on all owners of K, we only invoke it on the primary owner, then the next-in-line etc. The reason for this is that the backup owner(s) may not yet have received the modification of K. This is a better impl anyway (we discussed this before) becuse it generates less traffic; in the normal case, all but 1 GET requests are unnecessary. Improvement ----------- The above solution can be simplified and even made more efficient. Re-using concepts from IRAC [2], we can simply store the modified *keys* in the modification queue. The modification replication thread removes the key, gets the current value and invokes a PUT/REMOVE on the backup owner(s). Even better: a key is only ever added *once*, so if we have [5,2,17,3], adding key 2 is a no-op because the processing of key 2 (in second position in the queue) will fetch the up-to-date value anyway ! Misc ---- - Could we possibly use total order to send the updates in TO ? TBD (Pedro?) Thoughts ? [1] https://github.com/belaban/IspnPerfTest [2] https://github.com/infinispan/infinispan/wiki/RAC:-Reliable-Asynchronous-Clustering -- Bela Ban, JGroups lead (http://www.jgroups.org) From bban at redhat.com Tue Jul 29 10:38:02 2014 From: bban at redhat.com (Bela Ban) Date: Tue, 29 Jul 2014 16:38:02 +0200 Subject: [infinispan-dev] DIST-SYNC, put(), a problem and a solution In-Reply-To: <53D7A7F6.1010100@redhat.com> References: <53D7A7F6.1010100@redhat.com> Message-ID: <53D7B1CA.9000309@redhat.com> Following up on my own email, I changed the config to use Pedro's excellent total order implementation: With 100 nodes and 25 requester threads/node, I did NOT run into any locking issues ! I could even go up to 200 requester threads/node and the perf was ~ 7'000-8'000 requests/sec/node. Not too bad ! This really validates the concept of lockless total-order dissemination of TXs; for the first time, this has been tested on a large(r) scale (previously only on 25 nodes) and IT WORKS ! :-) I still believe we should implement my suggested solution for non-TO configs, but short of configuring thread pools of 1000 threads or higher, I hope TO will allow me to finally test a 500 node Infinispan cluster ! On 29/07/14 15:56, Bela Ban wrote: > Hi guys, > > sorry for the long post, but I do think I ran into an important problem > and we need to fix it ... :-) > > I've spent the last couple of days running the IspnPerfTest [1] perftest > on Google Compute Engine (GCE), and I've run into a problem with > Infinispan. It is a design problem and can be mitigated by sizing thread > pools correctly, but cannot be eliminated entirely. > > > Symptom: > -------- > IspnPerfTest has every node in a cluster perform 20'000 requests on keys > in range [1..20000]. > > 80% of the requests are reads and 20% writes. > > By default, we have 25 requester threads per node and 100 nodes in a > cluster, so a total of 2500 requester threads. > > The cache used is NON-TRANSACTIONAL / dist-sync / 2 owners: > > > > > > > > > useEagerLocking="true" > eagerLockSingleNode="true" /> > isolationLevel="READ_COMMITTED" useLockStriping="false" /> > > > It has 2 owners, a lock acquisition timeout of 5s and a repl timeout of > 20s. Lock stripting is off, so we have 1 lock per key. > > When I run the test, I always get errors like those below: > > org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock > after [10 seconds] on key [19386] for requestor [Thread[invoker-3,5,main]]! > Lock held by [Thread[OOB-194,ispn-perf-test,m5.1,5,main]] > > and > > org.infinispan.util.concurrent.TimeoutException: Node m8.1 timed out > > > Investigation: > ------------ > When I looked at UNICAST3, I saw a lot of missing messages on the > receive side and unacked messages on the send side. This caused me to > look into the (mainly OOB) thread pools and - voila - maxed out ! > > I learned from Pedro that the Infinispan internal thread pool (with a > default of 32 threads) can be configured, so I increased it to 300 and > increased the OOB pools as well. > > This mitigated the problem somewhat, but when I increased the requester > threads to 100, I had the same problem again. Apparently, the Infinispan > internal thread pool uses a rejection policy of "run" and thus uses the > JGroups (OOB) thread when exhausted. > > I learned (from Pedro and Mircea) that GETs and PUTs work as follows in > dist-sync / 2 owners: > - GETs are sent to the primary and backup owners and the first response > received is returned to the caller. No locks are acquired, so GETs > shouldn't cause problems. > > - A PUT(K) is sent to the primary owner of K > - The primary owner > (1) locks K > (2) updates the backup owner synchronously *while holding the lock* > (3) releases the lock > > > Hypothesis > ---------- > (2) above is done while holding the lock. The sync update of the backup > owner is done with the lock held to guarantee that the primary and > backup owner of K have the same values for K. > > However, the sync update *inside the lock scope* slows things down (can > it also lead to deadlocks?); there's the risk that the request is > dropped due to a full incoming thread pool, or that the response is not > received because of the same, or that the locking at the backup owner > blocks for some time. > > If we have many threads modifying the same key, then we have a backlog > of locking work against that key. Say we have 100 requester threads and > a 100 node cluster. This means that we have 10'000 threads accessing > keys; with 2'000 writers there's a big chance that some writers pick the > same key at the same time. > > For example, if we have 100 threads accessing key K and it takes 3ms to > replicate K to the backup owner, then the last of the 100 threads waits > ~300ms before it gets a chance to lock K on the primary owner and > replicate it as well. > > Just a small hiccup in sending the PUT to the primary owner, sending the > modification to the backup owner, waitting for the response, or GC, and > the delay will quickly become bigger. > > > Verification > ---------- > To verify the above, I set numOwners to 1. This means that the primary > owner of K does *not* send the modification to the backup owner, it only > locks K, modifies K and unlocks K again. > > I ran the IspnPerfTest again on 100 nodes, with 25 requesters, and NO > PROBLEM ! > > I then increased the requesters to 100, 150 and 200 and the test > completed flawlessly ! Performance was around *40'000 requests per node > per sec* on 4-core boxes ! > > > Root cause > --------- > ******************* > The root cause is the sync RPC of K to the backup owner(s) of K while > the primary owner holds the lock for K. > ******************* > > This causes a backlog of threads waiting for the lock and that backlog > can grow to exhaust the thread pools. First the Infinispan internal > thread pool, then the JGroups OOB thread pool. The latter causes > retransmissions to get dropped, which compounds the problem... > > > Goal > ---- > The goal is to make sure that primary and backup owner(s) of K have the > same value for K. > > Simply sending the modification to the backup owner(s) asynchronously > won't guarantee this, as modification messages might get processed out > of order as they're OOB ! > > > Suggested solution > ---------------- > The modification RPC needs to be invoked *outside of the lock scope*: > - lock K > - modify K > - unlock K > - send modification to backup owner(s) // outside the lock scope > > The primary owner puts the modification of K into a queue from where a > separate thread/task removes it. The thread then invokes the PUT(K) on > the backup owner(s). > > The queue has the modified keys in FIFO order, so the modifications > arrive at the backup owner(s) in the right order. > > This requires that the way GET is implemented changes slightly: instead > of invoking a GET on all owners of K, we only invoke it on the primary > owner, then the next-in-line etc. > > The reason for this is that the backup owner(s) may not yet have > received the modification of K. > > This is a better impl anyway (we discussed this before) becuse it > generates less traffic; in the normal case, all but 1 GET requests are > unnecessary. > > > > Improvement > ----------- > The above solution can be simplified and even made more efficient. > Re-using concepts from IRAC [2], we can simply store the modified *keys* > in the modification queue. The modification replication thread removes > the key, gets the current value and invokes a PUT/REMOVE on the backup > owner(s). > > Even better: a key is only ever added *once*, so if we have [5,2,17,3], > adding key 2 is a no-op because the processing of key 2 (in second > position in the queue) will fetch the up-to-date value anyway ! > > > Misc > ---- > - Could we possibly use total order to send the updates in TO ? TBD (Pedro?) > > > Thoughts ? > > > [1] https://github.com/belaban/IspnPerfTest > [2] > https://github.com/infinispan/infinispan/wiki/RAC:-Reliable-Asynchronous-Clustering > -- Bela Ban, JGroups lead (http://www.jgroups.org) From dan.berindei at gmail.com Tue Jul 29 10:39:04 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Tue, 29 Jul 2014 17:39:04 +0300 Subject: [infinispan-dev] DIST-SYNC, put(), a problem and a solution In-Reply-To: <53D7A7F6.1010100@redhat.com> References: <53D7A7F6.1010100@redhat.com> Message-ID: On Tue, Jul 29, 2014 at 4:56 PM, Bela Ban wrote: > Hi guys, > > sorry for the long post, but I do think I ran into an important problem > and we need to fix it ... :-) > > I've spent the last couple of days running the IspnPerfTest [1] perftest > on Google Compute Engine (GCE), and I've run into a problem with > Infinispan. It is a design problem and can be mitigated by sizing thread > pools correctly, but cannot be eliminated entirely. > > > Symptom: > -------- > IspnPerfTest has every node in a cluster perform 20'000 requests on keys > in range [1..20000]. > > 80% of the requests are reads and 20% writes. > > By default, we have 25 requester threads per node and 100 nodes in a > cluster, so a total of 2500 requester threads. > > The cache used is NON-TRANSACTIONAL / dist-sync / 2 owners: > > > > > > > > > useEagerLocking="true" > eagerLockSingleNode="true" /> > isolationLevel="READ_COMMITTED" useLockStriping="false" /> > > > It has 2 owners, a lock acquisition timeout of 5s and a repl timeout of > 20s. Lock stripting is off, so we have 1 lock per key. > > When I run the test, I always get errors like those below: > > org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock > after [10 seconds] on key [19386] for requestor [Thread[invoker-3,5,main]]! > Lock held by [Thread[OOB-194,ispn-perf-test,m5.1,5,main]] > > and > > org.infinispan.util.concurrent.TimeoutException: Node m8.1 timed out > > > Investigation: > ------------ > When I looked at UNICAST3, I saw a lot of missing messages on the > receive side and unacked messages on the send side. This caused me to > look into the (mainly OOB) thread pools and - voila - maxed out ! > > I learned from Pedro that the Infinispan internal thread pool (with a > default of 32 threads) can be configured, so I increased it to 300 and > increased the OOB pools as well. > > This mitigated the problem somewhat, but when I increased the requester > threads to 100, I had the same problem again. Apparently, the Infinispan > internal thread pool uses a rejection policy of "run" and thus uses the > JGroups (OOB) thread when exhausted. > We can't use another rejection policy in the remote executor because the message won't be re-delivered by JGroups, and we can't use a queue either. Pedro is working on ISPN-2849, which should help with the remote/OOB thread pool exhaustion. It is a bit tricky, though, because our interceptors assume they will be able to access stack variables after replication. > > I learned (from Pedro and Mircea) that GETs and PUTs work as follows in > dist-sync / 2 owners: > - GETs are sent to the primary and backup owners and the first response > received is returned to the caller. No locks are acquired, so GETs > shouldn't cause problems. > > - A PUT(K) is sent to the primary owner of K > - The primary owner > (1) locks K > (2) updates the backup owner synchronously *while holding the lock* > (3) releases the lock > > > Hypothesis > ---------- > (2) above is done while holding the lock. The sync update of the backup > owner is done with the lock held to guarantee that the primary and > backup owner of K have the same values for K. > And something else: if the primary owner reports that a write was successful and then dies, a read should find the updated value on the backup owner(s). > However, the sync update *inside the lock scope* slows things down (can > it also lead to deadlocks?); there's the risk that the request is > dropped due to a full incoming thread pool, or that the response is not > received because of the same, or that the locking at the backup owner > blocks for some time. > There is no locking on the backup owner, so there are no deadlocks. There is indeed a risk of the OOB/remote thread pools being full. > > If we have many threads modifying the same key, then we have a backlog > of locking work against that key. Say we have 100 requester threads and > a 100 node cluster. This means that we have 10'000 threads accessing > keys; with 2'000 writers there's a big chance that some writers pick the > same key at the same time. > > For example, if we have 100 threads accessing key K and it takes 3ms to > replicate K to the backup owner, then the last of the 100 threads waits > ~300ms before it gets a chance to lock K on the primary owner and > replicate it as well. > > Just a small hiccup in sending the PUT to the primary owner, sending the > modification to the backup owner, waitting for the response, or GC, and > the delay will quickly become bigger. > > > Verification > ---------- > To verify the above, I set numOwners to 1. This means that the primary > owner of K does *not* send the modification to the backup owner, it only > locks K, modifies K and unlocks K again. > > I ran the IspnPerfTest again on 100 nodes, with 25 requesters, and NO > PROBLEM ! > > I then increased the requesters to 100, 150 and 200 and the test > completed flawlessly ! Performance was around *40'000 requests per node > per sec* on 4-core boxes ! > > > Root cause > --------- > ******************* > The root cause is the sync RPC of K to the backup owner(s) of K while > the primary owner holds the lock for K. > ******************* > > This causes a backlog of threads waiting for the lock and that backlog > can grow to exhaust the thread pools. First the Infinispan internal > thread pool, then the JGroups OOB thread pool. The latter causes > retransmissions to get dropped, which compounds the problem... > > > Goal > ---- > The goal is to make sure that primary and backup owner(s) of K have the > same value for K. > > Simply sending the modification to the backup owner(s) asynchronously > won't guarantee this, as modification messages might get processed out > of order as they're OOB ! > > > Suggested solution > ---------------- > The modification RPC needs to be invoked *outside of the lock scope*: > - lock K > - modify K > - unlock K > - send modification to backup owner(s) // outside the lock scope > > The primary owner puts the modification of K into a queue from where a > separate thread/task removes it. The thread then invokes the PUT(K) on > the backup owner(s). > Does the replication thread execute the PUT(k) synchronously, or asynchronously? I assume asynchronously, otherwise the replication thread wouldn't be able to keep up with the writers. > > The queue has the modified keys in FIFO order, so the modifications > arrive at the backup owner(s) in the right order. > Sending the RPC to the backup owners asynchronously, while holding the key lock, would do the same thing. > > This requires that the way GET is implemented changes slightly: instead > of invoking a GET on all owners of K, we only invoke it on the primary > owner, then the next-in-line etc. > What's the next-in-line owner? A backup won't have the last version of the data. > The reason for this is that the backup owner(s) may not yet have > received the modification of K. > > OTOH, if the primary owner dies, we have to ask a backup, and we can lose the modifications not yet replicated by the primary. > This is a better impl anyway (we discussed this before) becuse it > generates less traffic; in the normal case, all but 1 GET requests are > unnecessary. > > I have a WIP branch for this and it seemed to work fine. Test suite speed seemed about the same, but I didn't get to do a real performance test. > > > Improvement > ----------- > The above solution can be simplified and even made more efficient. > Re-using concepts from IRAC [2], we can simply store the modified *keys* > in the modification queue. The modification replication thread removes > the key, gets the current value and invokes a PUT/REMOVE on the backup > owner(s). > > Even better: a key is only ever added *once*, so if we have [5,2,17,3], > adding key 2 is a no-op because the processing of key 2 (in second > position in the queue) will fetch the up-to-date value anyway ! > > > Misc > ---- > - Could we possibly use total order to send the updates in TO ? TBD > (Pedro?) > > > Thoughts ? > > > [1] https://github.com/belaban/IspnPerfTest > [2] > > https://github.com/infinispan/infinispan/wiki/RAC:-Reliable-Asynchronous-Clustering > > -- > Bela Ban, JGroups lead (http://www.jgroups.org) > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140729/a004600e/attachment.html From dan.berindei at gmail.com Tue Jul 29 10:42:29 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Tue, 29 Jul 2014 17:42:29 +0300 Subject: [infinispan-dev] DIST-SYNC, put(), a problem and a solution In-Reply-To: <53D7B1CA.9000309@redhat.com> References: <53D7A7F6.1010100@redhat.com> <53D7B1CA.9000309@redhat.com> Message-ID: Have you tried regular optimistic/pessimistic transactions as well? They *should* have less issues with the OOB thread pool than non-tx mode, and I'm quite curious how they stack against TO in such a large cluster. On Tue, Jul 29, 2014 at 5:38 PM, Bela Ban wrote: > Following up on my own email, I changed the config to use Pedro's > excellent total order implementation: > > transactionProtocol="TOTAL_ORDER" lockingMode="OPTIMISTIC" > useEagerLocking="true" eagerLockSingleNode="true"> > > > With 100 nodes and 25 requester threads/node, I did NOT run into any > locking issues ! > > I could even go up to 200 requester threads/node and the perf was ~ > 7'000-8'000 requests/sec/node. Not too bad ! > > This really validates the concept of lockless total-order dissemination > of TXs; for the first time, this has been tested on a large(r) scale > (previously only on 25 nodes) and IT WORKS ! :-) > > I still believe we should implement my suggested solution for non-TO > configs, but short of configuring thread pools of 1000 threads or > higher, I hope TO will allow me to finally test a 500 node Infinispan > cluster ! > > > On 29/07/14 15:56, Bela Ban wrote: > > Hi guys, > > > > sorry for the long post, but I do think I ran into an important problem > > and we need to fix it ... :-) > > > > I've spent the last couple of days running the IspnPerfTest [1] perftest > > on Google Compute Engine (GCE), and I've run into a problem with > > Infinispan. It is a design problem and can be mitigated by sizing thread > > pools correctly, but cannot be eliminated entirely. > > > > > > Symptom: > > -------- > > IspnPerfTest has every node in a cluster perform 20'000 requests on keys > > in range [1..20000]. > > > > 80% of the requests are reads and 20% writes. > > > > By default, we have 25 requester threads per node and 100 nodes in a > > cluster, so a total of 2500 requester threads. > > > > The cache used is NON-TRANSACTIONAL / dist-sync / 2 owners: > > > > > > > > > > > > > > > > > > > useEagerLocking="true" > > eagerLockSingleNode="true" /> > > > isolationLevel="READ_COMMITTED" useLockStriping="false" > /> > > > > > > It has 2 owners, a lock acquisition timeout of 5s and a repl timeout of > > 20s. Lock stripting is off, so we have 1 lock per key. > > > > When I run the test, I always get errors like those below: > > > > org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock > > after [10 seconds] on key [19386] for requestor > [Thread[invoker-3,5,main]]! > > Lock held by [Thread[OOB-194,ispn-perf-test,m5.1,5,main]] > > > > and > > > > org.infinispan.util.concurrent.TimeoutException: Node m8.1 timed out > > > > > > Investigation: > > ------------ > > When I looked at UNICAST3, I saw a lot of missing messages on the > > receive side and unacked messages on the send side. This caused me to > > look into the (mainly OOB) thread pools and - voila - maxed out ! > > > > I learned from Pedro that the Infinispan internal thread pool (with a > > default of 32 threads) can be configured, so I increased it to 300 and > > increased the OOB pools as well. > > > > This mitigated the problem somewhat, but when I increased the requester > > threads to 100, I had the same problem again. Apparently, the Infinispan > > internal thread pool uses a rejection policy of "run" and thus uses the > > JGroups (OOB) thread when exhausted. > > > > I learned (from Pedro and Mircea) that GETs and PUTs work as follows in > > dist-sync / 2 owners: > > - GETs are sent to the primary and backup owners and the first response > > received is returned to the caller. No locks are acquired, so GETs > > shouldn't cause problems. > > > > - A PUT(K) is sent to the primary owner of K > > - The primary owner > > (1) locks K > > (2) updates the backup owner synchronously *while holding the > lock* > > (3) releases the lock > > > > > > Hypothesis > > ---------- > > (2) above is done while holding the lock. The sync update of the backup > > owner is done with the lock held to guarantee that the primary and > > backup owner of K have the same values for K. > > > > However, the sync update *inside the lock scope* slows things down (can > > it also lead to deadlocks?); there's the risk that the request is > > dropped due to a full incoming thread pool, or that the response is not > > received because of the same, or that the locking at the backup owner > > blocks for some time. > > > > If we have many threads modifying the same key, then we have a backlog > > of locking work against that key. Say we have 100 requester threads and > > a 100 node cluster. This means that we have 10'000 threads accessing > > keys; with 2'000 writers there's a big chance that some writers pick the > > same key at the same time. > > > > For example, if we have 100 threads accessing key K and it takes 3ms to > > replicate K to the backup owner, then the last of the 100 threads waits > > ~300ms before it gets a chance to lock K on the primary owner and > > replicate it as well. > > > > Just a small hiccup in sending the PUT to the primary owner, sending the > > modification to the backup owner, waitting for the response, or GC, and > > the delay will quickly become bigger. > > > > > > Verification > > ---------- > > To verify the above, I set numOwners to 1. This means that the primary > > owner of K does *not* send the modification to the backup owner, it only > > locks K, modifies K and unlocks K again. > > > > I ran the IspnPerfTest again on 100 nodes, with 25 requesters, and NO > > PROBLEM ! > > > > I then increased the requesters to 100, 150 and 200 and the test > > completed flawlessly ! Performance was around *40'000 requests per node > > per sec* on 4-core boxes ! > > > > > > Root cause > > --------- > > ******************* > > The root cause is the sync RPC of K to the backup owner(s) of K while > > the primary owner holds the lock for K. > > ******************* > > > > This causes a backlog of threads waiting for the lock and that backlog > > can grow to exhaust the thread pools. First the Infinispan internal > > thread pool, then the JGroups OOB thread pool. The latter causes > > retransmissions to get dropped, which compounds the problem... > > > > > > Goal > > ---- > > The goal is to make sure that primary and backup owner(s) of K have the > > same value for K. > > > > Simply sending the modification to the backup owner(s) asynchronously > > won't guarantee this, as modification messages might get processed out > > of order as they're OOB ! > > > > > > Suggested solution > > ---------------- > > The modification RPC needs to be invoked *outside of the lock scope*: > > - lock K > > - modify K > > - unlock K > > - send modification to backup owner(s) // outside the lock scope > > > > The primary owner puts the modification of K into a queue from where a > > separate thread/task removes it. The thread then invokes the PUT(K) on > > the backup owner(s). > > > > The queue has the modified keys in FIFO order, so the modifications > > arrive at the backup owner(s) in the right order. > > > > This requires that the way GET is implemented changes slightly: instead > > of invoking a GET on all owners of K, we only invoke it on the primary > > owner, then the next-in-line etc. > > > > The reason for this is that the backup owner(s) may not yet have > > received the modification of K. > > > > This is a better impl anyway (we discussed this before) becuse it > > generates less traffic; in the normal case, all but 1 GET requests are > > unnecessary. > > > > > > > > Improvement > > ----------- > > The above solution can be simplified and even made more efficient. > > Re-using concepts from IRAC [2], we can simply store the modified *keys* > > in the modification queue. The modification replication thread removes > > the key, gets the current value and invokes a PUT/REMOVE on the backup > > owner(s). > > > > Even better: a key is only ever added *once*, so if we have [5,2,17,3], > > adding key 2 is a no-op because the processing of key 2 (in second > > position in the queue) will fetch the up-to-date value anyway ! > > > > > > Misc > > ---- > > - Could we possibly use total order to send the updates in TO ? TBD > (Pedro?) > > > > > > Thoughts ? > > > > > > [1] https://github.com/belaban/IspnPerfTest > > [2] > > > https://github.com/infinispan/infinispan/wiki/RAC:-Reliable-Asynchronous-Clustering > > > > -- > Bela Ban, JGroups lead (http://www.jgroups.org) > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140729/987ddb2c/attachment-0001.html From bban at redhat.com Tue Jul 29 10:50:27 2014 From: bban at redhat.com (Bela Ban) Date: Tue, 29 Jul 2014 16:50:27 +0200 Subject: [infinispan-dev] DIST-SYNC, put(), a problem and a solution In-Reply-To: References: <53D7A7F6.1010100@redhat.com> <53D7B1CA.9000309@redhat.com> Message-ID: <53D7B4B3.4080008@redhat.com> On 29/07/14 16:42, Dan Berindei wrote: > Have you tried regular optimistic/pessimistic transactions as well? Yes, in my first impl. but since I'm making only 1 change per request, I thought a TX is overkill. > They *should* have less issues with the OOB thread pool than non-tx mode, and > I'm quite curious how they stack against TO in such a large cluster. Why would they have fewer issues with the thread pools ? AIUI, a TX involves 2 RPCs (PREPARE-COMMIT/ROLLBACK) compared to one when not using TXs. And we're sync anyway... > On Tue, Jul 29, 2014 at 5:38 PM, Bela Ban > wrote: > > Following up on my own email, I changed the config to use Pedro's > excellent total order implementation: > > transactionProtocol="TOTAL_ORDER" lockingMode="OPTIMISTIC" > useEagerLocking="true" eagerLockSingleNode="true"> > > > With 100 nodes and 25 requester threads/node, I did NOT run into any > locking issues ! > > I could even go up to 200 requester threads/node and the perf was ~ > 7'000-8'000 requests/sec/node. Not too bad ! > > This really validates the concept of lockless total-order dissemination > of TXs; for the first time, this has been tested on a large(r) scale > (previously only on 25 nodes) and IT WORKS ! :-) > > I still believe we should implement my suggested solution for non-TO > configs, but short of configuring thread pools of 1000 threads or > higher, I hope TO will allow me to finally test a 500 node Infinispan > cluster ! > > > On 29/07/14 15:56, Bela Ban wrote: > > Hi guys, > > > > sorry for the long post, but I do think I ran into an important > problem > > and we need to fix it ... :-) > > > > I've spent the last couple of days running the IspnPerfTest [1] > perftest > > on Google Compute Engine (GCE), and I've run into a problem with > > Infinispan. It is a design problem and can be mitigated by sizing > thread > > pools correctly, but cannot be eliminated entirely. > > > > > > Symptom: > > -------- > > IspnPerfTest has every node in a cluster perform 20'000 requests > on keys > > in range [1..20000]. > > > > 80% of the requests are reads and 20% writes. > > > > By default, we have 25 requester threads per node and 100 nodes in a > > cluster, so a total of 2500 requester threads. > > > > The cache used is NON-TRANSACTIONAL / dist-sync / 2 owners: > > > > > > > > > > > > > > > > > > > useEagerLocking="true" > > eagerLockSingleNode="true" /> > > > isolationLevel="READ_COMMITTED" > useLockStriping="false" /> > > > > > > It has 2 owners, a lock acquisition timeout of 5s and a repl > timeout of > > 20s. Lock stripting is off, so we have 1 lock per key. > > > > When I run the test, I always get errors like those below: > > > > org.infinispan.util.concurrent.TimeoutException: Unable to > acquire lock > > after [10 seconds] on key [19386] for requestor > [Thread[invoker-3,5,main]]! > > Lock held by [Thread[OOB-194,ispn-perf-test,m5.1,5,main]] > > > > and > > > > org.infinispan.util.concurrent.TimeoutException: Node m8.1 timed out > > > > > > Investigation: > > ------------ > > When I looked at UNICAST3, I saw a lot of missing messages on the > > receive side and unacked messages on the send side. This caused me to > > look into the (mainly OOB) thread pools and - voila - maxed out ! > > > > I learned from Pedro that the Infinispan internal thread pool (with a > > default of 32 threads) can be configured, so I increased it to > 300 and > > increased the OOB pools as well. > > > > This mitigated the problem somewhat, but when I increased the > requester > > threads to 100, I had the same problem again. Apparently, the > Infinispan > > internal thread pool uses a rejection policy of "run" and thus > uses the > > JGroups (OOB) thread when exhausted. > > > > I learned (from Pedro and Mircea) that GETs and PUTs work as > follows in > > dist-sync / 2 owners: > > - GETs are sent to the primary and backup owners and the first > response > > received is returned to the caller. No locks are acquired, so GETs > > shouldn't cause problems. > > > > - A PUT(K) is sent to the primary owner of K > > - The primary owner > > (1) locks K > > (2) updates the backup owner synchronously *while holding > the lock* > > (3) releases the lock > > > > > > Hypothesis > > ---------- > > (2) above is done while holding the lock. The sync update of the > backup > > owner is done with the lock held to guarantee that the primary and > > backup owner of K have the same values for K. > > > > However, the sync update *inside the lock scope* slows things > down (can > > it also lead to deadlocks?); there's the risk that the request is > > dropped due to a full incoming thread pool, or that the response > is not > > received because of the same, or that the locking at the backup owner > > blocks for some time. > > > > If we have many threads modifying the same key, then we have a > backlog > > of locking work against that key. Say we have 100 requester > threads and > > a 100 node cluster. This means that we have 10'000 threads accessing > > keys; with 2'000 writers there's a big chance that some writers > pick the > > same key at the same time. > > > > For example, if we have 100 threads accessing key K and it takes > 3ms to > > replicate K to the backup owner, then the last of the 100 threads > waits > > ~300ms before it gets a chance to lock K on the primary owner and > > replicate it as well. > > > > Just a small hiccup in sending the PUT to the primary owner, > sending the > > modification to the backup owner, waitting for the response, or > GC, and > > the delay will quickly become bigger. > > > > > > Verification > > ---------- > > To verify the above, I set numOwners to 1. This means that the > primary > > owner of K does *not* send the modification to the backup owner, > it only > > locks K, modifies K and unlocks K again. > > > > I ran the IspnPerfTest again on 100 nodes, with 25 requesters, and NO > > PROBLEM ! > > > > I then increased the requesters to 100, 150 and 200 and the test > > completed flawlessly ! Performance was around *40'000 requests > per node > > per sec* on 4-core boxes ! > > > > > > Root cause > > --------- > > ******************* > > The root cause is the sync RPC of K to the backup owner(s) of K while > > the primary owner holds the lock for K. > > ******************* > > > > This causes a backlog of threads waiting for the lock and that > backlog > > can grow to exhaust the thread pools. First the Infinispan internal > > thread pool, then the JGroups OOB thread pool. The latter causes > > retransmissions to get dropped, which compounds the problem... > > > > > > Goal > > ---- > > The goal is to make sure that primary and backup owner(s) of K > have the > > same value for K. > > > > Simply sending the modification to the backup owner(s) asynchronously > > won't guarantee this, as modification messages might get > processed out > > of order as they're OOB ! > > > > > > Suggested solution > > ---------------- > > The modification RPC needs to be invoked *outside of the lock scope*: > > - lock K > > - modify K > > - unlock K > > - send modification to backup owner(s) // outside the lock scope > > > > The primary owner puts the modification of K into a queue from > where a > > separate thread/task removes it. The thread then invokes the > PUT(K) on > > the backup owner(s). > > > > The queue has the modified keys in FIFO order, so the modifications > > arrive at the backup owner(s) in the right order. > > > > This requires that the way GET is implemented changes slightly: > instead > > of invoking a GET on all owners of K, we only invoke it on the > primary > > owner, then the next-in-line etc. > > > > The reason for this is that the backup owner(s) may not yet have > > received the modification of K. > > > > This is a better impl anyway (we discussed this before) becuse it > > generates less traffic; in the normal case, all but 1 GET > requests are > > unnecessary. > > > > > > > > Improvement > > ----------- > > The above solution can be simplified and even made more efficient. > > Re-using concepts from IRAC [2], we can simply store the modified > *keys* > > in the modification queue. The modification replication thread > removes > > the key, gets the current value and invokes a PUT/REMOVE on the > backup > > owner(s). > > > > Even better: a key is only ever added *once*, so if we have > [5,2,17,3], > > adding key 2 is a no-op because the processing of key 2 (in second > > position in the queue) will fetch the up-to-date value anyway ! > > > > > > Misc > > ---- > > - Could we possibly use total order to send the updates in TO ? > TBD (Pedro?) > > > > > > Thoughts ? > > > > > > [1] https://github.com/belaban/IspnPerfTest > > [2] > > > https://github.com/infinispan/infinispan/wiki/RAC:-Reliable-Asynchronous-Clustering > > > > -- > Bela Ban, JGroups lead (http://www.jgroups.org) > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -- Bela Ban, JGroups lead (http://www.jgroups.org) From sanne at infinispan.org Tue Jul 29 11:34:36 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Tue, 29 Jul 2014 16:34:36 +0100 Subject: [infinispan-dev] On ParserRegistry and classloaders In-Reply-To: <53831239.30601@redhat.com> References: <9A22CC58-5B20-4D6B-BA6B-B4A23493979F@redhat.com> <53831239.30601@redhat.com> Message-ID: All, in Search we wrap the Parser in a decorator which workarounds the classloader limitation. I still think you should fix this, it doesn't matter how/why it was changed. Sanne On 26 May 2014 11:06, Ion Savin wrote: > Hi Sanne, Galder, > > On 05/23/2014 07:08 PM, Sanne Grinovero wrote: >> On 23 May 2014 08:03, Galder Zamarre?o wrote: >>> >Hey Sanne, >>> > >>> >I?ve looked at ParserRegistry and not sure I see the changes you are referring to? >>> > >>> >>From what I?ve seen, ParserRegistry has taken class loader in the constructor since the start. >> Yes, and that was good as we've been using it: it might need >> directions to be pointed at the right modules to load extension >> points. >> >> My problem is not that the constructor takes a ClassLoader, but that >> other options have been removed; essentially in my scenario the module >> containing the extension points does not contain the configuration >> file I want it to load, and the actual classLoader I want the >> CacheManager to use is yet a different one. As explained below, >> assembling a single "catch all" ClassLoader to delegate to all doesn't >> work as some of these actually need to be strictly isolated to prevent >> ambiguities. >> >>> >I suspect you might be referring to classloader related changes as a result of OSGI integration? >> I didn't check but that sounds like a reasonable estimate. > > I had a look at the OSGi-related changes done for this class and they > don't alter the class interface in any way. The implementation changes > related to FileLookup seem to maintain the same behavior for non-OSGi > contexts also. > > Regards, > Ion Savin > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From sanne at infinispan.org Tue Jul 29 14:06:31 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Tue, 29 Jul 2014 19:06:31 +0100 Subject: [infinispan-dev] DIST-SYNC, put(), a problem and a solution In-Reply-To: <53D7B4B3.4080008@redhat.com> References: <53D7A7F6.1010100@redhat.com> <53D7B1CA.9000309@redhat.com> <53D7B4B3.4080008@redhat.com> Message-ID: This is a nasty problem and I also feel passionately we need to get rid of it ASAP. I did have the same problems many times, and we discussed this also in Farnborough; AFAIR Dan and Pedro had some excellent ideas to fix this. You don't need TO, and you don't need to lock at all as long as you guarantee the backup owners are getting the number with some monotonicity sequence attached to it, all that backup owners need to do is ignore incoming commands which are outdated. Another aspect is that the "user thread" on the primary owner needs to wait (at least until we improve further) and only proceed after ACK from backup nodes, but this is better modelled through a state machine. (Also discussed in Farnborough). It's also conceptually linked to: - https://issues.jboss.org/browse/ISPN-1599 As you need to separate the locks of entries from the effective user facing lock, at least to implement transactions on top of this model. I expect this to improve performance in a very significant way, but it's getting embarrassing that it's still not done; at the next face to face meeting we should also reserve some time for retrospective sessions. Sanne On 29 July 2014 15:50, Bela Ban wrote: > > > On 29/07/14 16:42, Dan Berindei wrote: >> Have you tried regular optimistic/pessimistic transactions as well? > > Yes, in my first impl. but since I'm making only 1 change per request, I > thought a TX is overkill. > >> They *should* have less issues with the OOB thread pool than non-tx mode, and >> I'm quite curious how they stack against TO in such a large cluster. > > Why would they have fewer issues with the thread pools ? AIUI, a TX > involves 2 RPCs (PREPARE-COMMIT/ROLLBACK) compared to one when not using > TXs. And we're sync anyway... > > >> On Tue, Jul 29, 2014 at 5:38 PM, Bela Ban > > wrote: >> >> Following up on my own email, I changed the config to use Pedro's >> excellent total order implementation: >> >> > transactionProtocol="TOTAL_ORDER" lockingMode="OPTIMISTIC" >> useEagerLocking="true" eagerLockSingleNode="true"> >> >> >> With 100 nodes and 25 requester threads/node, I did NOT run into any >> locking issues ! >> >> I could even go up to 200 requester threads/node and the perf was ~ >> 7'000-8'000 requests/sec/node. Not too bad ! >> >> This really validates the concept of lockless total-order dissemination >> of TXs; for the first time, this has been tested on a large(r) scale >> (previously only on 25 nodes) and IT WORKS ! :-) >> >> I still believe we should implement my suggested solution for non-TO >> configs, but short of configuring thread pools of 1000 threads or >> higher, I hope TO will allow me to finally test a 500 node Infinispan >> cluster ! >> >> >> On 29/07/14 15:56, Bela Ban wrote: >> > Hi guys, >> > >> > sorry for the long post, but I do think I ran into an important >> problem >> > and we need to fix it ... :-) >> > >> > I've spent the last couple of days running the IspnPerfTest [1] >> perftest >> > on Google Compute Engine (GCE), and I've run into a problem with >> > Infinispan. It is a design problem and can be mitigated by sizing >> thread >> > pools correctly, but cannot be eliminated entirely. >> > >> > >> > Symptom: >> > -------- >> > IspnPerfTest has every node in a cluster perform 20'000 requests >> on keys >> > in range [1..20000]. >> > >> > 80% of the requests are reads and 20% writes. >> > >> > By default, we have 25 requester threads per node and 100 nodes in a >> > cluster, so a total of 2500 requester threads. >> > >> > The cache used is NON-TRANSACTIONAL / dist-sync / 2 owners: >> > >> > >> > >> > >> > >> > >> > >> > >> > > > useEagerLocking="true" >> > eagerLockSingleNode="true" /> >> > > > isolationLevel="READ_COMMITTED" >> useLockStriping="false" /> >> > >> > >> > It has 2 owners, a lock acquisition timeout of 5s and a repl >> timeout of >> > 20s. Lock stripting is off, so we have 1 lock per key. >> > >> > When I run the test, I always get errors like those below: >> > >> > org.infinispan.util.concurrent.TimeoutException: Unable to >> acquire lock >> > after [10 seconds] on key [19386] for requestor >> [Thread[invoker-3,5,main]]! >> > Lock held by [Thread[OOB-194,ispn-perf-test,m5.1,5,main]] >> > >> > and >> > >> > org.infinispan.util.concurrent.TimeoutException: Node m8.1 timed out >> > >> > >> > Investigation: >> > ------------ >> > When I looked at UNICAST3, I saw a lot of missing messages on the >> > receive side and unacked messages on the send side. This caused me to >> > look into the (mainly OOB) thread pools and - voila - maxed out ! >> > >> > I learned from Pedro that the Infinispan internal thread pool (with a >> > default of 32 threads) can be configured, so I increased it to >> 300 and >> > increased the OOB pools as well. >> > >> > This mitigated the problem somewhat, but when I increased the >> requester >> > threads to 100, I had the same problem again. Apparently, the >> Infinispan >> > internal thread pool uses a rejection policy of "run" and thus >> uses the >> > JGroups (OOB) thread when exhausted. >> > >> > I learned (from Pedro and Mircea) that GETs and PUTs work as >> follows in >> > dist-sync / 2 owners: >> > - GETs are sent to the primary and backup owners and the first >> response >> > received is returned to the caller. No locks are acquired, so GETs >> > shouldn't cause problems. >> > >> > - A PUT(K) is sent to the primary owner of K >> > - The primary owner >> > (1) locks K >> > (2) updates the backup owner synchronously *while holding >> the lock* >> > (3) releases the lock >> > >> > >> > Hypothesis >> > ---------- >> > (2) above is done while holding the lock. The sync update of the >> backup >> > owner is done with the lock held to guarantee that the primary and >> > backup owner of K have the same values for K. >> > >> > However, the sync update *inside the lock scope* slows things >> down (can >> > it also lead to deadlocks?); there's the risk that the request is >> > dropped due to a full incoming thread pool, or that the response >> is not >> > received because of the same, or that the locking at the backup owner >> > blocks for some time. >> > >> > If we have many threads modifying the same key, then we have a >> backlog >> > of locking work against that key. Say we have 100 requester >> threads and >> > a 100 node cluster. This means that we have 10'000 threads accessing >> > keys; with 2'000 writers there's a big chance that some writers >> pick the >> > same key at the same time. >> > >> > For example, if we have 100 threads accessing key K and it takes >> 3ms to >> > replicate K to the backup owner, then the last of the 100 threads >> waits >> > ~300ms before it gets a chance to lock K on the primary owner and >> > replicate it as well. >> > >> > Just a small hiccup in sending the PUT to the primary owner, >> sending the >> > modification to the backup owner, waitting for the response, or >> GC, and >> > the delay will quickly become bigger. >> > >> > >> > Verification >> > ---------- >> > To verify the above, I set numOwners to 1. This means that the >> primary >> > owner of K does *not* send the modification to the backup owner, >> it only >> > locks K, modifies K and unlocks K again. >> > >> > I ran the IspnPerfTest again on 100 nodes, with 25 requesters, and NO >> > PROBLEM ! >> > >> > I then increased the requesters to 100, 150 and 200 and the test >> > completed flawlessly ! Performance was around *40'000 requests >> per node >> > per sec* on 4-core boxes ! >> > >> > >> > Root cause >> > --------- >> > ******************* >> > The root cause is the sync RPC of K to the backup owner(s) of K while >> > the primary owner holds the lock for K. >> > ******************* >> > >> > This causes a backlog of threads waiting for the lock and that >> backlog >> > can grow to exhaust the thread pools. First the Infinispan internal >> > thread pool, then the JGroups OOB thread pool. The latter causes >> > retransmissions to get dropped, which compounds the problem... >> > >> > >> > Goal >> > ---- >> > The goal is to make sure that primary and backup owner(s) of K >> have the >> > same value for K. >> > >> > Simply sending the modification to the backup owner(s) asynchronously >> > won't guarantee this, as modification messages might get >> processed out >> > of order as they're OOB ! >> > >> > >> > Suggested solution >> > ---------------- >> > The modification RPC needs to be invoked *outside of the lock scope*: >> > - lock K >> > - modify K >> > - unlock K >> > - send modification to backup owner(s) // outside the lock scope >> > >> > The primary owner puts the modification of K into a queue from >> where a >> > separate thread/task removes it. The thread then invokes the >> PUT(K) on >> > the backup owner(s). >> > >> > The queue has the modified keys in FIFO order, so the modifications >> > arrive at the backup owner(s) in the right order. >> > >> > This requires that the way GET is implemented changes slightly: >> instead >> > of invoking a GET on all owners of K, we only invoke it on the >> primary >> > owner, then the next-in-line etc. >> > >> > The reason for this is that the backup owner(s) may not yet have >> > received the modification of K. >> > >> > This is a better impl anyway (we discussed this before) becuse it >> > generates less traffic; in the normal case, all but 1 GET >> requests are >> > unnecessary. >> > >> > >> > >> > Improvement >> > ----------- >> > The above solution can be simplified and even made more efficient. >> > Re-using concepts from IRAC [2], we can simply store the modified >> *keys* >> > in the modification queue. The modification replication thread >> removes >> > the key, gets the current value and invokes a PUT/REMOVE on the >> backup >> > owner(s). >> > >> > Even better: a key is only ever added *once*, so if we have >> [5,2,17,3], >> > adding key 2 is a no-op because the processing of key 2 (in second >> > position in the queue) will fetch the up-to-date value anyway ! >> > >> > >> > Misc >> > ---- >> > - Could we possibly use total order to send the updates in TO ? >> TBD (Pedro?) >> > >> > >> > Thoughts ? >> > >> > >> > [1] https://github.com/belaban/IspnPerfTest >> > [2] >> > >> https://github.com/infinispan/infinispan/wiki/RAC:-Reliable-Asynchronous-Clustering >> > >> >> -- >> Bela Ban, JGroups lead (http://www.jgroups.org) >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> > > -- > Bela Ban, JGroups lead (http://www.jgroups.org) > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From dan.berindei at gmail.com Tue Jul 29 16:29:29 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Tue, 29 Jul 2014 23:29:29 +0300 Subject: [infinispan-dev] DIST-SYNC, put(), a problem and a solution In-Reply-To: <53D7B4B3.4080008@redhat.com> References: <53D7A7F6.1010100@redhat.com> <53D7B1CA.9000309@redhat.com> <53D7B4B3.4080008@redhat.com> Message-ID: On Tue, Jul 29, 2014 at 5:50 PM, Bela Ban wrote: > > > On 29/07/14 16:42, Dan Berindei wrote: > > Have you tried regular optimistic/pessimistic transactions as well? > > Yes, in my first impl. but since I'm making only 1 change per request, I > thought a TX is overkill. > You are using txs with TO, right? > > > They *should* have less issues with the OOB thread pool than non-tx > mode, and > > I'm quite curious how they stack against TO in such a large cluster. > > Why would they have fewer issues with the thread pools ? AIUI, a TX > involves 2 RPCs (PREPARE-COMMIT/ROLLBACK) compared to one when not using > TXs. And we're sync anyway... > > Actually, 2 sync RPCs (prepare + commit) and 1 async RPC (tx completion notification). But we only keep the user thread busy across RPCs (unless L1 is enabled), so we actually need less OOB/remote threads. > > > On Tue, Jul 29, 2014 at 5:38 PM, Bela Ban > > wrote: > > > > Following up on my own email, I changed the config to use Pedro's > > excellent total order implementation: > > > > > transactionProtocol="TOTAL_ORDER" lockingMode="OPTIMISTIC" > > useEagerLocking="true" eagerLockSingleNode="true"> > > > > > > With 100 nodes and 25 requester threads/node, I did NOT run into any > > locking issues ! > > > > I could even go up to 200 requester threads/node and the perf was ~ > > 7'000-8'000 requests/sec/node. Not too bad ! > > > > This really validates the concept of lockless total-order > dissemination > > of TXs; for the first time, this has been tested on a large(r) scale > > (previously only on 25 nodes) and IT WORKS ! :-) > > > > I still believe we should implement my suggested solution for non-TO > > configs, but short of configuring thread pools of 1000 threads or > > higher, I hope TO will allow me to finally test a 500 node Infinispan > > cluster ! > > > > > > On 29/07/14 15:56, Bela Ban wrote: > > > Hi guys, > > > > > > sorry for the long post, but I do think I ran into an important > > problem > > > and we need to fix it ... :-) > > > > > > I've spent the last couple of days running the IspnPerfTest [1] > > perftest > > > on Google Compute Engine (GCE), and I've run into a problem with > > > Infinispan. It is a design problem and can be mitigated by sizing > > thread > > > pools correctly, but cannot be eliminated entirely. > > > > > > > > > Symptom: > > > -------- > > > IspnPerfTest has every node in a cluster perform 20'000 requests > > on keys > > > in range [1..20000]. > > > > > > 80% of the requests are reads and 20% writes. > > > > > > By default, we have 25 requester threads per node and 100 nodes > in a > > > cluster, so a total of 2500 requester threads. > > > > > > The cache used is NON-TRANSACTIONAL / dist-sync / 2 owners: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > useEagerLocking="true" > > > eagerLockSingleNode="true" /> > > > concurrencyLevel="1000" > > > isolationLevel="READ_COMMITTED" > > useLockStriping="false" /> > > > > > > > > > It has 2 owners, a lock acquisition timeout of 5s and a repl > > timeout of > > > 20s. Lock stripting is off, so we have 1 lock per key. > > > > > > When I run the test, I always get errors like those below: > > > > > > org.infinispan.util.concurrent.TimeoutException: Unable to > > acquire lock > > > after [10 seconds] on key [19386] for requestor > > [Thread[invoker-3,5,main]]! > > > Lock held by [Thread[OOB-194,ispn-perf-test,m5.1,5,main]] > > > > > > and > > > > > > org.infinispan.util.concurrent.TimeoutException: Node m8.1 timed > out > > > > > > > > > Investigation: > > > ------------ > > > When I looked at UNICAST3, I saw a lot of missing messages on the > > > receive side and unacked messages on the send side. This caused > me to > > > look into the (mainly OOB) thread pools and - voila - maxed out ! > > > > > > I learned from Pedro that the Infinispan internal thread pool > (with a > > > default of 32 threads) can be configured, so I increased it to > > 300 and > > > increased the OOB pools as well. > > > > > > This mitigated the problem somewhat, but when I increased the > > requester > > > threads to 100, I had the same problem again. Apparently, the > > Infinispan > > > internal thread pool uses a rejection policy of "run" and thus > > uses the > > > JGroups (OOB) thread when exhausted. > > > > > > I learned (from Pedro and Mircea) that GETs and PUTs work as > > follows in > > > dist-sync / 2 owners: > > > - GETs are sent to the primary and backup owners and the first > > response > > > received is returned to the caller. No locks are acquired, so GETs > > > shouldn't cause problems. > > > > > > - A PUT(K) is sent to the primary owner of K > > > - The primary owner > > > (1) locks K > > > (2) updates the backup owner synchronously *while holding > > the lock* > > > (3) releases the lock > > > > > > > > > Hypothesis > > > ---------- > > > (2) above is done while holding the lock. The sync update of the > > backup > > > owner is done with the lock held to guarantee that the primary and > > > backup owner of K have the same values for K. > > > > > > However, the sync update *inside the lock scope* slows things > > down (can > > > it also lead to deadlocks?); there's the risk that the request is > > > dropped due to a full incoming thread pool, or that the response > > is not > > > received because of the same, or that the locking at the backup > owner > > > blocks for some time. > > > > > > If we have many threads modifying the same key, then we have a > > backlog > > > of locking work against that key. Say we have 100 requester > > threads and > > > a 100 node cluster. This means that we have 10'000 threads > accessing > > > keys; with 2'000 writers there's a big chance that some writers > > pick the > > > same key at the same time. > > > > > > For example, if we have 100 threads accessing key K and it takes > > 3ms to > > > replicate K to the backup owner, then the last of the 100 threads > > waits > > > ~300ms before it gets a chance to lock K on the primary owner and > > > replicate it as well. > > > > > > Just a small hiccup in sending the PUT to the primary owner, > > sending the > > > modification to the backup owner, waitting for the response, or > > GC, and > > > the delay will quickly become bigger. > > > > > > > > > Verification > > > ---------- > > > To verify the above, I set numOwners to 1. This means that the > > primary > > > owner of K does *not* send the modification to the backup owner, > > it only > > > locks K, modifies K and unlocks K again. > > > > > > I ran the IspnPerfTest again on 100 nodes, with 25 requesters, > and NO > > > PROBLEM ! > > > > > > I then increased the requesters to 100, 150 and 200 and the test > > > completed flawlessly ! Performance was around *40'000 requests > > per node > > > per sec* on 4-core boxes ! > > > > > > > > > Root cause > > > --------- > > > ******************* > > > The root cause is the sync RPC of K to the backup owner(s) of K > while > > > the primary owner holds the lock for K. > > > ******************* > > > > > > This causes a backlog of threads waiting for the lock and that > > backlog > > > can grow to exhaust the thread pools. First the Infinispan > internal > > > thread pool, then the JGroups OOB thread pool. The latter causes > > > retransmissions to get dropped, which compounds the problem... > > > > > > > > > Goal > > > ---- > > > The goal is to make sure that primary and backup owner(s) of K > > have the > > > same value for K. > > > > > > Simply sending the modification to the backup owner(s) > asynchronously > > > won't guarantee this, as modification messages might get > > processed out > > > of order as they're OOB ! > > > > > > > > > Suggested solution > > > ---------------- > > > The modification RPC needs to be invoked *outside of the lock > scope*: > > > - lock K > > > - modify K > > > - unlock K > > > - send modification to backup owner(s) // outside the lock scope > > > > > > The primary owner puts the modification of K into a queue from > > where a > > > separate thread/task removes it. The thread then invokes the > > PUT(K) on > > > the backup owner(s). > > > > > > The queue has the modified keys in FIFO order, so the > modifications > > > arrive at the backup owner(s) in the right order. > > > > > > This requires that the way GET is implemented changes slightly: > > instead > > > of invoking a GET on all owners of K, we only invoke it on the > > primary > > > owner, then the next-in-line etc. > > > > > > The reason for this is that the backup owner(s) may not yet have > > > received the modification of K. > > > > > > This is a better impl anyway (we discussed this before) becuse it > > > generates less traffic; in the normal case, all but 1 GET > > requests are > > > unnecessary. > > > > > > > > > > > > Improvement > > > ----------- > > > The above solution can be simplified and even made more efficient. > > > Re-using concepts from IRAC [2], we can simply store the modified > > *keys* > > > in the modification queue. The modification replication thread > > removes > > > the key, gets the current value and invokes a PUT/REMOVE on the > > backup > > > owner(s). > > > > > > Even better: a key is only ever added *once*, so if we have > > [5,2,17,3], > > > adding key 2 is a no-op because the processing of key 2 (in second > > > position in the queue) will fetch the up-to-date value anyway ! > > > > > > > > > Misc > > > ---- > > > - Could we possibly use total order to send the updates in TO ? > > TBD (Pedro?) > > > > > > > > > Thoughts ? > > > > > > > > > [1] https://github.com/belaban/IspnPerfTest > > > [2] > > > > > > https://github.com/infinispan/infinispan/wiki/RAC:-Reliable-Asynchronous-Clustering > > > > > > > -- > > Bela Ban, JGroups lead (http://www.jgroups.org) > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org infinispan-dev at lists.jboss.org> > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > > > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > -- > Bela Ban, JGroups lead (http://www.jgroups.org) > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140729/5513b6ab/attachment-0001.html From dan.berindei at gmail.com Tue Jul 29 17:14:50 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Wed, 30 Jul 2014 00:14:50 +0300 Subject: [infinispan-dev] DIST-SYNC, put(), a problem and a solution In-Reply-To: References: <53D7A7F6.1010100@redhat.com> <53D7B1CA.9000309@redhat.com> <53D7B4B3.4080008@redhat.com> Message-ID: On Tue, Jul 29, 2014 at 9:06 PM, Sanne Grinovero wrote: > This is a nasty problem and I also feel passionately we need to get > rid of it ASAP. > I did have the same problems many times, and we discussed this also in > Farnborough; AFAIR Dan and Pedro had some excellent ideas to fix this. > > You don't need TO, and you don't need to lock at all as long as you > guarantee the backup owners are getting the number with some > monotonicity sequence attached to it, > all that backup owners need to do is ignore incoming commands which > are outdated. > This is more or less what TOA does - assign a monotonic sequence number to txs, and only apply them after all the previous txs in the sequence have been applied. The problem is getting that monotonic sequence when there are multiple originators and multiple primary owners also requires some extra RPCs. > Another aspect is that the "user thread" on the primary owner needs to > wait (at least until we improve further) and only proceed after ACK > from backup nodes, but this is better modelled through a state > machine. (Also discussed in Farnborough). > To be clear, I don't think keeping the user thread on the originator blocked until we have the write confirmations from all the backups is a problem - a sync operation has to block, and it also serves to rate-limit user operations. The problem appears when the originator is not the primary owner, and the thread blocking for backup ACKs is from the remote-executor pool (or OOB, when the remote-executor pool is exhausted). > It's also conceptually linked to: > - https://issues.jboss.org/browse/ISPN-1599 > As you need to separate the locks of entries from the effective user > facing lock, at least to implement transactions on top of this model. > I think we fixed ISPN-1599 when we changed passivation to use DataContainer.compute(). WDYT Pedro, is there anything else you'd like to do in the scope of ISPN-1599? > I expect this to improve performance in a very significant way, but > it's getting embarrassing that it's still not done; at the next face > to face meeting we should also reserve some time for retrospective > sessions. > Implementing the state machine-based interceptor stack may give us a performance boost, but I'm much more certain that it's a very complex, high risk task... and we don't have a stable test suite yet :) > > Sanne > > On 29 July 2014 15:50, Bela Ban wrote: > > > > > > On 29/07/14 16:42, Dan Berindei wrote: > >> Have you tried regular optimistic/pessimistic transactions as well? > > > > Yes, in my first impl. but since I'm making only 1 change per request, I > > thought a TX is overkill. > > > >> They *should* have less issues with the OOB thread pool than non-tx > mode, and > >> I'm quite curious how they stack against TO in such a large cluster. > > > > Why would they have fewer issues with the thread pools ? AIUI, a TX > > involves 2 RPCs (PREPARE-COMMIT/ROLLBACK) compared to one when not using > > TXs. And we're sync anyway... > > > > > >> On Tue, Jul 29, 2014 at 5:38 PM, Bela Ban >> > wrote: > >> > >> Following up on my own email, I changed the config to use Pedro's > >> excellent total order implementation: > >> > >> >> transactionProtocol="TOTAL_ORDER" lockingMode="OPTIMISTIC" > >> useEagerLocking="true" eagerLockSingleNode="true"> > >> > >> > >> With 100 nodes and 25 requester threads/node, I did NOT run into any > >> locking issues ! > >> > >> I could even go up to 200 requester threads/node and the perf was ~ > >> 7'000-8'000 requests/sec/node. Not too bad ! > >> > >> This really validates the concept of lockless total-order > dissemination > >> of TXs; for the first time, this has been tested on a large(r) scale > >> (previously only on 25 nodes) and IT WORKS ! :-) > >> > >> I still believe we should implement my suggested solution for non-TO > >> configs, but short of configuring thread pools of 1000 threads or > >> higher, I hope TO will allow me to finally test a 500 node > Infinispan > >> cluster ! > >> > >> > >> On 29/07/14 15:56, Bela Ban wrote: > >> > Hi guys, > >> > > >> > sorry for the long post, but I do think I ran into an important > >> problem > >> > and we need to fix it ... :-) > >> > > >> > I've spent the last couple of days running the IspnPerfTest [1] > >> perftest > >> > on Google Compute Engine (GCE), and I've run into a problem with > >> > Infinispan. It is a design problem and can be mitigated by sizing > >> thread > >> > pools correctly, but cannot be eliminated entirely. > >> > > >> > > >> > Symptom: > >> > -------- > >> > IspnPerfTest has every node in a cluster perform 20'000 requests > >> on keys > >> > in range [1..20000]. > >> > > >> > 80% of the requests are reads and 20% writes. > >> > > >> > By default, we have 25 requester threads per node and 100 nodes > in a > >> > cluster, so a total of 2500 requester threads. > >> > > >> > The cache used is NON-TRANSACTIONAL / dist-sync / 2 owners: > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > >> > useEagerLocking="true" > >> > eagerLockSingleNode="true" /> > >> > concurrencyLevel="1000" > >> > isolationLevel="READ_COMMITTED" > >> useLockStriping="false" /> > >> > > >> > > >> > It has 2 owners, a lock acquisition timeout of 5s and a repl > >> timeout of > >> > 20s. Lock stripting is off, so we have 1 lock per key. > >> > > >> > When I run the test, I always get errors like those below: > >> > > >> > org.infinispan.util.concurrent.TimeoutException: Unable to > >> acquire lock > >> > after [10 seconds] on key [19386] for requestor > >> [Thread[invoker-3,5,main]]! > >> > Lock held by [Thread[OOB-194,ispn-perf-test,m5.1,5,main]] > >> > > >> > and > >> > > >> > org.infinispan.util.concurrent.TimeoutException: Node m8.1 timed > out > >> > > >> > > >> > Investigation: > >> > ------------ > >> > When I looked at UNICAST3, I saw a lot of missing messages on the > >> > receive side and unacked messages on the send side. This caused > me to > >> > look into the (mainly OOB) thread pools and - voila - maxed out ! > >> > > >> > I learned from Pedro that the Infinispan internal thread pool > (with a > >> > default of 32 threads) can be configured, so I increased it to > >> 300 and > >> > increased the OOB pools as well. > >> > > >> > This mitigated the problem somewhat, but when I increased the > >> requester > >> > threads to 100, I had the same problem again. Apparently, the > >> Infinispan > >> > internal thread pool uses a rejection policy of "run" and thus > >> uses the > >> > JGroups (OOB) thread when exhausted. > >> > > >> > I learned (from Pedro and Mircea) that GETs and PUTs work as > >> follows in > >> > dist-sync / 2 owners: > >> > - GETs are sent to the primary and backup owners and the first > >> response > >> > received is returned to the caller. No locks are acquired, so > GETs > >> > shouldn't cause problems. > >> > > >> > - A PUT(K) is sent to the primary owner of K > >> > - The primary owner > >> > (1) locks K > >> > (2) updates the backup owner synchronously *while holding > >> the lock* > >> > (3) releases the lock > >> > > >> > > >> > Hypothesis > >> > ---------- > >> > (2) above is done while holding the lock. The sync update of the > >> backup > >> > owner is done with the lock held to guarantee that the primary > and > >> > backup owner of K have the same values for K. > >> > > >> > However, the sync update *inside the lock scope* slows things > >> down (can > >> > it also lead to deadlocks?); there's the risk that the request is > >> > dropped due to a full incoming thread pool, or that the response > >> is not > >> > received because of the same, or that the locking at the backup > owner > >> > blocks for some time. > >> > > >> > If we have many threads modifying the same key, then we have a > >> backlog > >> > of locking work against that key. Say we have 100 requester > >> threads and > >> > a 100 node cluster. This means that we have 10'000 threads > accessing > >> > keys; with 2'000 writers there's a big chance that some writers > >> pick the > >> > same key at the same time. > >> > > >> > For example, if we have 100 threads accessing key K and it takes > >> 3ms to > >> > replicate K to the backup owner, then the last of the 100 threads > >> waits > >> > ~300ms before it gets a chance to lock K on the primary owner and > >> > replicate it as well. > >> > > >> > Just a small hiccup in sending the PUT to the primary owner, > >> sending the > >> > modification to the backup owner, waitting for the response, or > >> GC, and > >> > the delay will quickly become bigger. > >> > > >> > > >> > Verification > >> > ---------- > >> > To verify the above, I set numOwners to 1. This means that the > >> primary > >> > owner of K does *not* send the modification to the backup owner, > >> it only > >> > locks K, modifies K and unlocks K again. > >> > > >> > I ran the IspnPerfTest again on 100 nodes, with 25 requesters, > and NO > >> > PROBLEM ! > >> > > >> > I then increased the requesters to 100, 150 and 200 and the test > >> > completed flawlessly ! Performance was around *40'000 requests > >> per node > >> > per sec* on 4-core boxes ! > >> > > >> > > >> > Root cause > >> > --------- > >> > ******************* > >> > The root cause is the sync RPC of K to the backup owner(s) of K > while > >> > the primary owner holds the lock for K. > >> > ******************* > >> > > >> > This causes a backlog of threads waiting for the lock and that > >> backlog > >> > can grow to exhaust the thread pools. First the Infinispan > internal > >> > thread pool, then the JGroups OOB thread pool. The latter causes > >> > retransmissions to get dropped, which compounds the problem... > >> > > >> > > >> > Goal > >> > ---- > >> > The goal is to make sure that primary and backup owner(s) of K > >> have the > >> > same value for K. > >> > > >> > Simply sending the modification to the backup owner(s) > asynchronously > >> > won't guarantee this, as modification messages might get > >> processed out > >> > of order as they're OOB ! > >> > > >> > > >> > Suggested solution > >> > ---------------- > >> > The modification RPC needs to be invoked *outside of the lock > scope*: > >> > - lock K > >> > - modify K > >> > - unlock K > >> > - send modification to backup owner(s) // outside the lock scope > >> > > >> > The primary owner puts the modification of K into a queue from > >> where a > >> > separate thread/task removes it. The thread then invokes the > >> PUT(K) on > >> > the backup owner(s). > >> > > >> > The queue has the modified keys in FIFO order, so the > modifications > >> > arrive at the backup owner(s) in the right order. > >> > > >> > This requires that the way GET is implemented changes slightly: > >> instead > >> > of invoking a GET on all owners of K, we only invoke it on the > >> primary > >> > owner, then the next-in-line etc. > >> > > >> > The reason for this is that the backup owner(s) may not yet have > >> > received the modification of K. > >> > > >> > This is a better impl anyway (we discussed this before) becuse it > >> > generates less traffic; in the normal case, all but 1 GET > >> requests are > >> > unnecessary. > >> > > >> > > >> > > >> > Improvement > >> > ----------- > >> > The above solution can be simplified and even made more > efficient. > >> > Re-using concepts from IRAC [2], we can simply store the modified > >> *keys* > >> > in the modification queue. The modification replication thread > >> removes > >> > the key, gets the current value and invokes a PUT/REMOVE on the > >> backup > >> > owner(s). > >> > > >> > Even better: a key is only ever added *once*, so if we have > >> [5,2,17,3], > >> > adding key 2 is a no-op because the processing of key 2 (in > second > >> > position in the queue) will fetch the up-to-date value anyway ! > >> > > >> > > >> > Misc > >> > ---- > >> > - Could we possibly use total order to send the updates in TO ? > >> TBD (Pedro?) > >> > > >> > > >> > Thoughts ? > >> > > >> > > >> > [1] https://github.com/belaban/IspnPerfTest > >> > [2] > >> > > >> > https://github.com/infinispan/infinispan/wiki/RAC:-Reliable-Asynchronous-Clustering > >> > > >> > >> -- > >> Bela Ban, JGroups lead (http://www.jgroups.org) > >> _______________________________________________ > >> infinispan-dev mailing list > >> infinispan-dev at lists.jboss.org infinispan-dev at lists.jboss.org> > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> > >> > >> > >> > >> _______________________________________________ > >> infinispan-dev mailing list > >> infinispan-dev at lists.jboss.org > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> > > > > -- > > Bela Ban, JGroups lead (http://www.jgroups.org) > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140730/9cb33afd/attachment-0001.html From sanne at infinispan.org Tue Jul 29 17:35:25 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Tue, 29 Jul 2014 22:35:25 +0100 Subject: [infinispan-dev] DIST-SYNC, put(), a problem and a solution In-Reply-To: References: <53D7A7F6.1010100@redhat.com> <53D7B1CA.9000309@redhat.com> <53D7B4B3.4080008@redhat.com> Message-ID: On 29 July 2014 22:14, Dan Berindei wrote: > > On Tue, Jul 29, 2014 at 9:06 PM, Sanne Grinovero > wrote: >> >> This is a nasty problem and I also feel passionately we need to get >> rid of it ASAP. >> I did have the same problems many times, and we discussed this also in >> Farnborough; AFAIR Dan and Pedro had some excellent ideas to fix this. >> >> You don't need TO, and you don't need to lock at all as long as you >> guarantee the backup owners are getting the number with some >> monotonicity sequence attached to it, >> all that backup owners need to do is ignore incoming commands which >> are outdated. > > > This is more or less what TOA does - assign a monotonic sequence number to > txs, and only apply them after all the previous txs in the sequence have > been applied. The problem is getting that monotonic sequence when there are > multiple originators and multiple primary owners also requires some extra > RPCs. Let's not mix this up with requirements for other areas. The strategy I've proposed is only to be applied for the communication from the primary owner to its backups: the value to be written is well known as it's the primary owner which defines it unilaterally (for example if there is an atomic replacement to be computed) and there is no need for extra RPCs as the sequence is not related to a group of changes but for the specific entry only. There is no such thing as a need for consensus across owners, nor need for a central source for sequences. Also I don't see it as an alternative to TOA, I rather expect it to work nicely together: when TOA is enabled you could trust the originating sequence source rather than generate a per-entry sequence, and in neither case you need to actually use a Lock. I haven't thought how the sequences would need to interact (if they need), but they seem complementary to resolve different aspects, and also both benefit from the same cleanup and basic structure. >> Another aspect is that the "user thread" on the primary owner needs to >> wait (at least until we improve further) and only proceed after ACK >> from backup nodes, but this is better modelled through a state >> machine. (Also discussed in Farnborough). > > > To be clear, I don't think keeping the user thread on the originator blocked > until we have the write confirmations from all the backups is a problem - a > sync operation has to block, and it also serves to rate-limit user > operations. There are better ways to rate-limit than to make all operations slow; we don't need to block a thread, we need to react on the reply from the backup owners. You still have an inherent rate-limit in the outgoing packet queues: if these fill up, then and only then it's nice to introduce some back pressure. > The problem appears when the originator is not the primary owner, and the > thread blocking for backup ACKs is from the remote-executor pool (or OOB, > when the remote-executor pool is exhausted). Not following. I guess this is out of scope now that I clarified the proposed solution is only to be applied between primary and backups? >> >> It's also conceptually linked to: >> - https://issues.jboss.org/browse/ISPN-1599 >> As you need to separate the locks of entries from the effective user >> facing lock, at least to implement transactions on top of this model. > > > I think we fixed ISPN-1599 when we changed passivation to use > DataContainer.compute(). WDYT Pedro, is there anything else you'd like to do > in the scope of ISPN-1599? > >> >> I expect this to improve performance in a very significant way, but >> it's getting embarrassing that it's still not done; at the next face >> to face meeting we should also reserve some time for retrospective >> sessions. > > > Implementing the state machine-based interceptor stack may give us a > performance boost, but I'm much more certain that it's a very complex, high > risk task... and we don't have a stable test suite yet :) Cleaning up and removing some complexity such as TooManyExecutorsException might help to get it stable, and keep it there :) BTW it was quite stable for me until you changed the JGroups UDP default configuration. Sanne > > >> >> >> Sanne >> >> On 29 July 2014 15:50, Bela Ban wrote: >> > >> > >> > On 29/07/14 16:42, Dan Berindei wrote: >> >> Have you tried regular optimistic/pessimistic transactions as well? >> > >> > Yes, in my first impl. but since I'm making only 1 change per request, I >> > thought a TX is overkill. >> > >> >> They *should* have less issues with the OOB thread pool than non-tx >> >> mode, and >> >> I'm quite curious how they stack against TO in such a large cluster. >> > >> > Why would they have fewer issues with the thread pools ? AIUI, a TX >> > involves 2 RPCs (PREPARE-COMMIT/ROLLBACK) compared to one when not using >> > TXs. And we're sync anyway... >> > >> > >> >> On Tue, Jul 29, 2014 at 5:38 PM, Bela Ban > >> > wrote: >> >> >> >> Following up on my own email, I changed the config to use Pedro's >> >> excellent total order implementation: >> >> >> >> > >> transactionProtocol="TOTAL_ORDER" lockingMode="OPTIMISTIC" >> >> useEagerLocking="true" eagerLockSingleNode="true"> >> >> >> >> >> >> With 100 nodes and 25 requester threads/node, I did NOT run into >> >> any >> >> locking issues ! >> >> >> >> I could even go up to 200 requester threads/node and the perf was ~ >> >> 7'000-8'000 requests/sec/node. Not too bad ! >> >> >> >> This really validates the concept of lockless total-order >> >> dissemination >> >> of TXs; for the first time, this has been tested on a large(r) >> >> scale >> >> (previously only on 25 nodes) and IT WORKS ! :-) >> >> >> >> I still believe we should implement my suggested solution for >> >> non-TO >> >> configs, but short of configuring thread pools of 1000 threads or >> >> higher, I hope TO will allow me to finally test a 500 node >> >> Infinispan >> >> cluster ! >> >> >> >> >> >> On 29/07/14 15:56, Bela Ban wrote: >> >> > Hi guys, >> >> > >> >> > sorry for the long post, but I do think I ran into an important >> >> problem >> >> > and we need to fix it ... :-) >> >> > >> >> > I've spent the last couple of days running the IspnPerfTest [1] >> >> perftest >> >> > on Google Compute Engine (GCE), and I've run into a problem with >> >> > Infinispan. It is a design problem and can be mitigated by >> >> sizing >> >> thread >> >> > pools correctly, but cannot be eliminated entirely. >> >> > >> >> > >> >> > Symptom: >> >> > -------- >> >> > IspnPerfTest has every node in a cluster perform 20'000 requests >> >> on keys >> >> > in range [1..20000]. >> >> > >> >> > 80% of the requests are reads and 20% writes. >> >> > >> >> > By default, we have 25 requester threads per node and 100 nodes >> >> in a >> >> > cluster, so a total of 2500 requester threads. >> >> > >> >> > The cache used is NON-TRANSACTIONAL / dist-sync / 2 owners: >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > > >> > useEagerLocking="true" >> >> > eagerLockSingleNode="true" /> >> >> > > >> concurrencyLevel="1000" >> >> > isolationLevel="READ_COMMITTED" >> >> useLockStriping="false" /> >> >> > >> >> > >> >> > It has 2 owners, a lock acquisition timeout of 5s and a repl >> >> timeout of >> >> > 20s. Lock stripting is off, so we have 1 lock per key. >> >> > >> >> > When I run the test, I always get errors like those below: >> >> > >> >> > org.infinispan.util.concurrent.TimeoutException: Unable to >> >> acquire lock >> >> > after [10 seconds] on key [19386] for requestor >> >> [Thread[invoker-3,5,main]]! >> >> > Lock held by [Thread[OOB-194,ispn-perf-test,m5.1,5,main]] >> >> > >> >> > and >> >> > >> >> > org.infinispan.util.concurrent.TimeoutException: Node m8.1 timed >> >> out >> >> > >> >> > >> >> > Investigation: >> >> > ------------ >> >> > When I looked at UNICAST3, I saw a lot of missing messages on >> >> the >> >> > receive side and unacked messages on the send side. This caused >> >> me to >> >> > look into the (mainly OOB) thread pools and - voila - maxed out >> >> ! >> >> > >> >> > I learned from Pedro that the Infinispan internal thread pool >> >> (with a >> >> > default of 32 threads) can be configured, so I increased it to >> >> 300 and >> >> > increased the OOB pools as well. >> >> > >> >> > This mitigated the problem somewhat, but when I increased the >> >> requester >> >> > threads to 100, I had the same problem again. Apparently, the >> >> Infinispan >> >> > internal thread pool uses a rejection policy of "run" and thus >> >> uses the >> >> > JGroups (OOB) thread when exhausted. >> >> > >> >> > I learned (from Pedro and Mircea) that GETs and PUTs work as >> >> follows in >> >> > dist-sync / 2 owners: >> >> > - GETs are sent to the primary and backup owners and the first >> >> response >> >> > received is returned to the caller. No locks are acquired, so >> >> GETs >> >> > shouldn't cause problems. >> >> > >> >> > - A PUT(K) is sent to the primary owner of K >> >> > - The primary owner >> >> > (1) locks K >> >> > (2) updates the backup owner synchronously *while holding >> >> the lock* >> >> > (3) releases the lock >> >> > >> >> > >> >> > Hypothesis >> >> > ---------- >> >> > (2) above is done while holding the lock. The sync update of the >> >> backup >> >> > owner is done with the lock held to guarantee that the primary >> >> and >> >> > backup owner of K have the same values for K. >> >> > >> >> > However, the sync update *inside the lock scope* slows things >> >> down (can >> >> > it also lead to deadlocks?); there's the risk that the request >> >> is >> >> > dropped due to a full incoming thread pool, or that the response >> >> is not >> >> > received because of the same, or that the locking at the backup >> >> owner >> >> > blocks for some time. >> >> > >> >> > If we have many threads modifying the same key, then we have a >> >> backlog >> >> > of locking work against that key. Say we have 100 requester >> >> threads and >> >> > a 100 node cluster. This means that we have 10'000 threads >> >> accessing >> >> > keys; with 2'000 writers there's a big chance that some writers >> >> pick the >> >> > same key at the same time. >> >> > >> >> > For example, if we have 100 threads accessing key K and it takes >> >> 3ms to >> >> > replicate K to the backup owner, then the last of the 100 >> >> threads >> >> waits >> >> > ~300ms before it gets a chance to lock K on the primary owner >> >> and >> >> > replicate it as well. >> >> > >> >> > Just a small hiccup in sending the PUT to the primary owner, >> >> sending the >> >> > modification to the backup owner, waitting for the response, or >> >> GC, and >> >> > the delay will quickly become bigger. >> >> > >> >> > >> >> > Verification >> >> > ---------- >> >> > To verify the above, I set numOwners to 1. This means that the >> >> primary >> >> > owner of K does *not* send the modification to the backup owner, >> >> it only >> >> > locks K, modifies K and unlocks K again. >> >> > >> >> > I ran the IspnPerfTest again on 100 nodes, with 25 requesters, >> >> and NO >> >> > PROBLEM ! >> >> > >> >> > I then increased the requesters to 100, 150 and 200 and the test >> >> > completed flawlessly ! Performance was around *40'000 requests >> >> per node >> >> > per sec* on 4-core boxes ! >> >> > >> >> > >> >> > Root cause >> >> > --------- >> >> > ******************* >> >> > The root cause is the sync RPC of K to the backup owner(s) of K >> >> while >> >> > the primary owner holds the lock for K. >> >> > ******************* >> >> > >> >> > This causes a backlog of threads waiting for the lock and that >> >> backlog >> >> > can grow to exhaust the thread pools. First the Infinispan >> >> internal >> >> > thread pool, then the JGroups OOB thread pool. The latter causes >> >> > retransmissions to get dropped, which compounds the problem... >> >> > >> >> > >> >> > Goal >> >> > ---- >> >> > The goal is to make sure that primary and backup owner(s) of K >> >> have the >> >> > same value for K. >> >> > >> >> > Simply sending the modification to the backup owner(s) >> >> asynchronously >> >> > won't guarantee this, as modification messages might get >> >> processed out >> >> > of order as they're OOB ! >> >> > >> >> > >> >> > Suggested solution >> >> > ---------------- >> >> > The modification RPC needs to be invoked *outside of the lock >> >> scope*: >> >> > - lock K >> >> > - modify K >> >> > - unlock K >> >> > - send modification to backup owner(s) // outside the lock scope >> >> > >> >> > The primary owner puts the modification of K into a queue from >> >> where a >> >> > separate thread/task removes it. The thread then invokes the >> >> PUT(K) on >> >> > the backup owner(s). >> >> > >> >> > The queue has the modified keys in FIFO order, so the >> >> modifications >> >> > arrive at the backup owner(s) in the right order. >> >> > >> >> > This requires that the way GET is implemented changes slightly: >> >> instead >> >> > of invoking a GET on all owners of K, we only invoke it on the >> >> primary >> >> > owner, then the next-in-line etc. >> >> > >> >> > The reason for this is that the backup owner(s) may not yet have >> >> > received the modification of K. >> >> > >> >> > This is a better impl anyway (we discussed this before) becuse >> >> it >> >> > generates less traffic; in the normal case, all but 1 GET >> >> requests are >> >> > unnecessary. >> >> > >> >> > >> >> > >> >> > Improvement >> >> > ----------- >> >> > The above solution can be simplified and even made more >> >> efficient. >> >> > Re-using concepts from IRAC [2], we can simply store the >> >> modified >> >> *keys* >> >> > in the modification queue. The modification replication thread >> >> removes >> >> > the key, gets the current value and invokes a PUT/REMOVE on the >> >> backup >> >> > owner(s). >> >> > >> >> > Even better: a key is only ever added *once*, so if we have >> >> [5,2,17,3], >> >> > adding key 2 is a no-op because the processing of key 2 (in >> >> second >> >> > position in the queue) will fetch the up-to-date value anyway ! >> >> > >> >> > >> >> > Misc >> >> > ---- >> >> > - Could we possibly use total order to send the updates in TO ? >> >> TBD (Pedro?) >> >> > >> >> > >> >> > Thoughts ? >> >> > >> >> > >> >> > [1] https://github.com/belaban/IspnPerfTest >> >> > [2] >> >> > >> >> >> >> https://github.com/infinispan/infinispan/wiki/RAC:-Reliable-Asynchronous-Clustering >> >> > >> >> >> >> -- >> >> Bela Ban, JGroups lead (http://www.jgroups.org) >> >> _______________________________________________ >> >> infinispan-dev mailing list >> >> infinispan-dev at lists.jboss.org >> >> >> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> >> infinispan-dev mailing list >> >> infinispan-dev at lists.jboss.org >> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> > >> > -- >> > Bela Ban, JGroups lead (http://www.jgroups.org) >> > _______________________________________________ >> > infinispan-dev mailing list >> > infinispan-dev at lists.jboss.org >> > https://lists.jboss.org/mailman/listinfo/infinispan-dev >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From dan.berindei at gmail.com Wed Jul 30 04:02:00 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Wed, 30 Jul 2014 11:02:00 +0300 Subject: [infinispan-dev] DIST-SYNC, put(), a problem and a solution In-Reply-To: References: <53D7A7F6.1010100@redhat.com> <53D7B1CA.9000309@redhat.com> <53D7B4B3.4080008@redhat.com>

Message-ID: On Wed, Jul 30, 2014 at 12:35 AM, Sanne Grinovero wrote: > On 29 July 2014 22:14, Dan Berindei wrote: > > > > On Tue, Jul 29, 2014 at 9:06 PM, Sanne Grinovero > > wrote: > >> > >> This is a nasty problem and I also feel passionately we need to get > >> rid of it ASAP. > >> I did have the same problems many times, and we discussed this also in > >> Farnborough; AFAIR Dan and Pedro had some excellent ideas to fix this. > >> > >> You don't need TO, and you don't need to lock at all as long as you > >> guarantee the backup owners are getting the number with some > >> monotonicity sequence attached to it, > >> all that backup owners need to do is ignore incoming commands which > >> are outdated. > > > > > > This is more or less what TOA does - assign a monotonic sequence number > to > > txs, and only apply them after all the previous txs in the sequence have > > been applied. The problem is getting that monotonic sequence when there > are > > multiple originators and multiple primary owners also requires some extra > > RPCs. > > Let's not mix this up with requirements for other areas. > > The strategy I've proposed is only to be applied for the communication > from the primary owner to its backups: > the value to be written is well known as it's the primary owner which > defines it unilaterally (for example if there is an atomic replacement > to be computed) > and there is no need for extra RPCs as the sequence is not related to > a group of changes but for the specific entry only. > > There is no such thing as a need for consensus across owners, nor need > for a central source for sequences. > Just to make sure I understand correctly: your proposal is that in non-tx caches, the primary owner should generate some sort of version number while holding the entry lock, and replicate the write to the backup owners synchronously, but without holding the lock? Then the backup owners would check the version of the entry and only apply a newer write? if your proposal is only meant to apply to non-tx caches, you are right you don't have to worry about multiple primary owners... most of the time. But when the primary owner changes, then you do have 2 primary owners (if the new primary owner installs the new topology first), and you do need to coordinate between the 2. Slightly related: we also considered generating a version number on the client for consistency when the HotRod client retries after a primary owner failure [1]. But the clients can't create a monotonic sequence number, so we couldn't use that version number for this. [1] https://issues.jboss.org/browse/ISPN-2956 > Also I don't see it as an alternative to TOA, I rather expect it to > work nicely together: when TOA is enabled you could trust the > originating sequence source rather than generate a per-entry sequence, > and in neither case you need to actually use a Lock. > I haven't thought how the sequences would need to interact (if they > need), but they seem complementary to resolve different aspects, and > also both benefit from the same cleanup and basic structure. > We don't acquire locks at all on the backup owners - either in tx or non-tx caches. If state transfer is in progress, we use ConcurrentHashMap.compute() to store tracking information, which uses a synchronized block, so I suppose we do acquire locks. I assume your proposal would require a DataContainer.compute() or something similar on the backups, to ensure that the version check and the replacement are atomic. I still think TOA does what you want for tx caches. Your proposal would only work for non-tx caches, so you couldn't use them together. > >> Another aspect is that the "user thread" on the primary owner needs to > >> wait (at least until we improve further) and only proceed after ACK > >> from backup nodes, but this is better modelled through a state > >> machine. (Also discussed in Farnborough). > > > > > > To be clear, I don't think keeping the user thread on the originator > blocked > > until we have the write confirmations from all the backups is a problem > - a > > sync operation has to block, and it also serves to rate-limit user > > operations. > > > There are better ways to rate-limit than to make all operations slow; > we don't need to block a thread, we need to react on the reply from > the backup owners. > You still have an inherent rate-limit in the outgoing packet queues: > if these fill up, then and only then it's nice to introduce some back > pressure. > Sorry, you got me confused when you called the thread on the primary owner a "user thread". I agree that internal stuff can and should be asynchronous, callback based, but the user still has to see a synchronous blocking operation. > > The problem appears when the originator is not the primary owner, and the > > thread blocking for backup ACKs is from the remote-executor pool (or OOB, > > when the remote-executor pool is exhausted). > > Not following. I guess this is out of scope now that I clarified the > proposed solution is only to be applied between primary and backups? > > Yeah, I was just trying to clarify that there is no danger of exhausting the remote executor/OOB thread pools when the originator of the write command is the primary owner (as it happens in the HotRod server). > > >> > >> It's also conceptually linked to: > >> - https://issues.jboss.org/browse/ISPN-1599 > >> As you need to separate the locks of entries from the effective user > >> facing lock, at least to implement transactions on top of this model. > > > > > > I think we fixed ISPN-1599 when we changed passivation to use > > DataContainer.compute(). WDYT Pedro, is there anything else you'd like > to do > > in the scope of ISPN-1599? > > > >> > >> I expect this to improve performance in a very significant way, but > >> it's getting embarrassing that it's still not done; at the next face > >> to face meeting we should also reserve some time for retrospective > >> sessions. > > > > > > Implementing the state machine-based interceptor stack may give us a > > performance boost, but I'm much more certain that it's a very complex, > high > > risk task... and we don't have a stable test suite yet :) > > Cleaning up and removing some complexity such as > TooManyExecutorsException might help to get it stable, and keep it > there :) > BTW it was quite stable for me until you changed the JGroups UDP > default configuration. > > Do you really use UDP to run the tests? The default is TCP, but maybe the some tests doesn't use TestCacheManagerFactory... I was just aligning our configs with Bela's recommandations: MERGE3 instead of MERGE2 and the removal of UFC in TCP stacks. If they cause problems on your machine, you should make more noise :) Dan > Sanne > > > > > > >> > >> > >> Sanne > >> > >> On 29 July 2014 15:50, Bela Ban wrote: > >> > > >> > > >> > On 29/07/14 16:42, Dan Berindei wrote: > >> >> Have you tried regular optimistic/pessimistic transactions as well? > >> > > >> > Yes, in my first impl. but since I'm making only 1 change per > request, I > >> > thought a TX is overkill. > >> > > >> >> They *should* have less issues with the OOB thread pool than non-tx > >> >> mode, and > >> >> I'm quite curious how they stack against TO in such a large cluster. > >> > > >> > Why would they have fewer issues with the thread pools ? AIUI, a TX > >> > involves 2 RPCs (PREPARE-COMMIT/ROLLBACK) compared to one when not > using > >> > TXs. And we're sync anyway... > >> > > >> > > >> >> On Tue, Jul 29, 2014 at 5:38 PM, Bela Ban >> >> > wrote: > >> >> > >> >> Following up on my own email, I changed the config to use Pedro's > >> >> excellent total order implementation: > >> >> > >> >> >> >> transactionProtocol="TOTAL_ORDER" lockingMode="OPTIMISTIC" > >> >> useEagerLocking="true" eagerLockSingleNode="true"> > >> >> > >> >> > >> >> With 100 nodes and 25 requester threads/node, I did NOT run into > >> >> any > >> >> locking issues ! > >> >> > >> >> I could even go up to 200 requester threads/node and the perf > was ~ > >> >> 7'000-8'000 requests/sec/node. Not too bad ! > >> >> > >> >> This really validates the concept of lockless total-order > >> >> dissemination > >> >> of TXs; for the first time, this has been tested on a large(r) > >> >> scale > >> >> (previously only on 25 nodes) and IT WORKS ! :-) > >> >> > >> >> I still believe we should implement my suggested solution for > >> >> non-TO > >> >> configs, but short of configuring thread pools of 1000 threads or > >> >> higher, I hope TO will allow me to finally test a 500 node > >> >> Infinispan > >> >> cluster ! > >> >> > >> >> > >> >> On 29/07/14 15:56, Bela Ban wrote: > >> >> > Hi guys, > >> >> > > >> >> > sorry for the long post, but I do think I ran into an > important > >> >> problem > >> >> > and we need to fix it ... :-) > >> >> > > >> >> > I've spent the last couple of days running the IspnPerfTest > [1] > >> >> perftest > >> >> > on Google Compute Engine (GCE), and I've run into a problem > with > >> >> > Infinispan. It is a design problem and can be mitigated by > >> >> sizing > >> >> thread > >> >> > pools correctly, but cannot be eliminated entirely. > >> >> > > >> >> > > >> >> > Symptom: > >> >> > -------- > >> >> > IspnPerfTest has every node in a cluster perform 20'000 > requests > >> >> on keys > >> >> > in range [1..20000]. > >> >> > > >> >> > 80% of the requests are reads and 20% writes. > >> >> > > >> >> > By default, we have 25 requester threads per node and 100 > nodes > >> >> in a > >> >> > cluster, so a total of 2500 requester threads. > >> >> > > >> >> > The cache used is NON-TRANSACTIONAL / dist-sync / 2 owners: > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > >> >> > useEagerLocking="true" > >> >> > eagerLockSingleNode="true" /> > >> >> > >> >> concurrencyLevel="1000" > >> >> > isolationLevel="READ_COMMITTED" > >> >> useLockStriping="false" /> > >> >> > > >> >> > > >> >> > It has 2 owners, a lock acquisition timeout of 5s and a repl > >> >> timeout of > >> >> > 20s. Lock stripting is off, so we have 1 lock per key. > >> >> > > >> >> > When I run the test, I always get errors like those below: > >> >> > > >> >> > org.infinispan.util.concurrent.TimeoutException: Unable to > >> >> acquire lock > >> >> > after [10 seconds] on key [19386] for requestor > >> >> [Thread[invoker-3,5,main]]! > >> >> > Lock held by [Thread[OOB-194,ispn-perf-test,m5.1,5,main]] > >> >> > > >> >> > and > >> >> > > >> >> > org.infinispan.util.concurrent.TimeoutException: Node m8.1 > timed > >> >> out > >> >> > > >> >> > > >> >> > Investigation: > >> >> > ------------ > >> >> > When I looked at UNICAST3, I saw a lot of missing messages on > >> >> the > >> >> > receive side and unacked messages on the send side. This > caused > >> >> me to > >> >> > look into the (mainly OOB) thread pools and - voila - maxed > out > >> >> ! > >> >> > > >> >> > I learned from Pedro that the Infinispan internal thread pool > >> >> (with a > >> >> > default of 32 threads) can be configured, so I increased it to > >> >> 300 and > >> >> > increased the OOB pools as well. > >> >> > > >> >> > This mitigated the problem somewhat, but when I increased the > >> >> requester > >> >> > threads to 100, I had the same problem again. Apparently, the > >> >> Infinispan > >> >> > internal thread pool uses a rejection policy of "run" and thus > >> >> uses the > >> >> > JGroups (OOB) thread when exhausted. > >> >> > > >> >> > I learned (from Pedro and Mircea) that GETs and PUTs work as > >> >> follows in > >> >> > dist-sync / 2 owners: > >> >> > - GETs are sent to the primary and backup owners and the first > >> >> response > >> >> > received is returned to the caller. No locks are acquired, so > >> >> GETs > >> >> > shouldn't cause problems. > >> >> > > >> >> > - A PUT(K) is sent to the primary owner of K > >> >> > - The primary owner > >> >> > (1) locks K > >> >> > (2) updates the backup owner synchronously *while > holding > >> >> the lock* > >> >> > (3) releases the lock > >> >> > > >> >> > > >> >> > Hypothesis > >> >> > ---------- > >> >> > (2) above is done while holding the lock. The sync update of > the > >> >> backup > >> >> > owner is done with the lock held to guarantee that the primary > >> >> and > >> >> > backup owner of K have the same values for K. > >> >> > > >> >> > However, the sync update *inside the lock scope* slows things > >> >> down (can > >> >> > it also lead to deadlocks?); there's the risk that the request > >> >> is > >> >> > dropped due to a full incoming thread pool, or that the > response > >> >> is not > >> >> > received because of the same, or that the locking at the > backup > >> >> owner > >> >> > blocks for some time. > >> >> > > >> >> > If we have many threads modifying the same key, then we have a > >> >> backlog > >> >> > of locking work against that key. Say we have 100 requester > >> >> threads and > >> >> > a 100 node cluster. This means that we have 10'000 threads > >> >> accessing > >> >> > keys; with 2'000 writers there's a big chance that some > writers > >> >> pick the > >> >> > same key at the same time. > >> >> > > >> >> > For example, if we have 100 threads accessing key K and it > takes > >> >> 3ms to > >> >> > replicate K to the backup owner, then the last of the 100 > >> >> threads > >> >> waits > >> >> > ~300ms before it gets a chance to lock K on the primary owner > >> >> and > >> >> > replicate it as well. > >> >> > > >> >> > Just a small hiccup in sending the PUT to the primary owner, > >> >> sending the > >> >> > modification to the backup owner, waitting for the response, > or > >> >> GC, and > >> >> > the delay will quickly become bigger. > >> >> > > >> >> > > >> >> > Verification > >> >> > ---------- > >> >> > To verify the above, I set numOwners to 1. This means that the > >> >> primary > >> >> > owner of K does *not* send the modification to the backup > owner, > >> >> it only > >> >> > locks K, modifies K and unlocks K again. > >> >> > > >> >> > I ran the IspnPerfTest again on 100 nodes, with 25 requesters, > >> >> and NO > >> >> > PROBLEM ! > >> >> > > >> >> > I then increased the requesters to 100, 150 and 200 and the > test > >> >> > completed flawlessly ! Performance was around *40'000 requests > >> >> per node > >> >> > per sec* on 4-core boxes ! > >> >> > > >> >> > > >> >> > Root cause > >> >> > --------- > >> >> > ******************* > >> >> > The root cause is the sync RPC of K to the backup owner(s) of > K > >> >> while > >> >> > the primary owner holds the lock for K. > >> >> > ******************* > >> >> > > >> >> > This causes a backlog of threads waiting for the lock and that > >> >> backlog > >> >> > can grow to exhaust the thread pools. First the Infinispan > >> >> internal > >> >> > thread pool, then the JGroups OOB thread pool. The latter > causes > >> >> > retransmissions to get dropped, which compounds the problem... > >> >> > > >> >> > > >> >> > Goal > >> >> > ---- > >> >> > The goal is to make sure that primary and backup owner(s) of K > >> >> have the > >> >> > same value for K. > >> >> > > >> >> > Simply sending the modification to the backup owner(s) > >> >> asynchronously > >> >> > won't guarantee this, as modification messages might get > >> >> processed out > >> >> > of order as they're OOB ! > >> >> > > >> >> > > >> >> > Suggested solution > >> >> > ---------------- > >> >> > The modification RPC needs to be invoked *outside of the lock > >> >> scope*: > >> >> > - lock K > >> >> > - modify K > >> >> > - unlock K > >> >> > - send modification to backup owner(s) // outside the lock > scope > >> >> > > >> >> > The primary owner puts the modification of K into a queue from > >> >> where a > >> >> > separate thread/task removes it. The thread then invokes the > >> >> PUT(K) on > >> >> > the backup owner(s). > >> >> > > >> >> > The queue has the modified keys in FIFO order, so the > >> >> modifications > >> >> > arrive at the backup owner(s) in the right order. > >> >> > > >> >> > This requires that the way GET is implemented changes > slightly: > >> >> instead > >> >> > of invoking a GET on all owners of K, we only invoke it on the > >> >> primary > >> >> > owner, then the next-in-line etc. > >> >> > > >> >> > The reason for this is that the backup owner(s) may not yet > have > >> >> > received the modification of K. > >> >> > > >> >> > This is a better impl anyway (we discussed this before) becuse > >> >> it > >> >> > generates less traffic; in the normal case, all but 1 GET > >> >> requests are > >> >> > unnecessary. > >> >> > > >> >> > > >> >> > > >> >> > Improvement > >> >> > ----------- > >> >> > The above solution can be simplified and even made more > >> >> efficient. > >> >> > Re-using concepts from IRAC [2], we can simply store the > >> >> modified > >> >> *keys* > >> >> > in the modification queue. The modification replication thread > >> >> removes > >> >> > the key, gets the current value and invokes a PUT/REMOVE on > the > >> >> backup > >> >> > owner(s). > >> >> > > >> >> > Even better: a key is only ever added *once*, so if we have > >> >> [5,2,17,3], > >> >> > adding key 2 is a no-op because the processing of key 2 (in > >> >> second > >> >> > position in the queue) will fetch the up-to-date value anyway > ! > >> >> > > >> >> > > >> >> > Misc > >> >> > ---- > >> >> > - Could we possibly use total order to send the updates in TO > ? > >> >> TBD (Pedro?) > >> >> > > >> >> > > >> >> > Thoughts ? > >> >> > > >> >> > > >> >> > [1] https://github.com/belaban/IspnPerfTest > >> >> > [2] > >> >> > > >> >> > >> >> > https://github.com/infinispan/infinispan/wiki/RAC:-Reliable-Asynchronous-Clustering > >> >> > > >> >> > >> >> -- > >> >> Bela Ban, JGroups lead (http://www.jgroups.org) > >> >> _______________________________________________ > >> >> infinispan-dev mailing list > >> >> infinispan-dev at lists.jboss.org > >> >> > >> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> >> > >> >> > >> >> > >> >> > >> >> _______________________________________________ > >> >> infinispan-dev mailing list > >> >> infinispan-dev at lists.jboss.org > >> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> >> > >> > > >> > -- > >> > Bela Ban, JGroups lead (http://www.jgroups.org) > >> > _______________________________________________ > >> > infinispan-dev mailing list > >> > infinispan-dev at lists.jboss.org > >> > https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> _______________________________________________ > >> infinispan-dev mailing list > >> infinispan-dev at lists.jboss.org > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140730/7f546c35/attachment-0001.html From pedro at infinispan.org Wed Jul 30 04:49:14 2014 From: pedro at infinispan.org (Pedro Ruivo) Date: Wed, 30 Jul 2014 09:49:14 +0100 Subject: [infinispan-dev] DIST-SYNC, put(), a problem and a solution In-Reply-To: References: <53D7A7F6.1010100@redhat.com> <53D7B1CA.9000309@redhat.com> <53D7B4B3.4080008@redhat.com> Message-ID: <53D8B18A.9030109@infinispan.org> On 07/29/2014 10:14 PM, Dan Berindei wrote: > > > > It's also conceptually linked to: > - https://issues.jboss.org/browse/ISPN-1599 > As you need to separate the locks of entries from the effective user > facing lock, at least to implement transactions on top of this model. > > > I think we fixed ISPN-1599 when we changed passivation to use > DataContainer.compute(). WDYT Pedro, is there anything else you'd like > to do in the scope of ISPN-1599? > > In the scope of ISPN-1599 I have nothing more in mind so far. But related to this discussion, I have ISPN-2849. From pedro at infinispan.org Wed Jul 30 05:00:45 2014 From: pedro at infinispan.org (Pedro Ruivo) Date: Wed, 30 Jul 2014 10:00:45 +0100 Subject: [infinispan-dev] DIST-SYNC, put(), a problem and a solution In-Reply-To: References: <53D7A7F6.1010100@redhat.com> <53D7B1CA.9000309@redhat.com> <53D7B4B3.4080008@redhat.com>

Message-ID: <53D8B43D.4060907@infinispan.org> On 07/30/2014 09:02 AM, Dan Berindei wrote: > > > if your proposal is only meant to apply to non-tx caches, you are right > you don't have to worry about multiple primary owners... most of the > time. But when the primary owner changes, then you do have 2 primary > owners (if the new primary owner installs the new topology first), and > you do need to coordinate between the 2. > I think it is the same for transactional cache. I.e. the commands wait for the transaction data from the new topology to be installed. In the non-tx caches, the old primary owner will send the next "sequence number" to the new primary owner and only after that, the new primary owner starts to give the orders. Otherwise, I can implement a total order version for non-tx caches and all the write serialization would be done in JGroups and Infinispan only has to apply the updates as soon as they are delivered. > Slightly related: we also considered generating a version number on the > client for consistency when the HotRod client retries after a primary > owner failure [1]. But the clients can't create a monotonic sequence > number, so we couldn't use that version number for this. > > [1] https://issues.jboss.org/browse/ISPN-2956 > > > Also I don't see it as an alternative to TOA, I rather expect it to > work nicely together: when TOA is enabled you could trust the > originating sequence source rather than generate a per-entry sequence, > and in neither case you need to actually use a Lock. > I haven't thought how the sequences would need to interact (if they > need), but they seem complementary to resolve different aspects, and > also both benefit from the same cleanup and basic structure. > > > We don't acquire locks at all on the backup owners - either in tx or > non-tx caches. If state transfer is in progress, we use > ConcurrentHashMap.compute() to store tracking information, which uses a > synchronized block, so I suppose we do acquire locks. I assume your > proposal would require a DataContainer.compute() or something similar on > the backups, to ensure that the version check and the replacement are > atomic. > > I still think TOA does what you want for tx caches. Your proposal would > only work for non-tx caches, so you couldn't use them together. > > > >> Another aspect is that the "user thread" on the primary owner > needs to > >> wait (at least until we improve further) and only proceed after ACK > >> from backup nodes, but this is better modelled through a state > >> machine. (Also discussed in Farnborough). > > > > > > To be clear, I don't think keeping the user thread on the > originator blocked > > until we have the write confirmations from all the backups is a > problem - a > > sync operation has to block, and it also serves to rate-limit user > > operations. > > > There are better ways to rate-limit than to make all operations slow; > we don't need to block a thread, we need to react on the reply from > the backup owners. > You still have an inherent rate-limit in the outgoing packet queues: > if these fill up, then and only then it's nice to introduce some back > pressure. > > > Sorry, you got me confused when you called the thread on the primary > owner a "user thread". I agree that internal stuff can and should be > asynchronous, callback based, but the user still has to see a > synchronous blocking operation. > > > > The problem appears when the originator is not the primary owner, > and the > > thread blocking for backup ACKs is from the remote-executor pool > (or OOB, > > when the remote-executor pool is exhausted). > > Not following. I guess this is out of scope now that I clarified the > proposed solution is only to be applied between primary and backups? > > > Yeah, I was just trying to clarify that there is no danger of exhausting > the remote executor/OOB thread pools when the originator of the write > command is the primary owner (as it happens in the HotRod server). > > > >> > >> It's also conceptually linked to: > >> - https://issues.jboss.org/browse/ISPN-1599 > >> As you need to separate the locks of entries from the effective user > >> facing lock, at least to implement transactions on top of this > model. > > > > > > I think we fixed ISPN-1599 when we changed passivation to use > > DataContainer.compute(). WDYT Pedro, is there anything else you'd > like to do > > in the scope of ISPN-1599? > > > >> > >> I expect this to improve performance in a very significant way, but > >> it's getting embarrassing that it's still not done; at the next face > >> to face meeting we should also reserve some time for retrospective > >> sessions. > > > > > > Implementing the state machine-based interceptor stack may give us a > > performance boost, but I'm much more certain that it's a very > complex, high > > risk task... and we don't have a stable test suite yet :) > > Cleaning up and removing some complexity such as > TooManyExecutorsException might help to get it stable, and keep it > there :) > BTW it was quite stable for me until you changed the JGroups UDP > default configuration. > > > Do you really use UDP to run the tests? The default is TCP, but maybe > the some tests doesn't use TestCacheManagerFactory... > > I was just aligning our configs with Bela's recommandations: MERGE3 > instead of MERGE2 and the removal of UFC in TCP stacks. If they cause > problems on your machine, you should make more noise :) > > Dan > > Sanne > > > > > > >> > >> > >> Sanne > >> > >> On 29 July 2014 15:50, Bela Ban > wrote: > >> > > >> > > >> > On 29/07/14 16:42, Dan Berindei wrote: > >> >> Have you tried regular optimistic/pessimistic transactions as > well? > >> > > >> > Yes, in my first impl. but since I'm making only 1 change per > request, I > >> > thought a TX is overkill. > >> > > >> >> They *should* have less issues with the OOB thread pool than > non-tx > >> >> mode, and > >> >> I'm quite curious how they stack against TO in such a large > cluster. > >> > > >> > Why would they have fewer issues with the thread pools ? AIUI, > a TX > >> > involves 2 RPCs (PREPARE-COMMIT/ROLLBACK) compared to one when > not using > >> > TXs. And we're sync anyway... > >> > > >> > > >> >> On Tue, Jul 29, 2014 at 5:38 PM, Bela Ban > >> >> >> wrote: > >> >> > >> >> Following up on my own email, I changed the config to use > Pedro's > >> >> excellent total order implementation: > >> >> > >> >> >> >> transactionProtocol="TOTAL_ORDER" lockingMode="OPTIMISTIC" > >> >> useEagerLocking="true" eagerLockSingleNode="true"> > >> >> > >> >> > >> >> With 100 nodes and 25 requester threads/node, I did NOT > run into > >> >> any > >> >> locking issues ! > >> >> > >> >> I could even go up to 200 requester threads/node and the > perf was ~ > >> >> 7'000-8'000 requests/sec/node. Not too bad ! > >> >> > >> >> This really validates the concept of lockless total-order > >> >> dissemination > >> >> of TXs; for the first time, this has been tested on a > large(r) > >> >> scale > >> >> (previously only on 25 nodes) and IT WORKS ! :-) > >> >> > >> >> I still believe we should implement my suggested solution for > >> >> non-TO > >> >> configs, but short of configuring thread pools of 1000 > threads or > >> >> higher, I hope TO will allow me to finally test a 500 node > >> >> Infinispan > >> >> cluster ! > >> >> > >> >> > >> >> On 29/07/14 15:56, Bela Ban wrote: > >> >> > Hi guys, > >> >> > > >> >> > sorry for the long post, but I do think I ran into an > important > >> >> problem > >> >> > and we need to fix it ... :-) > >> >> > > >> >> > I've spent the last couple of days running the > IspnPerfTest [1] > >> >> perftest > >> >> > on Google Compute Engine (GCE), and I've run into a > problem with > >> >> > Infinispan. It is a design problem and can be mitigated by > >> >> sizing > >> >> thread > >> >> > pools correctly, but cannot be eliminated entirely. > >> >> > > >> >> > > >> >> > Symptom: > >> >> > -------- > >> >> > IspnPerfTest has every node in a cluster perform > 20'000 requests > >> >> on keys > >> >> > in range [1..20000]. > >> >> > > >> >> > 80% of the requests are reads and 20% writes. > >> >> > > >> >> > By default, we have 25 requester threads per node and > 100 nodes > >> >> in a > >> >> > cluster, so a total of 2500 requester threads. > >> >> > > >> >> > The cache used is NON-TRANSACTIONAL / dist-sync / 2 > owners: > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > >> >> > useEagerLocking="true" > >> >> > eagerLockSingleNode="true" /> > >> >> > >> >> concurrencyLevel="1000" > >> >> > isolationLevel="READ_COMMITTED" > >> >> useLockStriping="false" /> > >> >> > > >> >> > > >> >> > It has 2 owners, a lock acquisition timeout of 5s and > a repl > >> >> timeout of > >> >> > 20s. Lock stripting is off, so we have 1 lock per key. > >> >> > > >> >> > When I run the test, I always get errors like those below: > >> >> > > >> >> > org.infinispan.util.concurrent.TimeoutException: Unable to > >> >> acquire lock > >> >> > after [10 seconds] on key [19386] for requestor > >> >> [Thread[invoker-3,5,main]]! > >> >> > Lock held by [Thread[OOB-194,ispn-perf-test,m5.1,5,main]] > >> >> > > >> >> > and > >> >> > > >> >> > org.infinispan.util.concurrent.TimeoutException: Node > m8.1 timed > >> >> out > >> >> > > >> >> > > >> >> > Investigation: > >> >> > ------------ > >> >> > When I looked at UNICAST3, I saw a lot of missing > messages on > >> >> the > >> >> > receive side and unacked messages on the send side. > This caused > >> >> me to > >> >> > look into the (mainly OOB) thread pools and - voila - > maxed out > >> >> ! > >> >> > > >> >> > I learned from Pedro that the Infinispan internal > thread pool > >> >> (with a > >> >> > default of 32 threads) can be configured, so I > increased it to > >> >> 300 and > >> >> > increased the OOB pools as well. > >> >> > > >> >> > This mitigated the problem somewhat, but when I > increased the > >> >> requester > >> >> > threads to 100, I had the same problem again. > Apparently, the > >> >> Infinispan > >> >> > internal thread pool uses a rejection policy of "run" > and thus > >> >> uses the > >> >> > JGroups (OOB) thread when exhausted. > >> >> > > >> >> > I learned (from Pedro and Mircea) that GETs and PUTs > work as > >> >> follows in > >> >> > dist-sync / 2 owners: > >> >> > - GETs are sent to the primary and backup owners and > the first > >> >> response > >> >> > received is returned to the caller. No locks are > acquired, so > >> >> GETs > >> >> > shouldn't cause problems. > >> >> > > >> >> > - A PUT(K) is sent to the primary owner of K > >> >> > - The primary owner > >> >> > (1) locks K > >> >> > (2) updates the backup owner synchronously > *while holding > >> >> the lock* > >> >> > (3) releases the lock > >> >> > > >> >> > > >> >> > Hypothesis > >> >> > ---------- > >> >> > (2) above is done while holding the lock. The sync > update of the > >> >> backup > >> >> > owner is done with the lock held to guarantee that the > primary > >> >> and > >> >> > backup owner of K have the same values for K. > >> >> > > >> >> > However, the sync update *inside the lock scope* slows > things > >> >> down (can > >> >> > it also lead to deadlocks?); there's the risk that the > request > >> >> is > >> >> > dropped due to a full incoming thread pool, or that > the response > >> >> is not > >> >> > received because of the same, or that the locking at > the backup > >> >> owner > >> >> > blocks for some time. > >> >> > > >> >> > If we have many threads modifying the same key, then > we have a > >> >> backlog > >> >> > of locking work against that key. Say we have 100 > requester > >> >> threads and > >> >> > a 100 node cluster. This means that we have 10'000 threads > >> >> accessing > >> >> > keys; with 2'000 writers there's a big chance that > some writers > >> >> pick the > >> >> > same key at the same time. > >> >> > > >> >> > For example, if we have 100 threads accessing key K > and it takes > >> >> 3ms to > >> >> > replicate K to the backup owner, then the last of the 100 > >> >> threads > >> >> waits > >> >> > ~300ms before it gets a chance to lock K on the > primary owner > >> >> and > >> >> > replicate it as well. > >> >> > > >> >> > Just a small hiccup in sending the PUT to the primary > owner, > >> >> sending the > >> >> > modification to the backup owner, waitting for the > response, or > >> >> GC, and > >> >> > the delay will quickly become bigger. > >> >> > > >> >> > > >> >> > Verification > >> >> > ---------- > >> >> > To verify the above, I set numOwners to 1. This means > that the > >> >> primary > >> >> > owner of K does *not* send the modification to the > backup owner, > >> >> it only > >> >> > locks K, modifies K and unlocks K again. > >> >> > > >> >> > I ran the IspnPerfTest again on 100 nodes, with 25 > requesters, > >> >> and NO > >> >> > PROBLEM ! > >> >> > > >> >> > I then increased the requesters to 100, 150 and 200 > and the test > >> >> > completed flawlessly ! Performance was around *40'000 > requests > >> >> per node > >> >> > per sec* on 4-core boxes ! > >> >> > > >> >> > > >> >> > Root cause > >> >> > --------- > >> >> > ******************* > >> >> > The root cause is the sync RPC of K to the backup > owner(s) of K > >> >> while > >> >> > the primary owner holds the lock for K. > >> >> > ******************* > >> >> > > >> >> > This causes a backlog of threads waiting for the lock > and that > >> >> backlog > >> >> > can grow to exhaust the thread pools. First the Infinispan > >> >> internal > >> >> > thread pool, then the JGroups OOB thread pool. The > latter causes > >> >> > retransmissions to get dropped, which compounds the > problem... > >> >> > > >> >> > > >> >> > Goal > >> >> > ---- > >> >> > The goal is to make sure that primary and backup > owner(s) of K > >> >> have the > >> >> > same value for K. > >> >> > > >> >> > Simply sending the modification to the backup owner(s) > >> >> asynchronously > >> >> > won't guarantee this, as modification messages might get > >> >> processed out > >> >> > of order as they're OOB ! > >> >> > > >> >> > > >> >> > Suggested solution > >> >> > ---------------- > >> >> > The modification RPC needs to be invoked *outside of > the lock > >> >> scope*: > >> >> > - lock K > >> >> > - modify K > >> >> > - unlock K > >> >> > - send modification to backup owner(s) // outside the > lock scope > >> >> > > >> >> > The primary owner puts the modification of K into a > queue from > >> >> where a > >> >> > separate thread/task removes it. The thread then > invokes the > >> >> PUT(K) on > >> >> > the backup owner(s). > >> >> > > >> >> > The queue has the modified keys in FIFO order, so the > >> >> modifications > >> >> > arrive at the backup owner(s) in the right order. > >> >> > > >> >> > This requires that the way GET is implemented changes > slightly: > >> >> instead > >> >> > of invoking a GET on all owners of K, we only invoke > it on the > >> >> primary > >> >> > owner, then the next-in-line etc. > >> >> > > >> >> > The reason for this is that the backup owner(s) may > not yet have > >> >> > received the modification of K. > >> >> > > >> >> > This is a better impl anyway (we discussed this > before) becuse > >> >> it > >> >> > generates less traffic; in the normal case, all but 1 GET > >> >> requests are > >> >> > unnecessary. > >> >> > > >> >> > > >> >> > > >> >> > Improvement > >> >> > ----------- > >> >> > The above solution can be simplified and even made more > >> >> efficient. > >> >> > Re-using concepts from IRAC [2], we can simply store the > >> >> modified > >> >> *keys* > >> >> > in the modification queue. The modification > replication thread > >> >> removes > >> >> > the key, gets the current value and invokes a > PUT/REMOVE on the > >> >> backup > >> >> > owner(s). > >> >> > > >> >> > Even better: a key is only ever added *once*, so if we > have > >> >> [5,2,17,3], > >> >> > adding key 2 is a no-op because the processing of key > 2 (in > >> >> second > >> >> > position in the queue) will fetch the up-to-date value > anyway ! > >> >> > > >> >> > > >> >> > Misc > >> >> > ---- > >> >> > - Could we possibly use total order to send the > updates in TO ? > >> >> TBD (Pedro?) > >> >> > > >> >> > > >> >> > Thoughts ? > >> >> > > >> >> > > >> >> > [1] https://github.com/belaban/IspnPerfTest > >> >> > [2] > >> >> > > >> >> > >> >> > https://github.com/infinispan/infinispan/wiki/RAC:-Reliable-Asynchronous-Clustering > >> >> > > >> >> > >> >> -- > >> >> Bela Ban, JGroups lead (http://www.jgroups.org) > >> >> _______________________________________________ > >> >> infinispan-dev mailing list > >> >> infinispan-dev at lists.jboss.org > > >> >>

> > >> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> >> > >> >> > >> >> > >> >> > >> >> _______________________________________________ > >> >> infinispan-dev mailing list > >> >> infinispan-dev at lists.jboss.org > > >> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> >> > >> > > >> > -- > >> > Bela Ban, JGroups lead (http://www.jgroups.org) > >> > _______________________________________________ > >> > infinispan-dev mailing list > >> > infinispan-dev at lists.jboss.org > > >> > https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> _______________________________________________ > >> infinispan-dev mailing list > >> infinispan-dev at lists.jboss.org > > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > From dan.berindei at gmail.com Wed Jul 30 05:13:38 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Wed, 30 Jul 2014 12:13:38 +0300 Subject: [infinispan-dev] DIST-SYNC, put(), a problem and a solution In-Reply-To: <53D8B43D.4060907@infinispan.org> References: <53D7A7F6.1010100@redhat.com> <53D7B1CA.9000309@redhat.com> <53D7B4B3.4080008@redhat.com>

<53D8B43D.4060907@infinispan.org> Message-ID: On Wed, Jul 30, 2014 at 12:00 PM, Pedro Ruivo wrote: > > > On 07/30/2014 09:02 AM, Dan Berindei wrote: > > > > > > > if your proposal is only meant to apply to non-tx caches, you are right > > you don't have to worry about multiple primary owners... most of the > > time. But when the primary owner changes, then you do have 2 primary > > owners (if the new primary owner installs the new topology first), and > > you do need to coordinate between the 2. > > > > I think it is the same for transactional cache. I.e. the commands wait > for the transaction data from the new topology to be installed. In the > non-tx caches, the old primary owner will send the next "sequence > number" to the new primary owner and only after that, the new primary > owner starts to give the orders. > I'm not sure that's related, commands that wait for a newer topology do not block a thread since the ISPN-3527 fix. > > Otherwise, I can implement a total order version for non-tx caches and > all the write serialization would be done in JGroups and Infinispan only > has to apply the updates as soon as they are delivered. > Right, that sounds quite interesting. But you'd also need a less-blocking state transfer ;) > > Slightly related: we also considered generating a version number on the > > client for consistency when the HotRod client retries after a primary > > owner failure [1]. But the clients can't create a monotonic sequence > > number, so we couldn't use that version number for this. > > > > [1] https://issues.jboss.org/browse/ISPN-2956 > > > > > > Also I don't see it as an alternative to TOA, I rather expect it to > > work nicely together: when TOA is enabled you could trust the > > originating sequence source rather than generate a per-entry > sequence, > > and in neither case you need to actually use a Lock. > > I haven't thought how the sequences would need to interact (if they > > need), but they seem complementary to resolve different aspects, and > > also both benefit from the same cleanup and basic structure. > > > > > > We don't acquire locks at all on the backup owners - either in tx or > > non-tx caches. If state transfer is in progress, we use > > ConcurrentHashMap.compute() to store tracking information, which uses a > > synchronized block, so I suppose we do acquire locks. I assume your > > proposal would require a DataContainer.compute() or something similar on > > the backups, to ensure that the version check and the replacement are > > atomic. > > > > I still think TOA does what you want for tx caches. Your proposal would > > only work for non-tx caches, so you couldn't use them together. > > > > > > >> Another aspect is that the "user thread" on the primary owner > > needs to > > >> wait (at least until we improve further) and only proceed after > ACK > > >> from backup nodes, but this is better modelled through a state > > >> machine. (Also discussed in Farnborough). > > > > > > > > > To be clear, I don't think keeping the user thread on the > > originator blocked > > > until we have the write confirmations from all the backups is a > > problem - a > > > sync operation has to block, and it also serves to rate-limit user > > > operations. > > > > > > There are better ways to rate-limit than to make all operations slow; > > we don't need to block a thread, we need to react on the reply from > > the backup owners. > > You still have an inherent rate-limit in the outgoing packet queues: > > if these fill up, then and only then it's nice to introduce some back > > pressure. > > > > > > Sorry, you got me confused when you called the thread on the primary > > owner a "user thread". I agree that internal stuff can and should be > > asynchronous, callback based, but the user still has to see a > > synchronous blocking operation. > > > > > > > The problem appears when the originator is not the primary owner, > > and the > > > thread blocking for backup ACKs is from the remote-executor pool > > (or OOB, > > > when the remote-executor pool is exhausted). > > > > Not following. I guess this is out of scope now that I clarified the > > proposed solution is only to be applied between primary and backups? > > > > > > Yeah, I was just trying to clarify that there is no danger of exhausting > > the remote executor/OOB thread pools when the originator of the write > > command is the primary owner (as it happens in the HotRod server). > > > > > > >> > > >> It's also conceptually linked to: > > >> - https://issues.jboss.org/browse/ISPN-1599 > > >> As you need to separate the locks of entries from the effective > user > > >> facing lock, at least to implement transactions on top of this > > model. > > > > > > > > > I think we fixed ISPN-1599 when we changed passivation to use > > > DataContainer.compute(). WDYT Pedro, is there anything else you'd > > like to do > > > in the scope of ISPN-1599? > > > > > >> > > >> I expect this to improve performance in a very significant way, > but > > >> it's getting embarrassing that it's still not done; at the next > face > > >> to face meeting we should also reserve some time for > retrospective > > >> sessions. > > > > > > > > > Implementing the state machine-based interceptor stack may give > us a > > > performance boost, but I'm much more certain that it's a very > > complex, high > > > risk task... and we don't have a stable test suite yet :) > > > > Cleaning up and removing some complexity such as > > TooManyExecutorsException might help to get it stable, and keep it > > there :) > > BTW it was quite stable for me until you changed the JGroups UDP > > default configuration. > > > > > > Do you really use UDP to run the tests? The default is TCP, but maybe > > the some tests doesn't use TestCacheManagerFactory... > > > > I was just aligning our configs with Bela's recommandations: MERGE3 > > instead of MERGE2 and the removal of UFC in TCP stacks. If they cause > > problems on your machine, you should make more noise :) > > > > Dan > > > > Sanne > > > > > > > > > > >> > > >> > > >> Sanne > > >> > > >> On 29 July 2014 15:50, Bela Ban > > wrote: > > >> > > > >> > > > >> > On 29/07/14 16:42, Dan Berindei wrote: > > >> >> Have you tried regular optimistic/pessimistic transactions as > > well? > > >> > > > >> > Yes, in my first impl. but since I'm making only 1 change per > > request, I > > >> > thought a TX is overkill. > > >> > > > >> >> They *should* have less issues with the OOB thread pool than > > non-tx > > >> >> mode, and > > >> >> I'm quite curious how they stack against TO in such a large > > cluster. > > >> > > > >> > Why would they have fewer issues with the thread pools ? AIUI, > > a TX > > >> > involves 2 RPCs (PREPARE-COMMIT/ROLLBACK) compared to one when > > not using > > >> > TXs. And we're sync anyway... > > >> > > > >> > > > >> >> On Tue, Jul 29, 2014 at 5:38 PM, Bela Ban > > > >> >> >> wrote: > > >> >> > > >> >> Following up on my own email, I changed the config to use > > Pedro's > > >> >> excellent total order implementation: > > >> >> > > >> >> > >> >> transactionProtocol="TOTAL_ORDER" lockingMode="OPTIMISTIC" > > >> >> useEagerLocking="true" eagerLockSingleNode="true"> > > >> >> > > >> >> > > >> >> With 100 nodes and 25 requester threads/node, I did NOT > > run into > > >> >> any > > >> >> locking issues ! > > >> >> > > >> >> I could even go up to 200 requester threads/node and the > > perf was ~ > > >> >> 7'000-8'000 requests/sec/node. Not too bad ! > > >> >> > > >> >> This really validates the concept of lockless total-order > > >> >> dissemination > > >> >> of TXs; for the first time, this has been tested on a > > large(r) > > >> >> scale > > >> >> (previously only on 25 nodes) and IT WORKS ! :-) > > >> >> > > >> >> I still believe we should implement my suggested solution > for > > >> >> non-TO > > >> >> configs, but short of configuring thread pools of 1000 > > threads or > > >> >> higher, I hope TO will allow me to finally test a 500 node > > >> >> Infinispan > > >> >> cluster ! > > >> >> > > >> >> > > >> >> On 29/07/14 15:56, Bela Ban wrote: > > >> >> > Hi guys, > > >> >> > > > >> >> > sorry for the long post, but I do think I ran into an > > important > > >> >> problem > > >> >> > and we need to fix it ... :-) > > >> >> > > > >> >> > I've spent the last couple of days running the > > IspnPerfTest [1] > > >> >> perftest > > >> >> > on Google Compute Engine (GCE), and I've run into a > > problem with > > >> >> > Infinispan. It is a design problem and can be > mitigated by > > >> >> sizing > > >> >> thread > > >> >> > pools correctly, but cannot be eliminated entirely. > > >> >> > > > >> >> > > > >> >> > Symptom: > > >> >> > -------- > > >> >> > IspnPerfTest has every node in a cluster perform > > 20'000 requests > > >> >> on keys > > >> >> > in range [1..20000]. > > >> >> > > > >> >> > 80% of the requests are reads and 20% writes. > > >> >> > > > >> >> > By default, we have 25 requester threads per node and > > 100 nodes > > >> >> in a > > >> >> > cluster, so a total of 2500 requester threads. > > >> >> > > > >> >> > The cache used is NON-TRANSACTIONAL / dist-sync / 2 > > owners: > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > >> >> > useEagerLocking="true" > > >> >> > eagerLockSingleNode="true" /> > > >> >> > > >> >> concurrencyLevel="1000" > > >> >> > isolationLevel="READ_COMMITTED" > > >> >> useLockStriping="false" /> > > >> >> > > > >> >> > > > >> >> > It has 2 owners, a lock acquisition timeout of 5s and > > a repl > > >> >> timeout of > > >> >> > 20s. Lock stripting is off, so we have 1 lock per key. > > >> >> > > > >> >> > When I run the test, I always get errors like those > below: > > >> >> > > > >> >> > org.infinispan.util.concurrent.TimeoutException: > Unable to > > >> >> acquire lock > > >> >> > after [10 seconds] on key [19386] for requestor > > >> >> [Thread[invoker-3,5,main]]! > > >> >> > Lock held by > [Thread[OOB-194,ispn-perf-test,m5.1,5,main]] > > >> >> > > > >> >> > and > > >> >> > > > >> >> > org.infinispan.util.concurrent.TimeoutException: Node > > m8.1 timed > > >> >> out > > >> >> > > > >> >> > > > >> >> > Investigation: > > >> >> > ------------ > > >> >> > When I looked at UNICAST3, I saw a lot of missing > > messages on > > >> >> the > > >> >> > receive side and unacked messages on the send side. > > This caused > > >> >> me to > > >> >> > look into the (mainly OOB) thread pools and - voila - > > maxed out > > >> >> ! > > >> >> > > > >> >> > I learned from Pedro that the Infinispan internal > > thread pool > > >> >> (with a > > >> >> > default of 32 threads) can be configured, so I > > increased it to > > >> >> 300 and > > >> >> > increased the OOB pools as well. > > >> >> > > > >> >> > This mitigated the problem somewhat, but when I > > increased the > > >> >> requester > > >> >> > threads to 100, I had the same problem again. > > Apparently, the > > >> >> Infinispan > > >> >> > internal thread pool uses a rejection policy of "run" > > and thus > > >> >> uses the > > >> >> > JGroups (OOB) thread when exhausted. > > >> >> > > > >> >> > I learned (from Pedro and Mircea) that GETs and PUTs > > work as > > >> >> follows in > > >> >> > dist-sync / 2 owners: > > >> >> > - GETs are sent to the primary and backup owners and > > the first > > >> >> response > > >> >> > received is returned to the caller. No locks are > > acquired, so > > >> >> GETs > > >> >> > shouldn't cause problems. > > >> >> > > > >> >> > - A PUT(K) is sent to the primary owner of K > > >> >> > - The primary owner > > >> >> > (1) locks K > > >> >> > (2) updates the backup owner synchronously > > *while holding > > >> >> the lock* > > >> >> > (3) releases the lock > > >> >> > > > >> >> > > > >> >> > Hypothesis > > >> >> > ---------- > > >> >> > (2) above is done while holding the lock. The sync > > update of the > > >> >> backup > > >> >> > owner is done with the lock held to guarantee that the > > primary > > >> >> and > > >> >> > backup owner of K have the same values for K. > > >> >> > > > >> >> > However, the sync update *inside the lock scope* slows > > things > > >> >> down (can > > >> >> > it also lead to deadlocks?); there's the risk that the > > request > > >> >> is > > >> >> > dropped due to a full incoming thread pool, or that > > the response > > >> >> is not > > >> >> > received because of the same, or that the locking at > > the backup > > >> >> owner > > >> >> > blocks for some time. > > >> >> > > > >> >> > If we have many threads modifying the same key, then > > we have a > > >> >> backlog > > >> >> > of locking work against that key. Say we have 100 > > requester > > >> >> threads and > > >> >> > a 100 node cluster. This means that we have 10'000 > threads > > >> >> accessing > > >> >> > keys; with 2'000 writers there's a big chance that > > some writers > > >> >> pick the > > >> >> > same key at the same time. > > >> >> > > > >> >> > For example, if we have 100 threads accessing key K > > and it takes > > >> >> 3ms to > > >> >> > replicate K to the backup owner, then the last of the > 100 > > >> >> threads > > >> >> waits > > >> >> > ~300ms before it gets a chance to lock K on the > > primary owner > > >> >> and > > >> >> > replicate it as well. > > >> >> > > > >> >> > Just a small hiccup in sending the PUT to the primary > > owner, > > >> >> sending the > > >> >> > modification to the backup owner, waitting for the > > response, or > > >> >> GC, and > > >> >> > the delay will quickly become bigger. > > >> >> > > > >> >> > > > >> >> > Verification > > >> >> > ---------- > > >> >> > To verify the above, I set numOwners to 1. This means > > that the > > >> >> primary > > >> >> > owner of K does *not* send the modification to the > > backup owner, > > >> >> it only > > >> >> > locks K, modifies K and unlocks K again. > > >> >> > > > >> >> > I ran the IspnPerfTest again on 100 nodes, with 25 > > requesters, > > >> >> and NO > > >> >> > PROBLEM ! > > >> >> > > > >> >> > I then increased the requesters to 100, 150 and 200 > > and the test > > >> >> > completed flawlessly ! Performance was around *40'000 > > requests > > >> >> per node > > >> >> > per sec* on 4-core boxes ! > > >> >> > > > >> >> > > > >> >> > Root cause > > >> >> > --------- > > >> >> > ******************* > > >> >> > The root cause is the sync RPC of K to the backup > > owner(s) of K > > >> >> while > > >> >> > the primary owner holds the lock for K. > > >> >> > ******************* > > >> >> > > > >> >> > This causes a backlog of threads waiting for the lock > > and that > > >> >> backlog > > >> >> > can grow to exhaust the thread pools. First the > Infinispan > > >> >> internal > > >> >> > thread pool, then the JGroups OOB thread pool. The > > latter causes > > >> >> > retransmissions to get dropped, which compounds the > > problem... > > >> >> > > > >> >> > > > >> >> > Goal > > >> >> > ---- > > >> >> > The goal is to make sure that primary and backup > > owner(s) of K > > >> >> have the > > >> >> > same value for K. > > >> >> > > > >> >> > Simply sending the modification to the backup owner(s) > > >> >> asynchronously > > >> >> > won't guarantee this, as modification messages might > get > > >> >> processed out > > >> >> > of order as they're OOB ! > > >> >> > > > >> >> > > > >> >> > Suggested solution > > >> >> > ---------------- > > >> >> > The modification RPC needs to be invoked *outside of > > the lock > > >> >> scope*: > > >> >> > - lock K > > >> >> > - modify K > > >> >> > - unlock K > > >> >> > - send modification to backup owner(s) // outside the > > lock scope > > >> >> > > > >> >> > The primary owner puts the modification of K into a > > queue from > > >> >> where a > > >> >> > separate thread/task removes it. The thread then > > invokes the > > >> >> PUT(K) on > > >> >> > the backup owner(s). > > >> >> > > > >> >> > The queue has the modified keys in FIFO order, so the > > >> >> modifications > > >> >> > arrive at the backup owner(s) in the right order. > > >> >> > > > >> >> > This requires that the way GET is implemented changes > > slightly: > > >> >> instead > > >> >> > of invoking a GET on all owners of K, we only invoke > > it on the > > >> >> primary > > >> >> > owner, then the next-in-line etc. > > >> >> > > > >> >> > The reason for this is that the backup owner(s) may > > not yet have > > >> >> > received the modification of K. > > >> >> > > > >> >> > This is a better impl anyway (we discussed this > > before) becuse > > >> >> it > > >> >> > generates less traffic; in the normal case, all but 1 > GET > > >> >> requests are > > >> >> > unnecessary. > > >> >> > > > >> >> > > > >> >> > > > >> >> > Improvement > > >> >> > ----------- > > >> >> > The above solution can be simplified and even made more > > >> >> efficient. > > >> >> > Re-using concepts from IRAC [2], we can simply store > the > > >> >> modified > > >> >> *keys* > > >> >> > in the modification queue. The modification > > replication thread > > >> >> removes > > >> >> > the key, gets the current value and invokes a > > PUT/REMOVE on the > > >> >> backup > > >> >> > owner(s). > > >> >> > > > >> >> > Even better: a key is only ever added *once*, so if we > > have > > >> >> [5,2,17,3], > > >> >> > adding key 2 is a no-op because the processing of key > > 2 (in > > >> >> second > > >> >> > position in the queue) will fetch the up-to-date value > > anyway ! > > >> >> > > > >> >> > > > >> >> > Misc > > >> >> > ---- > > >> >> > - Could we possibly use total order to send the > > updates in TO ? > > >> >> TBD (Pedro?) > > >> >> > > > >> >> > > > >> >> > Thoughts ? > > >> >> > > > >> >> > > > >> >> > [1] https://github.com/belaban/IspnPerfTest > > >> >> > [2] > > >> >> > > > >> >> > > >> >> > > > https://github.com/infinispan/infinispan/wiki/RAC:-Reliable-Asynchronous-Clustering > > >> >> > > > >> >> > > >> >> -- > > >> >> Bela Ban, JGroups lead (http://www.jgroups.org) > > >> >> _______________________________________________ > > >> >> infinispan-dev mailing list > > >> >> infinispan-dev at lists.jboss.org > > > > >> >> > > > > >> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> _______________________________________________ > > >> >> infinispan-dev mailing list > > >> >> infinispan-dev at lists.jboss.org > > > > >> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > >> >> > > >> > > > >> > -- > > >> > Bela Ban, JGroups lead (http://www.jgroups.org) > > >> > _______________________________________________ > > >> > infinispan-dev mailing list > > >> > infinispan-dev at lists.jboss.org > > > > >> > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > >> _______________________________________________ > > >> infinispan-dev mailing list > > >> infinispan-dev at lists.jboss.org > > > > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > > > > > > > > _______________________________________________ > > > infinispan-dev mailing list > > > infinispan-dev at lists.jboss.org > > > > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org infinispan-dev at lists.jboss.org> > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > > > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140730/cdfe7c5a/attachment-0001.html From rvansa at redhat.com Wed Jul 30 05:22:52 2014 From: rvansa at redhat.com (Radim Vansa) Date: Wed, 30 Jul 2014 11:22:52 +0200 Subject: [infinispan-dev] DIST-SYNC, put(), a problem and a solution In-Reply-To: References: <53D7A7F6.1010100@redhat.com> Message-ID: <53D8B96C.5030801@redhat.com> > Investigation: > ------------ > When I looked at UNICAST3, I saw a lot of missing messages on the > receive side and unacked messages on the send side. This caused me to > look into the (mainly OOB) thread pools and - voila - maxed out ! > > I learned from Pedro that the Infinispan internal thread pool (with a > default of 32 threads) can be configured, so I increased it to 300 and > increased the OOB pools as well. > > This mitigated the problem somewhat, but when I increased the > requester > threads to 100, I had the same problem again. Apparently, the > Infinispan > internal thread pool uses a rejection policy of "run" and thus > uses the > JGroups (OOB) thread when exhausted. > > > We can't use another rejection policy in the remote executor because > the message won't be re-delivered by JGroups, and we can't use a queue > either. Can't we just send response "Node is busy" and cancel the operation? (at least in cases where this is possible - we can't do that safely for CommitCommand, but usually it could be doable, right?) And what's the problem with queues, besides that these can grow out of memory? Radim -- Radim Vansa JBoss DataGrid QA -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140730/512af253/attachment.html From bban at redhat.com Wed Jul 30 06:01:23 2014 From: bban at redhat.com (Bela Ban) Date: Wed, 30 Jul 2014 12:01:23 +0200 Subject: [infinispan-dev] DIST-SYNC, put(), a problem and a solution In-Reply-To: References: <53D7A7F6.1010100@redhat.com> Message-ID: <53D8C273.5020009@redhat.com> On 29/07/14 16:39, Dan Berindei wrote: > Investigation: > ------------ > This mitigated the problem somewhat, but when I increased the requester > threads to 100, I had the same problem again. Apparently, the Infinispan > internal thread pool uses a rejection policy of "run" and thus uses the > JGroups (OOB) thread when exhausted. > > > We can't use another rejection policy in the remote executor because the > message won't be re-delivered by JGroups, and we can't use a queue either. Yes I'm aware of that and "run" is our only option for the Infinispan internal thread pool > Suggested solution > ---------------- > The modification RPC needs to be invoked *outside of the lock scope*: > - lock K > - modify K > - unlock K > - send modification to backup owner(s) // outside the lock scope > > The primary owner puts the modification of K into a queue from where a > separate thread/task removes it. The thread then invokes the PUT(K) on > the backup owner(s). > > > Does the replication thread execute the PUT(k) synchronously, or > asynchronously? I assume asynchronously, otherwise the replication > thread wouldn't be able to keep up with the writers. Async would be preferred, but the order of the updates needs to be guaranteed. This could be done with the sequence numbers suggested by Sanne, or using total order/TOA. > The queue has the modified keys in FIFO order, so the modifications > arrive at the backup owner(s) in the right order. > > > Sending the RPC to the backup owners asynchronously, while holding the > key lock, would do the same thing. Yes - but if there's a chance that the send() system call blocks, e.g. on TCP when the send window is full, the async repl should be outside the lock scope. If those update messages to the backup owner(s) are regular (not OOB) messages, FIFO would ensure that they're processed in the order in which they were sent. If they're OOB messages, we'd have to somehow guarantee ordering, e.g. using seq numbers. > This requires that the way GET is implemented changes slightly: instead > of invoking a GET on all owners of K, we only invoke it on the primary > owner, then the next-in-line etc. > I have a WIP branch for this and it seemed to work fine. Test suite > speed seemed about the same, but I didn't get to do a real performance test. Hmm, it should reduce overall traffic, which indirectly should lead to better performance. I hope you wrap your work with a JIRA and commit it ! -- Bela Ban, JGroups lead (http://www.jgroups.org) From bban at redhat.com Wed Jul 30 06:05:42 2014 From: bban at redhat.com (Bela Ban) Date: Wed, 30 Jul 2014 12:05:42 +0200 Subject: [infinispan-dev] DIST-SYNC, put(), a problem and a solution In-Reply-To: <53D8B96C.5030801@redhat.com> References: <53D7A7F6.1010100@redhat.com> <53D8B96C.5030801@redhat.com> Message-ID: <53D8C376.3010004@redhat.com> Quick update. Even when I changed the number of segments from 60 (default) to 1000 (on Dan's suggestion), I still got lock aquisition timeouts with regular TXs and in the TX-less case, too. With total order, I had not single issue in a couple of runs, one of them on 500 nodes with 200 requester threads each (100'000 threads) ! I didn't go back and measure performance of total order with 1000 segments. -- Bela Ban, JGroups lead (http://www.jgroups.org) From bban at redhat.com Wed Jul 30 06:13:44 2014 From: bban at redhat.com (Bela Ban) Date: Wed, 30 Jul 2014 12:13:44 +0200 Subject: [infinispan-dev] DIST-SYNC, put(), a problem and a solution In-Reply-To: References: <53D7A7F6.1010100@redhat.com> <53D7B1CA.9000309@redhat.com> <53D7B4B3.4080008@redhat.com> Message-ID: <53D8C558.5020804@redhat.com> On 29/07/14 20:06, Sanne Grinovero wrote: > This is a nasty problem and I also feel passionately we need to get > rid of it ASAP. +1 > I did have the same problems many times, and we discussed this also in > Farnborough; AFAIR Dan and Pedro had some excellent ideas to fix this. > > You don't need TO, and you don't need to lock at all as long as you > guarantee the backup owners are getting the number with some > monotonicity sequence attached to it, > all that backup owners need to do is ignore incoming commands which > are outdated. Right. And we need to handle the scenario where we get updates from multiple members, e.g. P40,P41,Q6,Q7 in the case where the primary owner changed from P to Q (or from Q to P ?) > Another aspect is that the "user thread" on the primary owner needs to > wait (at least until we improve further) and only proceed after ACK > from backup nodes, but this is better modelled through a state > machine. (Also discussed in Farnborough). > > It's also conceptually linked to: > - https://issues.jboss.org/browse/ISPN-1599 > As you need to separate the locks of entries from the effective user > facing lock, at least to implement transactions on top of this model. > > I expect this to improve performance in a very significant way, but > it's getting embarrassing that it's still not done; at the next face > to face meeting we should also reserve some time for retrospective > sessions. Yes - there's a link to the agenda of the 2015 team meeting, please feel free to update the agenda. I'll send out an email re dates and location shortly. [1] https://mojo.redhat.com/docs/DOC-977279 -- Bela Ban, JGroups lead (http://www.jgroups.org) From bban at redhat.com Wed Jul 30 06:26:00 2014 From: bban at redhat.com (Bela Ban) Date: Wed, 30 Jul 2014 12:26:00 +0200 Subject: [infinispan-dev] DIST-SYNC, put(), a problem and a solution In-Reply-To: References: <53D7A7F6.1010100@redhat.com> <53D7B1CA.9000309@redhat.com> <53D7B4B3.4080008@redhat.com> Message-ID: <53D8C838.5060907@redhat.com> On 29/07/14 23:14, Dan Berindei wrote: > > On Tue, Jul 29, 2014 at 9:06 PM, Sanne Grinovero > wrote: > > This is a nasty problem and I also feel passionately we need to get > rid of it ASAP. > I did have the same problems many times, and we discussed this also in > Farnborough; AFAIR Dan and Pedro had some excellent ideas to fix this. > > You don't need TO, and you don't need to lock at all as long as you > guarantee the backup owners are getting the number with some > monotonicity sequence attached to it, > all that backup owners need to do is ignore incoming commands which > are outdated. > > > This is more or less what TOA does - assign a monotonic sequence number > to txs, and only apply them after all the previous txs in the sequence > have been applied. The problem is getting that monotonic sequence when > there are multiple originators and multiple primary owners also requires > some extra RPCs. Yes - for a TX involving multiple keys, TOA's probably the way to go. However, for non-TXs caches and accessing single (or only few) keys, TOA's probably overkill. As long as we move the sync update RPC out of the lock scope, I'm fine with whatever solution you guys come up with. > Another aspect is that the "user thread" on the primary owner needs to > wait (at least until we improve further) and only proceed after ACK > from backup nodes, but this is better modelled through a state > machine. (Also discussed in Farnborough). > > > To be clear, I don't think keeping the user thread on the originator > blocked until we have the write confirmations from all the backups is a > problem - a sync operation has to block, and it also serves to > rate-limit user operations. I agree; sync mode implies user threads are blocking until an operation has completed. > The problem appears when the originator is not the primary owner, and > the thread blocking for backup ACKs is from the remote-executor pool (or > OOB, when the remote-executor pool is exhausted). > > > It's also conceptually linked to: > - https://issues.jboss.org/browse/ISPN-1599 > As you need to separate the locks of entries from the effective user > facing lock, at least to implement transactions on top of this model. > > > I think we fixed ISPN-1599 when we changed passivation to use > DataContainer.compute(). WDYT Pedro, is there anything else you'd like > to do in the scope of ISPN-1599? > > > I expect this to improve performance in a very significant way, but > it's getting embarrassing that it's still not done; at the next face > to face meeting we should also reserve some time for retrospective > sessions. > > > Implementing the state machine-based interceptor stack may give us a > performance boost, but I'm much more certain that it's a very complex, > high risk task... and we don't have a stable test suite yet :) Yes - this is something major, let's add it to the agenda -- Bela Ban, JGroups lead (http://www.jgroups.org) From bban at redhat.com Wed Jul 30 06:34:28 2014 From: bban at redhat.com (Bela Ban) Date: Wed, 30 Jul 2014 12:34:28 +0200 Subject: [infinispan-dev] DIST-SYNC, put(), a problem and a solution In-Reply-To: References: <53D7A7F6.1010100@redhat.com> <53D7B1CA.9000309@redhat.com> <53D7B4B3.4080008@redhat.com>

Message-ID: <53D8CA34.10206@redhat.com> On 29/07/14 23:35, Sanne Grinovero wrote: > The strategy I've proposed is only to be applied for the communication > from the primary owner to its backups: > the value to be written is well known as it's the primary owner which > defines it unilaterally (for example if there is an atomic replacement > to be computed) > and there is no need for extra RPCs as the sequence is not related to > a group of changes but for the specific entry only. How would this work with TXs involving multiple keys on different primary owners ? Each owner replicates with seqnos to the backup owners, so changes for single keys are received in order, but do we (need to) guarantee that TXs consistency is preserved ? In other words, do we preserve isolation: all changes of a TX are observed at the same logical time, across multiple backup owners ? > There is no such thing as a need for consensus across owners, nor need > for a central source for sequences. > > Also I don't see it as an alternative to TOA, I rather expect it to > work nicely together: when TOA is enabled you could trust the > originating sequence source rather than generate a per-entry sequence, > and in neither case you need to actually use a Lock. > I haven't thought how the sequences would need to interact (if they > need), but they seem complementary to resolve different aspects, and > also both benefit from the same cleanup and basic structure. > >>> Another aspect is that the "user thread" on the primary owner needs to >>> wait (at least until we improve further) and only proceed after ACK >>> from backup nodes, but this is better modelled through a state >>> machine. (Also discussed in Farnborough). >> >> >> To be clear, I don't think keeping the user thread on the originator blocked >> until we have the write confirmations from all the backups is a problem - a >> sync operation has to block, and it also serves to rate-limit user >> operations. > > > There are better ways to rate-limit than to make all operations slow; > we don't need to block a thread, we need to react on the reply from > the backup owners. Agreed. I think Dan mentioned it as a side effect. > You still have an inherent rate-limit in the outgoing packet queues: > if these fill up, then and only then it's nice to introduce some back > pressure. > > >> The problem appears when the originator is not the primary owner, and the >> thread blocking for backup ACKs is from the remote-executor pool (or OOB, >> when the remote-executor pool is exhausted). > > Not following. I guess this is out of scope now that I clarified the > proposed solution is only to be applied between primary and backups? > > >>> >>> It's also conceptually linked to: >>> - https://issues.jboss.org/browse/ISPN-1599 >>> As you need to separate the locks of entries from the effective user >>> facing lock, at least to implement transactions on top of this model. >> >> >> I think we fixed ISPN-1599 when we changed passivation to use >> DataContainer.compute(). WDYT Pedro, is there anything else you'd like to do >> in the scope of ISPN-1599? >> >>> >>> I expect this to improve performance in a very significant way, but >>> it's getting embarrassing that it's still not done; at the next face >>> to face meeting we should also reserve some time for retrospective >>> sessions. >> >> >> Implementing the state machine-based interceptor stack may give us a >> performance boost, but I'm much more certain that it's a very complex, high >> risk task... and we don't have a stable test suite yet :) > > Cleaning up and removing some complexity such as > TooManyExecutorsException might help to get it stable, and keep it > there :) > BTW it was quite stable for me until you changed the JGroups UDP > default configuration. > > Sanne -- Bela Ban, JGroups lead (http://www.jgroups.org) From dan.berindei at gmail.com Wed Jul 30 07:59:33 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Wed, 30 Jul 2014 14:59:33 +0300 Subject: [infinispan-dev] DIST-SYNC, put(), a problem and a solution In-Reply-To: <53D8B96C.5030801@redhat.com> References: <53D7A7F6.1010100@redhat.com> <53D8B96C.5030801@redhat.com> Message-ID: On Wed, Jul 30, 2014 at 12:22 PM, Radim Vansa wrote: > > Investigation: >> ------------ >> When I looked at UNICAST3, I saw a lot of missing messages on the >> receive side and unacked messages on the send side. This caused me to >> look into the (mainly OOB) thread pools and - voila - maxed out ! >> >> I learned from Pedro that the Infinispan internal thread pool (with a >> default of 32 threads) can be configured, so I increased it to 300 and >> increased the OOB pools as well. >> >> This mitigated the problem somewhat, but when I increased the requester >> threads to 100, I had the same problem again. Apparently, the Infinispan >> internal thread pool uses a rejection policy of "run" and thus uses the >> JGroups (OOB) thread when exhausted. >> > > We can't use another rejection policy in the remote executor because the > message won't be re-delivered by JGroups, and we can't use a queue either. > > > Can't we just send response "Node is busy" and cancel the operation? (at > least in cases where this is possible - we can't do that safely for > CommitCommand, but usually it could be doable, right?) And what's the > problem with queues, besides that these can grow out of memory? > No commit commands here, the cache is not transactional :) If the remote thread pool gets full on a backup node, there is no way to safely cancel the operation - other backup owners may have already applied the write. And even with numOwners=2, there are multiple backup owners during state transfer. We do throw an OutdatedTopologyException on the backups and retry the operation when the topology changes, we could do something similar when the remote executor thread pool is full. But 1) we have trouble preserving consistency when we retry, so we'd rather do it only when we really have to, and 2) repeated retries can be costly, as the primary needs to re-acquire the lock. The problem with queues is that commands are executed in the order they are in the queue. If a node has a remote executor thread pool of 100 threads and receives a prepare(tx1, put(k, v1) comand, then 1000 prepare(tx_i, put(k, v_i)) commands, and finally a commit(tx1) command, the commit(tx1) command will block until all but 99 of the the prepare(tx_i, put(k, v_i)) commands have timed out. I have some thoughts on improving that independently of Pedro's work on locking [1], and I've just written that up as ISPN-4585 [2] [1] https://issues.jboss.org/browse/ISPN-2849 [2] https://issues.jboss.org/browse/ISPN-4585 > > > Radim > > -- > Radim Vansa > JBoss DataGrid QA > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140730/406b4d10/attachment-0001.html From rvansa at redhat.com Wed Jul 30 08:47:15 2014 From: rvansa at redhat.com (Radim Vansa) Date: Wed, 30 Jul 2014 14:47:15 +0200 Subject: [infinispan-dev] DIST-SYNC, put(), a problem and a solution In-Reply-To: References: <53D7A7F6.1010100@redhat.com> <53D8B96C.5030801@redhat.com> Message-ID: <53D8E953.4030608@redhat.com> On 07/30/2014 01:59 PM, Dan Berindei wrote: > > > > On Wed, Jul 30, 2014 at 12:22 PM, Radim Vansa > wrote: > > >> Investigation: >> ------------ >> When I looked at UNICAST3, I saw a lot of missing messages on the >> receive side and unacked messages on the send side. This >> caused me to >> look into the (mainly OOB) thread pools and - voila - maxed out ! >> >> I learned from Pedro that the Infinispan internal thread pool >> (with a >> default of 32 threads) can be configured, so I increased it >> to 300 and >> increased the OOB pools as well. >> >> This mitigated the problem somewhat, but when I increased the >> requester >> threads to 100, I had the same problem again. Apparently, the >> Infinispan >> internal thread pool uses a rejection policy of "run" and >> thus uses the >> JGroups (OOB) thread when exhausted. >> >> >> We can't use another rejection policy in the remote executor >> because the message won't be re-delivered by JGroups, and we >> can't use a queue either. > > Can't we just send response "Node is busy" and cancel the > operation? (at least in cases where this is possible - we can't do > that safely for CommitCommand, but usually it could be doable, > right?) And what's the problem with queues, besides that these can > grow out of memory? > > > No commit commands here, the cache is not transactional :) Sure, but any change to OOB -> remote thread pool would likely affect both non-tx and tx. > > If the remote thread pool gets full on a backup node, there is no way > to safely cancel the operation - other backup owners may have already > applied the write. And even with numOwners=2, there are multiple > backup owners during state transfer. I was thinking about delaying the write until backup responds, but you're right, with 2 and more backups the situation is not that easy. > > We do throw an OutdatedTopologyException on the backups and retry the > operation when the topology changes, we could do something similar > when the remote executor thread pool is full. But 1) we have trouble > preserving consistency when we retry, so we'd rather do it only when > we really have to, and 2) repeated retries can be costly, as the > primary needs to re-acquire the lock. > > The problem with queues is that commands are executed in the order > they are in the queue. If a node has a remote executor thread pool of > 100 threads and receives a prepare(tx1, put(k, v1) comand, then 1000 > prepare(tx_i, put(k, v_i)) commands, and finally a commit(tx1) > command, the commit(tx1) command will block until all but 99 of the > the prepare(tx_i, put(k, v_i)) commands have timed out. Makes sense > > I have some thoughts on improving that independently of Pedro's work > on locking [1], and I've just written that up as ISPN-4585 [2] > > [1] https://issues.jboss.org/browse/ISPN-2849 > [2] https://issues.jboss.org/browse/ISPN-4585 > ISPN-2849 sounds a lot like the state machine-based interceptor stack, I am looking forward to that! (although it's the music of far future - ISPN 9, 10?) Thanks for those answers, Dan. I should realize most of that myself, but I don't have the capacity to hold all the wisdom about NBST algorithms online in my brain :) I hope some day I could catch a student looking for diploma thesis willing to model at least the basic Infinispan algorithms and formally verify that it's (in)correct ;-). Radim > > > Radim > > -- > Radim Vansa > JBoss DataGrid QA > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Radim Vansa JBoss DataGrid QA -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140730/9046bf28/attachment.html From mmarkus at redhat.com Wed Jul 30 14:17:41 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Wed, 30 Jul 2014 19:17:41 +0100 Subject: [infinispan-dev] On ParserRegistry and classloaders In-Reply-To: References: <9A22CC58-5B20-4D6B-BA6B-B4A23493979F@redhat.com> <53831239.30601@redhat.com> Message-ID: <114950B5-AA9F-485C-8E3C-AC36299974AB@redhat.com> Ion, Martin - what are your thoughts? On Jul 29, 2014, at 16:34, Sanne Grinovero