From ttarrant at redhat.com Thu Oct 2 09:21:07 2014 From: ttarrant at redhat.com (Tristan Tarrant) Date: Thu, 02 Oct 2014 15:21:07 +0200 Subject: [infinispan-dev] Clustering standalone Infinispan w/ WF running Infinispan In-Reply-To: <851849146.38574524.1412064092251.JavaMail.zimbra@redhat.com> References: <54257C74.2050600@redhat.com> <1898996727.37561681.1411744742097.JavaMail.zimbra@redhat.com> <54297D4F.9060009@jboss.com> <133530692.31414088.1412009840434.JavaMail.zimbra@redhat.com> <542A5583.4030908@redhat.com> <851849146.38574524.1412064092251.JavaMail.zimbra@redhat.com> Message-ID: <542D5143.3070006@redhat.com> I have successfully created a "hybrid" cluster between an application using Infinispan in embedded mode and an Infinispan server by doing the following on the embedded side: - use a JGroups Channel wrapped in a MuxHandler - use a custom class resolver which simulates (or rather... hacks) the behaviour of the ModularClassResolver when not using modules You can find the code at my personal GitHub repo: https://github.com/tristantarrant/infinispan-playground/tree/master/src/main/java/net/dataforte/infinispan/hybrid suggestions and improvements are welcome. Tristan On 30/09/14 10:01, Stelios Koussouris wrote: > Hi, > > To give a bit of context on this. We are doing a POC where the customer wishes to utilize JDG to speed up their application. We need (due to some customer requirements) to cluster > EMBEDDED JDG (infinispan library mode) with REMOTE JDG (Infinispan Server) nodes. The infinispan jars should be the same as they are only libraries and they > are on the same version. However, during "clustering" of the caches we started seeing errors which looked like there were due to the fact that the clustering of the caches contained different > info between the 2 types of cache instantiation (embedded vs server). > > The result was to for a suggestion to create our own MuxChannel (I don't know if we have any other alternatives at this stage to cluster embedded with server infinispan caches) but at the moment we are facing https://gist.github.com/skoussou/5edc5689446b67f85ae8 > > Regards, > > Stylianos Kousouris > Red Hat Middleware Consultant > > ----- Original Message ----- > From: "Tristan Tarrant" > To: "infinispan -Dev List" , "Kurt T Stam" > Cc: "Stelios Koussouris" , "Richard Achmatowicz" > Sent: Tuesday, 30 September, 2014 8:02:27 AM > Subject: Re: [infinispan-dev] Clustering standalone Infinispan w/ WF running Infinispan > > I don't know what Kurt is doing, but Stelios is attempting to cluster an > application using embedded Infinispan deployed within WF together with > an Infinispan Server instance. > The application is managing its own caches, and therefore it is not > interacting with the underlying Infinispan and JGroups subsystems in WF. > Infinispan Server uses its Infinispan and JGroups subsystems (which are > forked from WF's) and therefore are using MuxChannels. > > I told Stelios to use a MuxChannel-wrapped Channel in his application > and it solved part of the issue (he was initially importing the one > included in the WF's jgroups subsystem, but now he's using his local > copy), but now he has run into further problems and I believe what Paul > & Dennis have written might be correct. > > The code that configures this is in > EmbeddedCacheManagerConfigurationService: > > GlobalConfigurationBuilder builder = new GlobalConfigurationBuilder(); > ModuleLoader moduleLoader = this.dependencies.getModuleLoader(); > builder.serialization().classResolver(ModularClassResolver.getInstance(moduleLoader)); > > I don't know how you'd get a ModuleLoader from within a WF deployment, > but I'm sure it can be done. > > Tristan > > On 29/09/14 18:57, Paul Ferraro wrote: >> You should not need to use a MuxChannel. This would only be necessary if there are other EAP services sharing the channel. Using a MuxChannel allows your standalone Infinispan instance to filter these irrelevant messages. However, in JDG, there should be no other services other than Infinispan using the channel - hence the MuxChannel stuff is unnecessary. >> >> I think Dennis earlier response was spot on. EAP/JDG configures it's cache managers using a ModularClassResolver (which includes a module name along with the class name when marshalling). Your standalone Infinispan instances do not use this and therefore cannot make sense of the message body. >> >> Paul >> >> ----- Original Message ----- >>> From: "Kurt T Stam" >>> To: "Stelios Koussouris" , "Radoslav Husar" >>> Cc: "Galder Zamarre?o" , "Paul Ferraro" , "Richard Achmatowicz" >>> , "infinispan -Dev List" >>> Sent: Monday, September 29, 2014 11:39:59 AM >>> Subject: Re: Clustering standalone Infinispan w/ WF running Infinispan >>> >>> Thanks for following up Stelios, I think Galder is traveling the next 2 >>> weeks. >>> >>> So - do we need fixes on both ends then so that the boot order does not >>> matter? In which project(s) would we apply >>> there changes? Or can they be applied in the end-user's code? >>> >>> Thx, >>> >>> --Kurt >>> >>> >>> >>> On 9/26/14, 11:19 AM, Stelios Koussouris wrote: >>>> Hi, >>>> >>>> Rado: It is both ways. ie. if I start first the JDG Server I get the issue >>>> on the library mode side when I start that one. If reverse the order of >>>> startup I get it in the JDG Server side. >>>> >>>> Question: >>>> ----------------------------------------------------------------------------------------------------------------------- >>>> ...IMO the channel needs to be wrapped as >>>> org.jboss.as.clustering.jgroups.MuxChannel before passing to infinispan. >>>> ... >>>> ----------------------------------------------------------------------------------------------------------------------- >>>> For now that this is not being done. If I wanted to do it manually on the >>>> library side where I can create the protocol programmatically we are >>>> talking about something like this? >>>> >>>> ProtocolStackConfigurator configurator = >>>> ConfiguratorFactory.getStackConfigurator("jgroups-udp.xml"); >>>> MuxChannel channel = new MuxChannel(configurator); >>>> org.infinispan.remoting.transport.Transport transport = new >>>> org.infinispan.remoting.transport.jgroups.JGroupsTransport(channel); >>>> >>>> .... >>>> then replace the below >>>> new >>>> GlobalConfigurationBuilder().clusteredDefault().globalJmxStatistics().cacheManagerName("RDSCacheManager").allowDuplicateDomains(true).enable().transport().clusterName("UDM-CLUSTER").addProperty("configurationFile", >>>> "jgroups-udp.xml") >>>> >>>> WITH >>>> new >>>> GlobalConfigurationBuilder().clusteredDefault().globalJmxStatistics().cacheManagerName("RDSCacheManager").allowDuplicateDomains(true).enable().transport(Transport).clusterName("UDM-CLUSTER") >>>> >>>> Btw, someone mentioned that if I follow this method I need to to know the >>>> assigned mux ids, but that is not quite clear what it means with regards >>>> to the JGroupsTransport configuration >>>> >>>> Thanks, >>>> >>>> Stylianos Kousouris >>>> Red Hat Middleware Consultant >>>> >>>> ----- Original Message ----- >>>> From: "Radoslav Husar" >>>> To: "Galder Zamarre?o" , "Paul Ferraro" >>>> >>>> Cc: "Richard Achmatowicz" , "infinispan -Dev List" >>>> , "Stelios Koussouris" >>>> , "Kurt T Stam" >>>> Sent: Friday, 26 September, 2014 3:47:16 PM >>>> Subject: Re: Clustering standalone Infinispan w/ WF running Infinispan >>>> >>>> From what Stelios is telling me the question is a little bit other way >>>> round: he is using library mode infinispan and jgroups in EAP and >>>> connecting to JDG. So the question is what JDG is doing with the stack, >>>> not AS/WF as its infinispan/jgroups subsystem is not used. >>>> >>>> Unfortunately I don't have access to the JDG repo so I don't know what >>>> changes have been made there but if you are using the same jgroups >>>> logic, IMO the channel needs to be wrapped as >>>> org.jboss.as.clustering.jgroups.MuxChannel before passing to infinispan. >>>> >>>> Rado >>>> >>>> On 26/09/14 15:03, Galder Zamarre?o wrote: >>>>> Hey Paul, >>>>> >>>>> In the last couple of days, a couple of people have encountered the >>>>> exception in [1] when trying to cluster a standalone Infinispan app with >>>>> its own JGroups configuration file with a AS/WF running Infinispan cache. >>>>> >>>>> From my POV, 3 possible causes: >>>>> >>>>> 1. Dependency mismatches between AS/WF and the standalone app. Having done >>>>> some quick study of Kurt?s case, apart from micro version changes, all >>>>> looks good. >>>>> >>>>> 2. Mismatch in the Infinispan and/or JGroups configuration file. >>>>> >>>>> 3. AS/WF puts something on the clustered wire that standalone Infinispan >>>>> does not expect. Are you still doing multiplexing? Could you be adding >>>>> extra info to the wire? >>>>> >>>>> With this email, I?m trying to get some clarification from you if the >>>>> issue could be due to 3rd option. If it?s either of the first two, it?s a >>>>> matter of digging and finding the difference, but if it?s 3rd one, it?s >>>>> more problematic. >>>>> >>>>> Any ideas? >>>>> >>>>> [1] https://gist.github.com/skoussou/92f062f2d0bd17168e01 >>>>> -- >>>>> Galder Zamarre?o >>>>> galder at redhat.com >>>>> twitter.com/galderz >>>>> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > From paul.ferraro at redhat.com Thu Oct 2 10:06:01 2014 From: paul.ferraro at redhat.com (Paul Ferraro) Date: Thu, 2 Oct 2014 10:06:01 -0400 (EDT) Subject: [infinispan-dev] Clustering standalone Infinispan w/ WF running Infinispan In-Reply-To: <542D5143.3070006@redhat.com> References: <54257C74.2050600@redhat.com> <1898996727.37561681.1411744742097.JavaMail.zimbra@redhat.com> <54297D4F.9060009@jboss.com> <133530692.31414088.1412009840434.JavaMail.zimbra@redhat.com> <542A5583.4030908@redhat.com> <851849146.38574524.1412064092251.JavaMail.zimbra@redhat.com> <542D5143.3070006@redhat.com> Message-ID: <905338673.1908952.1412258761247.JavaMail.zimbra@redhat.com> The only other obvious alternative of which I can think is to actually start the application which uses embedded Infinispan using jboss-modules. That way you don't need to hack the behavior of ModularClassResolver. ----- Original Message ----- > From: "Tristan Tarrant" > To: "Stelios Koussouris" > Cc: "Kurt T Stam" , "infinispan -Dev List" , "Richard > Achmatowicz" > Sent: Thursday, October 2, 2014 9:21:07 AM > Subject: Re: [infinispan-dev] Clustering standalone Infinispan w/ WF running Infinispan > > I have successfully created a "hybrid" cluster between an application > using Infinispan in embedded mode and an Infinispan server by doing the > following on the embedded side: > > - use a JGroups Channel wrapped in a MuxHandler > - use a custom class resolver which simulates (or rather... hacks) the > behaviour of the ModularClassResolver when not using modules > > You can find the code at my personal GitHub repo: > > https://github.com/tristantarrant/infinispan-playground/tree/master/src/main/java/net/dataforte/infinispan/hybrid > > suggestions and improvements are welcome. > > Tristan > > On 30/09/14 10:01, Stelios Koussouris wrote: > > Hi, > > > > To give a bit of context on this. We are doing a POC where the customer > > wishes to utilize JDG to speed up their application. We need (due to some > > customer requirements) to cluster > > EMBEDDED JDG (infinispan library mode) with REMOTE JDG (Infinispan Server) > > nodes. The infinispan jars should be the same as they are only libraries > > and they > > are on the same version. However, during "clustering" of the caches we > > started seeing errors which looked like there were due to the fact that > > the clustering of the caches contained different > > info between the 2 types of cache instantiation (embedded vs server). > > > > The result was to for a suggestion to create our own MuxChannel (I don't > > know if we have any other alternatives at this stage to cluster embedded > > with server infinispan caches) but at the moment we are facing > > https://gist.github.com/skoussou/5edc5689446b67f85ae8 > > > > Regards, > > > > Stylianos Kousouris > > Red Hat Middleware Consultant > > > > ----- Original Message ----- > > From: "Tristan Tarrant" > > To: "infinispan -Dev List" , "Kurt T Stam" > > > > Cc: "Stelios Koussouris" , "Richard Achmatowicz" > > > > Sent: Tuesday, 30 September, 2014 8:02:27 AM > > Subject: Re: [infinispan-dev] Clustering standalone Infinispan w/ WF > > running Infinispan > > > > I don't know what Kurt is doing, but Stelios is attempting to cluster an > > application using embedded Infinispan deployed within WF together with > > an Infinispan Server instance. > > The application is managing its own caches, and therefore it is not > > interacting with the underlying Infinispan and JGroups subsystems in WF. > > Infinispan Server uses its Infinispan and JGroups subsystems (which are > > forked from WF's) and therefore are using MuxChannels. > > > > I told Stelios to use a MuxChannel-wrapped Channel in his application > > and it solved part of the issue (he was initially importing the one > > included in the WF's jgroups subsystem, but now he's using his local > > copy), but now he has run into further problems and I believe what Paul > > & Dennis have written might be correct. > > > > The code that configures this is in > > EmbeddedCacheManagerConfigurationService: > > > > GlobalConfigurationBuilder builder = new GlobalConfigurationBuilder(); > > ModuleLoader moduleLoader = this.dependencies.getModuleLoader(); > > builder.serialization().classResolver(ModularClassResolver.getInstance(moduleLoader)); > > > > I don't know how you'd get a ModuleLoader from within a WF deployment, > > but I'm sure it can be done. > > > > Tristan > > > > On 29/09/14 18:57, Paul Ferraro wrote: > >> You should not need to use a MuxChannel. This would only be necessary if > >> there are other EAP services sharing the channel. Using a MuxChannel > >> allows your standalone Infinispan instance to filter these irrelevant > >> messages. However, in JDG, there should be no other services other than > >> Infinispan using the channel - hence the MuxChannel stuff is unnecessary. > >> > >> I think Dennis earlier response was spot on. EAP/JDG configures it's > >> cache managers using a ModularClassResolver (which includes a module name > >> along with the class name when marshalling). Your standalone Infinispan > >> instances do not use this and therefore cannot make sense of the message > >> body. > >> > >> Paul > >> > >> ----- Original Message ----- > >>> From: "Kurt T Stam" > >>> To: "Stelios Koussouris" , "Radoslav Husar" > >>> > >>> Cc: "Galder Zamarre?o" , "Paul Ferraro" > >>> , "Richard Achmatowicz" > >>> , "infinispan -Dev List" > >>> > >>> Sent: Monday, September 29, 2014 11:39:59 AM > >>> Subject: Re: Clustering standalone Infinispan w/ WF running Infinispan > >>> > >>> Thanks for following up Stelios, I think Galder is traveling the next 2 > >>> weeks. > >>> > >>> So - do we need fixes on both ends then so that the boot order does not > >>> matter? In which project(s) would we apply > >>> there changes? Or can they be applied in the end-user's code? > >>> > >>> Thx, > >>> > >>> --Kurt > >>> > >>> > >>> > >>> On 9/26/14, 11:19 AM, Stelios Koussouris wrote: > >>>> Hi, > >>>> > >>>> Rado: It is both ways. ie. if I start first the JDG Server I get the > >>>> issue > >>>> on the library mode side when I start that one. If reverse the order of > >>>> startup I get it in the JDG Server side. > >>>> > >>>> Question: > >>>> ----------------------------------------------------------------------------------------------------------------------- > >>>> ...IMO the channel needs to be wrapped as > >>>> org.jboss.as.clustering.jgroups.MuxChannel before passing to infinispan. > >>>> ... > >>>> ----------------------------------------------------------------------------------------------------------------------- > >>>> For now that this is not being done. If I wanted to do it manually on > >>>> the > >>>> library side where I can create the protocol programmatically we are > >>>> talking about something like this? > >>>> > >>>> ProtocolStackConfigurator configurator = > >>>> ConfiguratorFactory.getStackConfigurator("jgroups-udp.xml"); > >>>> MuxChannel channel = new MuxChannel(configurator); > >>>> org.infinispan.remoting.transport.Transport transport = new > >>>> org.infinispan.remoting.transport.jgroups.JGroupsTransport(channel); > >>>> > >>>> .... > >>>> then replace the below > >>>> new > >>>> GlobalConfigurationBuilder().clusteredDefault().globalJmxStatistics().cacheManagerName("RDSCacheManager").allowDuplicateDomains(true).enable().transport().clusterName("UDM-CLUSTER").addProperty("configurationFile", > >>>> "jgroups-udp.xml") > >>>> > >>>> WITH > >>>> new > >>>> GlobalConfigurationBuilder().clusteredDefault().globalJmxStatistics().cacheManagerName("RDSCacheManager").allowDuplicateDomains(true).enable().transport(Transport).clusterName("UDM-CLUSTER") > >>>> > >>>> Btw, someone mentioned that if I follow this method I need to to know > >>>> the > >>>> assigned mux ids, but that is not quite clear what it means with regards > >>>> to the JGroupsTransport configuration > >>>> > >>>> Thanks, > >>>> > >>>> Stylianos Kousouris > >>>> Red Hat Middleware Consultant > >>>> > >>>> ----- Original Message ----- > >>>> From: "Radoslav Husar" > >>>> To: "Galder Zamarre?o" , "Paul Ferraro" > >>>> > >>>> Cc: "Richard Achmatowicz" , "infinispan -Dev List" > >>>> , "Stelios Koussouris" > >>>> , "Kurt T Stam" > >>>> Sent: Friday, 26 September, 2014 3:47:16 PM > >>>> Subject: Re: Clustering standalone Infinispan w/ WF running Infinispan > >>>> > >>>> From what Stelios is telling me the question is a little bit other > >>>> way > >>>> round: he is using library mode infinispan and jgroups in EAP and > >>>> connecting to JDG. So the question is what JDG is doing with the stack, > >>>> not AS/WF as its infinispan/jgroups subsystem is not used. > >>>> > >>>> Unfortunately I don't have access to the JDG repo so I don't know what > >>>> changes have been made there but if you are using the same jgroups > >>>> logic, IMO the channel needs to be wrapped as > >>>> org.jboss.as.clustering.jgroups.MuxChannel before passing to infinispan. > >>>> > >>>> Rado > >>>> > >>>> On 26/09/14 15:03, Galder Zamarre?o wrote: > >>>>> Hey Paul, > >>>>> > >>>>> In the last couple of days, a couple of people have encountered the > >>>>> exception in [1] when trying to cluster a standalone Infinispan app > >>>>> with > >>>>> its own JGroups configuration file with a AS/WF running Infinispan > >>>>> cache. > >>>>> > >>>>> From my POV, 3 possible causes: > >>>>> > >>>>> 1. Dependency mismatches between AS/WF and the standalone app. Having > >>>>> done > >>>>> some quick study of Kurt?s case, apart from micro version changes, all > >>>>> looks good. > >>>>> > >>>>> 2. Mismatch in the Infinispan and/or JGroups configuration file. > >>>>> > >>>>> 3. AS/WF puts something on the clustered wire that standalone > >>>>> Infinispan > >>>>> does not expect. Are you still doing multiplexing? Could you be adding > >>>>> extra info to the wire? > >>>>> > >>>>> With this email, I?m trying to get some clarification from you if the > >>>>> issue could be due to 3rd option. If it?s either of the first two, it?s > >>>>> a > >>>>> matter of digging and finding the difference, but if it?s 3rd one, it?s > >>>>> more problematic. > >>>>> > >>>>> Any ideas? > >>>>> > >>>>> [1] https://gist.github.com/skoussou/92f062f2d0bd17168e01 > >>>>> -- > >>>>> Galder Zamarre?o > >>>>> galder at redhat.com > >>>>> twitter.com/galderz > >>>>> > >> _______________________________________________ > >> infinispan-dev mailing list > >> infinispan-dev at lists.jboss.org > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From ttarrant at redhat.com Thu Oct 2 10:38:21 2014 From: ttarrant at redhat.com (Tristan Tarrant) Date: Thu, 02 Oct 2014 16:38:21 +0200 Subject: [infinispan-dev] Clustering standalone Infinispan w/ WF running Infinispan In-Reply-To: <905338673.1908952.1412258761247.JavaMail.zimbra@redhat.com> References: <54257C74.2050600@redhat.com> <1898996727.37561681.1411744742097.JavaMail.zimbra@redhat.com> <54297D4F.9060009@jboss.com> <133530692.31414088.1412009840434.JavaMail.zimbra@redhat.com> <542A5583.4030908@redhat.com> <851849146.38574524.1412064092251.JavaMail.zimbra@redhat.com> <542D5143.3070006@redhat.com> <905338673.1908952.1412258761247.JavaMail.zimbra@redhat.com> Message-ID: <542D635D.6030609@redhat.com> But then the module identifier wouldn't make sense: if you are embedding infinispan-core.jar, it would definitely not send "org.infinispan:main" as module:slot, which is what server needs instead. Tristan On 02/10/14 16:06, Paul Ferraro wrote: > The only other obvious alternative of which I can think is to actually start the application which uses embedded Infinispan using jboss-modules. > That way you don't need to hack the behavior of ModularClassResolver. > > ----- Original Message ----- >> From: "Tristan Tarrant" >> To: "Stelios Koussouris" >> Cc: "Kurt T Stam" , "infinispan -Dev List" , "Richard >> Achmatowicz" >> Sent: Thursday, October 2, 2014 9:21:07 AM >> Subject: Re: [infinispan-dev] Clustering standalone Infinispan w/ WF running Infinispan >> >> I have successfully created a "hybrid" cluster between an application >> using Infinispan in embedded mode and an Infinispan server by doing the >> following on the embedded side: >> >> - use a JGroups Channel wrapped in a MuxHandler >> - use a custom class resolver which simulates (or rather... hacks) the >> behaviour of the ModularClassResolver when not using modules >> >> You can find the code at my personal GitHub repo: >> >> https://github.com/tristantarrant/infinispan-playground/tree/master/src/main/java/net/dataforte/infinispan/hybrid >> >> suggestions and improvements are welcome. >> >> Tristan >> >> On 30/09/14 10:01, Stelios Koussouris wrote: >>> Hi, >>> >>> To give a bit of context on this. We are doing a POC where the customer >>> wishes to utilize JDG to speed up their application. We need (due to some >>> customer requirements) to cluster >>> EMBEDDED JDG (infinispan library mode) with REMOTE JDG (Infinispan Server) >>> nodes. The infinispan jars should be the same as they are only libraries >>> and they >>> are on the same version. However, during "clustering" of the caches we >>> started seeing errors which looked like there were due to the fact that >>> the clustering of the caches contained different >>> info between the 2 types of cache instantiation (embedded vs server). >>> >>> The result was to for a suggestion to create our own MuxChannel (I don't >>> know if we have any other alternatives at this stage to cluster embedded >>> with server infinispan caches) but at the moment we are facing >>> https://gist.github.com/skoussou/5edc5689446b67f85ae8 >>> >>> Regards, >>> >>> Stylianos Kousouris >>> Red Hat Middleware Consultant >>> >>> ----- Original Message ----- >>> From: "Tristan Tarrant" >>> To: "infinispan -Dev List" , "Kurt T Stam" >>> >>> Cc: "Stelios Koussouris" , "Richard Achmatowicz" >>> >>> Sent: Tuesday, 30 September, 2014 8:02:27 AM >>> Subject: Re: [infinispan-dev] Clustering standalone Infinispan w/ WF >>> running Infinispan >>> >>> I don't know what Kurt is doing, but Stelios is attempting to cluster an >>> application using embedded Infinispan deployed within WF together with >>> an Infinispan Server instance. >>> The application is managing its own caches, and therefore it is not >>> interacting with the underlying Infinispan and JGroups subsystems in WF. >>> Infinispan Server uses its Infinispan and JGroups subsystems (which are >>> forked from WF's) and therefore are using MuxChannels. >>> >>> I told Stelios to use a MuxChannel-wrapped Channel in his application >>> and it solved part of the issue (he was initially importing the one >>> included in the WF's jgroups subsystem, but now he's using his local >>> copy), but now he has run into further problems and I believe what Paul >>> & Dennis have written might be correct. >>> >>> The code that configures this is in >>> EmbeddedCacheManagerConfigurationService: >>> >>> GlobalConfigurationBuilder builder = new GlobalConfigurationBuilder(); >>> ModuleLoader moduleLoader = this.dependencies.getModuleLoader(); >>> builder.serialization().classResolver(ModularClassResolver.getInstance(moduleLoader)); >>> >>> I don't know how you'd get a ModuleLoader from within a WF deployment, >>> but I'm sure it can be done. >>> >>> Tristan >>> >>> On 29/09/14 18:57, Paul Ferraro wrote: >>>> You should not need to use a MuxChannel. This would only be necessary if >>>> there are other EAP services sharing the channel. Using a MuxChannel >>>> allows your standalone Infinispan instance to filter these irrelevant >>>> messages. However, in JDG, there should be no other services other than >>>> Infinispan using the channel - hence the MuxChannel stuff is unnecessary. >>>> >>>> I think Dennis earlier response was spot on. EAP/JDG configures it's >>>> cache managers using a ModularClassResolver (which includes a module name >>>> along with the class name when marshalling). Your standalone Infinispan >>>> instances do not use this and therefore cannot make sense of the message >>>> body. >>>> >>>> Paul >>>> >>>> ----- Original Message ----- >>>>> From: "Kurt T Stam" >>>>> To: "Stelios Koussouris" , "Radoslav Husar" >>>>> >>>>> Cc: "Galder Zamarre?o" , "Paul Ferraro" >>>>> , "Richard Achmatowicz" >>>>> , "infinispan -Dev List" >>>>> >>>>> Sent: Monday, September 29, 2014 11:39:59 AM >>>>> Subject: Re: Clustering standalone Infinispan w/ WF running Infinispan >>>>> >>>>> Thanks for following up Stelios, I think Galder is traveling the next 2 >>>>> weeks. >>>>> >>>>> So - do we need fixes on both ends then so that the boot order does not >>>>> matter? In which project(s) would we apply >>>>> there changes? Or can they be applied in the end-user's code? >>>>> >>>>> Thx, >>>>> >>>>> --Kurt >>>>> >>>>> >>>>> >>>>> On 9/26/14, 11:19 AM, Stelios Koussouris wrote: >>>>>> Hi, >>>>>> >>>>>> Rado: It is both ways. ie. if I start first the JDG Server I get the >>>>>> issue >>>>>> on the library mode side when I start that one. If reverse the order of >>>>>> startup I get it in the JDG Server side. >>>>>> >>>>>> Question: >>>>>> ----------------------------------------------------------------------------------------------------------------------- >>>>>> ...IMO the channel needs to be wrapped as >>>>>> org.jboss.as.clustering.jgroups.MuxChannel before passing to infinispan. >>>>>> ... >>>>>> ----------------------------------------------------------------------------------------------------------------------- >>>>>> For now that this is not being done. If I wanted to do it manually on >>>>>> the >>>>>> library side where I can create the protocol programmatically we are >>>>>> talking about something like this? >>>>>> >>>>>> ProtocolStackConfigurator configurator = >>>>>> ConfiguratorFactory.getStackConfigurator("jgroups-udp.xml"); >>>>>> MuxChannel channel = new MuxChannel(configurator); >>>>>> org.infinispan.remoting.transport.Transport transport = new >>>>>> org.infinispan.remoting.transport.jgroups.JGroupsTransport(channel); >>>>>> >>>>>> .... >>>>>> then replace the below >>>>>> new >>>>>> GlobalConfigurationBuilder().clusteredDefault().globalJmxStatistics().cacheManagerName("RDSCacheManager").allowDuplicateDomains(true).enable().transport().clusterName("UDM-CLUSTER").addProperty("configurationFile", >>>>>> "jgroups-udp.xml") >>>>>> >>>>>> WITH >>>>>> new >>>>>> GlobalConfigurationBuilder().clusteredDefault().globalJmxStatistics().cacheManagerName("RDSCacheManager").allowDuplicateDomains(true).enable().transport(Transport).clusterName("UDM-CLUSTER") >>>>>> >>>>>> Btw, someone mentioned that if I follow this method I need to to know >>>>>> the >>>>>> assigned mux ids, but that is not quite clear what it means with regards >>>>>> to the JGroupsTransport configuration >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Stylianos Kousouris >>>>>> Red Hat Middleware Consultant >>>>>> >>>>>> ----- Original Message ----- >>>>>> From: "Radoslav Husar" >>>>>> To: "Galder Zamarre?o" , "Paul Ferraro" >>>>>> >>>>>> Cc: "Richard Achmatowicz" , "infinispan -Dev List" >>>>>> , "Stelios Koussouris" >>>>>> , "Kurt T Stam" >>>>>> Sent: Friday, 26 September, 2014 3:47:16 PM >>>>>> Subject: Re: Clustering standalone Infinispan w/ WF running Infinispan >>>>>> >>>>>> From what Stelios is telling me the question is a little bit other >>>>>> way >>>>>> round: he is using library mode infinispan and jgroups in EAP and >>>>>> connecting to JDG. So the question is what JDG is doing with the stack, >>>>>> not AS/WF as its infinispan/jgroups subsystem is not used. >>>>>> >>>>>> Unfortunately I don't have access to the JDG repo so I don't know what >>>>>> changes have been made there but if you are using the same jgroups >>>>>> logic, IMO the channel needs to be wrapped as >>>>>> org.jboss.as.clustering.jgroups.MuxChannel before passing to infinispan. >>>>>> >>>>>> Rado >>>>>> >>>>>> On 26/09/14 15:03, Galder Zamarre?o wrote: >>>>>>> Hey Paul, >>>>>>> >>>>>>> In the last couple of days, a couple of people have encountered the >>>>>>> exception in [1] when trying to cluster a standalone Infinispan app >>>>>>> with >>>>>>> its own JGroups configuration file with a AS/WF running Infinispan >>>>>>> cache. >>>>>>> >>>>>>> From my POV, 3 possible causes: >>>>>>> >>>>>>> 1. Dependency mismatches between AS/WF and the standalone app. Having >>>>>>> done >>>>>>> some quick study of Kurt?s case, apart from micro version changes, all >>>>>>> looks good. >>>>>>> >>>>>>> 2. Mismatch in the Infinispan and/or JGroups configuration file. >>>>>>> >>>>>>> 3. AS/WF puts something on the clustered wire that standalone >>>>>>> Infinispan >>>>>>> does not expect. Are you still doing multiplexing? Could you be adding >>>>>>> extra info to the wire? >>>>>>> >>>>>>> With this email, I?m trying to get some clarification from you if the >>>>>>> issue could be due to 3rd option. If it?s either of the first two, it?s >>>>>>> a >>>>>>> matter of digging and finding the difference, but if it?s 3rd one, it?s >>>>>>> more problematic. >>>>>>> >>>>>>> Any ideas? >>>>>>> >>>>>>> [1] https://gist.github.com/skoussou/92f062f2d0bd17168e01 >>>>>>> -- >>>>>>> Galder Zamarre?o >>>>>>> galder at redhat.com >>>>>>> twitter.com/galderz >>>>>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From paul.ferraro at redhat.com Thu Oct 2 13:46:14 2014 From: paul.ferraro at redhat.com (Paul Ferraro) Date: Thu, 2 Oct 2014 13:46:14 -0400 (EDT) Subject: [infinispan-dev] Clustering standalone Infinispan w/ WF running Infinispan In-Reply-To: <542D635D.6030609@redhat.com> References: <54297D4F.9060009@jboss.com> <133530692.31414088.1412009840434.JavaMail.zimbra@redhat.com> <542A5583.4030908@redhat.com> <851849146.38574524.1412064092251.JavaMail.zimbra@redhat.com> <542D5143.3070006@redhat.com> <905338673.1908952.1412258761247.JavaMail.zimbra@redhat.com> <542D635D.6030609@redhat.com> Message-ID: <1607379301.2119649.1412271974105.JavaMail.zimbra@redhat.com> infinispan-core and its dependencies would need to be bundled as modules using the same module descriptors as the server. ----- Original Message ----- > From: "Tristan Tarrant" > To: "infinispan -Dev List" > Cc: "Kurt T Stam" , "Stelios Koussouris" , "Richard Achmatowicz" > > Sent: Thursday, October 2, 2014 10:38:21 AM > Subject: Re: [infinispan-dev] Clustering standalone Infinispan w/ WF running Infinispan > > But then the module identifier wouldn't make sense: if you are embedding > infinispan-core.jar, it would definitely not send "org.infinispan:main" > as module:slot, which is what server needs instead. > > Tristan > > > On 02/10/14 16:06, Paul Ferraro wrote: > > The only other obvious alternative of which I can think is to actually > > start the application which uses embedded Infinispan using jboss-modules. > > That way you don't need to hack the behavior of ModularClassResolver. > > > > ----- Original Message ----- > >> From: "Tristan Tarrant" > >> To: "Stelios Koussouris" > >> Cc: "Kurt T Stam" , "infinispan -Dev List" > >> , "Richard > >> Achmatowicz" > >> Sent: Thursday, October 2, 2014 9:21:07 AM > >> Subject: Re: [infinispan-dev] Clustering standalone Infinispan w/ WF > >> running Infinispan > >> > >> I have successfully created a "hybrid" cluster between an application > >> using Infinispan in embedded mode and an Infinispan server by doing the > >> following on the embedded side: > >> > >> - use a JGroups Channel wrapped in a MuxHandler > >> - use a custom class resolver which simulates (or rather... hacks) the > >> behaviour of the ModularClassResolver when not using modules > >> > >> You can find the code at my personal GitHub repo: > >> > >> https://github.com/tristantarrant/infinispan-playground/tree/master/src/main/java/net/dataforte/infinispan/hybrid > >> > >> suggestions and improvements are welcome. > >> > >> Tristan > >> > >> On 30/09/14 10:01, Stelios Koussouris wrote: > >>> Hi, > >>> > >>> To give a bit of context on this. We are doing a POC where the customer > >>> wishes to utilize JDG to speed up their application. We need (due to some > >>> customer requirements) to cluster > >>> EMBEDDED JDG (infinispan library mode) with REMOTE JDG (Infinispan > >>> Server) > >>> nodes. The infinispan jars should be the same as they are only libraries > >>> and they > >>> are on the same version. However, during "clustering" of the caches we > >>> started seeing errors which looked like there were due to the fact that > >>> the clustering of the caches contained different > >>> info between the 2 types of cache instantiation (embedded vs server). > >>> > >>> The result was to for a suggestion to create our own MuxChannel (I don't > >>> know if we have any other alternatives at this stage to cluster embedded > >>> with server infinispan caches) but at the moment we are facing > >>> https://gist.github.com/skoussou/5edc5689446b67f85ae8 > >>> > >>> Regards, > >>> > >>> Stylianos Kousouris > >>> Red Hat Middleware Consultant > >>> > >>> ----- Original Message ----- > >>> From: "Tristan Tarrant" > >>> To: "infinispan -Dev List" , "Kurt T > >>> Stam" > >>> > >>> Cc: "Stelios Koussouris" , "Richard Achmatowicz" > >>> > >>> Sent: Tuesday, 30 September, 2014 8:02:27 AM > >>> Subject: Re: [infinispan-dev] Clustering standalone Infinispan w/ WF > >>> running Infinispan > >>> > >>> I don't know what Kurt is doing, but Stelios is attempting to cluster an > >>> application using embedded Infinispan deployed within WF together with > >>> an Infinispan Server instance. > >>> The application is managing its own caches, and therefore it is not > >>> interacting with the underlying Infinispan and JGroups subsystems in WF. > >>> Infinispan Server uses its Infinispan and JGroups subsystems (which are > >>> forked from WF's) and therefore are using MuxChannels. > >>> > >>> I told Stelios to use a MuxChannel-wrapped Channel in his application > >>> and it solved part of the issue (he was initially importing the one > >>> included in the WF's jgroups subsystem, but now he's using his local > >>> copy), but now he has run into further problems and I believe what Paul > >>> & Dennis have written might be correct. > >>> > >>> The code that configures this is in > >>> EmbeddedCacheManagerConfigurationService: > >>> > >>> GlobalConfigurationBuilder builder = new GlobalConfigurationBuilder(); > >>> ModuleLoader moduleLoader = this.dependencies.getModuleLoader(); > >>> builder.serialization().classResolver(ModularClassResolver.getInstance(moduleLoader)); > >>> > >>> I don't know how you'd get a ModuleLoader from within a WF deployment, > >>> but I'm sure it can be done. > >>> > >>> Tristan > >>> > >>> On 29/09/14 18:57, Paul Ferraro wrote: > >>>> You should not need to use a MuxChannel. This would only be necessary > >>>> if > >>>> there are other EAP services sharing the channel. Using a MuxChannel > >>>> allows your standalone Infinispan instance to filter these irrelevant > >>>> messages. However, in JDG, there should be no other services other than > >>>> Infinispan using the channel - hence the MuxChannel stuff is > >>>> unnecessary. > >>>> > >>>> I think Dennis earlier response was spot on. EAP/JDG configures it's > >>>> cache managers using a ModularClassResolver (which includes a module > >>>> name > >>>> along with the class name when marshalling). Your standalone Infinispan > >>>> instances do not use this and therefore cannot make sense of the message > >>>> body. > >>>> > >>>> Paul > >>>> > >>>> ----- Original Message ----- > >>>>> From: "Kurt T Stam" > >>>>> To: "Stelios Koussouris" , "Radoslav Husar" > >>>>> > >>>>> Cc: "Galder Zamarre?o" , "Paul Ferraro" > >>>>> , "Richard Achmatowicz" > >>>>> , "infinispan -Dev List" > >>>>> > >>>>> Sent: Monday, September 29, 2014 11:39:59 AM > >>>>> Subject: Re: Clustering standalone Infinispan w/ WF running Infinispan > >>>>> > >>>>> Thanks for following up Stelios, I think Galder is traveling the next 2 > >>>>> weeks. > >>>>> > >>>>> So - do we need fixes on both ends then so that the boot order does not > >>>>> matter? In which project(s) would we apply > >>>>> there changes? Or can they be applied in the end-user's code? > >>>>> > >>>>> Thx, > >>>>> > >>>>> --Kurt > >>>>> > >>>>> > >>>>> > >>>>> On 9/26/14, 11:19 AM, Stelios Koussouris wrote: > >>>>>> Hi, > >>>>>> > >>>>>> Rado: It is both ways. ie. if I start first the JDG Server I get the > >>>>>> issue > >>>>>> on the library mode side when I start that one. If reverse the order > >>>>>> of > >>>>>> startup I get it in the JDG Server side. > >>>>>> > >>>>>> Question: > >>>>>> ----------------------------------------------------------------------------------------------------------------------- > >>>>>> ...IMO the channel needs to be wrapped as > >>>>>> org.jboss.as.clustering.jgroups.MuxChannel before passing to > >>>>>> infinispan. > >>>>>> ... > >>>>>> ----------------------------------------------------------------------------------------------------------------------- > >>>>>> For now that this is not being done. If I wanted to do it manually on > >>>>>> the > >>>>>> library side where I can create the protocol programmatically we are > >>>>>> talking about something like this? > >>>>>> > >>>>>> ProtocolStackConfigurator configurator = > >>>>>> ConfiguratorFactory.getStackConfigurator("jgroups-udp.xml"); > >>>>>> MuxChannel channel = new MuxChannel(configurator); > >>>>>> org.infinispan.remoting.transport.Transport transport = new > >>>>>> org.infinispan.remoting.transport.jgroups.JGroupsTransport(channel); > >>>>>> > >>>>>> .... > >>>>>> then replace the below > >>>>>> new > >>>>>> GlobalConfigurationBuilder().clusteredDefault().globalJmxStatistics().cacheManagerName("RDSCacheManager").allowDuplicateDomains(true).enable().transport().clusterName("UDM-CLUSTER").addProperty("configurationFile", > >>>>>> "jgroups-udp.xml") > >>>>>> > >>>>>> WITH > >>>>>> new > >>>>>> GlobalConfigurationBuilder().clusteredDefault().globalJmxStatistics().cacheManagerName("RDSCacheManager").allowDuplicateDomains(true).enable().transport(Transport).clusterName("UDM-CLUSTER") > >>>>>> > >>>>>> Btw, someone mentioned that if I follow this method I need to to know > >>>>>> the > >>>>>> assigned mux ids, but that is not quite clear what it means with > >>>>>> regards > >>>>>> to the JGroupsTransport configuration > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> Stylianos Kousouris > >>>>>> Red Hat Middleware Consultant > >>>>>> > >>>>>> ----- Original Message ----- > >>>>>> From: "Radoslav Husar" > >>>>>> To: "Galder Zamarre?o" , "Paul Ferraro" > >>>>>> > >>>>>> Cc: "Richard Achmatowicz" , "infinispan -Dev > >>>>>> List" > >>>>>> , "Stelios Koussouris" > >>>>>> , "Kurt T Stam" > >>>>>> Sent: Friday, 26 September, 2014 3:47:16 PM > >>>>>> Subject: Re: Clustering standalone Infinispan w/ WF running Infinispan > >>>>>> > >>>>>> From what Stelios is telling me the question is a little bit > >>>>>> other > >>>>>> way > >>>>>> round: he is using library mode infinispan and jgroups in EAP and > >>>>>> connecting to JDG. So the question is what JDG is doing with the > >>>>>> stack, > >>>>>> not AS/WF as its infinispan/jgroups subsystem is not used. > >>>>>> > >>>>>> Unfortunately I don't have access to the JDG repo so I don't know what > >>>>>> changes have been made there but if you are using the same jgroups > >>>>>> logic, IMO the channel needs to be wrapped as > >>>>>> org.jboss.as.clustering.jgroups.MuxChannel before passing to > >>>>>> infinispan. > >>>>>> > >>>>>> Rado > >>>>>> > >>>>>> On 26/09/14 15:03, Galder Zamarre?o wrote: > >>>>>>> Hey Paul, > >>>>>>> > >>>>>>> In the last couple of days, a couple of people have encountered the > >>>>>>> exception in [1] when trying to cluster a standalone Infinispan app > >>>>>>> with > >>>>>>> its own JGroups configuration file with a AS/WF running Infinispan > >>>>>>> cache. > >>>>>>> > >>>>>>> From my POV, 3 possible causes: > >>>>>>> > >>>>>>> 1. Dependency mismatches between AS/WF and the standalone app. Having > >>>>>>> done > >>>>>>> some quick study of Kurt?s case, apart from micro version changes, > >>>>>>> all > >>>>>>> looks good. > >>>>>>> > >>>>>>> 2. Mismatch in the Infinispan and/or JGroups configuration file. > >>>>>>> > >>>>>>> 3. AS/WF puts something on the clustered wire that standalone > >>>>>>> Infinispan > >>>>>>> does not expect. Are you still doing multiplexing? Could you be > >>>>>>> adding > >>>>>>> extra info to the wire? > >>>>>>> > >>>>>>> With this email, I?m trying to get some clarification from you if the > >>>>>>> issue could be due to 3rd option. If it?s either of the first two, > >>>>>>> it?s > >>>>>>> a > >>>>>>> matter of digging and finding the difference, but if it?s 3rd one, > >>>>>>> it?s > >>>>>>> more problematic. > >>>>>>> > >>>>>>> Any ideas? > >>>>>>> > >>>>>>> [1] https://gist.github.com/skoussou/92f062f2d0bd17168e01 > >>>>>>> -- > >>>>>>> Galder Zamarre?o > >>>>>>> galder at redhat.com > >>>>>>> twitter.com/galderz > >>>>>>> > >>>> _______________________________________________ > >>>> infinispan-dev mailing list > >>>> infinispan-dev at lists.jboss.org > >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> _______________________________________________ > >> infinispan-dev mailing list > >> infinispan-dev at lists.jboss.org > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From rvansa at redhat.com Fri Oct 3 04:30:10 2014 From: rvansa at redhat.com (Radim Vansa) Date: Fri, 03 Oct 2014 10:30:10 +0200 Subject: [infinispan-dev] About size() Message-ID: <542E5E92.7060504@redhat.com> Hi, recently we had a discussion about what size() returns, but I've realized there are more things that users would like to know. My question is whether you think that they would really appreciate it, or whether it's just my QA point of view where I sometimes compute the 'checksums' of cache to see if I didn't lost anything. There are those sizes: A) number of owned entries B) number of entries stored locally in memory C) number of entries stored in each local cache store D) number of entries stored in each shared cache store E) total number of entries in cache So far, we can get B via withFlags(SKIP_CACHE_LOAD).size() (passivation ? B : 0) + firstNonZero(C, D) via size() E via distributed iterators / MR A via data container iteration + distribution manager query, but only without cache store C or D through getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() I think that it would go along with users' expectations if size() returned E and for the rest we should have special methods on AdvancedCache. That would of course change the meaning of size(), but I'd say that finally to something that has firm meaning. WDYT? Radim -- Radim Vansa JBoss DataGrid QA From dereed at redhat.com Fri Oct 3 13:38:50 2014 From: dereed at redhat.com (Dennis Reed) Date: Fri, 03 Oct 2014 12:38:50 -0500 Subject: [infinispan-dev] About size() In-Reply-To: <542E5E92.7060504@redhat.com> References: <542E5E92.7060504@redhat.com> Message-ID: <542EDF2A.7080807@redhat.com> Since size() is defined by the ConcurrentMap interface, it already has a precisely defined meaning. The only "correct" implementation is E. The current non-correct implementation was just because it's expensive to calculate correctly. I'm not sure the current impl is really that useful for anything. -Dennis On 10/03/2014 03:30 AM, Radim Vansa wrote: > Hi, > > recently we had a discussion about what size() returns, but I've > realized there are more things that users would like to know. My > question is whether you think that they would really appreciate it, or > whether it's just my QA point of view where I sometimes compute the > 'checksums' of cache to see if I didn't lost anything. > > There are those sizes: > A) number of owned entries > B) number of entries stored locally in memory > C) number of entries stored in each local cache store > D) number of entries stored in each shared cache store > E) total number of entries in cache > > So far, we can get > B via withFlags(SKIP_CACHE_LOAD).size() > (passivation ? B : 0) + firstNonZero(C, D) via size() > E via distributed iterators / MR > A via data container iteration + distribution manager query, but only > without cache store > C or D through > getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() > > I think that it would go along with users' expectations if size() > returned E and for the rest we should have special methods on > AdvancedCache. That would of course change the meaning of size(), but > I'd say that finally to something that has firm meaning. > > WDYT? > > Radim > From radhamohanmaheshwari at gmail.com Fri Oct 3 15:36:07 2014 From: radhamohanmaheshwari at gmail.com (Radha Mohan Maheshwari) Date: Sat, 4 Oct 2014 01:06:07 +0530 Subject: [infinispan-dev] Configure named cache in remote infinispan 6.0.2 cluster In-Reply-To: <54196237.2060906@redhat.com> References: <54196237.2060906@redhat.com> Message-ID: Hi All , I am getting exception while enabling jmx in infinispan 6 server WARNING: Failed to load the specified log manager class org.jboss.logmanager.LogManager Oct 04, 2014 1:01:36 AM org.jboss.msc.service.ServiceLogger_$logger greeting INFO: JBoss MSC version 1.0.4.GA Oct 04, 2014 1:01:36 AM org.jboss.as.server.ApplicationServerService start INFO: JBAS015899: JBoss Infinispan Server 6.0.2.Final (AS 7.2.0.Final) starting Oct 04, 2014 1:01:38 AM org.jboss.as.controller.AbstractOperationContext executeStep ERROR: JBAS014612: Operation ("parallel-extension-add") failed - address: ([]) java.lang.RuntimeException: JBAS014670: Failed initializing module org.jboss.as.logging at org.jboss.as.controller.extension.ParallelExtensionAddHandler$1.execute(ParallelExtensionAddHandler.java:99) at org.jboss.as.controller.AbstractOperationContext.executeStep(AbstractOperationContext.java:440) at org.jboss.as.controller.AbstractOperationContext.doCompleteStep(AbstractOperationContext.java:322) at org.jboss.as.controller.AbstractOperationContext.completeStepInternal(AbstractOperationContext.java:229) at org.jboss.as.controller.AbstractOperationContext.executeOperation(AbstractOperationContext.java:224) at org.jboss.as.controller.ModelControllerImpl.boot(ModelControllerImpl.java:172) at org.jboss.as.controller.AbstractControllerService.boot(AbstractControllerService.java:225) at org.jboss.as.server.ServerService.boot(ServerService.java:333) at org.jboss.as.server.ServerService.boot(ServerService.java:308) at org.jboss.as.controller.AbstractControllerService$1.run(AbstractControllerService.java:188) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: JBAS011592: The logging subsystem requires the log manager to be org.jboss.logmanager.LogManager. The subsystem has not be initialized and cannot be used. To use JBoss Log Manager you must add the system property "java.util.logging.manager" and set it to "org.jboss.logmanager.LogManager" at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.jboss.as.controller.extension.ParallelExtensionAddHandler$1.execute(ParallelExtensionAddHandler.java:91) ... 10 more Caused by: java.lang.IllegalStateException: JBAS011592: The logging subsystem requires the log manager to be org.jboss.logmanager.LogManager. The subsystem has not be initialized and cannot be used. To use JBoss Log Manager you must add the system property "java.util.logging.manager" and set it to "org.jboss.logmanager.LogManager" at org.jboss.as.logging.LoggingExtension.initialize(LoggingExtension.java:103) at org.jboss.as.controller.extension.ExtensionAddHandler.initializeExtension(ExtensionAddHandler.java:97) at org.jboss.as.controller.extension.ParallelExtensionAddHandler$ExtensionInitializeTask.call(ParallelExtensionAddHandler.java:127) at org.jboss.as.controller.extension.ParallelExtensionAddHandler$ExtensionInitializeTask.call(ParallelExtensionAddHandler.java:113) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) at org.jboss.threads.JBossThread.run(JBossThread.java:122) Radha Mohan Maheshwari -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20141004/fe7af390/attachment.html From sanne at infinispan.org Mon Oct 6 06:57:36 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Mon, 6 Oct 2014 11:57:36 +0100 Subject: [infinispan-dev] About size() In-Reply-To: <542EDF2A.7080807@redhat.com> References: <542E5E92.7060504@redhat.com> <542EDF2A.7080807@redhat.com> Message-ID: On 3 October 2014 18:38, Dennis Reed wrote: > Since size() is defined by the ConcurrentMap interface, it already has a > precisely defined meaning. The only "correct" implementation is E. +1 > The current non-correct implementation was just because it's expensive > to calculate correctly. I'm not sure the current impl is really that > useful for anything. +1 And not just size() but many others from ConcurrentMap. The question is if we should drop the interface and all the methods which aren't efficiently implementable, or fix all those methods. In the past I loved that I could inject "Infinispan superpowers" into an application making extensive use of Map and ConcurrentMap without changes, but that has been deceiving and required great care such as verifying that these features would not be used anywhere in the code. I ended up wrapping the Cache implementation in a custom adapter which would also implement ConcurrentMap but would throw a RuntimeException if any of the "unallowed" methods was called, at least I would detect violations safely. I still think that for the time being - until a better solution is planned - we should throw exceptions.. alas that's an old conversation and it was never done. Sanne > > -Dennis > > On 10/03/2014 03:30 AM, Radim Vansa wrote: >> Hi, >> >> recently we had a discussion about what size() returns, but I've >> realized there are more things that users would like to know. My >> question is whether you think that they would really appreciate it, or >> whether it's just my QA point of view where I sometimes compute the >> 'checksums' of cache to see if I didn't lost anything. >> >> There are those sizes: >> A) number of owned entries >> B) number of entries stored locally in memory >> C) number of entries stored in each local cache store >> D) number of entries stored in each shared cache store >> E) total number of entries in cache >> >> So far, we can get >> B via withFlags(SKIP_CACHE_LOAD).size() >> (passivation ? B : 0) + firstNonZero(C, D) via size() >> E via distributed iterators / MR >> A via data container iteration + distribution manager query, but only >> without cache store >> C or D through >> getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() >> >> I think that it would go along with users' expectations if size() >> returned E and for the rest we should have special methods on >> AdvancedCache. That would of course change the meaning of size(), but >> I'd say that finally to something that has firm meaning. >> >> WDYT? >> >> Radim >> > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From ttarrant at redhat.com Mon Oct 6 07:44:40 2014 From: ttarrant at redhat.com (Tristan Tarrant) Date: Mon, 06 Oct 2014 13:44:40 +0200 Subject: [infinispan-dev] About size() In-Reply-To: References: <542E5E92.7060504@redhat.com> <542EDF2A.7080807@redhat.com> Message-ID: <543280A8.5040109@redhat.com> I think we should provide correct implementations of size() (and others) and provide shortcut implementations using our usual Flag API (e.g. SKIP_REMOTE_LOOKUP). Tristan On 06/10/14 12:57, Sanne Grinovero wrote: > On 3 October 2014 18:38, Dennis Reed wrote: >> Since size() is defined by the ConcurrentMap interface, it already has a >> precisely defined meaning. The only "correct" implementation is E. > +1 > >> The current non-correct implementation was just because it's expensive >> to calculate correctly. I'm not sure the current impl is really that >> useful for anything. > +1 > > And not just size() but many others from ConcurrentMap. > The question is if we should drop the interface and all the methods > which aren't efficiently implementable, or fix all those methods. > > In the past I loved that I could inject "Infinispan superpowers" into > an application making extensive use of Map and ConcurrentMap without > changes, but that has been deceiving and required great care such as > verifying that these features would not be used anywhere in the code. > I ended up wrapping the Cache implementation in a custom adapter which > would also implement ConcurrentMap but would throw a RuntimeException > if any of the "unallowed" methods was called, at least I would detect > violations safely. > > I still think that for the time being - until a better solution is > planned - we should throw exceptions.. alas that's an old conversation > and it was never done. > > Sanne > > >> -Dennis >> >> On 10/03/2014 03:30 AM, Radim Vansa wrote: >>> Hi, >>> >>> recently we had a discussion about what size() returns, but I've >>> realized there are more things that users would like to know. My >>> question is whether you think that they would really appreciate it, or >>> whether it's just my QA point of view where I sometimes compute the >>> 'checksums' of cache to see if I didn't lost anything. >>> >>> There are those sizes: >>> A) number of owned entries >>> B) number of entries stored locally in memory >>> C) number of entries stored in each local cache store >>> D) number of entries stored in each shared cache store >>> E) total number of entries in cache >>> >>> So far, we can get >>> B via withFlags(SKIP_CACHE_LOAD).size() >>> (passivation ? B : 0) + firstNonZero(C, D) via size() >>> E via distributed iterators / MR >>> A via data container iteration + distribution manager query, but only >>> without cache store >>> C or D through >>> getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() >>> >>> I think that it would go along with users' expectations if size() >>> returned E and for the rest we should have special methods on >>> AdvancedCache. That would of course change the meaning of size(), but >>> I'd say that finally to something that has firm meaning. >>> >>> WDYT? >>> >>> Radim >>> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > From sanne at infinispan.org Mon Oct 6 07:57:29 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Mon, 6 Oct 2014 12:57:29 +0100 Subject: [infinispan-dev] About size() In-Reply-To: <543280A8.5040109@redhat.com> References: <542E5E92.7060504@redhat.com> <542EDF2A.7080807@redhat.com> <543280A8.5040109@redhat.com> Message-ID: On 6 October 2014 12:44, Tristan Tarrant wrote: > I think we should provide correct implementations of size() (and others) > and provide shortcut implementations using our usual Flag API (e.g. > SKIP_REMOTE_LOOKUP). Right that would be very nice. Same for CacheStore interaction: all cachestores should be included unless skipped explicitly. Sanne > > Tristan > > On 06/10/14 12:57, Sanne Grinovero wrote: >> On 3 October 2014 18:38, Dennis Reed wrote: >>> Since size() is defined by the ConcurrentMap interface, it already has a >>> precisely defined meaning. The only "correct" implementation is E. >> +1 >> >>> The current non-correct implementation was just because it's expensive >>> to calculate correctly. I'm not sure the current impl is really that >>> useful for anything. >> +1 >> >> And not just size() but many others from ConcurrentMap. >> The question is if we should drop the interface and all the methods >> which aren't efficiently implementable, or fix all those methods. >> >> In the past I loved that I could inject "Infinispan superpowers" into >> an application making extensive use of Map and ConcurrentMap without >> changes, but that has been deceiving and required great care such as >> verifying that these features would not be used anywhere in the code. >> I ended up wrapping the Cache implementation in a custom adapter which >> would also implement ConcurrentMap but would throw a RuntimeException >> if any of the "unallowed" methods was called, at least I would detect >> violations safely. >> >> I still think that for the time being - until a better solution is >> planned - we should throw exceptions.. alas that's an old conversation >> and it was never done. >> >> Sanne >> >> >>> -Dennis >>> >>> On 10/03/2014 03:30 AM, Radim Vansa wrote: >>>> Hi, >>>> >>>> recently we had a discussion about what size() returns, but I've >>>> realized there are more things that users would like to know. My >>>> question is whether you think that they would really appreciate it, or >>>> whether it's just my QA point of view where I sometimes compute the >>>> 'checksums' of cache to see if I didn't lost anything. >>>> >>>> There are those sizes: >>>> A) number of owned entries >>>> B) number of entries stored locally in memory >>>> C) number of entries stored in each local cache store >>>> D) number of entries stored in each shared cache store >>>> E) total number of entries in cache >>>> >>>> So far, we can get >>>> B via withFlags(SKIP_CACHE_LOAD).size() >>>> (passivation ? B : 0) + firstNonZero(C, D) via size() >>>> E via distributed iterators / MR >>>> A via data container iteration + distribution manager query, but only >>>> without cache store >>>> C or D through >>>> getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() >>>> >>>> I think that it would go along with users' expectations if size() >>>> returned E and for the rest we should have special methods on >>>> AdvancedCache. That would of course change the meaning of size(), but >>>> I'd say that finally to something that has firm meaning. >>>> >>>> WDYT? >>>> >>>> Radim >>>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From ttarrant at redhat.com Tue Oct 7 03:47:11 2014 From: ttarrant at redhat.com (Tristan Tarrant) Date: Tue, 07 Oct 2014 09:47:11 +0200 Subject: [infinispan-dev] Weekly Infinispan IRC meeting 2014-10-06 Message-ID: <54339A7F.7080201@redhat.com> Get the minutes from here: http://transcripts.jboss.org/meeting/irc.freenode.org/infinispan/2014/infinispan.2014-10-06-14.02.log.html From ttarrant at redhat.com Tue Oct 7 03:48:30 2014 From: ttarrant at redhat.com (Tristan Tarrant) Date: Tue, 07 Oct 2014 09:48:30 +0200 Subject: [infinispan-dev] Weekly Infinispan IRC meeting 2014-09-29 Message-ID: <54339ACE.6090805@redhat.com> I forgot to send this last week :) Get the minutes from here: http://transcripts.jboss.org/meeting/irc.freenode.org/infinispan/2014/infinispan.2014-09-29-14.02.log.html Tristan From isavin at redhat.com Tue Oct 7 04:06:30 2014 From: isavin at redhat.com (Ion Savin) Date: Tue, 07 Oct 2014 11:06:30 +0300 Subject: [infinispan-dev] GSoC 2015 Message-ID: <54339F06.2020107@redhat.com> http://google-opensource.blogspot.ro/2014/10/google-summer-of-code-2015-and-google.html http://www.google-melange.com/gsoc/events/google/gsoc2015 From pedro at infinispan.org Tue Oct 7 04:09:17 2014 From: pedro at infinispan.org (Pedro Ruivo) Date: Tue, 07 Oct 2014 11:09:17 +0300 Subject: [infinispan-dev] Weekly Infinispan IRC meeting 2014-10-06 In-Reply-To: <54339A7F.7080201@redhat.com> References: <54339A7F.7080201@redhat.com> Message-ID: <54339FAD.20502@infinispan.org> My update: Last week: * I worked in Cross-Site state transfer: applying last comments and finally got integrated (it will be available next release). * Also, it was backported the code to product. * I started adding the Cross-Site state transfer configuration to the server mode. * Review and integrate pull requests. This week: * LEADS meeting in Crete. Cheers, Pedro On 10/07/2014 10:47 AM, Tristan Tarrant wrote: > Get the minutes from here: > > http://transcripts.jboss.org/meeting/irc.freenode.org/infinispan/2014/infinispan.2014-10-06-14.02.log.html > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > From isavin at redhat.com Tue Oct 7 04:16:23 2014 From: isavin at redhat.com (Ion Savin) Date: Tue, 07 Oct 2014 11:16:23 +0300 Subject: [infinispan-dev] Weekly Infinispan IRC meeting 2014-10-06 In-Reply-To: <54339A7F.7080201@redhat.com> References: <54339A7F.7080201@redhat.com> Message-ID: <5433A157.1090106@redhat.com> Last week: * HRCPP-174 MSI installer not working on WIN32 platforms * cleanup + OSGi tests for https://github.com/infinispan/infinispan/pull/2640 * product work * integrated the uberjar fixes This week: * finish tests and integrate PR #2640 * ISPN-3836 TCCL socket leak * HRCPP-173 The HotRod client should support a separate CH for each cache -- Ion Savin From ttarrant at redhat.com Tue Oct 7 05:28:03 2014 From: ttarrant at redhat.com (Tristan Tarrant) Date: Tue, 07 Oct 2014 11:28:03 +0200 Subject: [infinispan-dev] Feature branches on Infinispan GitHub repo Message-ID: <5433B223.1000304@redhat.com> Hi guys, since Vladimir and myself are starting work on the server management console task (ISPN-4800), I have created the feature branch to which pull requests will be issued directly on the Infinispan GitHub repository. https://github.com/infinispan/infinispan/tree/ISPN-4800/management_ui So when you see that branch appear when you pull from "origin", know that it wasn't pushed there by mistake :) Tristan From rvansa at redhat.com Tue Oct 7 07:32:04 2014 From: rvansa at redhat.com (Radim Vansa) Date: Tue, 07 Oct 2014 13:32:04 +0200 Subject: [infinispan-dev] About size() In-Reply-To: References: <542E5E92.7060504@redhat.com> <542EDF2A.7080807@redhat.com> <543280A8.5040109@redhat.com> Message-ID: <5433CF34.8010209@redhat.com> If you have one local and one shared cache store, how should the command behave? a) distexec/MR sum of cache.withFlags(SKIP_REMOTE_LOOKUP, SKIP_BACKUP_ENTRIES).size() from all nodes? (note that there's no SKIP_BACKUP_ENTRIES flag right now), where this method returns localStore.size() for first non-shared cache store + passivation ? dataContainer.size(SKIP_BACKUP_ENTRIES) : 0) b) distexec/MR sum of sharedStore.size() + passivation ? sum of dataContainer.size(SKIP_BACKUP_ENTRIES) : 0 c) MR that would count the entries d) wrapper on distributed entry iteration with converters set to return 0-sized entries And what about nodes with different configuration? Radim On 10/06/2014 01:57 PM, Sanne Grinovero wrote: > On 6 October 2014 12:44, Tristan Tarrant wrote: >> I think we should provide correct implementations of size() (and others) >> and provide shortcut implementations using our usual Flag API (e.g. >> SKIP_REMOTE_LOOKUP). > Right that would be very nice. Same for CacheStore interaction: all > cachestores should be included unless skipped explicitly. > > Sanne > >> Tristan >> >> On 06/10/14 12:57, Sanne Grinovero wrote: >>> On 3 October 2014 18:38, Dennis Reed wrote: >>>> Since size() is defined by the ConcurrentMap interface, it already has a >>>> precisely defined meaning. The only "correct" implementation is E. >>> +1 >>> >>>> The current non-correct implementation was just because it's expensive >>>> to calculate correctly. I'm not sure the current impl is really that >>>> useful for anything. >>> +1 >>> >>> And not just size() but many others from ConcurrentMap. >>> The question is if we should drop the interface and all the methods >>> which aren't efficiently implementable, or fix all those methods. >>> >>> In the past I loved that I could inject "Infinispan superpowers" into >>> an application making extensive use of Map and ConcurrentMap without >>> changes, but that has been deceiving and required great care such as >>> verifying that these features would not be used anywhere in the code. >>> I ended up wrapping the Cache implementation in a custom adapter which >>> would also implement ConcurrentMap but would throw a RuntimeException >>> if any of the "unallowed" methods was called, at least I would detect >>> violations safely. >>> >>> I still think that for the time being - until a better solution is >>> planned - we should throw exceptions.. alas that's an old conversation >>> and it was never done. >>> >>> Sanne >>> >>> >>>> -Dennis >>>> >>>> On 10/03/2014 03:30 AM, Radim Vansa wrote: >>>>> Hi, >>>>> >>>>> recently we had a discussion about what size() returns, but I've >>>>> realized there are more things that users would like to know. My >>>>> question is whether you think that they would really appreciate it, or >>>>> whether it's just my QA point of view where I sometimes compute the >>>>> 'checksums' of cache to see if I didn't lost anything. >>>>> >>>>> There are those sizes: >>>>> A) number of owned entries >>>>> B) number of entries stored locally in memory >>>>> C) number of entries stored in each local cache store >>>>> D) number of entries stored in each shared cache store >>>>> E) total number of entries in cache >>>>> >>>>> So far, we can get >>>>> B via withFlags(SKIP_CACHE_LOAD).size() >>>>> (passivation ? B : 0) + firstNonZero(C, D) via size() >>>>> E via distributed iterators / MR >>>>> A via data container iteration + distribution manager query, but only >>>>> without cache store >>>>> C or D through >>>>> getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() >>>>> >>>>> I think that it would go along with users' expectations if size() >>>>> returned E and for the rest we should have special methods on >>>>> AdvancedCache. That would of course change the meaning of size(), but >>>>> I'd say that finally to something that has firm meaning. >>>>> >>>>> WDYT? >>>>> >>>>> Radim >>>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Radim Vansa JBoss DataGrid QA From mudokonman at gmail.com Tue Oct 7 08:21:02 2014 From: mudokonman at gmail.com (William Burns) Date: Tue, 7 Oct 2014 08:21:02 -0400 Subject: [infinispan-dev] About size() In-Reply-To: <5433CF34.8010209@redhat.com> References: <542E5E92.7060504@redhat.com> <542EDF2A.7080807@redhat.com> <543280A8.5040109@redhat.com> <5433CF34.8010209@redhat.com> Message-ID: On Tue, Oct 7, 2014 at 7:32 AM, Radim Vansa wrote: > If you have one local and one shared cache store, how should the command > behave? > > a) distexec/MR sum of cache.withFlags(SKIP_REMOTE_LOOKUP, > SKIP_BACKUP_ENTRIES).size() from all nodes? (note that there's no > SKIP_BACKUP_ENTRIES flag right now), where this method returns > localStore.size() for first non-shared cache store + passivation ? > dataContainer.size(SKIP_BACKUP_ENTRIES) : 0) Calling the size method in either distexec or MR will give you inflated numbers as you need to pay attention to the numOwners to get a proper count. > b) distexec/MR sum of sharedStore.size() + passivation ? sum of > dataContainer.size(SKIP_BACKUP_ENTRIES) : 0 Calling the size on a shared cache actually should work somewhat well (assuming all entries are stored in the shared cache). The problem is if passivation is enabled as you point out because you also have to check the data container which means you can also have an issue with concurrent activations and passivations (which you can't verify properly in either case without knowing the keys). > c) MR that would count the entries This is the only reliable way to do this with MR. And unfortunately if a rehash occurs I am not sure if you would get inconsistent numbers or an Exception. In the latter at least you should be able to make sure that you have the proper number when it does return without exception. I can't say how it works with multiple loaders though, my guess is that it may process the entry more than once so it depends on if your mapper is smart enough to realize it. > d) wrapper on distributed entry iteration with converters set to return > 0-sized entries Entry iterator can't return 0 sized entries (just the values). The keys are required to make sure that the count is correct and also to ensure that if a rehash happens in the middle it can properly continue to operate without having to start over. Entry iterator should work properly irrespective of the number of stores/loaders that are configured, since it keep track of already seen keys (so duplicates are ignored). > > And what about nodes with different configuration? Hard to know without knowing what the differences are. > > Radim > > On 10/06/2014 01:57 PM, Sanne Grinovero wrote: >> On 6 October 2014 12:44, Tristan Tarrant wrote: >>> I think we should provide correct implementations of size() (and others) >>> and provide shortcut implementations using our usual Flag API (e.g. >>> SKIP_REMOTE_LOOKUP). >> Right that would be very nice. Same for CacheStore interaction: all >> cachestores should be included unless skipped explicitly. >> >> Sanne >> >>> Tristan >>> >>> On 06/10/14 12:57, Sanne Grinovero wrote: >>>> On 3 October 2014 18:38, Dennis Reed wrote: >>>>> Since size() is defined by the ConcurrentMap interface, it already has a >>>>> precisely defined meaning. The only "correct" implementation is E. >>>> +1 This is one of the things I have been wanting to do is actually implement the other Map methods across the entire cache. However to do a lot of these in a memory conscious way they would need to be ran ignoring any ongoing transactions. Actually having this requirement allows these methods to be implemented quite easily especially in conjunction with the EntryIterator. I almost made a PR for it a while back, but it seemed a little zealous to do at the same time and it didn't seem that people were pushing for it very hard (maybe that was a wrong assumption). Also I wasn't quite sure the transactional part not being functional anymore would be a deterrent. >>>> >>>>> The current non-correct implementation was just because it's expensive >>>>> to calculate correctly. I'm not sure the current impl is really that >>>>> useful for anything. >>>> +1 >>>> >>>> And not just size() but many others from ConcurrentMap. >>>> The question is if we should drop the interface and all the methods >>>> which aren't efficiently implementable, or fix all those methods. >>>> >>>> In the past I loved that I could inject "Infinispan superpowers" into >>>> an application making extensive use of Map and ConcurrentMap without >>>> changes, but that has been deceiving and required great care such as >>>> verifying that these features would not be used anywhere in the code. >>>> I ended up wrapping the Cache implementation in a custom adapter which >>>> would also implement ConcurrentMap but would throw a RuntimeException >>>> if any of the "unallowed" methods was called, at least I would detect >>>> violations safely. >>>> >>>> I still think that for the time being - until a better solution is >>>> planned - we should throw exceptions.. alas that's an old conversation >>>> and it was never done. >>>> >>>> Sanne >>>> >>>> >>>>> -Dennis >>>>> >>>>> On 10/03/2014 03:30 AM, Radim Vansa wrote: >>>>>> Hi, >>>>>> >>>>>> recently we had a discussion about what size() returns, but I've >>>>>> realized there are more things that users would like to know. My >>>>>> question is whether you think that they would really appreciate it, or >>>>>> whether it's just my QA point of view where I sometimes compute the >>>>>> 'checksums' of cache to see if I didn't lost anything. >>>>>> >>>>>> There are those sizes: >>>>>> A) number of owned entries >>>>>> B) number of entries stored locally in memory >>>>>> C) number of entries stored in each local cache store >>>>>> D) number of entries stored in each shared cache store >>>>>> E) total number of entries in cache >>>>>> >>>>>> So far, we can get >>>>>> B via withFlags(SKIP_CACHE_LOAD).size() >>>>>> (passivation ? B : 0) + firstNonZero(C, D) via size() >>>>>> E via distributed iterators / MR >>>>>> A via data container iteration + distribution manager query, but only >>>>>> without cache store >>>>>> C or D through >>>>>> getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() >>>>>> >>>>>> I think that it would go along with users' expectations if size() >>>>>> returned E and for the rest we should have special methods on >>>>>> AdvancedCache. That would of course change the meaning of size(), but >>>>>> I'd say that finally to something that has firm meaning. >>>>>> >>>>>> WDYT? >>>>>> >>>>>> Radim >>>>>> >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev at lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Radim Vansa > JBoss DataGrid QA > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From rvansa at redhat.com Tue Oct 7 08:43:31 2014 From: rvansa at redhat.com (Radim Vansa) Date: Tue, 07 Oct 2014 14:43:31 +0200 Subject: [infinispan-dev] About size() In-Reply-To: References: <542E5E92.7060504@redhat.com> <542EDF2A.7080807@redhat.com> <543280A8.5040109@redhat.com> <5433CF34.8010209@redhat.com> Message-ID: <5433DFF3.2060100@redhat.com> On 10/07/2014 02:21 PM, William Burns wrote: > On Tue, Oct 7, 2014 at 7:32 AM, Radim Vansa wrote: >> If you have one local and one shared cache store, how should the command >> behave? >> >> a) distexec/MR sum of cache.withFlags(SKIP_REMOTE_LOOKUP, >> SKIP_BACKUP_ENTRIES).size() from all nodes? (note that there's no >> SKIP_BACKUP_ENTRIES flag right now), where this method returns >> localStore.size() for first non-shared cache store + passivation ? >> dataContainer.size(SKIP_BACKUP_ENTRIES) : 0) > Calling the size method in either distexec or MR will give you > inflated numbers as you need to pay attention to the numOwners to get > a proper count. That's what I meant by the SKIP_BACKUP_ENTRIES - dataContainer should be able to report only primary-owned entries, or we have to iterate and apply the filtering outside. > >> b) distexec/MR sum of sharedStore.size() + passivation ? sum of >> dataContainer.size(SKIP_BACKUP_ENTRIES) : 0 > Calling the size on a shared cache actually should work somewhat well > (assuming all entries are stored in the shared cache). The problem is > if passivation is enabled as you point out because you also have to > check the data container which means you can also have an issue with > concurrent activations and passivations (which you can't verify > properly in either case without knowing the keys). > >> c) MR that would count the entries > This is the only reliable way to do this with MR. And unfortunately > if a rehash occurs I am not sure if you would get inconsistent numbers > or an Exception. In the latter at least you should be able to make > sure that you have the proper number when it does return without > exception. I can't say how it works with multiple loaders though, my > guess is that it may process the entry more than once so it depends on > if your mapper is smart enough to realize it. I don't think that reporting incorrect size is *that* harmful - even ConcurrentMap interface says that it's just a wild guess and when things are changing, you can't rely on that. > >> d) wrapper on distributed entry iteration with converters set to return >> 0-sized entries > Entry iterator can't return 0 sized entries (just the values). The > keys are required to make sure that the count is correct and also to > ensure that if a rehash happens in the middle it can properly continue > to operate without having to start over. Entry iterator should work > properly irrespective of the number of stores/loaders that are > configured, since it keep track of already seen keys (so duplicates > are ignored). Ok, I was simplifying that a bit. And by the way, I don't really like the fact that for distributed entry iteration you need to be able to keep all keys from one segment at one moment in memory. But fine - distributed entry iteration is probably not the right way. > > >> And what about nodes with different configuration? > Hard to know without knowing what the differences are. I had in my mind different loaders and passivation configuration (e.g. some node could use shared store and some don't - do we want to handle such obscure configs? Can we design that without the need to have complicated decision trees what to include and what not?). Radim > >> Radim >> >> On 10/06/2014 01:57 PM, Sanne Grinovero wrote: >>> On 6 October 2014 12:44, Tristan Tarrant wrote: >>>> I think we should provide correct implementations of size() (and others) >>>> and provide shortcut implementations using our usual Flag API (e.g. >>>> SKIP_REMOTE_LOOKUP). >>> Right that would be very nice. Same for CacheStore interaction: all >>> cachestores should be included unless skipped explicitly. >>> >>> Sanne >>> >>>> Tristan >>>> >>>> On 06/10/14 12:57, Sanne Grinovero wrote: >>>>> On 3 October 2014 18:38, Dennis Reed wrote: >>>>>> Since size() is defined by the ConcurrentMap interface, it already has a >>>>>> precisely defined meaning. The only "correct" implementation is E. >>>>> +1 > This is one of the things I have been wanting to do is actually > implement the other Map methods across the entire cache. However to > do a lot of these in a memory conscious way they would need to be ran > ignoring any ongoing transactions. Actually having this requirement > allows these methods to be implemented quite easily especially in > conjunction with the EntryIterator. I almost made a PR for it a while > back, but it seemed a little zealous to do at the same time and it > didn't seem that people were pushing for it very hard (maybe that was > a wrong assumption). Also I wasn't quite sure the transactional part > not being functional anymore would be a deterrent. > >>>>>> The current non-correct implementation was just because it's expensive >>>>>> to calculate correctly. I'm not sure the current impl is really that >>>>>> useful for anything. >>>>> +1 >>>>> >>>>> And not just size() but many others from ConcurrentMap. >>>>> The question is if we should drop the interface and all the methods >>>>> which aren't efficiently implementable, or fix all those methods. >>>>> >>>>> In the past I loved that I could inject "Infinispan superpowers" into >>>>> an application making extensive use of Map and ConcurrentMap without >>>>> changes, but that has been deceiving and required great care such as >>>>> verifying that these features would not be used anywhere in the code. >>>>> I ended up wrapping the Cache implementation in a custom adapter which >>>>> would also implement ConcurrentMap but would throw a RuntimeException >>>>> if any of the "unallowed" methods was called, at least I would detect >>>>> violations safely. >>>>> >>>>> I still think that for the time being - until a better solution is >>>>> planned - we should throw exceptions.. alas that's an old conversation >>>>> and it was never done. >>>>> >>>>> Sanne >>>>> >>>>> >>>>>> -Dennis >>>>>> >>>>>> On 10/03/2014 03:30 AM, Radim Vansa wrote: >>>>>>> Hi, >>>>>>> >>>>>>> recently we had a discussion about what size() returns, but I've >>>>>>> realized there are more things that users would like to know. My >>>>>>> question is whether you think that they would really appreciate it, or >>>>>>> whether it's just my QA point of view where I sometimes compute the >>>>>>> 'checksums' of cache to see if I didn't lost anything. >>>>>>> >>>>>>> There are those sizes: >>>>>>> A) number of owned entries >>>>>>> B) number of entries stored locally in memory >>>>>>> C) number of entries stored in each local cache store >>>>>>> D) number of entries stored in each shared cache store >>>>>>> E) total number of entries in cache >>>>>>> >>>>>>> So far, we can get >>>>>>> B via withFlags(SKIP_CACHE_LOAD).size() >>>>>>> (passivation ? B : 0) + firstNonZero(C, D) via size() >>>>>>> E via distributed iterators / MR >>>>>>> A via data container iteration + distribution manager query, but only >>>>>>> without cache store >>>>>>> C or D through >>>>>>> getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() >>>>>>> >>>>>>> I think that it would go along with users' expectations if size() >>>>>>> returned E and for the rest we should have special methods on >>>>>>> AdvancedCache. That would of course change the meaning of size(), but >>>>>>> I'd say that finally to something that has firm meaning. >>>>>>> >>>>>>> WDYT? >>>>>>> >>>>>>> Radim >>>>>>> >>>>>> _______________________________________________ >>>>>> infinispan-dev mailing list >>>>>> infinispan-dev at lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev at lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>> >>>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> -- >> Radim Vansa >> JBoss DataGrid QA >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Radim Vansa JBoss DataGrid QA From sanne at infinispan.org Tue Oct 7 09:16:06 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Tue, 7 Oct 2014 14:16:06 +0100 Subject: [infinispan-dev] About size() In-Reply-To: <5433DFF3.2060100@redhat.com> References: <542E5E92.7060504@redhat.com> <542EDF2A.7080807@redhat.com> <543280A8.5040109@redhat.com> <5433CF34.8010209@redhat.com> <5433DFF3.2060100@redhat.com> Message-ID: Considering all these very valid concerns I'd return on my proposal for throwing runtime exceptions via an (optional) decorator. I'd have such a decorator in place by default, so that we make it very clear that - while you can remove it - the behaviour of such methods is "unusual" and that a user would be better off avoiding them unless he's into the advanced stuff. As said before, that worked very well for me in the past and it was great that - even while I did know - I had a safety guard to highlight unintended refactorings by others on my team who didn't know the black art of using Infinispan correctly. Sanne On 7 October 2014 13:43, Radim Vansa wrote: > On 10/07/2014 02:21 PM, William Burns wrote: >> On Tue, Oct 7, 2014 at 7:32 AM, Radim Vansa wrote: >>> If you have one local and one shared cache store, how should the command >>> behave? >>> >>> a) distexec/MR sum of cache.withFlags(SKIP_REMOTE_LOOKUP, >>> SKIP_BACKUP_ENTRIES).size() from all nodes? (note that there's no >>> SKIP_BACKUP_ENTRIES flag right now), where this method returns >>> localStore.size() for first non-shared cache store + passivation ? >>> dataContainer.size(SKIP_BACKUP_ENTRIES) : 0) >> Calling the size method in either distexec or MR will give you >> inflated numbers as you need to pay attention to the numOwners to get >> a proper count. > > That's what I meant by the SKIP_BACKUP_ENTRIES - dataContainer should be > able to report only primary-owned entries, or we have to iterate and > apply the filtering outside. > >> >>> b) distexec/MR sum of sharedStore.size() + passivation ? sum of >>> dataContainer.size(SKIP_BACKUP_ENTRIES) : 0 >> Calling the size on a shared cache actually should work somewhat well >> (assuming all entries are stored in the shared cache). The problem is >> if passivation is enabled as you point out because you also have to >> check the data container which means you can also have an issue with >> concurrent activations and passivations (which you can't verify >> properly in either case without knowing the keys). >> >>> c) MR that would count the entries >> This is the only reliable way to do this with MR. And unfortunately >> if a rehash occurs I am not sure if you would get inconsistent numbers >> or an Exception. In the latter at least you should be able to make >> sure that you have the proper number when it does return without >> exception. I can't say how it works with multiple loaders though, my >> guess is that it may process the entry more than once so it depends on >> if your mapper is smart enough to realize it. > > I don't think that reporting incorrect size is *that* harmful - even > ConcurrentMap interface says that it's just a wild guess and when things > are changing, you can't rely on that. > >> >>> d) wrapper on distributed entry iteration with converters set to return >>> 0-sized entries >> Entry iterator can't return 0 sized entries (just the values). The >> keys are required to make sure that the count is correct and also to >> ensure that if a rehash happens in the middle it can properly continue >> to operate without having to start over. Entry iterator should work >> properly irrespective of the number of stores/loaders that are >> configured, since it keep track of already seen keys (so duplicates >> are ignored). > > Ok, I was simplifying that a bit. And by the way, I don't really like > the fact that for distributed entry iteration you need to be able to > keep all keys from one segment at one moment in memory. But fine - > distributed entry iteration is probably not the right way. > >> >> >>> And what about nodes with different configuration? >> Hard to know without knowing what the differences are. > > I had in my mind different loaders and passivation configuration (e.g. > some node could use shared store and some don't - do we want to handle > such obscure configs? Can we design that without the need to have > complicated decision trees what to include and what not?). > > Radim > >> >>> Radim >>> >>> On 10/06/2014 01:57 PM, Sanne Grinovero wrote: >>>> On 6 October 2014 12:44, Tristan Tarrant wrote: >>>>> I think we should provide correct implementations of size() (and others) >>>>> and provide shortcut implementations using our usual Flag API (e.g. >>>>> SKIP_REMOTE_LOOKUP). >>>> Right that would be very nice. Same for CacheStore interaction: all >>>> cachestores should be included unless skipped explicitly. >>>> >>>> Sanne >>>> >>>>> Tristan >>>>> >>>>> On 06/10/14 12:57, Sanne Grinovero wrote: >>>>>> On 3 October 2014 18:38, Dennis Reed wrote: >>>>>>> Since size() is defined by the ConcurrentMap interface, it already has a >>>>>>> precisely defined meaning. The only "correct" implementation is E. >>>>>> +1 >> This is one of the things I have been wanting to do is actually >> implement the other Map methods across the entire cache. However to >> do a lot of these in a memory conscious way they would need to be ran >> ignoring any ongoing transactions. Actually having this requirement >> allows these methods to be implemented quite easily especially in >> conjunction with the EntryIterator. I almost made a PR for it a while >> back, but it seemed a little zealous to do at the same time and it >> didn't seem that people were pushing for it very hard (maybe that was >> a wrong assumption). Also I wasn't quite sure the transactional part >> not being functional anymore would be a deterrent. >> >>>>>>> The current non-correct implementation was just because it's expensive >>>>>>> to calculate correctly. I'm not sure the current impl is really that >>>>>>> useful for anything. >>>>>> +1 >>>>>> >>>>>> And not just size() but many others from ConcurrentMap. >>>>>> The question is if we should drop the interface and all the methods >>>>>> which aren't efficiently implementable, or fix all those methods. >>>>>> >>>>>> In the past I loved that I could inject "Infinispan superpowers" into >>>>>> an application making extensive use of Map and ConcurrentMap without >>>>>> changes, but that has been deceiving and required great care such as >>>>>> verifying that these features would not be used anywhere in the code. >>>>>> I ended up wrapping the Cache implementation in a custom adapter which >>>>>> would also implement ConcurrentMap but would throw a RuntimeException >>>>>> if any of the "unallowed" methods was called, at least I would detect >>>>>> violations safely. >>>>>> >>>>>> I still think that for the time being - until a better solution is >>>>>> planned - we should throw exceptions.. alas that's an old conversation >>>>>> and it was never done. >>>>>> >>>>>> Sanne >>>>>> >>>>>> >>>>>>> -Dennis >>>>>>> >>>>>>> On 10/03/2014 03:30 AM, Radim Vansa wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> recently we had a discussion about what size() returns, but I've >>>>>>>> realized there are more things that users would like to know. My >>>>>>>> question is whether you think that they would really appreciate it, or >>>>>>>> whether it's just my QA point of view where I sometimes compute the >>>>>>>> 'checksums' of cache to see if I didn't lost anything. >>>>>>>> >>>>>>>> There are those sizes: >>>>>>>> A) number of owned entries >>>>>>>> B) number of entries stored locally in memory >>>>>>>> C) number of entries stored in each local cache store >>>>>>>> D) number of entries stored in each shared cache store >>>>>>>> E) total number of entries in cache >>>>>>>> >>>>>>>> So far, we can get >>>>>>>> B via withFlags(SKIP_CACHE_LOAD).size() >>>>>>>> (passivation ? B : 0) + firstNonZero(C, D) via size() >>>>>>>> E via distributed iterators / MR >>>>>>>> A via data container iteration + distribution manager query, but only >>>>>>>> without cache store >>>>>>>> C or D through >>>>>>>> getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() >>>>>>>> >>>>>>>> I think that it would go along with users' expectations if size() >>>>>>>> returned E and for the rest we should have special methods on >>>>>>>> AdvancedCache. That would of course change the meaning of size(), but >>>>>>>> I'd say that finally to something that has firm meaning. >>>>>>>> >>>>>>>> WDYT? >>>>>>>> >>>>>>>> Radim >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> infinispan-dev mailing list >>>>>>> infinispan-dev at lists.jboss.org >>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>> _______________________________________________ >>>>>> infinispan-dev mailing list >>>>>> infinispan-dev at lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev at lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> -- >>> Radim Vansa >>> JBoss DataGrid QA >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Radim Vansa > JBoss DataGrid QA > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From mudokonman at gmail.com Tue Oct 7 09:17:54 2014 From: mudokonman at gmail.com (William Burns) Date: Tue, 7 Oct 2014 09:17:54 -0400 Subject: [infinispan-dev] About size() In-Reply-To: <5433DFF3.2060100@redhat.com> References: <542E5E92.7060504@redhat.com> <542EDF2A.7080807@redhat.com> <543280A8.5040109@redhat.com> <5433CF34.8010209@redhat.com> <5433DFF3.2060100@redhat.com> Message-ID: On Tue, Oct 7, 2014 at 8:43 AM, Radim Vansa wrote: > On 10/07/2014 02:21 PM, William Burns wrote: >> On Tue, Oct 7, 2014 at 7:32 AM, Radim Vansa wrote: >>> If you have one local and one shared cache store, how should the command >>> behave? >>> >>> a) distexec/MR sum of cache.withFlags(SKIP_REMOTE_LOOKUP, >>> SKIP_BACKUP_ENTRIES).size() from all nodes? (note that there's no >>> SKIP_BACKUP_ENTRIES flag right now), where this method returns >>> localStore.size() for first non-shared cache store + passivation ? >>> dataContainer.size(SKIP_BACKUP_ENTRIES) : 0) >> Calling the size method in either distexec or MR will give you >> inflated numbers as you need to pay attention to the numOwners to get >> a proper count. > > That's what I meant by the SKIP_BACKUP_ENTRIES - dataContainer should be > able to report only primary-owned entries, or we have to iterate and > apply the filtering outside. If we added this functionality then yes it would be promoted up to MR counting entries status though it would still have issues with rehash. As well as issues with concurrent activations and passivations. > >> >>> b) distexec/MR sum of sharedStore.size() + passivation ? sum of >>> dataContainer.size(SKIP_BACKUP_ENTRIES) : 0 >> Calling the size on a shared cache actually should work somewhat well >> (assuming all entries are stored in the shared cache). The problem is >> if passivation is enabled as you point out because you also have to >> check the data container which means you can also have an issue with >> concurrent activations and passivations (which you can't verify >> properly in either case without knowing the keys). >> >>> c) MR that would count the entries >> This is the only reliable way to do this with MR. And unfortunately >> if a rehash occurs I am not sure if you would get inconsistent numbers >> or an Exception. In the latter at least you should be able to make >> sure that you have the proper number when it does return without >> exception. I can't say how it works with multiple loaders though, my >> guess is that it may process the entry more than once so it depends on >> if your mapper is smart enough to realize it. > > I don't think that reporting incorrect size is *that* harmful - even > ConcurrentMap interface says that it's just a wild guess and when things > are changing, you can't rely on that. ConcurrentMap doesn't say anything about size method actually. ConcurrentHashMap has some verbage about saying that it might not be completely correct under concurrent modification though. It isn't a wild guess really though for ConcurrentHashMap. The worst is that you could count a value that was there but it is now removed or you don't count a value that was recently added. Really the guarantee from CHM is that it counts each individual segment properly for a glimpse of time for that segment, the problem is that each segment could change (since they are counted at different times). But the values missing in ConcurrentHashMap are totally different than losing an entire segment due to a rehash. You could theoretically have a rehash occur right after MR started iterating and see no values for that segment or a very small subset. There is a much larger margin of error in this case for what values are seen and which are not. > >> >>> d) wrapper on distributed entry iteration with converters set to return >>> 0-sized entries >> Entry iterator can't return 0 sized entries (just the values). The >> keys are required to make sure that the count is correct and also to >> ensure that if a rehash happens in the middle it can properly continue >> to operate without having to start over. Entry iterator should work >> properly irrespective of the number of stores/loaders that are >> configured, since it keep track of already seen keys (so duplicates >> are ignored). > > Ok, I was simplifying that a bit. And by the way, I don't really like > the fact that for distributed entry iteration you need to be able to > keep all keys from one segment at one moment in memory. But fine - > distributed entry iteration is probably not the right way. I agree it is annoying to have to keep the keys, but it is one of the few ways to reliably get all the values without losing one. Actually this approach provides a much closer approximation to what ConcurrentHashMap provides for its size implementation, since it can't drop a segment. It is pretty much required to do it this way to do keySet, entrySet, and values where you don't have the luxury of dropping whole swaths of entries like you do with calling size() method (even if the value(s) was there the entire time). > >> >> >>> And what about nodes with different configuration? >> Hard to know without knowing what the differences are. > > I had in my mind different loaders and passivation configuration (e.g. > some node could use shared store and some don't - do we want to handle > such obscure configs? Can we design that without the need to have > complicated decision trees what to include and what not?). Well the last sentence means we have to use MR or Entry Iterator since we can't call size on the shared loader. I would think that it should still work irrespective of the loader configuration (except for MR with multiple loaders). The main issue I can think of is that if everyone isn't using the shared loader that you could have stale values in the loader if you don't always have a node using the shared loader up (assuming purge at startup isn't enabled). > > Radim > >> >>> Radim >>> >>> On 10/06/2014 01:57 PM, Sanne Grinovero wrote: >>>> On 6 October 2014 12:44, Tristan Tarrant wrote: >>>>> I think we should provide correct implementations of size() (and others) >>>>> and provide shortcut implementations using our usual Flag API (e.g. >>>>> SKIP_REMOTE_LOOKUP). >>>> Right that would be very nice. Same for CacheStore interaction: all >>>> cachestores should be included unless skipped explicitly. >>>> >>>> Sanne >>>> >>>>> Tristan >>>>> >>>>> On 06/10/14 12:57, Sanne Grinovero wrote: >>>>>> On 3 October 2014 18:38, Dennis Reed wrote: >>>>>>> Since size() is defined by the ConcurrentMap interface, it already has a >>>>>>> precisely defined meaning. The only "correct" implementation is E. >>>>>> +1 >> This is one of the things I have been wanting to do is actually >> implement the other Map methods across the entire cache. However to >> do a lot of these in a memory conscious way they would need to be ran >> ignoring any ongoing transactions. Actually having this requirement >> allows these methods to be implemented quite easily especially in >> conjunction with the EntryIterator. I almost made a PR for it a while >> back, but it seemed a little zealous to do at the same time and it >> didn't seem that people were pushing for it very hard (maybe that was >> a wrong assumption). Also I wasn't quite sure the transactional part >> not being functional anymore would be a deterrent. >> >>>>>>> The current non-correct implementation was just because it's expensive >>>>>>> to calculate correctly. I'm not sure the current impl is really that >>>>>>> useful for anything. >>>>>> +1 >>>>>> >>>>>> And not just size() but many others from ConcurrentMap. >>>>>> The question is if we should drop the interface and all the methods >>>>>> which aren't efficiently implementable, or fix all those methods. >>>>>> >>>>>> In the past I loved that I could inject "Infinispan superpowers" into >>>>>> an application making extensive use of Map and ConcurrentMap without >>>>>> changes, but that has been deceiving and required great care such as >>>>>> verifying that these features would not be used anywhere in the code. >>>>>> I ended up wrapping the Cache implementation in a custom adapter which >>>>>> would also implement ConcurrentMap but would throw a RuntimeException >>>>>> if any of the "unallowed" methods was called, at least I would detect >>>>>> violations safely. >>>>>> >>>>>> I still think that for the time being - until a better solution is >>>>>> planned - we should throw exceptions.. alas that's an old conversation >>>>>> and it was never done. >>>>>> >>>>>> Sanne >>>>>> >>>>>> >>>>>>> -Dennis >>>>>>> >>>>>>> On 10/03/2014 03:30 AM, Radim Vansa wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> recently we had a discussion about what size() returns, but I've >>>>>>>> realized there are more things that users would like to know. My >>>>>>>> question is whether you think that they would really appreciate it, or >>>>>>>> whether it's just my QA point of view where I sometimes compute the >>>>>>>> 'checksums' of cache to see if I didn't lost anything. >>>>>>>> >>>>>>>> There are those sizes: >>>>>>>> A) number of owned entries >>>>>>>> B) number of entries stored locally in memory >>>>>>>> C) number of entries stored in each local cache store >>>>>>>> D) number of entries stored in each shared cache store >>>>>>>> E) total number of entries in cache >>>>>>>> >>>>>>>> So far, we can get >>>>>>>> B via withFlags(SKIP_CACHE_LOAD).size() >>>>>>>> (passivation ? B : 0) + firstNonZero(C, D) via size() >>>>>>>> E via distributed iterators / MR >>>>>>>> A via data container iteration + distribution manager query, but only >>>>>>>> without cache store >>>>>>>> C or D through >>>>>>>> getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() >>>>>>>> >>>>>>>> I think that it would go along with users' expectations if size() >>>>>>>> returned E and for the rest we should have special methods on >>>>>>>> AdvancedCache. That would of course change the meaning of size(), but >>>>>>>> I'd say that finally to something that has firm meaning. >>>>>>>> >>>>>>>> WDYT? >>>>>>>> >>>>>>>> Radim >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> infinispan-dev mailing list >>>>>>> infinispan-dev at lists.jboss.org >>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>> _______________________________________________ >>>>>> infinispan-dev mailing list >>>>>> infinispan-dev at lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev at lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> -- >>> Radim Vansa >>> JBoss DataGrid QA >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Radim Vansa > JBoss DataGrid QA > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From rvansa at redhat.com Tue Oct 7 09:42:10 2014 From: rvansa at redhat.com (Radim Vansa) Date: Tue, 07 Oct 2014 15:42:10 +0200 Subject: [infinispan-dev] About size() In-Reply-To: References: <542E5E92.7060504@redhat.com> <542EDF2A.7080807@redhat.com> <543280A8.5040109@redhat.com> <5433CF34.8010209@redhat.com> <5433DFF3.2060100@redhat.com> Message-ID: <5433EDB2.8020603@redhat.com> Considering the frequency of "How do I get the number of entries in cache", "How do I get all keys" on all forums, I think that backing to runtime exception would not satisfy the users. On 10/07/2014 03:16 PM, Sanne Grinovero wrote: > Considering all these very valid concerns I'd return on my proposal > for throwing runtime exceptions via an (optional) decorator. > > I'd have such a decorator in place by default, so that we make it very > clear that - while you can remove it - the behaviour of such methods > is "unusual" and that a user would be better off avoiding them unless > he's into the advanced stuff. > > As said before, that worked very well for me in the past and it was > great that - even while I did know - I had a safety guard to highlight > unintended refactorings by others on my team who didn't know the black > art of using Infinispan correctly. > > Sanne > > > > On 7 October 2014 13:43, Radim Vansa wrote: >> On 10/07/2014 02:21 PM, William Burns wrote: >>> On Tue, Oct 7, 2014 at 7:32 AM, Radim Vansa wrote: >>>> If you have one local and one shared cache store, how should the command >>>> behave? >>>> >>>> a) distexec/MR sum of cache.withFlags(SKIP_REMOTE_LOOKUP, >>>> SKIP_BACKUP_ENTRIES).size() from all nodes? (note that there's no >>>> SKIP_BACKUP_ENTRIES flag right now), where this method returns >>>> localStore.size() for first non-shared cache store + passivation ? >>>> dataContainer.size(SKIP_BACKUP_ENTRIES) : 0) >>> Calling the size method in either distexec or MR will give you >>> inflated numbers as you need to pay attention to the numOwners to get >>> a proper count. >> That's what I meant by the SKIP_BACKUP_ENTRIES - dataContainer should be >> able to report only primary-owned entries, or we have to iterate and >> apply the filtering outside. >> >>>> b) distexec/MR sum of sharedStore.size() + passivation ? sum of >>>> dataContainer.size(SKIP_BACKUP_ENTRIES) : 0 >>> Calling the size on a shared cache actually should work somewhat well >>> (assuming all entries are stored in the shared cache). The problem is >>> if passivation is enabled as you point out because you also have to >>> check the data container which means you can also have an issue with >>> concurrent activations and passivations (which you can't verify >>> properly in either case without knowing the keys). >>> >>>> c) MR that would count the entries >>> This is the only reliable way to do this with MR. And unfortunately >>> if a rehash occurs I am not sure if you would get inconsistent numbers >>> or an Exception. In the latter at least you should be able to make >>> sure that you have the proper number when it does return without >>> exception. I can't say how it works with multiple loaders though, my >>> guess is that it may process the entry more than once so it depends on >>> if your mapper is smart enough to realize it. >> I don't think that reporting incorrect size is *that* harmful - even >> ConcurrentMap interface says that it's just a wild guess and when things >> are changing, you can't rely on that. >> >>>> d) wrapper on distributed entry iteration with converters set to return >>>> 0-sized entries >>> Entry iterator can't return 0 sized entries (just the values). The >>> keys are required to make sure that the count is correct and also to >>> ensure that if a rehash happens in the middle it can properly continue >>> to operate without having to start over. Entry iterator should work >>> properly irrespective of the number of stores/loaders that are >>> configured, since it keep track of already seen keys (so duplicates >>> are ignored). >> Ok, I was simplifying that a bit. And by the way, I don't really like >> the fact that for distributed entry iteration you need to be able to >> keep all keys from one segment at one moment in memory. But fine - >> distributed entry iteration is probably not the right way. >> >>> >>>> And what about nodes with different configuration? >>> Hard to know without knowing what the differences are. >> I had in my mind different loaders and passivation configuration (e.g. >> some node could use shared store and some don't - do we want to handle >> such obscure configs? Can we design that without the need to have >> complicated decision trees what to include and what not?). >> >> Radim >> >>>> Radim >>>> >>>> On 10/06/2014 01:57 PM, Sanne Grinovero wrote: >>>>> On 6 October 2014 12:44, Tristan Tarrant wrote: >>>>>> I think we should provide correct implementations of size() (and others) >>>>>> and provide shortcut implementations using our usual Flag API (e.g. >>>>>> SKIP_REMOTE_LOOKUP). >>>>> Right that would be very nice. Same for CacheStore interaction: all >>>>> cachestores should be included unless skipped explicitly. >>>>> >>>>> Sanne >>>>> >>>>>> Tristan >>>>>> >>>>>> On 06/10/14 12:57, Sanne Grinovero wrote: >>>>>>> On 3 October 2014 18:38, Dennis Reed wrote: >>>>>>>> Since size() is defined by the ConcurrentMap interface, it already has a >>>>>>>> precisely defined meaning. The only "correct" implementation is E. >>>>>>> +1 >>> This is one of the things I have been wanting to do is actually >>> implement the other Map methods across the entire cache. However to >>> do a lot of these in a memory conscious way they would need to be ran >>> ignoring any ongoing transactions. Actually having this requirement >>> allows these methods to be implemented quite easily especially in >>> conjunction with the EntryIterator. I almost made a PR for it a while >>> back, but it seemed a little zealous to do at the same time and it >>> didn't seem that people were pushing for it very hard (maybe that was >>> a wrong assumption). Also I wasn't quite sure the transactional part >>> not being functional anymore would be a deterrent. >>> >>>>>>>> The current non-correct implementation was just because it's expensive >>>>>>>> to calculate correctly. I'm not sure the current impl is really that >>>>>>>> useful for anything. >>>>>>> +1 >>>>>>> >>>>>>> And not just size() but many others from ConcurrentMap. >>>>>>> The question is if we should drop the interface and all the methods >>>>>>> which aren't efficiently implementable, or fix all those methods. >>>>>>> >>>>>>> In the past I loved that I could inject "Infinispan superpowers" into >>>>>>> an application making extensive use of Map and ConcurrentMap without >>>>>>> changes, but that has been deceiving and required great care such as >>>>>>> verifying that these features would not be used anywhere in the code. >>>>>>> I ended up wrapping the Cache implementation in a custom adapter which >>>>>>> would also implement ConcurrentMap but would throw a RuntimeException >>>>>>> if any of the "unallowed" methods was called, at least I would detect >>>>>>> violations safely. >>>>>>> >>>>>>> I still think that for the time being - until a better solution is >>>>>>> planned - we should throw exceptions.. alas that's an old conversation >>>>>>> and it was never done. >>>>>>> >>>>>>> Sanne >>>>>>> >>>>>>> >>>>>>>> -Dennis >>>>>>>> >>>>>>>> On 10/03/2014 03:30 AM, Radim Vansa wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> recently we had a discussion about what size() returns, but I've >>>>>>>>> realized there are more things that users would like to know. My >>>>>>>>> question is whether you think that they would really appreciate it, or >>>>>>>>> whether it's just my QA point of view where I sometimes compute the >>>>>>>>> 'checksums' of cache to see if I didn't lost anything. >>>>>>>>> >>>>>>>>> There are those sizes: >>>>>>>>> A) number of owned entries >>>>>>>>> B) number of entries stored locally in memory >>>>>>>>> C) number of entries stored in each local cache store >>>>>>>>> D) number of entries stored in each shared cache store >>>>>>>>> E) total number of entries in cache >>>>>>>>> >>>>>>>>> So far, we can get >>>>>>>>> B via withFlags(SKIP_CACHE_LOAD).size() >>>>>>>>> (passivation ? B : 0) + firstNonZero(C, D) via size() >>>>>>>>> E via distributed iterators / MR >>>>>>>>> A via data container iteration + distribution manager query, but only >>>>>>>>> without cache store >>>>>>>>> C or D through >>>>>>>>> getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() >>>>>>>>> >>>>>>>>> I think that it would go along with users' expectations if size() >>>>>>>>> returned E and for the rest we should have special methods on >>>>>>>>> AdvancedCache. That would of course change the meaning of size(), but >>>>>>>>> I'd say that finally to something that has firm meaning. >>>>>>>>> >>>>>>>>> WDYT? >>>>>>>>> >>>>>>>>> Radim >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> infinispan-dev mailing list >>>>>>>> infinispan-dev at lists.jboss.org >>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>>> _______________________________________________ >>>>>>> infinispan-dev mailing list >>>>>>> infinispan-dev at lists.jboss.org >>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> infinispan-dev mailing list >>>>>> infinispan-dev at lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev at lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> -- >>>> Radim Vansa >>>> JBoss DataGrid QA >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> -- >> Radim Vansa >> JBoss DataGrid QA >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Radim Vansa JBoss DataGrid QA From sanne at infinispan.org Tue Oct 7 10:23:03 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Tue, 7 Oct 2014 15:23:03 +0100 Subject: [infinispan-dev] About size() In-Reply-To: <5433EDB2.8020603@redhat.com> References: <542E5E92.7060504@redhat.com> <542EDF2A.7080807@redhat.com> <543280A8.5040109@redhat.com> <5433CF34.8010209@redhat.com> <5433DFF3.2060100@redhat.com> <5433EDB2.8020603@redhat.com> Message-ID: On 7 October 2014 14:42, Radim Vansa wrote: > Considering the frequency of "How do I get the number of entries in > cache", "How do I get all keys" on all forums, I think that backing to > runtime exception would not satisfy the users. Correct but when they get to ask the right question, that's usually after several hours of debugging and swearing. With an immediate exception, they get immediate advice and hopefully a hint of where to look in the docs for the special methods like statistics, disabling the exception-throwing decorator, or implement their own M/R job with the exact flags they need. From ttarrant at redhat.com Tue Oct 7 10:59:25 2014 From: ttarrant at redhat.com (Tristan Tarrant) Date: Tue, 07 Oct 2014 16:59:25 +0200 Subject: [infinispan-dev] About size() In-Reply-To: References: <542E5E92.7060504@redhat.com> <542EDF2A.7080807@redhat.com> <543280A8.5040109@redhat.com> <5433CF34.8010209@redhat.com> <5433DFF3.2060100@redhat.com> <5433EDB2.8020603@redhat.com> Message-ID: <5433FFCD.10409@redhat.com> I'm not sure idiot-proof API is what we want to encourage. I'd rather tell users to RTFM. Tristan On 07/10/14 16:23, Sanne Grinovero wrote: > On 7 October 2014 14:42, Radim Vansa wrote: >> Considering the frequency of "How do I get the number of entries in >> cache", "How do I get all keys" on all forums, I think that backing to >> runtime exception would not satisfy the users. > Correct but when they get to ask the right question, that's usually > after several hours of debugging and swearing. > With an immediate exception, they get immediate advice and hopefully a > hint of where to look in the docs for the special methods like > statistics, disabling the exception-throwing decorator, or implement > their own M/R job with the exact flags they need. > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > From dan.berindei at gmail.com Tue Oct 7 12:28:07 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Tue, 7 Oct 2014 19:28:07 +0300 Subject: [infinispan-dev] Infinispan 7.0.0.CR1 is out! Message-ID: Dear Community, We are gearing up towards a great Infinispan 7.0.0, and we are happy to announce our first candidate release! Notable features and improvements in this release: * Cross-site state transfer now handles failures (ISPN-4025) * Easier management of Protobuf schemas (ISPN-4357) * New uberjars-based distribution (ISPN-4728) * The HotRod protocol and Java client now have a size() operation (ISPN-4736) * Cluster listeners' filters and converters can now see the old value and metadata (ISPN-4753) See the full announcement here: http://goo.gl/ERslmk Cheers Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20141007/f3cf39ba/attachment.html From sanne at infinispan.org Tue Oct 7 12:48:53 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Tue, 7 Oct 2014 17:48:53 +0100 Subject: [infinispan-dev] About size() In-Reply-To: <5433FFCD.10409@redhat.com> References: <542E5E92.7060504@redhat.com> <542EDF2A.7080807@redhat.com> <543280A8.5040109@redhat.com> <5433CF34.8010209@redhat.com> <5433DFF3.2060100@redhat.com> <5433EDB2.8020603@redhat.com> <5433FFCD.10409@redhat.com> Message-ID: On 7 October 2014 15:59, Tristan Tarrant wrote: > I'm not sure idiot-proof API is what we want to encourage. I'd rather > tell users to RTFM. I'm not thinking about idiots at all. As I said I made such a decorator for my own sake, as it was handy to spot bad-usage cases I had missed, or would miss after future refactorings. I would not underestimate users: if you explain the problem and make sure people see your message, everyone will be able to find many clever solutions to get what they need. On the other hand, relying for a user to read these specific javadocs is foolish: since you're inheriting the ConcurrentMap contract, people are going to use the ConcurrentMap type in some cases and clients of that API will have access to the ConcurrentMap javadoc exclusively. Maybe it's even code which was written before the introduction of Infinispan in a project, or written by a team which had never heard of Infinispan... you know, people make expecations out of typesafety. You could certainly put a link to TFM in the exception message. Sanne > > Tristan > > On 07/10/14 16:23, Sanne Grinovero wrote: >> On 7 October 2014 14:42, Radim Vansa wrote: >>> Considering the frequency of "How do I get the number of entries in >>> cache", "How do I get all keys" on all forums, I think that backing to >>> runtime exception would not satisfy the users. >> Correct but when they get to ask the right question, that's usually >> after several hours of debugging and swearing. >> With an immediate exception, they get immediate advice and hopefully a >> hint of where to look in the docs for the special methods like >> statistics, disabling the exception-throwing decorator, or implement >> their own M/R job with the exact flags they need. >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From dan.berindei at gmail.com Wed Oct 8 10:02:22 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Wed, 8 Oct 2014 17:02:22 +0300 Subject: [infinispan-dev] About size() In-Reply-To: References: <542E5E92.7060504@redhat.com> <542EDF2A.7080807@redhat.com> <543280A8.5040109@redhat.com> <5433CF34.8010209@redhat.com> <5433DFF3.2060100@redhat.com> Message-ID: On Tue, Oct 7, 2014 at 4:17 PM, William Burns wrote: > On Tue, Oct 7, 2014 at 8:43 AM, Radim Vansa wrote: > > On 10/07/2014 02:21 PM, William Burns wrote: > >> On Tue, Oct 7, 2014 at 7:32 AM, Radim Vansa wrote: > >>> If you have one local and one shared cache store, how should the > command > >>> behave? > >>> > >>> a) distexec/MR sum of cache.withFlags(SKIP_REMOTE_LOOKUP, > >>> SKIP_BACKUP_ENTRIES).size() from all nodes? (note that there's no > >>> SKIP_BACKUP_ENTRIES flag right now), where this method returns > >>> localStore.size() for first non-shared cache store + passivation ? > >>> dataContainer.size(SKIP_BACKUP_ENTRIES) : 0) > >> Calling the size method in either distexec or MR will give you > >> inflated numbers as you need to pay attention to the numOwners to get > >> a proper count. > > > > That's what I meant by the SKIP_BACKUP_ENTRIES - dataContainer should be > > able to report only primary-owned entries, or we have to iterate and > > apply the filtering outside. > > If we added this functionality then yes it would be promoted up to MR > counting entries status though it would still have issues with rehash. > As well as issues with concurrent activations and passivations. > I think we can use something like OutdatedTopologyException to make sure we count each segment once, on the primary owner. But in order to verify that a particular node is the primary owner we'd have to load each cache store entry, so performance with cache stores will be pretty bad. Dealing with concurrent activations/passivations will is even trickier. > > > > >> > >>> b) distexec/MR sum of sharedStore.size() + passivation ? sum of > >>> dataContainer.size(SKIP_BACKUP_ENTRIES) : 0 > >> Calling the size on a shared cache actually should work somewhat well > >> (assuming all entries are stored in the shared cache). The problem is > >> if passivation is enabled as you point out because you also have to > >> check the data container which means you can also have an issue with > >> concurrent activations and passivations (which you can't verify > >> properly in either case without knowing the keys). > >> > >>> c) MR that would count the entries > >> This is the only reliable way to do this with MR. And unfortunately > >> if a rehash occurs I am not sure if you would get inconsistent numbers > >> or an Exception. In the latter at least you should be able to make > >> sure that you have the proper number when it does return without > >> exception. I can't say how it works with multiple loaders though, my > >> guess is that it may process the entry more than once so it depends on > >> if your mapper is smart enough to realize it. > > > > I don't think that reporting incorrect size is *that* harmful - even > > ConcurrentMap interface says that it's just a wild guess and when things > > are changing, you can't rely on that. > > ConcurrentMap doesn't say anything about size method actually. > ConcurrentHashMap has some verbage about saying that it might not be > completely correct under concurrent modification though. > > It isn't a wild guess really though for ConcurrentHashMap. The worst > is that you could count a value that was there but it is now removed > or you don't count a value that was recently added. Really the > guarantee from CHM is that it counts each individual segment properly > for a glimpse of time for that segment, the problem is that each > segment could change (since they are counted at different times). But > the values missing in ConcurrentHashMap are totally different than > losing an entire segment due to a rehash. You could theoretically > have a rehash occur right after MR started iterating and see no values > for that segment or a very small subset. There is a much larger > margin of error in this case for what values are seen and which are > not. > > Interesting... the Map javadoc seems to assume linearizability, maybe because the original implementation was Hashtable :) So there is precedent for relaxing the definition of size(). But of course some users will still expect a 0 error margin when there are no concurrent writes, so I agree we don't get a free pass to ignore rehashes and activations during get(). > > > >> > >>> d) wrapper on distributed entry iteration with converters set to return > >>> 0-sized entries > >> Entry iterator can't return 0 sized entries (just the values). The > >> keys are required to make sure that the count is correct and also to > >> ensure that if a rehash happens in the middle it can properly continue > >> to operate without having to start over. Entry iterator should work > >> properly irrespective of the number of stores/loaders that are > >> configured, since it keep track of already seen keys (so duplicates > >> are ignored). > > > > Ok, I was simplifying that a bit. And by the way, I don't really like > > the fact that for distributed entry iteration you need to be able to > > keep all keys from one segment at one moment in memory. But fine - > > distributed entry iteration is probably not the right way. > > I agree it is annoying to have to keep the keys, but it is one of the > few ways to reliably get all the values without losing one. Actually > this approach provides a much closer approximation to what > ConcurrentHashMap provides for its size implementation, since it can't > drop a segment. It is pretty much required to do it this way to do > keySet, entrySet, and values where you don't have the luxury of > dropping whole swaths of entries like you do with calling size() > method (even if the value(s) was there the entire time). > If we decide to improve size(), I'd vote to use distributed entry iterators. We may be able to avoid sending all the keys to the originator when the cache doesn't have any stores. But with a store it looks like we can't avoid reading all the keys from the store, so skipping the transfer of the keys wouldn't help that much. > > > > >> > >> > >>> And what about nodes with different configuration? > >> Hard to know without knowing what the differences are. > > > > I had in my mind different loaders and passivation configuration (e.g. > > some node could use shared store and some don't - do we want to handle > > such obscure configs? Can we design that without the need to have > > complicated decision trees what to include and what not?). > > Well the last sentence means we have to use MR or Entry Iterator since > we can't call size on the shared loader. I would think that it should > still work irrespective of the loader configuration (except for MR > with multiple loaders). The main issue I can think of is that if > everyone isn't using the shared loader that you could have stale > values in the loader if you don't always have a node using the shared > loader up (assuming purge at startup isn't enabled). > We really shouldn't support different store/loader configurations on each node, except for minor stuff like paths. > > > > > Radim > > > >> > >>> Radim > >>> > >>> On 10/06/2014 01:57 PM, Sanne Grinovero wrote: > >>>> On 6 October 2014 12:44, Tristan Tarrant wrote: > >>>>> I think we should provide correct implementations of size() (and > others) > >>>>> and provide shortcut implementations using our usual Flag API (e.g. > >>>>> SKIP_REMOTE_LOOKUP). > >>>> Right that would be very nice. Same for CacheStore interaction: all > >>>> cachestores should be included unless skipped explicitly. > >>>> > >>>> Sanne > >>>> > >>>>> Tristan > >>>>> > >>>>> On 06/10/14 12:57, Sanne Grinovero wrote: > >>>>>> On 3 October 2014 18:38, Dennis Reed wrote: > >>>>>>> Since size() is defined by the ConcurrentMap interface, it already > has a > >>>>>>> precisely defined meaning. The only "correct" implementation is E. > >>>>>> +1 > >> This is one of the things I have been wanting to do is actually > >> implement the other Map methods across the entire cache. However to > >> do a lot of these in a memory conscious way they would need to be ran > >> ignoring any ongoing transactions. Actually having this requirement > >> allows these methods to be implemented quite easily especially in > >> conjunction with the EntryIterator. I almost made a PR for it a while > >> back, but it seemed a little zealous to do at the same time and it > >> didn't seem that people were pushing for it very hard (maybe that was > >> a wrong assumption). Also I wasn't quite sure the transactional part > >> not being functional anymore would be a deterrent. > >> > >>>>>>> The current non-correct implementation was just because it's > expensive > >>>>>>> to calculate correctly. I'm not sure the current impl is really > that > >>>>>>> useful for anything. > >>>>>> +1 > >>>>>> > >>>>>> And not just size() but many others from ConcurrentMap. > >>>>>> The question is if we should drop the interface and all the methods > >>>>>> which aren't efficiently implementable, or fix all those methods. > >>>>>> > >>>>>> In the past I loved that I could inject "Infinispan superpowers" > into > >>>>>> an application making extensive use of Map and ConcurrentMap without > >>>>>> changes, but that has been deceiving and required great care such as > >>>>>> verifying that these features would not be used anywhere in the > code. > >>>>>> I ended up wrapping the Cache implementation in a custom adapter > which > >>>>>> would also implement ConcurrentMap but would throw a > RuntimeException > >>>>>> if any of the "unallowed" methods was called, at least I would > detect > >>>>>> violations safely. > >>>>>> > >>>>>> I still think that for the time being - until a better solution is > >>>>>> planned - we should throw exceptions.. alas that's an old > conversation > >>>>>> and it was never done. > >>>>>> > >>>>>> Sanne > >>>>>> > >>>>>> > >>>>>>> -Dennis > >>>>>>> > >>>>>>> On 10/03/2014 03:30 AM, Radim Vansa wrote: > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> recently we had a discussion about what size() returns, but I've > >>>>>>>> realized there are more things that users would like to know. My > >>>>>>>> question is whether you think that they would really appreciate > it, or > >>>>>>>> whether it's just my QA point of view where I sometimes compute > the > >>>>>>>> 'checksums' of cache to see if I didn't lost anything. > >>>>>>>> > >>>>>>>> There are those sizes: > >>>>>>>> A) number of owned entries > >>>>>>>> B) number of entries stored locally in memory > >>>>>>>> C) number of entries stored in each local cache store > >>>>>>>> D) number of entries stored in each shared cache store > >>>>>>>> E) total number of entries in cache > >>>>>>>> > >>>>>>>> So far, we can get > >>>>>>>> B via withFlags(SKIP_CACHE_LOAD).size() > >>>>>>>> (passivation ? B : 0) + firstNonZero(C, D) via size() > >>>>>>>> E via distributed iterators / MR > >>>>>>>> A via data container iteration + distribution manager query, but > only > >>>>>>>> without cache store > >>>>>>>> C or D through > >>>>>>>> > getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() > >>>>>>>> > >>>>>>>> I think that it would go along with users' expectations if size() > >>>>>>>> returned E and for the rest we should have special methods on > >>>>>>>> AdvancedCache. That would of course change the meaning of size(), > but > >>>>>>>> I'd say that finally to something that has firm meaning. > >>>>>>>> > >>>>>>>> WDYT? > >>>>>>>> > >>>>>>>> Radim > >>>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> infinispan-dev mailing list > >>>>>>> infinispan-dev at lists.jboss.org > >>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >>>>>> _______________________________________________ > >>>>>> infinispan-dev mailing list > >>>>>> infinispan-dev at lists.jboss.org > >>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >>>>>> > >>>>>> > >>>>> _______________________________________________ > >>>>> infinispan-dev mailing list > >>>>> infinispan-dev at lists.jboss.org > >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >>>> _______________________________________________ > >>>> infinispan-dev mailing list > >>>> infinispan-dev at lists.jboss.org > >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >>> > >>> -- > >>> Radim Vansa > >>> JBoss DataGrid QA > >>> > >>> _______________________________________________ > >>> infinispan-dev mailing list > >>> infinispan-dev at lists.jboss.org > >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> _______________________________________________ > >> infinispan-dev mailing list > >> infinispan-dev at lists.jboss.org > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > > -- > > Radim Vansa > > JBoss DataGrid QA > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20141008/0665c484/attachment-0001.html From mmarkus at redhat.com Wed Oct 8 10:03:13 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Wed, 8 Oct 2014 15:03:13 +0100 Subject: [infinispan-dev] About size() In-Reply-To: <542E5E92.7060504@redhat.com> References: <542E5E92.7060504@redhat.com> Message-ID: <3EA0122E-8293-49EB-8CB7-F67FA2E58532@redhat.com> On Oct 3, 2014, at 9:30, Radim Vansa wrote: > Hi, > > recently we had a discussion about what size() returns, but I've > realized there are more things that users would like to know. My > question is whether you think that they would really appreciate it, or > whether it's just my QA point of view where I sometimes compute the > 'checksums' of cache to see if I didn't lost anything. > > There are those sizes: > A) number of owned entries > B) number of entries stored locally in memory > C) number of entries stored in each local cache store > D) number of entries stored in each shared cache store > E) total number of entries in cache > > So far, we can get > B via withFlags(SKIP_CACHE_LOAD).size() > (passivation ? B : 0) + firstNonZero(C, D) via size() > E via distributed iterators / MR > A via data container iteration + distribution manager query, but only > without cache store > C or D through > getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() > > I think that it would go along with users' expectations if size() > returned E and for the rest we should have special methods on > AdvancedCache. That would of course change the meaning of size(), but > I'd say that finally to something that has firm meaning. > > WDYT? There was a lot of arguments in past whether size() and other methods that operate over all the elements (keySet, values) are useful because: - they are approximate (data changes during iteration) - they are very resource consuming and might be miss-used (this is the reason we chosen to use size() with its current local semantic) These methods (size, keys, values) are useful for people and I think we were not wise to implement them only on top of the local data: this is like preferring efficiency over correctness. This also created a lot of confusion with our users, question like size() doesn't return the correct value being asked regularly. I totally agree that size() returns E (i.e. everything that is stored within the grid, including persistence) and it's performance implications to be documented accordingly. For keySet and values - we should stop implementing them (throw exception) and point users to Will's distributed iterator which is a nicer way to achieve the desired behavior. > > Radim > > -- > Radim Vansa > JBoss DataGrid QA > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Wed Oct 8 10:09:26 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Wed, 8 Oct 2014 15:09:26 +0100 Subject: [infinispan-dev] About size() In-Reply-To: <542EDF2A.7080807@redhat.com> References: <542E5E92.7060504@redhat.com> <542EDF2A.7080807@redhat.com> Message-ID: On Oct 3, 2014, at 18:38, Dennis Reed wrote: > Since size() is defined by the ConcurrentMap interface, it already has a > precisely defined meaning. The only "correct" implementation is E. > > The current non-correct implementation was just because it's expensive > to calculate correctly. I'm not sure the current impl is really that > useful for anything. +1 > > -Dennis > > On 10/03/2014 03:30 AM, Radim Vansa wrote: >> Hi, >> >> recently we had a discussion about what size() returns, but I've >> realized there are more things that users would like to know. My >> question is whether you think that they would really appreciate it, or >> whether it's just my QA point of view where I sometimes compute the >> 'checksums' of cache to see if I didn't lost anything. >> >> There are those sizes: >> A) number of owned entries >> B) number of entries stored locally in memory >> C) number of entries stored in each local cache store >> D) number of entries stored in each shared cache store >> E) total number of entries in cache >> >> So far, we can get >> B via withFlags(SKIP_CACHE_LOAD).size() >> (passivation ? B : 0) + firstNonZero(C, D) via size() >> E via distributed iterators / MR >> A via data container iteration + distribution manager query, but only >> without cache store >> C or D through >> getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() >> >> I think that it would go along with users' expectations if size() >> returned E and for the rest we should have special methods on >> AdvancedCache. That would of course change the meaning of size(), but >> I'd say that finally to something that has firm meaning. >> >> WDYT? >> >> Radim >> > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Wed Oct 8 10:11:55 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Wed, 8 Oct 2014 15:11:55 +0100 Subject: [infinispan-dev] About size() In-Reply-To: <543280A8.5040109@redhat.com> References: <542E5E92.7060504@redhat.com> <542EDF2A.7080807@redhat.com> <543280A8.5040109@redhat.com> Message-ID: On Oct 6, 2014, at 12:44, Tristan Tarrant wrote: > I think we should provide correct implementations of size() (and others) > and provide shortcut implementations using our usual Flag API (e.g. > SKIP_REMOTE_LOOKUP). for keySet and values, Will's distributed iteration is a way nicer way of doing it, as it only fetches the data iteratively. Better to throw an exception and point user to the distributed iterator. > > Tristan > > On 06/10/14 12:57, Sanne Grinovero wrote: >> On 3 October 2014 18:38, Dennis Reed wrote: >>> Since size() is defined by the ConcurrentMap interface, it already has a >>> precisely defined meaning. The only "correct" implementation is E. >> +1 >> >>> The current non-correct implementation was just because it's expensive >>> to calculate correctly. I'm not sure the current impl is really that >>> useful for anything. >> +1 >> >> And not just size() but many others from ConcurrentMap. >> The question is if we should drop the interface and all the methods >> which aren't efficiently implementable, or fix all those methods. >> >> In the past I loved that I could inject "Infinispan superpowers" into >> an application making extensive use of Map and ConcurrentMap without >> changes, but that has been deceiving and required great care such as >> verifying that these features would not be used anywhere in the code. >> I ended up wrapping the Cache implementation in a custom adapter which >> would also implement ConcurrentMap but would throw a RuntimeException >> if any of the "unallowed" methods was called, at least I would detect >> violations safely. >> >> I still think that for the time being - until a better solution is >> planned - we should throw exceptions.. alas that's an old conversation >> and it was never done. >> >> Sanne >> >> >>> -Dennis >>> >>> On 10/03/2014 03:30 AM, Radim Vansa wrote: >>>> Hi, >>>> >>>> recently we had a discussion about what size() returns, but I've >>>> realized there are more things that users would like to know. My >>>> question is whether you think that they would really appreciate it, or >>>> whether it's just my QA point of view where I sometimes compute the >>>> 'checksums' of cache to see if I didn't lost anything. >>>> >>>> There are those sizes: >>>> A) number of owned entries >>>> B) number of entries stored locally in memory >>>> C) number of entries stored in each local cache store >>>> D) number of entries stored in each shared cache store >>>> E) total number of entries in cache >>>> >>>> So far, we can get >>>> B via withFlags(SKIP_CACHE_LOAD).size() >>>> (passivation ? B : 0) + firstNonZero(C, D) via size() >>>> E via distributed iterators / MR >>>> A via data container iteration + distribution manager query, but only >>>> without cache store >>>> C or D through >>>> getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() >>>> >>>> I think that it would go along with users' expectations if size() >>>> returned E and for the rest we should have special methods on >>>> AdvancedCache. That would of course change the meaning of size(), but >>>> I'd say that finally to something that has firm meaning. >>>> >>>> WDYT? >>>> >>>> Radim >>>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From dan.berindei at gmail.com Wed Oct 8 10:11:59 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Wed, 8 Oct 2014 17:11:59 +0300 Subject: [infinispan-dev] About size() In-Reply-To: <3EA0122E-8293-49EB-8CB7-F67FA2E58532@redhat.com> References: <542E5E92.7060504@redhat.com> <3EA0122E-8293-49EB-8CB7-F67FA2E58532@redhat.com> Message-ID: On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus wrote: > On Oct 3, 2014, at 9:30, Radim Vansa wrote: > > > Hi, > > > > recently we had a discussion about what size() returns, but I've > > realized there are more things that users would like to know. My > > question is whether you think that they would really appreciate it, or > > whether it's just my QA point of view where I sometimes compute the > > 'checksums' of cache to see if I didn't lost anything. > > > > There are those sizes: > > A) number of owned entries > > B) number of entries stored locally in memory > > C) number of entries stored in each local cache store > > D) number of entries stored in each shared cache store > > E) total number of entries in cache > > > > So far, we can get > > B via withFlags(SKIP_CACHE_LOAD).size() > > (passivation ? B : 0) + firstNonZero(C, D) via size() > > E via distributed iterators / MR > > A via data container iteration + distribution manager query, but only > > without cache store > > C or D through > > > getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() > > > > I think that it would go along with users' expectations if size() > > returned E and for the rest we should have special methods on > > AdvancedCache. That would of course change the meaning of size(), but > > I'd say that finally to something that has firm meaning. > > > > WDYT? > > There was a lot of arguments in past whether size() and other methods that > operate over all the elements (keySet, values) are useful because: > - they are approximate (data changes during iteration) > - they are very resource consuming and might be miss-used (this is the > reason we chosen to use size() with its current local semantic) > > These methods (size, keys, values) are useful for people and I think we > were not wise to implement them only on top of the local data: this is like > preferring efficiency over correctness. This also created a lot of > confusion with our users, question like size() doesn't return the correct > value being asked regularly. I totally agree that size() returns E (i.e. > everything that is stored within the grid, including persistence) and it's > performance implications to be documented accordingly. For keySet and > values - we should stop implementing them (throw exception) and point users > to Will's distributed iterator which is a nicer way to achieve the desired > behavior. > We can also implement keySet() and values() on top of the distributed entry iterator and document that using the iterator directly is better. > > > > > Radim > > > > -- > > Radim Vansa > > JBoss DataGrid QA > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20141008/72903689/attachment.html From mmarkus at redhat.com Wed Oct 8 10:13:55 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Wed, 8 Oct 2014 15:13:55 +0100 Subject: [infinispan-dev] About size() In-Reply-To: References: <542E5E92.7060504@redhat.com> <3EA0122E-8293-49EB-8CB7-F67FA2E58532@redhat.com> Message-ID: <593A31AE-2C9C-4B90-9048-0EDBADCA1ADF@redhat.com> On Oct 8, 2014, at 15:11, Dan Berindei wrote: > > On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus wrote: > On Oct 3, 2014, at 9:30, Radim Vansa wrote: > > > Hi, > > > > recently we had a discussion about what size() returns, but I've > > realized there are more things that users would like to know. My > > question is whether you think that they would really appreciate it, or > > whether it's just my QA point of view where I sometimes compute the > > 'checksums' of cache to see if I didn't lost anything. > > > > There are those sizes: > > A) number of owned entries > > B) number of entries stored locally in memory > > C) number of entries stored in each local cache store > > D) number of entries stored in each shared cache store > > E) total number of entries in cache > > > > So far, we can get > > B via withFlags(SKIP_CACHE_LOAD).size() > > (passivation ? B : 0) + firstNonZero(C, D) via size() > > E via distributed iterators / MR > > A via data container iteration + distribution manager query, but only > > without cache store > > C or D through > > getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() > > > > I think that it would go along with users' expectations if size() > > returned E and for the rest we should have special methods on > > AdvancedCache. That would of course change the meaning of size(), but > > I'd say that finally to something that has firm meaning. > > > > WDYT? > > There was a lot of arguments in past whether size() and other methods that operate over all the elements (keySet, values) are useful because: > - they are approximate (data changes during iteration) > - they are very resource consuming and might be miss-used (this is the reason we chosen to use size() with its current local semantic) > > These methods (size, keys, values) are useful for people and I think we were not wise to implement them only on top of the local data: this is like preferring efficiency over correctness. This also created a lot of confusion with our users, question like size() doesn't return the correct value being asked regularly. I totally agree that size() returns E (i.e. everything that is stored within the grid, including persistence) and it's performance implications to be documented accordingly. For keySet and values - we should stop implementing them (throw exception) and point users to Will's distributed iterator which is a nicer way to achieve the desired behavior. > > We can also implement keySet() and values() on top of the distributed entry iterator and document that using the iterator directly is better. Yes, that's what I meant as well. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mudokonman at gmail.com Wed Oct 8 10:42:16 2014 From: mudokonman at gmail.com (William Burns) Date: Wed, 8 Oct 2014 10:42:16 -0400 Subject: [infinispan-dev] About size() In-Reply-To: <593A31AE-2C9C-4B90-9048-0EDBADCA1ADF@redhat.com> References: <542E5E92.7060504@redhat.com> <3EA0122E-8293-49EB-8CB7-F67FA2E58532@redhat.com> <593A31AE-2C9C-4B90-9048-0EDBADCA1ADF@redhat.com> Message-ID: So it seems we would want to change this for 7.0 if possible since it would be a bigger change for something like 7.1 and 8.0 would be even further out. I should be able to put this together for CR2. It seems that we want to implement keySet, values and entrySet methods using the entry iterator approach. It is however unclear for the size method if we want to use MR entry counting and not worry about the rehash and passivation issues since it is just an estimation anyways. Or if we want to also use the entry iterator which should be closer approximation but will require more network overhead and memory usage. Also we didn't really talk about the fact that these methods would ignore ongoing transactions and if that is a concern or not. - Will On Wed, Oct 8, 2014 at 10:13 AM, Mircea Markus wrote: > > On Oct 8, 2014, at 15:11, Dan Berindei wrote: > >> >> On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus wrote: >> On Oct 3, 2014, at 9:30, Radim Vansa wrote: >> >> > Hi, >> > >> > recently we had a discussion about what size() returns, but I've >> > realized there are more things that users would like to know. My >> > question is whether you think that they would really appreciate it, or >> > whether it's just my QA point of view where I sometimes compute the >> > 'checksums' of cache to see if I didn't lost anything. >> > >> > There are those sizes: >> > A) number of owned entries >> > B) number of entries stored locally in memory >> > C) number of entries stored in each local cache store >> > D) number of entries stored in each shared cache store >> > E) total number of entries in cache >> > >> > So far, we can get >> > B via withFlags(SKIP_CACHE_LOAD).size() >> > (passivation ? B : 0) + firstNonZero(C, D) via size() >> > E via distributed iterators / MR >> > A via data container iteration + distribution manager query, but only >> > without cache store >> > C or D through >> > getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() >> > >> > I think that it would go along with users' expectations if size() >> > returned E and for the rest we should have special methods on >> > AdvancedCache. That would of course change the meaning of size(), but >> > I'd say that finally to something that has firm meaning. >> > >> > WDYT? >> >> There was a lot of arguments in past whether size() and other methods that operate over all the elements (keySet, values) are useful because: >> - they are approximate (data changes during iteration) >> - they are very resource consuming and might be miss-used (this is the reason we chosen to use size() with its current local semantic) >> >> These methods (size, keys, values) are useful for people and I think we were not wise to implement them only on top of the local data: this is like preferring efficiency over correctness. This also created a lot of confusion with our users, question like size() doesn't return the correct value being asked regularly. I totally agree that size() returns E (i.e. everything that is stored within the grid, including persistence) and it's performance implications to be documented accordingly. For keySet and values - we should stop implementing them (throw exception) and point users to Will's distributed iterator which is a nicer way to achieve the desired behavior. >> >> We can also implement keySet() and values() on top of the distributed entry iterator and document that using the iterator directly is better. > > Yes, that's what I meant as well. > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From dan.berindei at gmail.com Wed Oct 8 10:57:58 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Wed, 8 Oct 2014 17:57:58 +0300 Subject: [infinispan-dev] About size() In-Reply-To: References: <542E5E92.7060504@redhat.com> <3EA0122E-8293-49EB-8CB7-F67FA2E58532@redhat.com> <593A31AE-2C9C-4B90-9048-0EDBADCA1ADF@redhat.com> Message-ID: On Wed, Oct 8, 2014 at 5:42 PM, William Burns wrote: > So it seems we would want to change this for 7.0 if possible since it > would be a bigger change for something like 7.1 and 8.0 would be even > further out. I should be able to put this together for CR2. > I'm not 100% convinced that we need it for 7.x. For 8.0 I would recommend removing the size() method altogether, and providing some looser "statistics" instead. > > It seems that we want to implement keySet, values and entrySet methods > using the entry iterator approach. > > It is however unclear for the size method if we want to use MR entry > counting and not worry about the rehash and passivation issues since > it is just an estimation anyways. Or if we want to also use the entry > iterator which should be closer approximation but will require more > network overhead and memory usage. > +1 to use the entry iterator from me, ignoring state transfer we can get some pretty wild fluctuations in the size of the cache. We could use a distributed task for Cache.isEmpty() instead of size() == 0, though. > > Also we didn't really talk about the fact that these methods would > ignore ongoing transactions and if that is a concern or not. > > It might be a concern for the Hibernate 2LC impl, it was their TCK that prompted the last round of discussions about clear(). We haven't talked about what size(), keySet() and values() should return for an invalidation cache either... I forget, does the distributed entry iterator work with invalidation caches? > - Will > > On Wed, Oct 8, 2014 at 10:13 AM, Mircea Markus wrote: > > > > On Oct 8, 2014, at 15:11, Dan Berindei wrote: > > > >> > >> On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus > wrote: > >> On Oct 3, 2014, at 9:30, Radim Vansa wrote: > >> > >> > Hi, > >> > > >> > recently we had a discussion about what size() returns, but I've > >> > realized there are more things that users would like to know. My > >> > question is whether you think that they would really appreciate it, or > >> > whether it's just my QA point of view where I sometimes compute the > >> > 'checksums' of cache to see if I didn't lost anything. > >> > > >> > There are those sizes: > >> > A) number of owned entries > >> > B) number of entries stored locally in memory > >> > C) number of entries stored in each local cache store > >> > D) number of entries stored in each shared cache store > >> > E) total number of entries in cache > >> > > >> > So far, we can get > >> > B via withFlags(SKIP_CACHE_LOAD).size() > >> > (passivation ? B : 0) + firstNonZero(C, D) via size() > >> > E via distributed iterators / MR > >> > A via data container iteration + distribution manager query, but only > >> > without cache store > >> > C or D through > >> > > getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() > >> > > >> > I think that it would go along with users' expectations if size() > >> > returned E and for the rest we should have special methods on > >> > AdvancedCache. That would of course change the meaning of size(), but > >> > I'd say that finally to something that has firm meaning. > >> > > >> > WDYT? > >> > >> There was a lot of arguments in past whether size() and other methods > that operate over all the elements (keySet, values) are useful because: > >> - they are approximate (data changes during iteration) > >> - they are very resource consuming and might be miss-used (this is the > reason we chosen to use size() with its current local semantic) > >> > >> These methods (size, keys, values) are useful for people and I think we > were not wise to implement them only on top of the local data: this is like > preferring efficiency over correctness. This also created a lot of > confusion with our users, question like size() doesn't return the correct > value being asked regularly. I totally agree that size() returns E (i.e. > everything that is stored within the grid, including persistence) and it's > performance implications to be documented accordingly. For keySet and > values - we should stop implementing them (throw exception) and point users > to Will's distributed iterator which is a nicer way to achieve the desired > behavior. > >> > >> We can also implement keySet() and values() on top of the distributed > entry iterator and document that using the iterator directly is better. > > > > Yes, that's what I meant as well. > > > > Cheers, > > -- > > Mircea Markus > > Infinispan lead (www.infinispan.org) > > > > > > > > > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20141008/870e35b8/attachment.html From mudokonman at gmail.com Wed Oct 8 11:14:12 2014 From: mudokonman at gmail.com (William Burns) Date: Wed, 8 Oct 2014 11:14:12 -0400 Subject: [infinispan-dev] About size() In-Reply-To: References: <542E5E92.7060504@redhat.com> <3EA0122E-8293-49EB-8CB7-F67FA2E58532@redhat.com> <593A31AE-2C9C-4B90-9048-0EDBADCA1ADF@redhat.com>

Message-ID: On Wed, Oct 8, 2014 at 10:57 AM, Dan Berindei wrote: > > > On Wed, Oct 8, 2014 at 5:42 PM, William Burns wrote: >> >> So it seems we would want to change this for 7.0 if possible since it >> would be a bigger change for something like 7.1 and 8.0 would be even >> further out. I should be able to put this together for CR2. > > > I'm not 100% convinced that we need it for 7.x. For 8.0 I would recommend > removing the size() method altogether, and providing some looser > "statistics" instead. Yeah I guess I don't know enough about the demand for these methods or what people wanted to use them for to know what kind of priority they should be given. It sounds like you are talking about decoupling from the Map/ConcurrentMap interface completely then, right? So we would also eliminate the other bulk methods (keySet, values, entrySet)? > >> >> >> It seems that we want to implement keySet, values and entrySet methods >> using the entry iterator approach. >> >> It is however unclear for the size method if we want to use MR entry >> counting and not worry about the rehash and passivation issues since >> it is just an estimation anyways. Or if we want to also use the entry >> iterator which should be closer approximation but will require more >> network overhead and memory usage. > > > +1 to use the entry iterator from me, ignoring state transfer we can get > some pretty wild fluctuations in the size of the cache. That is personally my feeling as well, but I tend to err more on the side of correctness to begin with. > We could use a distributed task for Cache.isEmpty() instead of size() == 0, > though. Yes that should be a good optimization either way. > >> >> >> Also we didn't really talk about the fact that these methods would >> ignore ongoing transactions and if that is a concern or not. >> > > It might be a concern for the Hibernate 2LC impl, it was their TCK that > prompted the last round of discussions about clear(). Although I wonder how much these methods are even used since they only work for Local, Replication or Invalidation caches in their current state (and didn't even use loaders until 6.0). > > We haven't talked about what size(), keySet() and values() should return for > an invalidation cache either... I forget, does the distributed entry > iterator work with invalidation caches? It works the same as a local cache so only the local node contents are returned. Replicated does the same thing, distributed is the only special case. This was the only thing that made sense to me, but if you have any ideas that would be great to hear for possibly enhancing Invalidation iteration. > > >> >> - Will >> >> On Wed, Oct 8, 2014 at 10:13 AM, Mircea Markus wrote: >> > >> > On Oct 8, 2014, at 15:11, Dan Berindei wrote: >> > >> >> >> >> On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus >> >> wrote: >> >> On Oct 3, 2014, at 9:30, Radim Vansa wrote: >> >> >> >> > Hi, >> >> > >> >> > recently we had a discussion about what size() returns, but I've >> >> > realized there are more things that users would like to know. My >> >> > question is whether you think that they would really appreciate it, >> >> > or >> >> > whether it's just my QA point of view where I sometimes compute the >> >> > 'checksums' of cache to see if I didn't lost anything. >> >> > >> >> > There are those sizes: >> >> > A) number of owned entries >> >> > B) number of entries stored locally in memory >> >> > C) number of entries stored in each local cache store >> >> > D) number of entries stored in each shared cache store >> >> > E) total number of entries in cache >> >> > >> >> > So far, we can get >> >> > B via withFlags(SKIP_CACHE_LOAD).size() >> >> > (passivation ? B : 0) + firstNonZero(C, D) via size() >> >> > E via distributed iterators / MR >> >> > A via data container iteration + distribution manager query, but only >> >> > without cache store >> >> > C or D through >> >> > >> >> > getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() >> >> > >> >> > I think that it would go along with users' expectations if size() >> >> > returned E and for the rest we should have special methods on >> >> > AdvancedCache. That would of course change the meaning of size(), but >> >> > I'd say that finally to something that has firm meaning. >> >> > >> >> > WDYT? >> >> >> >> There was a lot of arguments in past whether size() and other methods >> >> that operate over all the elements (keySet, values) are useful because: >> >> - they are approximate (data changes during iteration) >> >> - they are very resource consuming and might be miss-used (this is the >> >> reason we chosen to use size() with its current local semantic) >> >> >> >> These methods (size, keys, values) are useful for people and I think we >> >> were not wise to implement them only on top of the local data: this is like >> >> preferring efficiency over correctness. This also created a lot of confusion >> >> with our users, question like size() doesn't return the correct value being >> >> asked regularly. I totally agree that size() returns E (i.e. everything that >> >> is stored within the grid, including persistence) and it's performance >> >> implications to be documented accordingly. For keySet and values - we should >> >> stop implementing them (throw exception) and point users to Will's >> >> distributed iterator which is a nicer way to achieve the desired behavior. >> >> >> >> We can also implement keySet() and values() on top of the distributed >> >> entry iterator and document that using the iterator directly is better. >> > >> > Yes, that's what I meant as well. >> > >> > Cheers, >> > -- >> > Mircea Markus >> > Infinispan lead (www.infinispan.org) >> > >> > >> > >> > >> > >> > _______________________________________________ >> > infinispan-dev mailing list >> > infinispan-dev at lists.jboss.org >> > https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From rvansa at redhat.com Wed Oct 8 11:19:42 2014 From: rvansa at redhat.com (Radim Vansa) Date: Wed, 08 Oct 2014 17:19:42 +0200 Subject: [infinispan-dev] About size() In-Reply-To: References: <542E5E92.7060504@redhat.com> <3EA0122E-8293-49EB-8CB7-F67FA2E58532@redhat.com> <593A31AE-2C9C-4B90-9048-0EDBADCA1ADF@redhat.com> Message-ID: <5435560E.2030206@redhat.com> Users expect that size() will be constant-time (or linear to cluster size), and generally fast operation. I'd prefer to keep it that way. Though, even the MR way (used for HotRod size() now) needs to crawl through all the entries locally. 'Heretic, not very well though of and changing too many things' idea: what about having data container segment-aware? Then you'd just bcast SizeCommand with given topologyId and sum up sizes of primary-owned segments... It's not a complete solution, but at least that would enable to get the number of locally owned entries quite fast. Though, you can't do that easily with cache stores (without changing SPI). Regarding cache stores, IMO we're damned anyway: when calling cacheStore.size(), it can report more entries as those haven't been expired yet, it can report less entries as those can be expired due to [1]. Or, we'll enumerate all the entries, and that's going to be slow (btw., [1] reminded me that we should enumerate both datacontainer AND cachestores even if passivation is not enabled). Radim [1] https://issues.jboss.org/browse/ISPN-3202 On 10/08/2014 04:42 PM, William Burns wrote: > So it seems we would want to change this for 7.0 if possible since it > would be a bigger change for something like 7.1 and 8.0 would be even > further out. I should be able to put this together for CR2. > > It seems that we want to implement keySet, values and entrySet methods > using the entry iterator approach. > > It is however unclear for the size method if we want to use MR entry > counting and not worry about the rehash and passivation issues since > it is just an estimation anyways. Or if we want to also use the entry > iterator which should be closer approximation but will require more > network overhead and memory usage. > > Also we didn't really talk about the fact that these methods would > ignore ongoing transactions and if that is a concern or not. > > - Will > > On Wed, Oct 8, 2014 at 10:13 AM, Mircea Markus wrote: >> On Oct 8, 2014, at 15:11, Dan Berindei wrote: >> >>> On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus wrote: >>> On Oct 3, 2014, at 9:30, Radim Vansa wrote: >>> >>>> Hi, >>>> >>>> recently we had a discussion about what size() returns, but I've >>>> realized there are more things that users would like to know. My >>>> question is whether you think that they would really appreciate it, or >>>> whether it's just my QA point of view where I sometimes compute the >>>> 'checksums' of cache to see if I didn't lost anything. >>>> >>>> There are those sizes: >>>> A) number of owned entries >>>> B) number of entries stored locally in memory >>>> C) number of entries stored in each local cache store >>>> D) number of entries stored in each shared cache store >>>> E) total number of entries in cache >>>> >>>> So far, we can get >>>> B via withFlags(SKIP_CACHE_LOAD).size() >>>> (passivation ? B : 0) + firstNonZero(C, D) via size() >>>> E via distributed iterators / MR >>>> A via data container iteration + distribution manager query, but only >>>> without cache store >>>> C or D through >>>> getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() >>>> >>>> I think that it would go along with users' expectations if size() >>>> returned E and for the rest we should have special methods on >>>> AdvancedCache. That would of course change the meaning of size(), but >>>> I'd say that finally to something that has firm meaning. >>>> >>>> WDYT? >>> There was a lot of arguments in past whether size() and other methods that operate over all the elements (keySet, values) are useful because: >>> - they are approximate (data changes during iteration) >>> - they are very resource consuming and might be miss-used (this is the reason we chosen to use size() with its current local semantic) >>> >>> These methods (size, keys, values) are useful for people and I think we were not wise to implement them only on top of the local data: this is like preferring efficiency over correctness. This also created a lot of confusion with our users, question like size() doesn't return the correct value being asked regularly. I totally agree that size() returns E (i.e. everything that is stored within the grid, including persistence) and it's performance implications to be documented accordingly. For keySet and values - we should stop implementing them (throw exception) and point users to Will's distributed iterator which is a nicer way to achieve the desired behavior. >>> >>> We can also implement keySet() and values() on top of the distributed entry iterator and document that using the iterator directly is better. >> Yes, that's what I meant as well. >> >> Cheers, >> -- >> Mircea Markus >> Infinispan lead (www.infinispan.org) >> >> >> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Radim Vansa JBoss DataGrid QA From dan.berindei at gmail.com Wed Oct 8 12:23:06 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Wed, 8 Oct 2014 19:23:06 +0300 Subject: [infinispan-dev] About size() In-Reply-To: References: <542E5E92.7060504@redhat.com> <3EA0122E-8293-49EB-8CB7-F67FA2E58532@redhat.com> <593A31AE-2C9C-4B90-9048-0EDBADCA1ADF@redhat.com>

Message-ID: On Wed, Oct 8, 2014 at 6:14 PM, William Burns wrote: > On Wed, Oct 8, 2014 at 10:57 AM, Dan Berindei > wrote: > > > > > > On Wed, Oct 8, 2014 at 5:42 PM, William Burns > wrote: > >> > >> So it seems we would want to change this for 7.0 if possible since it > >> would be a bigger change for something like 7.1 and 8.0 would be even > >> further out. I should be able to put this together for CR2. > > > > > > I'm not 100% convinced that we need it for 7.x. For 8.0 I would recommend > > removing the size() method altogether, and providing some looser > > "statistics" instead. > > Yeah I guess I don't know enough about the demand for these methods or > what people wanted to use them for to know what kind of priority they > should be given. > > It sounds like you are talking about decoupling from the > Map/ConcurrentMap interface completely then, right? So we would also > eliminate the other bulk methods (keySet, values, entrySet)? > Yes, I would base the Cache interface on JSR-107's Cache, which doesn't have size() or the other methods. > > > > >> > >> > >> It seems that we want to implement keySet, values and entrySet methods > >> using the entry iterator approach. > >> > >> It is however unclear for the size method if we want to use MR entry > >> counting and not worry about the rehash and passivation issues since > >> it is just an estimation anyways. Or if we want to also use the entry > >> iterator which should be closer approximation but will require more > >> network overhead and memory usage. > > > > > > +1 to use the entry iterator from me, ignoring state transfer we can get > > some pretty wild fluctuations in the size of the cache. > > That is personally my feeling as well, but I tend to err more on the > side of correctness to begin with. > > > We could use a distributed task for Cache.isEmpty() instead of size() == > 0, > > though. > > Yes that should be a good optimization either way. > > > > >> > >> > >> Also we didn't really talk about the fact that these methods would > >> ignore ongoing transactions and if that is a concern or not. > >> > > > > It might be a concern for the Hibernate 2LC impl, it was their TCK that > > prompted the last round of discussions about clear(). > > Although I wonder how much these methods are even used since they only > work for Local, Replication or Invalidation caches in their current > state (and didn't even use loaders until 6.0). > There is some more information about the test in the mailing list discussion [1] There's also a JIRA for clear() [2] I think 2LC almost never uses distribution, so size() being local-only didn't matter, but making it non-tx could cause problems - at least for that particular test. [1] http://lists.jboss.org/pipermail/infinispan-dev/2013-October/013914.html [2] https://issues.jboss.org/browse/ISPN-3656 > > > > We haven't talked about what size(), keySet() and values() should return > for > > an invalidation cache either... I forget, does the distributed entry > > iterator work with invalidation caches? > > It works the same as a local cache so only the local node contents are > returned. Replicated does the same thing, distributed is the only > special case. This was the only thing that made sense to me, but if > you have any ideas that would be great to hear for possibly enhancing > Invalidation iteration. > Sounds good to me. cache.get(k) will search on all the nodes via ClusterLoader, so there is a certain appeal in making the entry iterator do the same. But invalidation caches are used with an external (non-CacheLoader) source of data anyway, so we can never return "all the entries". > > > > > >> > >> - Will > >> > >> On Wed, Oct 8, 2014 at 10:13 AM, Mircea Markus > wrote: > >> > > >> > On Oct 8, 2014, at 15:11, Dan Berindei > wrote: > >> > > >> >> > >> >> On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus > >> >> wrote: > >> >> On Oct 3, 2014, at 9:30, Radim Vansa wrote: > >> >> > >> >> > Hi, > >> >> > > >> >> > recently we had a discussion about what size() returns, but I've > >> >> > realized there are more things that users would like to know. My > >> >> > question is whether you think that they would really appreciate it, > >> >> > or > >> >> > whether it's just my QA point of view where I sometimes compute the > >> >> > 'checksums' of cache to see if I didn't lost anything. > >> >> > > >> >> > There are those sizes: > >> >> > A) number of owned entries > >> >> > B) number of entries stored locally in memory > >> >> > C) number of entries stored in each local cache store > >> >> > D) number of entries stored in each shared cache store > >> >> > E) total number of entries in cache > >> >> > > >> >> > So far, we can get > >> >> > B via withFlags(SKIP_CACHE_LOAD).size() > >> >> > (passivation ? B : 0) + firstNonZero(C, D) via size() > >> >> > E via distributed iterators / MR > >> >> > A via data container iteration + distribution manager query, but > only > >> >> > without cache store > >> >> > C or D through > >> >> > > >> >> > > getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() > >> >> > > >> >> > I think that it would go along with users' expectations if size() > >> >> > returned E and for the rest we should have special methods on > >> >> > AdvancedCache. That would of course change the meaning of size(), > but > >> >> > I'd say that finally to something that has firm meaning. > >> >> > > >> >> > WDYT? > >> >> > >> >> There was a lot of arguments in past whether size() and other methods > >> >> that operate over all the elements (keySet, values) are useful > because: > >> >> - they are approximate (data changes during iteration) > >> >> - they are very resource consuming and might be miss-used (this is > the > >> >> reason we chosen to use size() with its current local semantic) > >> >> > >> >> These methods (size, keys, values) are useful for people and I think > we > >> >> were not wise to implement them only on top of the local data: this > is like > >> >> preferring efficiency over correctness. This also created a lot of > confusion > >> >> with our users, question like size() doesn't return the correct > value being > >> >> asked regularly. I totally agree that size() returns E (i.e. > everything that > >> >> is stored within the grid, including persistence) and it's > performance > >> >> implications to be documented accordingly. For keySet and values - > we should > >> >> stop implementing them (throw exception) and point users to Will's > >> >> distributed iterator which is a nicer way to achieve the desired > behavior. > >> >> > >> >> We can also implement keySet() and values() on top of the distributed > >> >> entry iterator and document that using the iterator directly is > better. > >> > > >> > Yes, that's what I meant as well. > >> > > >> > Cheers, > >> > -- > >> > Mircea Markus > >> > Infinispan lead (www.infinispan.org) > >> > > >> > > >> > > >> > > >> > > >> > _______________________________________________ > >> > infinispan-dev mailing list > >> > infinispan-dev at lists.jboss.org > >> > https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> > >> _______________________________________________ > >> infinispan-dev mailing list > >> infinispan-dev at lists.jboss.org > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20141008/e3158960/attachment.html From dan.berindei at gmail.com Wed Oct 8 12:41:42 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Wed, 8 Oct 2014 19:41:42 +0300 Subject: [infinispan-dev] About size() In-Reply-To: <5435560E.2030206@redhat.com> References: <542E5E92.7060504@redhat.com> <3EA0122E-8293-49EB-8CB7-F67FA2E58532@redhat.com> <593A31AE-2C9C-4B90-9048-0EDBADCA1ADF@redhat.com> <5435560E.2030206@redhat.com> Message-ID: On Wed, Oct 8, 2014 at 6:19 PM, Radim Vansa wrote: > Users expect that size() will be constant-time (or linear to cluster > size), and generally fast operation. I'd prefer to keep it that way. > Though, even the MR way (used for HotRod size() now) needs to crawl > through all the entries locally. > They might expect that, but there is nothing in the Map API suggesting it. > > 'Heretic, not very well though of and changing too many things' idea: > what about having data container segment-aware? Then you'd just bcast > SizeCommand with given topologyId and sum up sizes of primary-owned > segments... It's not a complete solution, but at least that would enable > to get the number of locally owned entries quite fast. Though, you can't > do that easily with cache stores (without changing SPI). > We could create a separate DataContainer for each segment. But would it really be worth the trouble? I don't know of anyone using size() for something other than checking that their data was properly loaded into the cache, and they don't need a super-fast size() for that. > > Regarding cache stores, IMO we're damned anyway: when calling > cacheStore.size(), it can report more entries as those haven't been > expired yet, it can report less entries as those can be expired due to > [1]. Or, we'll enumerate all the entries, and that's going to be slow > (btw., [1] reminded me that we should enumerate both datacontainer AND > cachestores even if passivation is not enabled). > Exactly, we need to iterate all the entries from the stores if we want something remotely accurate (although I had forgotten about expiration also being a problem). Otherwise we could just leave size() as it is now, it's pretty fast :) > > Radim > > [1] https://issues.jboss.org/browse/ISPN-3202 > > On 10/08/2014 04:42 PM, William Burns wrote: > > So it seems we would want to change this for 7.0 if possible since it > > would be a bigger change for something like 7.1 and 8.0 would be even > > further out. I should be able to put this together for CR2. > > > > It seems that we want to implement keySet, values and entrySet methods > > using the entry iterator approach. > > > > It is however unclear for the size method if we want to use MR entry > > counting and not worry about the rehash and passivation issues since > > it is just an estimation anyways. Or if we want to also use the entry > > iterator which should be closer approximation but will require more > > network overhead and memory usage. > > > > Also we didn't really talk about the fact that these methods would > > ignore ongoing transactions and if that is a concern or not. > > > > - Will > > > > On Wed, Oct 8, 2014 at 10:13 AM, Mircea Markus > wrote: > >> On Oct 8, 2014, at 15:11, Dan Berindei wrote: > >> > >>> On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus > wrote: > >>> On Oct 3, 2014, at 9:30, Radim Vansa wrote: > >>> > >>>> Hi, > >>>> > >>>> recently we had a discussion about what size() returns, but I've > >>>> realized there are more things that users would like to know. My > >>>> question is whether you think that they would really appreciate it, or > >>>> whether it's just my QA point of view where I sometimes compute the > >>>> 'checksums' of cache to see if I didn't lost anything. > >>>> > >>>> There are those sizes: > >>>> A) number of owned entries > >>>> B) number of entries stored locally in memory > >>>> C) number of entries stored in each local cache store > >>>> D) number of entries stored in each shared cache store > >>>> E) total number of entries in cache > >>>> > >>>> So far, we can get > >>>> B via withFlags(SKIP_CACHE_LOAD).size() > >>>> (passivation ? B : 0) + firstNonZero(C, D) via size() > >>>> E via distributed iterators / MR > >>>> A via data container iteration + distribution manager query, but only > >>>> without cache store > >>>> C or D through > >>>> > getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() > >>>> > >>>> I think that it would go along with users' expectations if size() > >>>> returned E and for the rest we should have special methods on > >>>> AdvancedCache. That would of course change the meaning of size(), but > >>>> I'd say that finally to something that has firm meaning. > >>>> > >>>> WDYT? > >>> There was a lot of arguments in past whether size() and other methods > that operate over all the elements (keySet, values) are useful because: > >>> - they are approximate (data changes during iteration) > >>> - they are very resource consuming and might be miss-used (this is the > reason we chosen to use size() with its current local semantic) > >>> > >>> These methods (size, keys, values) are useful for people and I think > we were not wise to implement them only on top of the local data: this is > like preferring efficiency over correctness. This also created a lot of > confusion with our users, question like size() doesn't return the correct > value being asked regularly. I totally agree that size() returns E (i.e. > everything that is stored within the grid, including persistence) and it's > performance implications to be documented accordingly. For keySet and > values - we should stop implementing them (throw exception) and point users > to Will's distributed iterator which is a nicer way to achieve the desired > behavior. > >>> > >>> We can also implement keySet() and values() on top of the distributed > entry iterator and document that using the iterator directly is better. > >> Yes, that's what I meant as well. > >> > >> Cheers, > >> -- > >> Mircea Markus > >> Infinispan lead (www.infinispan.org) > >> > >> > >> > >> > >> > >> _______________________________________________ > >> infinispan-dev mailing list > >> infinispan-dev at lists.jboss.org > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Radim Vansa > JBoss DataGrid QA > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20141008/2e3cacec/attachment-0001.html From mudokonman at gmail.com Wed Oct 8 12:53:42 2014 From: mudokonman at gmail.com (William Burns) Date: Wed, 8 Oct 2014 12:53:42 -0400 Subject: [infinispan-dev] About size() In-Reply-To: References: <542E5E92.7060504@redhat.com> <3EA0122E-8293-49EB-8CB7-F67FA2E58532@redhat.com> <593A31AE-2C9C-4B90-9048-0EDBADCA1ADF@redhat.com> <5435560E.2030206@redhat.com> Message-ID: On Wed, Oct 8, 2014 at 12:41 PM, Dan Berindei wrote: > > > On Wed, Oct 8, 2014 at 6:19 PM, Radim Vansa wrote: >> >> Users expect that size() will be constant-time (or linear to cluster >> size), and generally fast operation. I'd prefer to keep it that way. >> Though, even the MR way (used for HotRod size() now) needs to crawl >> through all the entries locally. > > > They might expect that, but there is nothing in the Map API suggesting it. > >> >> >> 'Heretic, not very well though of and changing too many things' idea: >> what about having data container segment-aware? Then you'd just bcast >> SizeCommand with given topologyId and sum up sizes of primary-owned >> segments... It's not a complete solution, but at least that would enable >> to get the number of locally owned entries quite fast. Though, you can't >> do that easily with cache stores (without changing SPI). > > > We could create a separate DataContainer for each segment. But would it > really be worth the trouble? I don't know of anyone using size() for > something other than checking that their data was properly loaded into the > cache, and they don't need a super-fast size() for that. Having a DataContainer per segment would actually reduce required memory usage for the distributed iterator as well, since we can query data by segment much more efficiently and close out segments one by one per node instead of having to keep multiple open at once. When I asked about this before it was kind of a we can deal with it later kind thing. I would think this would increase ST operation time as well. > >> >> >> Regarding cache stores, IMO we're damned anyway: when calling >> cacheStore.size(), it can report more entries as those haven't been >> expired yet, it can report less entries as those can be expired due to >> [1]. Or, we'll enumerate all the entries, and that's going to be slow >> (btw., [1] reminded me that we should enumerate both datacontainer AND >> cachestores even if passivation is not enabled). > > > Exactly, we need to iterate all the entries from the stores if we want > something remotely accurate (although I had forgotten about expiration also > being a problem). Otherwise we could just leave size() as it is now, it's > pretty fast :) > >> >> >> Radim >> >> [1] https://issues.jboss.org/browse/ISPN-3202 >> >> On 10/08/2014 04:42 PM, William Burns wrote: >> > So it seems we would want to change this for 7.0 if possible since it >> > would be a bigger change for something like 7.1 and 8.0 would be even >> > further out. I should be able to put this together for CR2. >> > >> > It seems that we want to implement keySet, values and entrySet methods >> > using the entry iterator approach. >> > >> > It is however unclear for the size method if we want to use MR entry >> > counting and not worry about the rehash and passivation issues since >> > it is just an estimation anyways. Or if we want to also use the entry >> > iterator which should be closer approximation but will require more >> > network overhead and memory usage. >> > >> > Also we didn't really talk about the fact that these methods would >> > ignore ongoing transactions and if that is a concern or not. >> > >> > - Will >> > >> > On Wed, Oct 8, 2014 at 10:13 AM, Mircea Markus >> > wrote: >> >> On Oct 8, 2014, at 15:11, Dan Berindei wrote: >> >> >> >>> On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus >> >>> wrote: >> >>> On Oct 3, 2014, at 9:30, Radim Vansa wrote: >> >>> >> >>>> Hi, >> >>>> >> >>>> recently we had a discussion about what size() returns, but I've >> >>>> realized there are more things that users would like to know. My >> >>>> question is whether you think that they would really appreciate it, >> >>>> or >> >>>> whether it's just my QA point of view where I sometimes compute the >> >>>> 'checksums' of cache to see if I didn't lost anything. >> >>>> >> >>>> There are those sizes: >> >>>> A) number of owned entries >> >>>> B) number of entries stored locally in memory >> >>>> C) number of entries stored in each local cache store >> >>>> D) number of entries stored in each shared cache store >> >>>> E) total number of entries in cache >> >>>> >> >>>> So far, we can get >> >>>> B via withFlags(SKIP_CACHE_LOAD).size() >> >>>> (passivation ? B : 0) + firstNonZero(C, D) via size() >> >>>> E via distributed iterators / MR >> >>>> A via data container iteration + distribution manager query, but only >> >>>> without cache store >> >>>> C or D through >> >>>> >> >>>> getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() >> >>>> >> >>>> I think that it would go along with users' expectations if size() >> >>>> returned E and for the rest we should have special methods on >> >>>> AdvancedCache. That would of course change the meaning of size(), but >> >>>> I'd say that finally to something that has firm meaning. >> >>>> >> >>>> WDYT? >> >>> There was a lot of arguments in past whether size() and other methods >> >>> that operate over all the elements (keySet, values) are useful because: >> >>> - they are approximate (data changes during iteration) >> >>> - they are very resource consuming and might be miss-used (this is the >> >>> reason we chosen to use size() with its current local semantic) >> >>> >> >>> These methods (size, keys, values) are useful for people and I think >> >>> we were not wise to implement them only on top of the local data: this is >> >>> like preferring efficiency over correctness. This also created a lot of >> >>> confusion with our users, question like size() doesn't return the correct >> >>> value being asked regularly. I totally agree that size() returns E (i.e. >> >>> everything that is stored within the grid, including persistence) and it's >> >>> performance implications to be documented accordingly. For keySet and values >> >>> - we should stop implementing them (throw exception) and point users to >> >>> Will's distributed iterator which is a nicer way to achieve the desired >> >>> behavior. >> >>> >> >>> We can also implement keySet() and values() on top of the distributed >> >>> entry iterator and document that using the iterator directly is better. >> >> Yes, that's what I meant as well. >> >> >> >> Cheers, >> >> -- >> >> Mircea Markus >> >> Infinispan lead (www.infinispan.org) >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> >> infinispan-dev mailing list >> >> infinispan-dev at lists.jboss.org >> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> > _______________________________________________ >> > infinispan-dev mailing list >> > infinispan-dev at lists.jboss.org >> > https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> -- >> Radim Vansa >> JBoss DataGrid QA >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From dan.berindei at gmail.com Thu Oct 9 02:48:38 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Thu, 9 Oct 2014 09:48:38 +0300 Subject: [infinispan-dev] About size() In-Reply-To: References: <542E5E92.7060504@redhat.com> <3EA0122E-8293-49EB-8CB7-F67FA2E58532@redhat.com> <593A31AE-2C9C-4B90-9048-0EDBADCA1ADF@redhat.com> <5435560E.2030206@redhat.com> Message-ID: On Wed, Oct 8, 2014 at 7:53 PM, William Burns wrote: > On Wed, Oct 8, 2014 at 12:41 PM, Dan Berindei > wrote: > > > > > > On Wed, Oct 8, 2014 at 6:19 PM, Radim Vansa wrote: > >> > >> Users expect that size() will be constant-time (or linear to cluster > >> size), and generally fast operation. I'd prefer to keep it that way. > >> Though, even the MR way (used for HotRod size() now) needs to crawl > >> through all the entries locally. > > > > > > They might expect that, but there is nothing in the Map API suggesting > it. > > > >> > >> > >> 'Heretic, not very well though of and changing too many things' idea: > >> what about having data container segment-aware? Then you'd just bcast > >> SizeCommand with given topologyId and sum up sizes of primary-owned > >> segments... It's not a complete solution, but at least that would enable > >> to get the number of locally owned entries quite fast. Though, you can't > >> do that easily with cache stores (without changing SPI). > > > > > > We could create a separate DataContainer for each segment. But would it > > really be worth the trouble? I don't know of anyone using size() for > > something other than checking that their data was properly loaded into > the > > cache, and they don't need a super-fast size() for that. > > Having a DataContainer per segment would actually reduce required > memory usage for the distributed iterator as well, since we can query > data by segment much more efficiently and close out segments one by > one per node instead of having to keep multiple open at once. When I > asked about this before it was kind of a we can deal with it later > kind thing. I would think this would increase ST operation time as > well. > You mean it would improve ST performance, because it wouldn't have to compute the hash of each key in the data container? I don't think we have ever considered splitting the data container for ST, as it didn't seem worth the trouble. OTOH we wanted to add a segment-based query to the cache loader SPI every since we started designing NBST :) > > > >> > >> > >> Regarding cache stores, IMO we're damned anyway: when calling > >> cacheStore.size(), it can report more entries as those haven't been > >> expired yet, it can report less entries as those can be expired due to > >> [1]. Or, we'll enumerate all the entries, and that's going to be slow > >> (btw., [1] reminded me that we should enumerate both datacontainer AND > >> cachestores even if passivation is not enabled). > > > > > > Exactly, we need to iterate all the entries from the stores if we want > > something remotely accurate (although I had forgotten about expiration > also > > being a problem). Otherwise we could just leave size() as it is now, it's > > pretty fast :) > > > >> > >> > >> Radim > >> > >> [1] https://issues.jboss.org/browse/ISPN-3202 > >> > >> On 10/08/2014 04:42 PM, William Burns wrote: > >> > So it seems we would want to change this for 7.0 if possible since it > >> > would be a bigger change for something like 7.1 and 8.0 would be even > >> > further out. I should be able to put this together for CR2. > >> > > >> > It seems that we want to implement keySet, values and entrySet methods > >> > using the entry iterator approach. > >> > > >> > It is however unclear for the size method if we want to use MR entry > >> > counting and not worry about the rehash and passivation issues since > >> > it is just an estimation anyways. Or if we want to also use the entry > >> > iterator which should be closer approximation but will require more > >> > network overhead and memory usage. > >> > > >> > Also we didn't really talk about the fact that these methods would > >> > ignore ongoing transactions and if that is a concern or not. > >> > > >> > - Will > >> > > >> > On Wed, Oct 8, 2014 at 10:13 AM, Mircea Markus > >> > wrote: > >> >> On Oct 8, 2014, at 15:11, Dan Berindei > wrote: > >> >> > >> >>> On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus > >> >>> wrote: > >> >>> On Oct 3, 2014, at 9:30, Radim Vansa wrote: > >> >>> > >> >>>> Hi, > >> >>>> > >> >>>> recently we had a discussion about what size() returns, but I've > >> >>>> realized there are more things that users would like to know. My > >> >>>> question is whether you think that they would really appreciate it, > >> >>>> or > >> >>>> whether it's just my QA point of view where I sometimes compute the > >> >>>> 'checksums' of cache to see if I didn't lost anything. > >> >>>> > >> >>>> There are those sizes: > >> >>>> A) number of owned entries > >> >>>> B) number of entries stored locally in memory > >> >>>> C) number of entries stored in each local cache store > >> >>>> D) number of entries stored in each shared cache store > >> >>>> E) total number of entries in cache > >> >>>> > >> >>>> So far, we can get > >> >>>> B via withFlags(SKIP_CACHE_LOAD).size() > >> >>>> (passivation ? B : 0) + firstNonZero(C, D) via size() > >> >>>> E via distributed iterators / MR > >> >>>> A via data container iteration + distribution manager query, but > only > >> >>>> without cache store > >> >>>> C or D through > >> >>>> > >> >>>> > getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() > >> >>>> > >> >>>> I think that it would go along with users' expectations if size() > >> >>>> returned E and for the rest we should have special methods on > >> >>>> AdvancedCache. That would of course change the meaning of size(), > but > >> >>>> I'd say that finally to something that has firm meaning. > >> >>>> > >> >>>> WDYT? > >> >>> There was a lot of arguments in past whether size() and other > methods > >> >>> that operate over all the elements (keySet, values) are useful > because: > >> >>> - they are approximate (data changes during iteration) > >> >>> - they are very resource consuming and might be miss-used (this is > the > >> >>> reason we chosen to use size() with its current local semantic) > >> >>> > >> >>> These methods (size, keys, values) are useful for people and I think > >> >>> we were not wise to implement them only on top of the local data: > this is > >> >>> like preferring efficiency over correctness. This also created a > lot of > >> >>> confusion with our users, question like size() doesn't return the > correct > >> >>> value being asked regularly. I totally agree that size() returns E > (i.e. > >> >>> everything that is stored within the grid, including persistence) > and it's > >> >>> performance implications to be documented accordingly. For keySet > and values > >> >>> - we should stop implementing them (throw exception) and point > users to > >> >>> Will's distributed iterator which is a nicer way to achieve the > desired > >> >>> behavior. > >> >>> > >> >>> We can also implement keySet() and values() on top of the > distributed > >> >>> entry iterator and document that using the iterator directly is > better. > >> >> Yes, that's what I meant as well. > >> >> > >> >> Cheers, > >> >> -- > >> >> Mircea Markus > >> >> Infinispan lead (www.infinispan.org) > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> _______________________________________________ > >> >> infinispan-dev mailing list > >> >> infinispan-dev at lists.jboss.org > >> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> > _______________________________________________ > >> > infinispan-dev mailing list > >> > infinispan-dev at lists.jboss.org > >> > https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> > >> > >> -- > >> Radim Vansa > >> JBoss DataGrid QA > >> > >> _______________________________________________ > >> infinispan-dev mailing list > >> infinispan-dev at lists.jboss.org > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20141009/d0a0c200/attachment-0001.html From rory.odonnell at oracle.com Thu Oct 9 05:27:16 2014 From: rory.odonnell at oracle.com (Rory O'Donnell Oracle, Dublin Ireland) Date: Thu, 09 Oct 2014 10:27:16 +0100 Subject: [infinispan-dev] Early Access builds for JDK 9 b33 and JDK 8u40 b09 are available on java.net Message-ID: <543654F4.1040605@oracle.com> Hi Galder, Early Access build for JDK 9 b33 is available on java.net, summary of changes are listed here Early Access build for JDK 8u40 b09 is available on java.net, summary of changes are listed here. Rgds,Rory -- Rgds,Rory O'Donnell Quality Engineering Manager Oracle EMEA , Dublin, Ireland -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20141009/6d8f30d7/attachment.html From emmanuel at hibernate.org Thu Oct 9 08:18:38 2014 From: emmanuel at hibernate.org (Emmanuel Bernard) Date: Thu, 9 Oct 2014 15:18:38 +0300 Subject: [infinispan-dev] TopologySafe Map / Reduce Message-ID: Pedro and I have been having discussions with the LEADS guys on their experience of Map / Reduce especially around stability during topology changes. This ties to the .size() thread you guys have been exchanging on (I only could read it partially). On the requirements, theirs is pretty straightforward and expected I think from most users. They are fine with inconsistencies with entries create/updated/deleted between the M/R start and the end. They are *not* fine with seeing the same key/value several time for the duration of the M/R execution. This AFAIK can happen when a topology change occurs. Here is a proposal. Why not run the M/R job not per node but rather per segment? The point is that segments are stable across topology changes. The M/R tasks would then be about iterating over the keys in a given segment. The M/R request would send the task per segments on each node where the segment is primary. (We can imagine interesting things like sending it to one of the backups for workload optimization purposes or sending it to both primary and backups and to comparisons). The M/R requester would be in an interesting situation. It could detect that a segment M/R never returns and trigger a new computation on another node than the one initially sent. One tricky question around that is when the M/R job store data in an intermediary state. We need some sort of way to expose the user indirectly to segments so that we can evict per segment intermediary caches in case of failure or retry. But before getting ahead of ourselves, what do you thing of the general idea? Even without retry framework, this approach would be more stable than our current per node approach during topology changes and improve dependability. Emmanuel From mudokonman at gmail.com Thu Oct 9 08:40:12 2014 From: mudokonman at gmail.com (William Burns) Date: Thu, 9 Oct 2014 08:40:12 -0400 Subject: [infinispan-dev] TopologySafe Map / Reduce In-Reply-To: References: Message-ID: Actually this was something I was hoping to get to possibly in the near future. I already have to do https://issues.jboss.org/browse/ISPN-4358 which will require rewriting parts of the distributed entry iterator. In doing so I was planning on breaking this out to a more generic framework where you could run a given operation by segment guaranteeing it was only ran once per entry. In doing so I was thinking I could try to move M/R on top of this to allow it to also be resilient to rehash events. Additional comments inline. On Thu, Oct 9, 2014 at 8:18 AM, Emmanuel Bernard wrote: > Pedro and I have been having discussions with the LEADS guys on their experience of Map / Reduce especially around stability during topology changes. > > This ties to the .size() thread you guys have been exchanging on (I only could read it partially). > > On the requirements, theirs is pretty straightforward and expected I think from most users. > They are fine with inconsistencies with entries create/updated/deleted between the M/R start and the end. There is no way we can fix this without adding a very strict isolation level like SERIALIZABLE. > They are *not* fine with seeing the same key/value several time for the duration of the M/R execution. This AFAIK can happen when a topology change occurs. This can happen if it was processed on one node and then rehash migrates the entry to another and runs it there. > > Here is a proposal. > Why not run the M/R job not per node but rather per segment? > The point is that segments are stable across topology changes. The M/R tasks would then be about iterating over the keys in a given segment. > > The M/R request would send the task per segments on each node where the segment is primary. This is exactly what the iterator does today but also watches for rehashes to send the request to a new owner when the segment moves between nodes. > (We can imagine interesting things like sending it to one of the backups for workload optimization purposes or sending it to both primary and backups and to comparisons). > The M/R requester would be in an interesting situation. It could detect that a segment M/R never returns and trigger a new computation on another node than the one initially sent. > > One tricky question around that is when the M/R job store data in an intermediary state. We need some sort of way to expose the user indirectly to segments so that we can evict per segment intermediary caches in case of failure or retry. This was one place I was thinking I would need to take special care to look into when doing a conversion like this. > > But before getting ahead of ourselves, what do you thing of the general idea? Even without retry framework, this approach would be more stable than our current per node approach during topology changes and improve dependability. Doing it solely based on segment would remove the possibility of having duplicates. However without a mechanism to send a new request on rehash it would be possible to only find a subset of values (if a segment is removed while iterating on it). > > Emmanuel > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From dan.berindei at gmail.com Thu Oct 9 09:41:39 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Thu, 9 Oct 2014 16:41:39 +0300 Subject: [infinispan-dev] TopologySafe Map / Reduce In-Reply-To: References:

Message-ID: On Thu, Oct 9, 2014 at 3:40 PM, William Burns wrote: > Actually this was something I was hoping to get to possibly in the near > future. > > I already have to do https://issues.jboss.org/browse/ISPN-4358 which > will require rewriting parts of the distributed entry iterator. In > doing so I was planning on breaking this out to a more generic > framework where you could run a given operation by segment > guaranteeing it was only ran once per entry. In doing so I was > thinking I could try to move M/R on top of this to allow it to also be > resilient to rehash events. > > Additional comments inline. > > On Thu, Oct 9, 2014 at 8:18 AM, Emmanuel Bernard > wrote: > > Pedro and I have been having discussions with the LEADS guys on their > experience of Map / Reduce especially around stability during topology > changes. > > > > This ties to the .size() thread you guys have been exchanging on (I only > could read it partially). > > > > On the requirements, theirs is pretty straightforward and expected I > think from most users. > > They are fine with inconsistencies with entries create/updated/deleted > between the M/R start and the end. > > There is no way we can fix this without adding a very strict isolation > level like SERIALIZABLE. > > They are *not* fine with seeing the same key/value several time for the > duration of the M/R execution. This AFAIK can happen when a topology change > occurs. > > This can happen if it was processed on one node and then rehash > migrates the entry to another and runs it there. > > > > > Here is a proposal. > > Why not run the M/R job not per node but rather per segment? > > The point is that segments are stable across topology changes. The M/R > tasks would then be about iterating over the keys in a given segment. > > > > The M/R request would send the task per segments on each node where the > segment is primary. > > This is exactly what the iterator does today but also watches for > rehashes to send the request to a new owner when the segment moves > between nodes. > > > (We can imagine interesting things like sending it to one of the backups > for workload optimization purposes or sending it to both primary and > backups and to comparisons). > > The M/R requester would be in an interesting situation. It could detect > that a segment M/R never returns and trigger a new computation on another > node than the one initially sent. > > > > One tricky question around that is when the M/R job store data in an > intermediary state. We need some sort of way to expose the user indirectly > to segments so that we can evict per segment intermediary caches in case of > failure or retry. > > This was one place I was thinking I would need to take special care to > look into when doing a conversion like this. > I'd rather not expose this to the user. Instead, we could split the intermediary values for each key by the source segment, and do the invalidation of the retried segments in our M/R framework (e.g. when we detect that the primary owner at the start of the map/combine phase is not an owner at all at the end). I think we have another problem with the publishing of intermediary values not being idempotent. The default configuration for the intermediate cache is non-transactional, and retrying the put(delta) command after a topology change could add the same intermediate values twice. A transactional intermediary cache should be safe, though, because the tx won't commit on the old owner until the new owner knows about the tx. > > > > But before getting ahead of ourselves, what do you thing of the general > idea? Even without retry framework, this approach would be more stable than > our current per node approach during topology changes and improve > dependability. > > Doing it solely based on segment would remove the possibility of > having duplicates. However without a mechanism to send a new request > on rehash it would be possible to only find a subset of values (if a > segment is removed while iterating on it). > > > > > Emmanuel > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20141009/34d4e311/attachment.html From mmarkus at redhat.com Thu Oct 9 11:46:44 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Thu, 9 Oct 2014 16:46:44 +0100 Subject: [infinispan-dev] About size() In-Reply-To: <5435560E.2030206@redhat.com> References: <542E5E92.7060504@redhat.com> <3EA0122E-8293-49EB-8CB7-F67FA2E58532@redhat.com> <593A31AE-2C9C-4B90-9048-0EDBADCA1ADF@redhat.com> <5435560E.2030206@redhat.com> Message-ID: <081C49F0-4B51-4486-BD7C-19926E5A9178@redhat.com> On Oct 8, 2014, at 16:19, Radim Vansa wrote: > Users expect that size() will be constant-time (or linear to cluster > size), and generally fast operation. I'd prefer to keep it that way. > Though, even the MR way (used for HotRod size() now) needs to crawl > through all the entries locally. yes, but first of all they expect size to be correct, then fast. > > 'Heretic, not very well though of and changing too many things' idea: > what about having data container segment-aware? Then you'd just bcast > SizeCommand with given topologyId and sum up sizes of primary-owned > segments... It's not a complete solution, but at least that would enable > to get the number of locally owned entries quite fast. Though, you can't > do that easily with cache stores (without changing SPI). that would help and there were discussions to do this for other reasons as well: e.g. ST would migrate the data without iterating over the state in the DC. Not doable in the scope of ISPN 7.0, though. > > Regarding cache stores, IMO we're damned anyway: when calling > cacheStore.size(), it can report more entries as those haven't been > expired yet, it can report less entries as those can be expired due to > [1]. Or, we'll enumerate all the entries, and that's going to be slow > (btw., [1] reminded me that we should enumerate both datacontainer AND > cachestores even if passivation is not enabled). > > Radim > > [1] https://issues.jboss.org/browse/ISPN-3202 > > On 10/08/2014 04:42 PM, William Burns wrote: >> So it seems we would want to change this for 7.0 if possible since it >> would be a bigger change for something like 7.1 and 8.0 would be even >> further out. I should be able to put this together for CR2. >> >> It seems that we want to implement keySet, values and entrySet methods >> using the entry iterator approach. >> >> It is however unclear for the size method if we want to use MR entry >> counting and not worry about the rehash and passivation issues since >> it is just an estimation anyways. Or if we want to also use the entry >> iterator which should be closer approximation but will require more >> network overhead and memory usage. >> >> Also we didn't really talk about the fact that these methods would >> ignore ongoing transactions and if that is a concern or not. >> >> - Will >> >> On Wed, Oct 8, 2014 at 10:13 AM, Mircea Markus wrote: >>> On Oct 8, 2014, at 15:11, Dan Berindei wrote: >>> >>>> On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus wrote: >>>> On Oct 3, 2014, at 9:30, Radim Vansa wrote: >>>> >>>>> Hi, >>>>> >>>>> recently we had a discussion about what size() returns, but I've >>>>> realized there are more things that users would like to know. My >>>>> question is whether you think that they would really appreciate it, or >>>>> whether it's just my QA point of view where I sometimes compute the >>>>> 'checksums' of cache to see if I didn't lost anything. >>>>> >>>>> There are those sizes: >>>>> A) number of owned entries >>>>> B) number of entries stored locally in memory >>>>> C) number of entries stored in each local cache store >>>>> D) number of entries stored in each shared cache store >>>>> E) total number of entries in cache >>>>> >>>>> So far, we can get >>>>> B via withFlags(SKIP_CACHE_LOAD).size() >>>>> (passivation ? B : 0) + firstNonZero(C, D) via size() >>>>> E via distributed iterators / MR >>>>> A via data container iteration + distribution manager query, but only >>>>> without cache store >>>>> C or D through >>>>> getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() >>>>> >>>>> I think that it would go along with users' expectations if size() >>>>> returned E and for the rest we should have special methods on >>>>> AdvancedCache. That would of course change the meaning of size(), but >>>>> I'd say that finally to something that has firm meaning. >>>>> >>>>> WDYT? >>>> There was a lot of arguments in past whether size() and other methods that operate over all the elements (keySet, values) are useful because: >>>> - they are approximate (data changes during iteration) >>>> - they are very resource consuming and might be miss-used (this is the reason we chosen to use size() with its current local semantic) >>>> >>>> These methods (size, keys, values) are useful for people and I think we were not wise to implement them only on top of the local data: this is like preferring efficiency over correctness. This also created a lot of confusion with our users, question like size() doesn't return the correct value being asked regularly. I totally agree that size() returns E (i.e. everything that is stored within the grid, including persistence) and it's performance implications to be documented accordingly. For keySet and values - we should stop implementing them (throw exception) and point users to Will's distributed iterator which is a nicer way to achieve the desired behavior. >>>> >>>> We can also implement keySet() and values() on top of the distributed entry iterator and document that using the iterator directly is better. >>> Yes, that's what I meant as well. >>> >>> Cheers, >>> -- >>> Mircea Markus >>> Infinispan lead (www.infinispan.org) >>> >>> >>> >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Radim Vansa > JBoss DataGrid QA > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Thu Oct 9 11:47:25 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Thu, 9 Oct 2014 16:47:25 +0100 Subject: [infinispan-dev] About size() In-Reply-To: References: <542E5E92.7060504@redhat.com> <3EA0122E-8293-49EB-8CB7-F67FA2E58532@redhat.com> <593A31AE-2C9C-4B90-9048-0EDBADCA1ADF@redhat.com> Message-ID: <45C26FF2-348C-4C3D-B030-AAAB81B725EF@redhat.com> On Oct 8, 2014, at 15:42, William Burns wrote: > So it seems we would want to change this for 7.0 if possible since it > would be a bigger change for something like 7.1 and 8.0 would be even > further out. I should be able to put this together for CR2. +1, plese go for it. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Thu Oct 9 11:48:49 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Thu, 9 Oct 2014 16:48:49 +0100 Subject: [infinispan-dev] About size() In-Reply-To: References: <542E5E92.7060504@redhat.com> <3EA0122E-8293-49EB-8CB7-F67FA2E58532@redhat.com> <593A31AE-2C9C-4B90-9048-0EDBADCA1ADF@redhat.com>

Message-ID: <5C898CCC-55F0-4556-883B-B5EC4F2A62C5@redhat.com> On Oct 8, 2014, at 15:57, Dan Berindei wrote: > On Wed, Oct 8, 2014 at 5:42 PM, William Burns wrote: > So it seems we would want to change this for 7.0 if possible since it > would be a bigger change for something like 7.1 and 8.0 would be even > further out. I should be able to put this together for CR2. > > I'm not 100% convinced that we need it for 7.x. For 8.0 I would recommend removing the size() method altogether, and providing some looser "statistics" instead. 8.0 will happen in ?1 year's time, this is a small change for the better so +1 to have it in for CR2. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From pedro at infinispan.org Thu Oct 9 17:13:13 2014 From: pedro at infinispan.org (Pedro Ruivo) Date: Fri, 10 Oct 2014 00:13:13 +0300 Subject: [infinispan-dev] TopologySafe Map / Reduce In-Reply-To: References:

Message-ID: <5436FA69.6030300@infinispan.org> On 10/09/2014 03:40 PM, William Burns wrote: > Actually this was something I was hoping to get to possibly in the near future. > > I already have to do https://issues.jboss.org/browse/ISPN-4358 which > will require rewriting parts of the distributed entry iterator. In > doing so I was planning on breaking this out to a more generic > framework where you could run a given operation by segment > guaranteeing it was only ran once per entry. In doing so I was > thinking I could try to move M/R on top of this to allow it to also be > resilient to rehash events. > > Additional comments inline. > > On Thu, Oct 9, 2014 at 8:18 AM, Emmanuel Bernard wrote: >> Pedro and I have been having discussions with the LEADS guys on their experience of Map / Reduce especially around stability during topology changes. >> >> This ties to the .size() thread you guys have been exchanging on (I only could read it partially). >> >> On the requirements, theirs is pretty straightforward and expected I think from most users. >> They are fine with inconsistencies with entries create/updated/deleted between the M/R start and the end. > > There is no way we can fix this without adding a very strict isolation > level like SERIALIZABLE. Snapshot Isolation should be fine, but I don't wanna enter in discussion about it right now :) > >> They are *not* fine with seeing the same key/value several time for the duration of the M/R execution. This AFAIK can happen when a topology change occurs. > > This can happen if it was processed on one node and then rehash > migrates the entry to another and runs it there. > >> >> Here is a proposal. >> Why not run the M/R job not per node but rather per segment? >> The point is that segments are stable across topology changes. The M/R tasks would then be about iterating over the keys in a given segment. >> >> The M/R request would send the task per segments on each node where the segment is primary. > > This is exactly what the iterator does today but also watches for > rehashes to send the request to a new owner when the segment moves > between nodes. > >> (We can imagine interesting things like sending it to one of the backups for workload optimization purposes or sending it to both primary and backups and to comparisons). >> The M/R requester would be in an interesting situation. It could detect that a segment M/R never returns and trigger a new computation on another node than the one initially sent. >> >> One tricky question around that is when the M/R job store data in an intermediary state. We need some sort of way to expose the user indirectly to segments so that we can evict per segment intermediary caches in case of failure or retry. > > This was one place I was thinking I would need to take special care to > look into when doing a conversion like this. > >> >> But before getting ahead of ourselves, what do you thing of the general idea? Even without retry framework, this approach would be more stable than our current per node approach during topology changes and improve dependability. > > Doing it solely based on segment would remove the possibility of > having duplicates. However without a mechanism to send a new request > on rehash it would be possible to only find a subset of values (if a > segment is removed while iterating on it). true. I think the retry mechanism is the best approach. other alternative, would be to implement a Map getBySegment(int) operations that goes remote if the segment is not local. > >> >> Emmanuel >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > From pedro at infinispan.org Thu Oct 9 17:16:06 2014 From: pedro at infinispan.org (Pedro Ruivo) Date: Fri, 10 Oct 2014 00:16:06 +0300 Subject: [infinispan-dev] TopologySafe Map / Reduce In-Reply-To: References:

Message-ID: <5436FB16.3000003@infinispan.org> On 10/09/2014 04:41 PM, Dan Berindei wrote: > > > On Thu, Oct 9, 2014 at 3:40 PM, William Burns > wrote: > > Actually this was something I was hoping to get to possibly in the > near future. > > I already have to do https://issues.jboss.org/browse/ISPN-4358 which > will require rewriting parts of the distributed entry iterator. In > doing so I was planning on breaking this out to a more generic > framework where you could run a given operation by segment > guaranteeing it was only ran once per entry. In doing so I was > thinking I could try to move M/R on top of this to allow it to also be > resilient to rehash events. > > Additional comments inline. > > On Thu, Oct 9, 2014 at 8:18 AM, Emmanuel Bernard > > wrote: > > Pedro and I have been having discussions with the LEADS guys on their experience of Map / Reduce especially around stability during topology changes. > > > > This ties to the .size() thread you guys have been exchanging on (I only could read it partially). > > > > On the requirements, theirs is pretty straightforward and expected I think from most users. > > They are fine with inconsistencies with entries create/updated/deleted between the M/R start and the end. > > There is no way we can fix this without adding a very strict isolation > level like SERIALIZABLE. > > > > They are *not* fine with seeing the same key/value several time for the duration of the M/R execution. This AFAIK can happen when a topology change occurs. > > This can happen if it was processed on one node and then rehash > migrates the entry to another and runs it there. > > > > > Here is a proposal. > > Why not run the M/R job not per node but rather per segment? > > The point is that segments are stable across topology changes. The M/R tasks would then be about iterating over the keys in a given segment. > > > > The M/R request would send the task per segments on each node where the segment is primary. > > This is exactly what the iterator does today but also watches for > rehashes to send the request to a new owner when the segment moves > between nodes. > > > (We can imagine interesting things like sending it to one of the backups for workload optimization purposes or sending it to both primary and backups and to comparisons). > > The M/R requester would be in an interesting situation. It could detect that a segment M/R never returns and trigger a new computation on another node than the one initially sent. > > > > One tricky question around that is when the M/R job store data in an intermediary state. We need some sort of way to expose the user indirectly to segments so that we can evict per segment intermediary caches in case of failure or retry. > > This was one place I was thinking I would need to take special care to > look into when doing a conversion like this. > > > I'd rather not expose this to the user. Instead, we could split the > intermediary values for each key by the source segment, and do the > invalidation of the retried segments in our M/R framework (e.g. when we > detect that the primary owner at the start of the map/combine phase is > not an owner at all at the end). > > I think we have another problem with the publishing of intermediary > values not being idempotent. The default configuration for the > intermediate cache is non-transactional, and retrying the put(delta) > command after a topology change could add the same intermediate values > twice. A transactional intermediary cache should be safe, though, > because the tx won't commit on the old owner until the new owner knows > about the tx. can you elaborate on it? anyway, I think the retry mechanism should solve it. If we detect a topology change (during the iteration of segment _i_) and the segment _i_ is moved, then we can cancel the iteration, remove all the intermediate values generated in segment _i_ and restart (on the primary owner). > > > > > > But before getting ahead of ourselves, what do you thing of the general idea? Even without retry framework, this approach would be more stable than our current per node approach during topology changes and improve dependability. > > Doing it solely based on segment would remove the possibility of > having duplicates. However without a mechanism to send a new request > on rehash it would be possible to only find a subset of values (if a > segment is removed while iterating on it). > > > > > Emmanuel > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > From dan.berindei at gmail.com Fri Oct 10 03:03:37 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Fri, 10 Oct 2014 10:03:37 +0300 Subject: [infinispan-dev] TopologySafe Map / Reduce In-Reply-To: <5436FB16.3000003@infinispan.org> References:

<5436FB16.3000003@infinispan.org> Message-ID: On Fri, Oct 10, 2014 at 12:16 AM, Pedro Ruivo wrote: > > > On 10/09/2014 04:41 PM, Dan Berindei wrote: > > > > > > On Thu, Oct 9, 2014 at 3:40 PM, William Burns > > wrote: > > > > Actually this was something I was hoping to get to possibly in the > > near future. > > > > I already have to do https://issues.jboss.org/browse/ISPN-4358 which > > will require rewriting parts of the distributed entry iterator. In > > doing so I was planning on breaking this out to a more generic > > framework where you could run a given operation by segment > > guaranteeing it was only ran once per entry. In doing so I was > > thinking I could try to move M/R on top of this to allow it to also > be > > resilient to rehash events. > > > > Additional comments inline. > > > > On Thu, Oct 9, 2014 at 8:18 AM, Emmanuel Bernard > > > wrote: > > > Pedro and I have been having discussions with the LEADS guys on > their experience of Map / Reduce especially around stability during > topology changes. > > > > > > This ties to the .size() thread you guys have been exchanging on > (I only could read it partially). > > > > > > On the requirements, theirs is pretty straightforward and expected > I think from most users. > > > They are fine with inconsistencies with entries > create/updated/deleted between the M/R start and the end. > > > > There is no way we can fix this without adding a very strict > isolation > > level like SERIALIZABLE. > > > > > > > They are *not* fine with seeing the same key/value several time > for the duration of the M/R execution. This AFAIK can happen when a > topology change occurs. > > > > This can happen if it was processed on one node and then rehash > > migrates the entry to another and runs it there. > > > > > > > > Here is a proposal. > > > Why not run the M/R job not per node but rather per segment? > > > The point is that segments are stable across topology changes. The > M/R tasks would then be about iterating over the keys in a given segment. > > > > > > The M/R request would send the task per segments on each node > where the segment is primary. > > > > This is exactly what the iterator does today but also watches for > > rehashes to send the request to a new owner when the segment moves > > between nodes. > > > > > (We can imagine interesting things like sending it to one of the > backups for workload optimization purposes or sending it to both primary > and backups and to comparisons). > > > The M/R requester would be in an interesting situation. It could > detect that a segment M/R never returns and trigger a new computation on > another node than the one initially sent. > > > > > > One tricky question around that is when the M/R job store data in > an intermediary state. We need some sort of way to expose the user > indirectly to segments so that we can evict per segment intermediary caches > in case of failure or retry. > > > > This was one place I was thinking I would need to take special care > to > > look into when doing a conversion like this. > > > > > > I'd rather not expose this to the user. Instead, we could split the > > intermediary values for each key by the source segment, and do the > > invalidation of the retried segments in our M/R framework (e.g. when we > > detect that the primary owner at the start of the map/combine phase is > > not an owner at all at the end). > > > > I think we have another problem with the publishing of intermediary > > values not being idempotent. The default configuration for the > > intermediate cache is non-transactional, and retrying the put(delta) > > command after a topology change could add the same intermediate values > > twice. A transactional intermediary cache should be safe, though, > > because the tx won't commit on the old owner until the new owner knows > > about the tx. > > can you elaborate on it? > say we have a cache with numOwners=2, owners(k) = [A, B] C will become the primary owner of k, but for now owners(k) = [A, B, C] O sends put(delta) to A (the primary) A sends put(delta) to B, C B sees a topology change (owners(k) = [C, B]), doesn't apply the delta and replies with an OutdatedTopologyException C applies the delta A resends put(delta) to C (new primary) C sends put(delta) to B, applies the delta again I think it could be solved with versions, I just wanted to point out that we don't do that now. > > anyway, I think the retry mechanism should solve it. If we detect a > topology change (during the iteration of segment _i_) and the segment > _i_ is moved, then we can cancel the iteration, remove all the > intermediate values generated in segment _i_ and restart (on the primary > owner). > The problem is that the intermediate keys aren't in the same segment: we want the reduce phase to access only keys local to the reducing node, and keys in different input segments can yield values for the same intermediate key. So like you say, we'd have to retry on every topology change in the intermediary cache, not just the ones affecting segment _i_. There's another complication: in the scenario above, O may only get the topology update with owners(k) = [C, B] after the map/combine phase completed. So the originator of the M/R job would have to watch for topology changes seen by any node, and invalidate/retry any input segments that could have been affected. All that without slowing down the no-topology-change case too much... > > > > > > > > > But before getting ahead of ourselves, what do you thing of the > general idea? Even without retry framework, this approach would be more > stable than our current per node approach during topology changes and > improve dependability. > > > > Doing it solely based on segment would remove the possibility of > > having duplicates. However without a mechanism to send a new request > > on rehash it would be possible to only find a subset of values (if a > > segment is removed while iterating on it). > > > > > > > > Emmanuel > > > _______________________________________________ > > > infinispan-dev mailing list > > > infinispan-dev at lists.jboss.org > > > > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org infinispan-dev at lists.jboss.org> > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > > > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20141010/b324ae41/attachment-0001.html From rory.odonnell at oracle.com Fri Oct 10 06:01:36 2014 From: rory.odonnell at oracle.com (Rory O'Donnell Oracle, Dublin Ireland) Date: Fri, 10 Oct 2014 11:01:36 +0100 Subject: [infinispan-dev] Analysis of infinispan-6.0.2 dependency on JDK-Internal APIs In-Reply-To: <54227EF9.6090403@oracle.com> References: <54227901.8020509@oracle.com> <54227EF9.6090403@oracle.com> Message-ID: <5437AE80.6050302@oracle.com> Hi Galder, Did you have time to review the report, any feedback ? Rgds,Rory On 24/09/2014 09:21, Rory O'Donnell Oracle, Dublin Ireland wrote: > Below is a text output of the report for infinispan-6.0.2. > > Rgds,Rory > > ------------------------------------------------------------------------ > > > JDK Internal API Usage Report for infinispan-6.0.2.Final-all > > The OpenJDK Quality Outreach campaign has run a compatibility report > to identify usage of JDK-internal APIs. Usage of these JDK-internal > APIs could pose compatibility issues, as the Java team explained in > 1996 > . > We have created this report to help you identify which JDK-internal > APIs your project uses, what to use instead, and where those changes > should go. Making these changes will improve your compatibility, and > in some cases give better performance. > > Migrating away from the JDK-internal APIs now will give your team > adequate time for testing before the release of JDK 9. If you are > unable to migrate away from an internal API, please provide us with an > explanation below to help us understand it better. As a reminder, > supported APIs are determined by the OpenJDK's Java Community Process > and not by Oracle. > > This report was generated by jdeps > > through static analysis of artifacts: it does not identify any usage > of those APIs through reflection or dynamic bytecode. You may also run > jdeps on your own > > if you would prefer. > > Summary of the analysis of the jar files within > infinispan-6.0.2.Final-all: > > * Numer of jar files depending on JDK-internal APIs: 10 > * Internal APIs that have known replacements: 0 > * Internal APIs that have no supported replacements: 73 > > > APIs that have known replacements > : > > ID Replace Usage of With Inside > > > JDK-internal APIs without supported replacements: > > ID Internal APIs (do not use) Used by > 1 com.sun.org.apache.xml.internal.utils.PrefixResolver > > * lib/freemarker-2.3.11.jar > > Explanation... > 2 com.sun.org.apache.xpath.internal.XPath > > * lib/freemarker-2.3.11.jar > > Explanation... > 3 com.sun.org.apache.xpath.internal.XPathContext > > * lib/freemarker-2.3.11.jar > > Explanation... > 4 com.sun.org.apache.xpath.internal.objects.XBoolean > > * lib/freemarker-2.3.11.jar > > Explanation... > 5 com.sun.org.apache.xpath.internal.objects.XNodeSet > > * lib/freemarker-2.3.11.jar > > Explanation... > 6 com.sun.org.apache.xpath.internal.objects.XNull > > * lib/freemarker-2.3.11.jar > > Explanation... > 7 com.sun.org.apache.xpath.internal.objects.XNumber > > * lib/freemarker-2.3.11.jar > > Explanation... > 8 com.sun.org.apache.xpath.internal.objects.XObject > > * lib/freemarker-2.3.11.jar > > Explanation... > 9 com.sun.org.apache.xpath.internal.objects.XString > > * lib/freemarker-2.3.11.jar > > Explanation... > 10 org.w3c.dom.html.HTMLAnchorElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 11 org.w3c.dom.html.HTMLAppletElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 12 org.w3c.dom.html.HTMLAreaElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 13 org.w3c.dom.html.HTMLBRElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 14 org.w3c.dom.html.HTMLBaseElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 15 org.w3c.dom.html.HTMLBaseFontElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 16 org.w3c.dom.html.HTMLBodyElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 17 org.w3c.dom.html.HTMLButtonElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 18 org.w3c.dom.html.HTMLCollection > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 19 org.w3c.dom.html.HTMLDListElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 20 org.w3c.dom.html.HTMLDirectoryElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 21 org.w3c.dom.html.HTMLDivElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 22 org.w3c.dom.html.HTMLDocument > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 23 org.w3c.dom.html.HTMLElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 24 org.w3c.dom.html.HTMLFieldSetElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 25 org.w3c.dom.html.HTMLFontElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 26 org.w3c.dom.html.HTMLFormElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 27 org.w3c.dom.html.HTMLFrameElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 28 org.w3c.dom.html.HTMLFrameSetElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 29 org.w3c.dom.html.HTMLHRElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 30 org.w3c.dom.html.HTMLHeadElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 31 org.w3c.dom.html.HTMLHeadingElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 32 org.w3c.dom.html.HTMLHtmlElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 33 org.w3c.dom.html.HTMLIFrameElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 34 org.w3c.dom.html.HTMLImageElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 35 org.w3c.dom.html.HTMLInputElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 36 org.w3c.dom.html.HTMLIsIndexElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 37 org.w3c.dom.html.HTMLLIElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 38 org.w3c.dom.html.HTMLLabelElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 39 org.w3c.dom.html.HTMLLegendElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 40 org.w3c.dom.html.HTMLLinkElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 41 org.w3c.dom.html.HTMLMapElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 42 org.w3c.dom.html.HTMLMenuElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 43 org.w3c.dom.html.HTMLMetaElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 44 org.w3c.dom.html.HTMLModElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 45 org.w3c.dom.html.HTMLOListElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 46 org.w3c.dom.html.HTMLObjectElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 47 org.w3c.dom.html.HTMLOptGroupElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 48 org.w3c.dom.html.HTMLOptionElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 49 org.w3c.dom.html.HTMLParagraphElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 50 org.w3c.dom.html.HTMLParamElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 51 org.w3c.dom.html.HTMLPreElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 52 org.w3c.dom.html.HTMLQuoteElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 53 org.w3c.dom.html.HTMLScriptElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 54 org.w3c.dom.html.HTMLSelectElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 55 org.w3c.dom.html.HTMLStyleElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 56 org.w3c.dom.html.HTMLTableCaptionElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 57 org.w3c.dom.html.HTMLTableCellElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 58 org.w3c.dom.html.HTMLTableColElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 59 org.w3c.dom.html.HTMLTableElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 60 org.w3c.dom.html.HTMLTableRowElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 61 org.w3c.dom.html.HTMLTableSectionElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 62 org.w3c.dom.html.HTMLTextAreaElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 63 org.w3c.dom.html.HTMLTitleElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 64 org.w3c.dom.html.HTMLUListElement > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 65 org.w3c.dom.ranges.DocumentRange > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 66 org.w3c.dom.ranges.Range > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 67 org.w3c.dom.ranges.RangeException > > * lib/xercesImpl-2.9.1.jar > > Explanation... > 68 sun.misc.Signal > > * lib/aesh-0.33.7.jar > > Explanation... > 69 sun.misc.SignalHandler > > * lib/aesh-0.33.7.jar > > Explanation... > 70 sun.misc.Unsafe > > * lib/avro-1.7.5.jar > * lib/guava-12.0.jar > * lib/infinispan-commons-6.0.2.Final.jar > * lib/mvel2-2.0.12.jar > * lib/scala-library-2.10.2.jar > > Explanation... > 71 sun.nio.ch.FileChannelImpl > > * lib/leveldb-0.5.jar > > Explanation... > 72 sun.reflect.ReflectionFactory > > * lib/jboss-marshalling-1.4.4.Final.jar > > Explanation... > 73 sun.reflect.ReflectionFactory$GetReflectionFactoryAction > > * lib/jboss-marshalling-1.4.4.Final.jar > > Explanation... > > > Identify External Replacements > > You should use a separate third-party library that performs this > functionality. > > ID Internal API (grouped by package) Used By Identify External > Replacement > > > ------------------------------------------------------------------------ > > > On 24/09/2014 08:55, Rory O'Donnell Oracle, Dublin Ireland wrote: >> Hi Galder, >> >> As part of the preparations for JDK 9, Oracle?s engineers have been >> analyzing open source projects like yours to understand usage. One >> area of concern involves identifying compatibility problems, such as >> reliance on JDK-internal APIs. >> >> Our engineers have already prepared guidance on migrating some of the >> more common usage patterns of JDK-internal APIs to supported public >> interfaces. The list is on the OpenJDK wiki [0], along with >> instructions on how to run the jdeps analysis tool yourself . >> >> As part of the ongoing development of JDK 9, I would like to >> encourage migration from JDK-internal APIs towards the supported Java >> APIs. I have prepared a report for your project rele ase >> infinispan-6.0.2 based on the jdeps output. >> >> The report is attached to this e-mail. >> >> For anything where your migration path is unclear, I would appreciate >> comments on the JDK-internal API usage patterns in the attached jdeps >> report - in particular comments elaborating on the rationale for them >> - either to me or on this mailing list. >> >> Finding suitable replacements for unsupported interfaces is not >> always straightforward, which is why I am reaching out to you early >> in the JDK 9 development cycle so you can give feedback about new >> APIs that may be needed to facilitate this exercise. >> >> Thank you in advance for any efforts and feedback helping us make JDK >> 9 better. >> >> Rgds,Rory >> >> [0] >> https://wiki.openjdk.java.net/display/JDK8/Java+Dependency+Analysis+Tool >> >> >> -- >> Rgds,Rory O'Donnell >> Quality Engineering Manager >> Oracle EMEA , Dublin, Ireland >> >> >> >> > > -- > Rgds,Rory O'Donnell > Quality Engineering Manager > Oracle EMEA , Dublin, Ireland > > > > -- Rgds,Rory O'Donnell Quality Engineering Manager Oracle EMEA , Dublin, Ireland -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20141010/68d38cac/attachment-0001.html From dan.berindei at gmail.com Fri Oct 10 07:37:00 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Fri, 10 Oct 2014 14:37:00 +0300 Subject: [infinispan-dev] Analysis of infinispan-6.0.2 dependency on JDK-Internal APIs In-Reply-To: <5437AE80.6050302@oracle.com> References: <54227901.8020509@oracle.com> <54227EF9.6090403@oracle.com> <5437AE80.6050302@oracle.com> Message-ID: Hi Rory Galder is on PTO for another week, so I'll try to answer instead. We only use sun.misc.Unsafe directly, in order to implement a variation of Doug Lea's ConcurrentHashMapV8 that accepts a custom Equivalence (implementation of equality/hashCode). I guess we'll have to switch to AtomicFieldUpdaters if we want it to work with JDK 9, and possibly move to the volatile extensions once they are implemented. The rest of the internal class usages seem to be from our dependencies on WildFly, JBoss Marshalling, LevelDB, Smooks, and JBoss MicroContainer. Smooks and JBoss MicroContainer likely won't see any updates for JDK 9, but they're only used in the demos so they're not critical. JBoss Marshalling is used in the core, however, so we'll need a release from them before we can run anything on JDK 9. Cheers Dan On Fri, Oct 10, 2014 at 1:01 PM, Rory O'Donnell Oracle, Dublin Ireland < rory.odonnell at oracle.com> wrote: > Hi Galder, > > Did you have time to review the report, any feedback ? > > Rgds,Rory > > On 24/09/2014 09:21, Rory O'Donnell Oracle, Dublin Ireland wrote: > > Below is a text output of the report for infinispan-6.0.2. > > Rgds,Rory > > ------------------------------ > JDK Internal API Usage Report for infinispan-6.0.2.Final-all > > The OpenJDK Quality Outreach campaign has run a compatibility report to > identify usage of JDK-internal APIs. Usage of these JDK-internal APIs could > pose compatibility issues, as the Java team explained in 1996 > . We > have created this report to help you identify which JDK-internal APIs your > project uses, what to use instead, and where those changes should go. > Making these changes will improve your compatibility, and in some cases > give better performance. > > Migrating away from the JDK-internal APIs now will give your team adequate > time for testing before the release of JDK 9. If you are unable to migrate > away from an internal API, please provide us with an explanation below to > help us understand it better. As a reminder, supported APIs are determined > by the OpenJDK's Java Community Process and not by Oracle. > > This report was generated by jdeps > > through static analysis of artifacts: it does not identify any usage of > those APIs through reflection or dynamic bytecode. You may also run jdeps > on your own > > if you would prefer. > > Summary of the analysis of the jar files within > infinispan-6.0.2.Final-all: > > - Numer of jar files depending on JDK-internal APIs: 10 > - Internal APIs that have known replacements: 0 > - Internal APIs that have no supported replacements: 73 > > APIs that have known replacements > > : ID Replace Usage of With Inside JDK-internal APIs without supported > replacements: ID Internal APIs (do not use) Used by 1 > com.sun.org.apache.xml.internal.utils.PrefixResolver > > - lib/freemarker-2.3.11.jar > > Explanation... 2 com.sun.org.apache.xpath.internal.XPath > > - lib/freemarker-2.3.11.jar > > Explanation... 3 com.sun.org.apache.xpath.internal.XPathContext > > - lib/freemarker-2.3.11.jar > > Explanation... 4 com.sun.org.apache.xpath.internal.objects.XBoolean > > - lib/freemarker-2.3.11.jar > > Explanation... 5 com.sun.org.apache.xpath.internal.objects.XNodeSet > > - lib/freemarker-2.3.11.jar > > Explanation... 6 com.sun.org.apache.xpath.internal.objects.XNull > > - lib/freemarker-2.3.11.jar > > Explanation... 7 com.sun.org.apache.xpath.internal.objects.XNumber > > - lib/freemarker-2.3.11.jar > > Explanation... 8 com.sun.org.apache.xpath.internal.objects.XObject > > - lib/freemarker-2.3.11.jar > > Explanation... 9 com.sun.org.apache.xpath.internal.objects.XString > > - lib/freemarker-2.3.11.jar > > Explanation... 10 org.w3c.dom.html.HTMLAnchorElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 11 org.w3c.dom.html.HTMLAppletElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 12 org.w3c.dom.html.HTMLAreaElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 13 org.w3c.dom.html.HTMLBRElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 14 org.w3c.dom.html.HTMLBaseElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 15 org.w3c.dom.html.HTMLBaseFontElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 16 org.w3c.dom.html.HTMLBodyElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 17 org.w3c.dom.html.HTMLButtonElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 18 org.w3c.dom.html.HTMLCollection > > - lib/xercesImpl-2.9.1.jar > > Explanation... 19 org.w3c.dom.html.HTMLDListElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 20 org.w3c.dom.html.HTMLDirectoryElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 21 org.w3c.dom.html.HTMLDivElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 22 org.w3c.dom.html.HTMLDocument > > - lib/xercesImpl-2.9.1.jar > > Explanation... 23 org.w3c.dom.html.HTMLElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 24 org.w3c.dom.html.HTMLFieldSetElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 25 org.w3c.dom.html.HTMLFontElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 26 org.w3c.dom.html.HTMLFormElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 27 org.w3c.dom.html.HTMLFrameElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 28 org.w3c.dom.html.HTMLFrameSetElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 29 org.w3c.dom.html.HTMLHRElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 30 org.w3c.dom.html.HTMLHeadElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 31 org.w3c.dom.html.HTMLHeadingElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 32 org.w3c.dom.html.HTMLHtmlElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 33 org.w3c.dom.html.HTMLIFrameElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 34 org.w3c.dom.html.HTMLImageElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 35 org.w3c.dom.html.HTMLInputElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 36 org.w3c.dom.html.HTMLIsIndexElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 37 org.w3c.dom.html.HTMLLIElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 38 org.w3c.dom.html.HTMLLabelElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 39 org.w3c.dom.html.HTMLLegendElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 40 org.w3c.dom.html.HTMLLinkElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 41 org.w3c.dom.html.HTMLMapElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 42 org.w3c.dom.html.HTMLMenuElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 43 org.w3c.dom.html.HTMLMetaElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 44 org.w3c.dom.html.HTMLModElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 45 org.w3c.dom.html.HTMLOListElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 46 org.w3c.dom.html.HTMLObjectElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 47 org.w3c.dom.html.HTMLOptGroupElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 48 org.w3c.dom.html.HTMLOptionElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 49 org.w3c.dom.html.HTMLParagraphElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 50 org.w3c.dom.html.HTMLParamElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 51 org.w3c.dom.html.HTMLPreElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 52 org.w3c.dom.html.HTMLQuoteElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 53 org.w3c.dom.html.HTMLScriptElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 54 org.w3c.dom.html.HTMLSelectElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 55 org.w3c.dom.html.HTMLStyleElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 56 org.w3c.dom.html.HTMLTableCaptionElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 57 org.w3c.dom.html.HTMLTableCellElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 58 org.w3c.dom.html.HTMLTableColElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 59 org.w3c.dom.html.HTMLTableElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 60 org.w3c.dom.html.HTMLTableRowElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 61 org.w3c.dom.html.HTMLTableSectionElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 62 org.w3c.dom.html.HTMLTextAreaElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 63 org.w3c.dom.html.HTMLTitleElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 64 org.w3c.dom.html.HTMLUListElement > > - lib/xercesImpl-2.9.1.jar > > Explanation... 65 org.w3c.dom.ranges.DocumentRange > > - lib/xercesImpl-2.9.1.jar > > Explanation... 66 org.w3c.dom.ranges.Range > > - lib/xercesImpl-2.9.1.jar > > Explanation... 67 org.w3c.dom.ranges.RangeException > > - lib/xercesImpl-2.9.1.jar > > Explanation... 68 sun.misc.Signal > > - lib/aesh-0.33.7.jar > > Explanation... 69 sun.misc.SignalHandler > > - lib/aesh-0.33.7.jar > > Explanation... 70 sun.misc.Unsafe > > - lib/avro-1.7.5.jar > - lib/guava-12.0.jar > - lib/infinispan-commons-6.0.2.Final.jar > - lib/mvel2-2.0.12.jar > - lib/scala-library-2.10.2.jar > > Explanation... 71 sun.nio.ch.FileChannelImpl > > - lib/leveldb-0.5.jar > > Explanation... 72 sun.reflect.ReflectionFactory > > - lib/jboss-marshalling-1.4.4.Final.jar > > Explanation... 73 > sun.reflect.ReflectionFactory$GetReflectionFactoryAction > > - lib/jboss-marshalling-1.4.4.Final.jar > > Explanation... Identify External Replacements > > You should use a separate third-party library that performs this > functionality. > ID Internal API (grouped by package) Used By Identify External > Replacement > ------------------------------ > > > On 24/09/2014 08:55, Rory O'Donnell Oracle, Dublin Ireland wrote: > > Hi Galder, > > As part of the preparations for JDK 9, Oracle?s engineers have been > analyzing open source projects like yours to understand usage. One area of > concern involves identifying compatibility problems, such as reliance on > JDK-internal APIs. > > Our engineers have already prepared guidance on migrating some of the more > common usage patterns of JDK-internal APIs to supported public interfaces. > The list is on the OpenJDK wiki [0], along with instructions on how to run > the jdeps analysis tool yourself . > > As part of the ongoing development of JDK 9, I would like to encourage > migration from JDK-internal APIs towards the supported Java APIs. I have > prepared a report for your project rele ase infinispan-6.0.2 based on the > jdeps output. > > The report is attached to this e-mail. > > For anything where your migration path is unclear, I would appreciate > comments on the JDK-internal API usage patterns in the attached jdeps > report - in particular comments elaborating on the rationale for them - > either to me or on this mailing list. > > Finding suitable replacements for unsupported interfaces is not always > straightforward, which is why I am reaching out to you early in the JDK 9 > development cycle so you can give feedback about new APIs that may be > needed to facilitate this exercise. > > Thank you in advance for any efforts and feedback helping us make JDK 9 > better. > > Rgds,Rory > > [0] > https://wiki.openjdk.java.net/display/JDK8/Java+Dependency+Analysis+Tool > > > -- > Rgds,Rory O'Donnell > Quality Engineering Manager > Oracle EMEA , Dublin, Ireland > > > > > > > -- > Rgds,Rory O'Donnell > Quality Engineering Manager > Oracle EMEA , Dublin, Ireland > > > > > > > -- > Rgds,Rory O'Donnell > Quality Engineering Manager > Oracle EMEA , Dublin, Ireland > > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20141010/8dbe5058/attachment-0001.html From rory.odonnell at oracle.com Fri Oct 10 08:26:33 2014 From: rory.odonnell at oracle.com (Rory O'Donnell Oracle, Dublin Ireland) Date: Fri, 10 Oct 2014 13:26:33 +0100 Subject: [infinispan-dev] Analysis of infinispan-6.0.2 dependency on JDK-Internal APIs In-Reply-To: References: <54227901.8020509@oracle.com> <54227EF9.6090403@oracle.com> <5437AE80.6050302@oracle.com> Message-ID: <5437D079.5090902@oracle.com> Hi Dan, Thank you for the feedback. Rgds,Rory On 10/10/2014 12:37, Dan Berindei wrote: > Hi Rory > > Galder is on PTO for another week, so I'll try to answer instead. > > We only use sun.misc.Unsafe directly, in order to implement a > variation of Doug Lea's ConcurrentHashMapV8 that accepts a custom > Equivalence (implementation of equality/hashCode). I guess we'll have > to switch to AtomicFieldUpdaters if we want it to work with JDK 9, and > possibly move to the volatile extensions once they are implemented. > > The rest of the internal class usages seem to be from our dependencies > on WildFly, JBoss Marshalling, LevelDB, Smooks, and JBoss > MicroContainer. Smooks and JBoss MicroContainer likely won't see any > updates for JDK 9, but they're only used in the demos so they're not > critical. JBoss Marshalling is used in the core, however, so we'll > need a release from them before we can run anything on JDK 9. > > Cheers > Dan > > > On Fri, Oct 10, 2014 at 1:01 PM, Rory O'Donnell Oracle, Dublin Ireland > > wrote: > > Hi Galder, > > Did you have time to review the report, any feedback ? > > Rgds,Rory > > On 24/09/2014 09:21, Rory O'Donnell Oracle, Dublin Ireland wrote: >> Below is a text output of the report for infinispan-6.0.2. >> >> Rgds,Rory >> >> ------------------------------------------------------------------------ >> >> >> JDK Internal API Usage Report for infinispan-6.0.2.Final-all >> >> The OpenJDK Quality Outreach campaign has run a compatibility >> report to identify usage of JDK-internal APIs. Usage of these >> JDK-internal APIs could pose compatibility issues, as the Java >> team explained in 1996 >> . >> We have created this report to help you identify which >> JDK-internal APIs your project uses, what to use instead, and >> where those changes should go. Making these changes will improve >> your compatibility, and in some cases give better performance. >> >> Migrating away from the JDK-internal APIs now will give your team >> adequate time for testing before the release of JDK 9. If you are >> unable to migrate away from an internal API, please provide us >> with an explanation below to help us understand it better. As a >> reminder, supported APIs are determined by the OpenJDK's Java >> Community Process and not by Oracle. >> >> This report was generated by jdeps >> >> through static analysis of artifacts: it does not identify any >> usage of those APIs through reflection or dynamic bytecode. You >> may also run jdeps on your own >> >> if you would prefer. >> >> Summary of the analysis of the jar files within >> infinispan-6.0.2.Final-all: >> >> * Numer of jar files depending on JDK-internal APIs: 10 >> * Internal APIs that have known replacements: 0 >> * Internal APIs that have no supported replacements: 73 >> >> >> APIs that have known replacements >> : >> >> ID Replace Usage of With Inside >> >> >> JDK-internal APIs without supported replacements: >> >> ID Internal APIs (do not use) Used by >> 1 com.sun.org.apache.xml.internal.utils.PrefixResolver >> >> * lib/freemarker-2.3.11.jar >> >> Explanation... >> 2 com.sun.org.apache.xpath.internal.XPath >> >> * lib/freemarker-2.3.11.jar >> >> Explanation... >> 3 com.sun.org.apache.xpath.internal.XPathContext >> >> * lib/freemarker-2.3.11.jar >> >> Explanation... >> 4 com.sun.org.apache.xpath.internal.objects.XBoolean >> >> * lib/freemarker-2.3.11.jar >> >> Explanation... >> 5 com.sun.org.apache.xpath.internal.objects.XNodeSet >> >> * lib/freemarker-2.3.11.jar >> >> Explanation... >> 6 com.sun.org.apache.xpath.internal.objects.XNull >> >> * lib/freemarker-2.3.11.jar >> >> Explanation... >> 7 com.sun.org.apache.xpath.internal.objects.XNumber >> >> * lib/freemarker-2.3.11.jar >> >> Explanation... >> 8 com.sun.org.apache.xpath.internal.objects.XObject >> >> * lib/freemarker-2.3.11.jar >> >> Explanation... >> 9 com.sun.org.apache.xpath.internal.objects.XString >> >> * lib/freemarker-2.3.11.jar >> >> Explanation... >> 10 org.w3c.dom.html.HTMLAnchorElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 11 org.w3c.dom.html.HTMLAppletElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 12 org.w3c.dom.html.HTMLAreaElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 13 org.w3c.dom.html.HTMLBRElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 14 org.w3c.dom.html.HTMLBaseElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 15 org.w3c.dom.html.HTMLBaseFontElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 16 org.w3c.dom.html.HTMLBodyElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 17 org.w3c.dom.html.HTMLButtonElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 18 org.w3c.dom.html.HTMLCollection >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 19 org.w3c.dom.html.HTMLDListElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 20 org.w3c.dom.html.HTMLDirectoryElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 21 org.w3c.dom.html.HTMLDivElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 22 org.w3c.dom.html.HTMLDocument >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 23 org.w3c.dom.html.HTMLElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 24 org.w3c.dom.html.HTMLFieldSetElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 25 org.w3c.dom.html.HTMLFontElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 26 org.w3c.dom.html.HTMLFormElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 27 org.w3c.dom.html.HTMLFrameElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 28 org.w3c.dom.html.HTMLFrameSetElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 29 org.w3c.dom.html.HTMLHRElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 30 org.w3c.dom.html.HTMLHeadElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 31 org.w3c.dom.html.HTMLHeadingElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 32 org.w3c.dom.html.HTMLHtmlElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 33 org.w3c.dom.html.HTMLIFrameElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 34 org.w3c.dom.html.HTMLImageElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 35 org.w3c.dom.html.HTMLInputElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 36 org.w3c.dom.html.HTMLIsIndexElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 37 org.w3c.dom.html.HTMLLIElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 38 org.w3c.dom.html.HTMLLabelElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 39 org.w3c.dom.html.HTMLLegendElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 40 org.w3c.dom.html.HTMLLinkElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 41 org.w3c.dom.html.HTMLMapElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 42 org.w3c.dom.html.HTMLMenuElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 43 org.w3c.dom.html.HTMLMetaElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 44 org.w3c.dom.html.HTMLModElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 45 org.w3c.dom.html.HTMLOListElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 46 org.w3c.dom.html.HTMLObjectElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 47 org.w3c.dom.html.HTMLOptGroupElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 48 org.w3c.dom.html.HTMLOptionElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 49 org.w3c.dom.html.HTMLParagraphElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 50 org.w3c.dom.html.HTMLParamElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 51 org.w3c.dom.html.HTMLPreElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 52 org.w3c.dom.html.HTMLQuoteElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 53 org.w3c.dom.html.HTMLScriptElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 54 org.w3c.dom.html.HTMLSelectElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 55 org.w3c.dom.html.HTMLStyleElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 56 org.w3c.dom.html.HTMLTableCaptionElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 57 org.w3c.dom.html.HTMLTableCellElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 58 org.w3c.dom.html.HTMLTableColElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 59 org.w3c.dom.html.HTMLTableElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 60 org.w3c.dom.html.HTMLTableRowElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 61 org.w3c.dom.html.HTMLTableSectionElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 62 org.w3c.dom.html.HTMLTextAreaElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 63 org.w3c.dom.html.HTMLTitleElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 64 org.w3c.dom.html.HTMLUListElement >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 65 org.w3c.dom.ranges.DocumentRange >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 66 org.w3c.dom.ranges.Range >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 67 org.w3c.dom.ranges.RangeException >> >> * lib/xercesImpl-2.9.1.jar >> >> Explanation... >> 68 sun.misc.Signal >> >> * lib/aesh-0.33.7.jar >> >> Explanation... >> 69 sun.misc.SignalHandler >> >> * lib/aesh-0.33.7.jar >> >> Explanation... >> 70 sun.misc.Unsafe >> >> * lib/avro-1.7.5.jar >> * lib/guava-12.0.jar >> * lib/infinispan-commons-6.0.2.Final.jar >> * lib/mvel2-2.0.12.jar >> * lib/scala-library-2.10.2.jar >> >> Explanation... >> 71 sun.nio.ch.FileChannelImpl >> >> * lib/leveldb-0.5.jar >> >> Explanation... >> 72 sun.reflect.ReflectionFactory >> >> * lib/jboss-marshalling-1.4.4.Final.jar >> >> Explanation... >> 73 sun.reflect.ReflectionFactory$GetReflectionFactoryAction >> >> * lib/jboss-marshalling-1.4.4.Final.jar >> >> Explanation... >> >> >> Identify External Replacements >> >> You should use a separate third-party library that performs this >> functionality. >> >> ID Internal API (grouped by package) Used By Identify External >> Replacement >> >> >> ------------------------------------------------------------------------ >> >> >> On 24/09/2014 08:55, Rory O'Donnell Oracle, Dublin Ireland wrote: >>> Hi Galder, >>> >>> As part of the preparations for JDK 9, Oracle?s engineers have >>> been analyzing open source projects like yours to understand >>> usage. One area of concern involves identifying compatibility >>> problems, such as reliance on JDK-internal APIs. >>> >>> Our engineers have already prepared guidance on migrating some >>> of the more common usage patterns of JDK-internal APIs to >>> supported public interfaces. The list is on the OpenJDK wiki >>> [0], along with instructions on how to run the jdeps analysis >>> tool yourself . >>> >>> As part of the ongoing development of JDK 9, I would like to >>> encourage migration from JDK-internal APIs towards the supported >>> Java APIs. I have prepared a report for your project rele ase >>> infinispan-6.0.2 based on the jdeps output. >>> >>> The report is attached to this e-mail. >>> >>> For anything where your migration path is unclear, I would >>> appreciate comments on the JDK-internal API usage patterns in >>> the attached jdeps report - in particular comments elaborating >>> on the rationale for them - either to me or on this mailing list. >>> >>> Finding suitable replacements for unsupported interfaces is not >>> always straightforward, which is why I am reaching out to you >>> early in the JDK 9 development cycle so you can give feedback >>> about new APIs that may be needed to facilitate this exercise. >>> >>> Thank you in advance for any efforts and feedback helping us >>> make JDK 9 better. >>> >>> Rgds,Rory >>> >>> [0] >>> https://wiki.openjdk.java.net/display/JDK8/Java+Dependency+Analysis+Tool >>> >>> >>> >>> -- >>> Rgds,Rory O'Donnell >>> Quality Engineering Manager >>> Oracle EMEA , Dublin, Ireland >>> >>> >>> >>> >> >> -- >> Rgds,Rory O'Donnell >> Quality Engineering Manager >> Oracle EMEA , Dublin, Ireland >> >> >> >> > > -- > Rgds,Rory O'Donnell > Quality Engineering Manager > Oracle EMEA , Dublin, Ireland > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > -- Rgds,Rory O'Donnell Quality Engineering Manager Oracle EMEA , Dublin, Ireland -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20141010/af8ed8aa/attachment-0001.html From mudokonman at gmail.com Fri Oct 10 08:38:04 2014 From: mudokonman at gmail.com (William Burns) Date: Fri, 10 Oct 2014 08:38:04 -0400 Subject: [infinispan-dev] About size() In-Reply-To: <5435560E.2030206@redhat.com> References: <542E5E92.7060504@redhat.com> <3EA0122E-8293-49EB-8CB7-F67FA2E58532@redhat.com> <593A31AE-2C9C-4B90-9048-0EDBADCA1ADF@redhat.com> <5435560E.2030206@redhat.com> Message-ID: On Wed, Oct 8, 2014 at 11:19 AM, Radim Vansa wrote: > Users expect that size() will be constant-time (or linear to cluster > size), and generally fast operation. I'd prefer to keep it that way. > Though, even the MR way (used for HotRod size() now) needs to crawl > through all the entries locally. Many in memory collections require O(n) to do size such as ConcurrentLinkedQueue, so I wouldn't say size should always be expected to be constant time or O(c) where c is # of nodes. Granted a user can expect anything they want. > > 'Heretic, not very well though of and changing too many things' idea: > what about having data container segment-aware? Then you'd just bcast > SizeCommand with given topologyId and sum up sizes of primary-owned > segments... It's not a complete solution, but at least that would enable > to get the number of locally owned entries quite fast. Though, you can't > do that easily with cache stores (without changing SPI). > > Regarding cache stores, IMO we're damned anyway: when calling > cacheStore.size(), it can report more entries as those haven't been > expired yet, it can report less entries as those can be expired due to > [1]. Or, we'll enumerate all the entries, and that's going to be slow > (btw., [1] reminded me that we should enumerate both datacontainer AND > cachestores even if passivation is not enabled). This is precisely what the distributed iterator does. And also support for expired entries was recently integrated as I missed that in the original implementation [a] [a] https://issues.jboss.org/browse/ISPN-4643 > > Radim > > [1] https://issues.jboss.org/browse/ISPN-3202 > > On 10/08/2014 04:42 PM, William Burns wrote: >> So it seems we would want to change this for 7.0 if possible since it >> would be a bigger change for something like 7.1 and 8.0 would be even >> further out. I should be able to put this together for CR2. >> >> It seems that we want to implement keySet, values and entrySet methods >> using the entry iterator approach. >> >> It is however unclear for the size method if we want to use MR entry >> counting and not worry about the rehash and passivation issues since >> it is just an estimation anyways. Or if we want to also use the entry >> iterator which should be closer approximation but will require more >> network overhead and memory usage. >> >> Also we didn't really talk about the fact that these methods would >> ignore ongoing transactions and if that is a concern or not. >> >> - Will >> >> On Wed, Oct 8, 2014 at 10:13 AM, Mircea Markus wrote: >>> On Oct 8, 2014, at 15:11, Dan Berindei wrote: >>> >>>> On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus wrote: >>>> On Oct 3, 2014, at 9:30, Radim Vansa wrote: >>>> >>>>> Hi, >>>>> >>>>> recently we had a discussion about what size() returns, but I've >>>>> realized there are more things that users would like to know. My >>>>> question is whether you think that they would really appreciate it, or >>>>> whether it's just my QA point of view where I sometimes compute the >>>>> 'checksums' of cache to see if I didn't lost anything. >>>>> >>>>> There are those sizes: >>>>> A) number of owned entries >>>>> B) number of entries stored locally in memory >>>>> C) number of entries stored in each local cache store >>>>> D) number of entries stored in each shared cache store >>>>> E) total number of entries in cache >>>>> >>>>> So far, we can get >>>>> B via withFlags(SKIP_CACHE_LOAD).size() >>>>> (passivation ? B : 0) + firstNonZero(C, D) via size() >>>>> E via distributed iterators / MR >>>>> A via data container iteration + distribution manager query, but only >>>>> without cache store >>>>> C or D through >>>>> getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() >>>>> >>>>> I think that it would go along with users' expectations if size() >>>>> returned E and for the rest we should have special methods on >>>>> AdvancedCache. That would of course change the meaning of size(), but >>>>> I'd say that finally to something that has firm meaning. >>>>> >>>>> WDYT? >>>> There was a lot of arguments in past whether size() and other methods that operate over all the elements (keySet, values) are useful because: >>>> - they are approximate (data changes during iteration) >>>> - they are very resource consuming and might be miss-used (this is the reason we chosen to use size() with its current local semantic) >>>> >>>> These methods (size, keys, values) are useful for people and I think we were not wise to implement them only on top of the local data: this is like preferring efficiency over correctness. This also created a lot of confusion with our users, question like size() doesn't return the correct value being asked regularly. I totally agree that size() returns E (i.e. everything that is stored within the grid, including persistence) and it's performance implications to be documented accordingly. For keySet and values - we should stop implementing them (throw exception) and point users to Will's distributed iterator which is a nicer way to achieve the desired behavior. >>>> >>>> We can also implement keySet() and values() on top of the distributed entry iterator and document that using the iterator directly is better. >>> Yes, that's what I meant as well. >>> >>> Cheers, >>> -- >>> Mircea Markus >>> Infinispan lead (www.infinispan.org) >>> >>> >>> >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Radim Vansa > JBoss DataGrid QA > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From ttarrant at redhat.com Fri Oct 10 09:53:49 2014 From: ttarrant at redhat.com (Tristan Tarrant) Date: Fri, 10 Oct 2014 15:53:49 +0200 Subject: [infinispan-dev] Clustering standalone Infinispan w/ WF running Infinispan In-Reply-To: <5437E23E.4050508@jboss.com> References: <54257C74.2050600@redhat.com> <1898996727.37561681.1411744742097.JavaMail.zimbra@redhat.com> <54297D4F.9060009@jboss.com> <133530692.31414088.1412009840434.JavaMail.zimbra@redhat.com> <542A5583.4030908@redhat.com> <851849146.38574524.1412064092251.JavaMail.zimbra@redhat.com> <542D5143.3070006@redhat.com> <5437E23E.4050508@jboss.com> Message-ID: <5437E4ED.6020109@redhat.com> Markdown chewed on my markup :) https://raw.githubusercontent.com/tristantarrant/infinispan-playground-hybrid/master/README.md On 10/10/14 15:42, Kurt T Stam wrote: > Hi Tristan, > > I'm trying to follow your instructions but am I bit confused by the > following: > > "You will also need to modify the following file: > > modules/system/layers/base/org/jboss/as/clustering/infinispan/main/module.xml > > > by adding the following line to its dependencies:" > > What do I have to add? > > Thx, > > --Kurt > > > > On 10/2/14, 9:21 AM, Tristan Tarrant wrote: >> I have successfully created a "hybrid" cluster between an application >> using Infinispan in embedded mode and an Infinispan server by doing >> the following on the embedded side: >> >> - use a JGroups Channel wrapped in a MuxHandler >> - use a custom class resolver which simulates (or rather... hacks) >> the behaviour of the ModularClassResolver when not using modules >> >> You can find the code at my personal GitHub repo: >> >> https://github.com/tristantarrant/infinispan-playground/tree/master/src/main/java/net/dataforte/infinispan/hybrid >> >> >> suggestions and improvements are welcome. >> >> Tristan >> >> On 30/09/14 10:01, Stelios Koussouris wrote: >>> Hi, >>> >>> To give a bit of context on this. We are doing a POC where the >>> customer wishes to utilize JDG to speed up their application. We >>> need (due to some customer requirements) to cluster >>> EMBEDDED JDG (infinispan library mode) with REMOTE JDG (Infinispan >>> Server) nodes. The infinispan jars should be the same as they are >>> only libraries and they >>> are on the same version. However, during "clustering" of the caches >>> we started seeing errors which looked like there were due to the >>> fact that the clustering of the caches contained different >>> info between the 2 types of cache instantiation (embedded vs server). >>> >>> The result was to for a suggestion to create our own MuxChannel (I >>> don't know if we have any other alternatives at this stage to >>> cluster embedded with server infinispan caches) but at the moment we >>> are facing https://gist.github.com/skoussou/5edc5689446b67f85ae8 >>> >>> Regards, >>> >>> Stylianos Kousouris >>> Red Hat Middleware Consultant >>> >>> ----- Original Message ----- >>> From: "Tristan Tarrant" >>> To: "infinispan -Dev List" , "Kurt T >>> Stam" >>> Cc: "Stelios Koussouris" , "Richard >>> Achmatowicz" >>> Sent: Tuesday, 30 September, 2014 8:02:27 AM >>> Subject: Re: [infinispan-dev] Clustering standalone Infinispan w/ WF >>> running Infinispan >>> >>> I don't know what Kurt is doing, but Stelios is attempting to >>> cluster an >>> application using embedded Infinispan deployed within WF together with >>> an Infinispan Server instance. >>> The application is managing its own caches, and therefore it is not >>> interacting with the underlying Infinispan and JGroups subsystems in >>> WF. >>> Infinispan Server uses its Infinispan and JGroups subsystems (which are >>> forked from WF's) and therefore are using MuxChannels. >>> >>> I told Stelios to use a MuxChannel-wrapped Channel in his application >>> and it solved part of the issue (he was initially importing the one >>> included in the WF's jgroups subsystem, but now he's using his local >>> copy), but now he has run into further problems and I believe what Paul >>> & Dennis have written might be correct. >>> >>> The code that configures this is in >>> EmbeddedCacheManagerConfigurationService: >>> >>> GlobalConfigurationBuilder builder = new GlobalConfigurationBuilder(); >>> ModuleLoader moduleLoader = this.dependencies.getModuleLoader(); >>> builder.serialization().classResolver(ModularClassResolver.getInstance(moduleLoader)); >>> >>> >>> I don't know how you'd get a ModuleLoader from within a WF deployment, >>> but I'm sure it can be done. >>> >>> Tristan >>> >>> On 29/09/14 18:57, Paul Ferraro wrote: >>>> You should not need to use a MuxChannel. This would only be >>>> necessary if there are other EAP services sharing the channel. >>>> Using a MuxChannel allows your standalone Infinispan instance to >>>> filter these irrelevant messages. However, in JDG, there should be >>>> no other services other than Infinispan using the channel - hence >>>> the MuxChannel stuff is unnecessary. >>>> >>>> I think Dennis earlier response was spot on. EAP/JDG configures >>>> it's cache managers using a ModularClassResolver (which includes a >>>> module name along with the class name when marshalling). Your >>>> standalone Infinispan instances do not use this and therefore >>>> cannot make sense of the message body. >>>> >>>> Paul >>>> >>>> ----- Original Message ----- >>>>> From: "Kurt T Stam" >>>>> To: "Stelios Koussouris" , "Radoslav Husar" >>>>> >>>>> Cc: "Galder Zamarre?o" , "Paul Ferraro" >>>>> , "Richard Achmatowicz" >>>>> , "infinispan -Dev List" >>>>> >>>>> Sent: Monday, September 29, 2014 11:39:59 AM >>>>> Subject: Re: Clustering standalone Infinispan w/ WF running >>>>> Infinispan >>>>> >>>>> Thanks for following up Stelios, I think Galder is traveling the >>>>> next 2 >>>>> weeks. >>>>> >>>>> So - do we need fixes on both ends then so that the boot order >>>>> does not >>>>> matter? In which project(s) would we apply >>>>> there changes? Or can they be applied in the end-user's code? >>>>> >>>>> Thx, >>>>> >>>>> --Kurt >>>>> >>>>> >>>>> >>>>> On 9/26/14, 11:19 AM, Stelios Koussouris wrote: >>>>>> Hi, >>>>>> >>>>>> Rado: It is both ways. ie. if I start first the JDG Server I get >>>>>> the issue >>>>>> on the library mode side when I start that one. If reverse the >>>>>> order of >>>>>> startup I get it in the JDG Server side. >>>>>> >>>>>> Question: >>>>>> ----------------------------------------------------------------------------------------------------------------------- >>>>>> >>>>>> ...IMO the channel needs to be wrapped as >>>>>> org.jboss.as.clustering.jgroups.MuxChannel before passing to >>>>>> infinispan. >>>>>> ... >>>>>> ----------------------------------------------------------------------------------------------------------------------- >>>>>> >>>>>> For now that this is not being done. If I wanted to do it >>>>>> manually on the >>>>>> library side where I can create the protocol programmatically we are >>>>>> talking about something like this? >>>>>> >>>>>> ProtocolStackConfigurator configurator = >>>>>> ConfiguratorFactory.getStackConfigurator("jgroups-udp.xml"); >>>>>> MuxChannel channel = new MuxChannel(configurator); >>>>>> org.infinispan.remoting.transport.Transport transport = new >>>>>> org.infinispan.remoting.transport.jgroups.JGroupsTransport(channel); >>>>>> >>>>>> .... >>>>>> then replace the below >>>>>> new >>>>>> GlobalConfigurationBuilder().clusteredDefault().globalJmxStatistics().cacheManagerName("RDSCacheManager").allowDuplicateDomains(true).enable().transport().clusterName("UDM-CLUSTER").addProperty("configurationFile", >>>>>> >>>>>> "jgroups-udp.xml") >>>>>> WITH >>>>>> new >>>>>> GlobalConfigurationBuilder().clusteredDefault().globalJmxStatistics().cacheManagerName("RDSCacheManager").allowDuplicateDomains(true).enable().transport(Transport).clusterName("UDM-CLUSTER") >>>>>> >>>>>> >>>>>> Btw, someone mentioned that if I follow this method I need to to >>>>>> know the >>>>>> assigned mux ids, but that is not quite clear what it means with >>>>>> regards >>>>>> to the JGroupsTransport configuration >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Stylianos Kousouris >>>>>> Red Hat Middleware Consultant >>>>>> >>>>>> ----- Original Message ----- >>>>>> From: "Radoslav Husar" >>>>>> To: "Galder Zamarre?o" , "Paul Ferraro" >>>>>> >>>>>> Cc: "Richard Achmatowicz" , "infinispan -Dev >>>>>> List" >>>>>>