From bban at redhat.com Sat Mar 1 04:29:53 2014 From: bban at redhat.com (Bela Ban) Date: Sat, 01 Mar 2014 10:29:53 +0100 Subject: [infinispan-dev] Musings on ISPN/JGRPs OSI transport choices and ambitions In-Reply-To: <1393611384264-4028925.post@n3.nabble.com> References: <1393611384264-4028925.post@n3.nabble.com> Message-ID: <5311A891.5090306@redhat.com> Hi Ben, why don't you post an edited version of my private replies to you to this topic as well, so we have some background ? In a nutshell, you want to use a zero-copy transport between processes that run on the *same physical box* (e.g. shipping only pointers to native shared memory between processes). Currently, using TCP or UDP between processes on the same box still requires 1 or 2 copies, even when a loopback device is used. I'm interested in adding such a transport in JGroups 4, in which I plan to revamp the transport to adopt an NIO based scheme, accommodating both TDP and TCP. This is all still in the planning phase, but one feature will be to have multiple transports running in the same stack and sending messages alternatively via different transports. E.g. multicasts would use UDP whereas unicasts would use TCP (by default), but this could be overridden per message (with flags). If we then had 5 physical boxes, with 20 processes on each box, for a total of 100 nodes, then we could configure the stacks to run both SHR_MEM and UDP: a group-wide message (a multicast) would be sent via SHR_MEM *and* UDP. The SHR_MEM transport would disseminate the message to all 20 processes on the same physical box, using shared memory. The UDP transport would be configured as non-loopback (IP_MULTICAST_LOOP=false), which means that the message would be multicast to the other 3 physical boxes, but the local multicast would be dropped. The other boxes would then use SHR_MEM to disseminate the message locally to all 20 processes. Just an idea atm, this could also be done via RELAY2, but the QoS would not be the same. I'm planning on releasing 3.5 in 6-8 weeks from now. This includes a community baking phase during which I'll be working on a deep-dive course on JGroups. So a *very tentative* schedule is to start on 4.0 at the beginning of summer. On 28/02/14 19:16, cotton-ben wrote: > Hi Mircea, Manik, Bela, et. al. > > I want to more publicly muse on this SUBJ line. Here now, then maybe in > ISPN /user/ forum, then maybe JSR-347 provider wide. I know we had a > semi-private (Bela led) exchange, but I want to be more public with this > conversation. > > Long post again. sorry. > > This is just on open musing. I realize this musing should not expect to be > accommodated by any "oh, we got to do this in ISPN/JGRPs now!" repsonse ... > there is absolutely only the most infrequent use-case that would /today/ be > served by addressing this musing ... but tomorrow that /will/ be a different > story. > > Questions:: > > Does the concept of ISPN/JGRPs transport between "Cluster" nodes currently > depend on OSI transport layer sockets' participation(s)? > > In other words, if all the nodes on my "Cluster" have locality=127.0.0.1 is > ISPN/JGRPs accommodating enough to use a native OS IPC choice as an > intra-node transport? > > Or, is it true that my transport choices are always limited to just > {TCP,UDP} -- independent of the participating nodes' locality (and that I > am thus forced to go over an OSI loopback)? > > If my transport choices are only limited to {TCP,UDP} for all node locality, > then I might ask that you consider additional upcoming modern Java transport > options. > > With the ambitions of upcoming OpenJDK JEPs, that will make mainstream an > API capabilty that today is only available via sun.misc.Unsafe, Java will > soon have "more complete" transport options that will include all of > > { TCP, UDP, RDMA/SDP, IPC } > > Some examples of upcoming accommodating providers= > > 1. RDMA/SDP: via Infiniband VERBS (works today in JDK 7 on OSI physical > layer IB NICs, does not work over Ethernet) > 2. IPC via OpenHFT' SHM as IPC solution (will work this year) > > Again, I realize that these transport choices are useful today only in a > very rare use case. However, should these transports be in your offering to > ISPN/JGRPs customers, then ISPN/JGRPs becomes -- like all of Java has > become in recent years -- increasingly more attractive to /all/ HPC Linux > supercomputing use cases (not just ours). > > > > > > > > -- > View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/Musings-on-ISPN-JGRPs-OSI-transport-choices-and-ambitions-tp4028925.html > Sent from the Infinispan Developer List mailing list archive at Nabble.com. > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -- Bela Ban, JGroups lead (http://www.jgroups.org) From ben.cotton at ALUMNI.RUTGERS.EDU Sat Mar 1 05:22:04 2014 From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben) Date: Sat, 1 Mar 2014 02:22:04 -0800 (PST) Subject: [infinispan-dev] Musings on ISPN/JGRPs OSI transport choices and ambitions In-Reply-To: <5311A891.5090306@redhat.com> References: <1393611384264-4028925.post@n3.nabble.com> <5311A891.5090306@redhat.com> Message-ID: <013b01cf3537$fd62d5a0$f82880e0$@alumni.rutgers.edu> ? In a nutshell, you want to use a zero-copy transport between processes that run on the *same physical box* (e.g. shipping only pointers to native shared memory between processes). This is exactly what we want. Without this capability our beloved RedHat ISPN/JGRPS stack pays no homage to our architected locality premiums. Remember when EJB made the fatal mistake of forcing remoteness on its API?s end-user? Just plain LAZY these EJB designers! How dare they? They ended up with the grossly inelegant hack of providing a localremote() interface. OBSCENE. When you hack up something to the point of being forced to cattle-prod the API with something called ?localremote()? (a literal oxymoron in your API. Are you kidding me?) you just have to laugh and say ?damn that is BROKE!?. It prejudiced Java in the eyes of the HPC community. For years. Now Bela, we totally respect that you won?t ever put C++/JNI into JGRPs ? even to accommodate our architected supercomputing locality premiums. We get that. But, now Bela! But now! We have a 100% Java solution to take us where we want to go ? an IPC transport over /dev/shm ? without a single line of C++ nor JNI. It is beautiful. It solves a problem of ours . By doing this soon in JGRPs, Bela, you can realize two beautiful outcomes: 1. JGRPs remains magnifique: 100% Pure Java, not a single JNI bridge to C++ to native kernel system calls (we agree that is unattractive) 2. You liberate the JGRPs end-user from having remoteness (which OSI loopback is) forced down their throats ? I'm interested in adding such a transport in JGroups 4, This thrills us! ? in which I plan to revamp the transport to adopt an NIO based scheme I may be mistaken, but I think the OpenHFT solution for using SHM as an IPC transport has big advantages over using the NIO bridges to Off-Heap capabilities. Doesn?t NIO?s DirectByteBuffer have a capacity delinquency that will immediately betray big data views? I?m not sure about this, I?ll get back to you (publicly, here). ? a group-wide message (a multicast) would be sent via SHR_MEM *and* UDP. Perfect. ? why don't you post an edited version of my private replies to you to this topic as well, so we have some background ? very good idea. Will do. The bottom line Is this: let?s fix this. Let?s work together with OpenHFT to find the best ?locality is a premium!? solution (I don?t think it is NIO, but I might be wrong). After we fix this, let?s demand the JGRID world fix it. We can go to 347 (BTW, OpenHFT?s Peter Lawrey is now being seated on the 347 EG) and specify that providing a transport that accommodates locality is required to be JGRID standard. Let?s make it better. :-) Thank you Bela (and RedHat). From: Bela Ban [via Infinispan Developer List] [mailto:ml-node+s980875n4028928h85 at n3.nabble.com] Sent: Saturday, March 1, 2014 4:30 AM To: cotton-ben Subject: Re: [infinispan-dev] Musings on ISPN/JGRPs OSI transport choices and ambitions Hi Ben, why don't you post an edited version of my private replies to you to this topic as well, so we have some background ? In a nutshell, you want to use a zero-copy transport between processes that run on the *same physical box* (e.g. shipping only pointers to native shared memory between processes). Currently, using TCP or UDP between processes on the same box still requires 1 or 2 copies, even when a loopback device is used. I'm interested in adding such a transport in JGroups 4, in which I plan to revamp the transport to adopt an NIO based scheme, accommodating both TDP and TCP. This is all still in the planning phase, but one feature will be to have multiple transports running in the same stack and sending messages alternatively via different transports. E.g. multicasts would use UDP whereas unicasts would use TCP (by default), but this could be overridden per message (with flags). If we then had 5 physical boxes, with 20 processes on each box, for a total of 100 nodes, then we could configure the stacks to run both SHR_MEM and UDP: a group-wide message (a multicast) would be sent via SHR_MEM *and* UDP. The SHR_MEM transport would disseminate the message to all 20 processes on the same physical box, using shared memory. The UDP transport would be configured as non-loopback (IP_MULTICAST_LOOP=false), which means that the message would be multicast to the other 3 physical boxes, but the local multicast would be dropped. The other boxes would then use SHR_MEM to disseminate the message locally to all 20 processes. Just an idea atm, this could also be done via RELAY2, but the QoS would not be the same. I'm planning on releasing 3.5 in 6-8 weeks from now. This includes a community baking phase during which I'll be working on a deep-dive course on JGroups. So a *very tentative* schedule is to start on 4.0 at the beginning of summer. On 28/02/14 19:16, cotton-ben wrote: > Hi Mircea, Manik, Bela, et. al. > > I want to more publicly muse on this SUBJ line. Here now, then maybe in > ISPN /user/ forum, then maybe JSR-347 provider wide. I know we had a > semi-private (Bela led) exchange, but I want to be more public with this > conversation. > > Long post again. sorry. > > This is just on open musing. I realize this musing should not expect to be > accommodated by any "oh, we got to do this in ISPN/JGRPs now!" repsonse ... > there is absolutely only the most infrequent use-case that would /today/ be > served by addressing this musing ... but tomorrow that /will/ be a different > story. > > Questions:: > > Does the concept of ISPN/JGRPs transport between "Cluster" nodes currently > depend on OSI transport layer sockets' participation(s)? > > In other words, if all the nodes on my "Cluster" have locality=127.0.0.1 is > ISPN/JGRPs accommodating enough to use a native OS IPC choice as an > intra-node transport? > > Or, is it true that my transport choices are always limited to just > {TCP,UDP} -- independent of the participating nodes' locality (and that I > am thus forced to go over an OSI loopback)? > > If my transport choices are only limited to {TCP,UDP} for all node locality, > then I might ask that you consider additional upcoming modern Java transport > options. > > With the ambitions of upcoming OpenJDK JEPs, that will make mainstream an > API capabilty that today is only available via sun.misc.Unsafe, Java will > soon have "more complete" transport options that will include all of > > { TCP, UDP, RDMA/SDP, IPC } > > Some examples of upcoming accommodating providers= > > 1. RDMA/SDP: via Infiniband VERBS (works today in JDK 7 on OSI physical > layer IB NICs, does not work over Ethernet) > 2. IPC via OpenHFT' SHM as IPC solution (will work this year) > > Again, I realize that these transport choices are useful today only in a > very rare use case. However, should these transports be in your offering to > ISPN/JGRPs customers, then ISPN/JGRPs becomes -- like all of Java has > become in recent years -- increasingly more attractive to /all/ HPC Linux > supercomputing use cases (not just ours). > > > > > > > > -- > View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/Musings-on-ISPN-JGRPs- OSI-transport-choices-and-ambitions-tp4028925.html > Sent from the Infinispan Developer List mailing list archive at Nabble.com. > _______________________________________________ > infinispan-dev mailing list > [hidden email] > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -- Bela Ban, JGroups lead (http://www.jgroups.org) _______________________________________________ infinispan-dev mailing list [hidden email] https://lists.jboss.org/mailman/listinfo/infinispan-dev _____ If you reply to this email, your message will be added to the discussion below: http://infinispan-developer-list.980875.n3.nabble.com/Musings-on-ISPN-JGRPs- OSI-transport-choices-and-ambitions-tp4028925p4028928.html To start a new topic under Infinispan Developer List, email ml-node+s980875n2085493h0 at n3.nabble.com To unsubscribe from Infinispan Developer List, click here . NAML -- View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/Musings-on-ISPN-JGRPs-OSI-transport-choices-and-ambitions-tp4028925p4028929.html Sent from the Infinispan Developer List mailing list archive at Nabble.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140301/2a5582d9/attachment-0001.html From bban at redhat.com Mon Mar 3 07:36:53 2014 From: bban at redhat.com (Bela Ban) Date: Mon, 03 Mar 2014 13:36:53 +0100 Subject: [infinispan-dev] Musings on ISPN/JGRPs OSI transport choices and ambitions In-Reply-To: <013b01cf3537$fd62d5a0$f82880e0$@alumni.rutgers.edu> References: <1393611384264-4028925.post@n3.nabble.com> <5311A891.5090306@redhat.com> <013b01cf3537$fd62d5a0$f82880e0$@alumni.rutgers.edu> Message-ID: <53147765.6000505@redhat.com> On 01/03/14 11:22, cotton-ben wrote: > ?in which I plan to revamp the transport to adopt an NIO based scheme > > I may be mistaken, but I think the OpenHFT solution for using SHM as an > IPC transport has big advantages over using the NIO bridges to Off-Heap > capabilities. NIO.2 will be used for UDP and TCP only, I'm not talking about the shmem transport. -- Bela Ban, JGroups lead (http://www.jgroups.org) From ben.cotton at ALUMNI.RUTGERS.EDU Mon Mar 3 10:46:35 2014 From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben) Date: Mon, 3 Mar 2014 07:46:35 -0800 (PST) Subject: [infinispan-dev] Infinispan embedded off-heap cache In-Reply-To: <1392670937953-4028836.post@n3.nabble.com> References: <52D678BD.5010400@redhat.com> <52D67A1E.2010509@redhat.com> <1389808266330-4028649.post@n3.nabble.com> <1389893871449-4028653.post@n3.nabble.com> <1391715983011-4028794.post@n3.nabble.com> <3560BCBE-3D25-472B-9D8B-C16230FBCB4B@redhat.com> <1391992923651-4028800.post@n3.nabble.com> <1392670937953-4028836.post@n3.nabble.com> Message-ID: <1393861594878-4028931.post@n3.nabble.com> Quick Update: It is my understandng that Peter Lawrey will make available an OpenHFT HC Alpha Release in Maven Central next weekend. At that time, Dmitry Gordeev and I will take the OpenHFT dependency tag and proceed to build a branch of Red Hat's ISPN 7 that will treat net.openhft.collections.SharedHashMap as a Red Hat Infinispan 7 default impl of a fully JSR-107 interoperable off-heap javax.cache.Cache ... A diagram of this build effort can be found here: https://raw.github.com/Cotton-Ben/OpenHFT/master/doc/AdaptingOpenHFT-SHM-as-JCACHE-Impl.jpg ... The Red Hat view of his effort will be tracked here: https://issues.jboss.org/browse/ISPN-871 ... The code that defines the Impl will be here https://github.com/Cotton-Ben/infinispan ... -- View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028931.html Sent from the Infinispan Developer List mailing list archive at Nabble.com. From ben.cotton at ALUMNI.RUTGERS.EDU Mon Mar 3 16:04:29 2014 From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben) Date: Mon, 3 Mar 2014 13:04:29 -0800 (PST) Subject: [infinispan-dev] Musings on ISPN/JGRPs OSI transport choices and ambitions In-Reply-To: <53147765.6000505@redhat.com> References: <1393611384264-4028925.post@n3.nabble.com> <5311A891.5090306@redhat.com> <013b01cf3537$fd62d5a0$f82880e0$@alumni.rutgers.edu> <53147765.6000505@redhat.com> Message-ID: <1393880669040-4028933.post@n3.nabble.com> > /I may be mistaken, but I think the OpenHFT solution for using SHM as an > IPC transport has big advantages over using the NIO bridges to Off-Heap > capabilities. / o NIO.2 will be used for UDP and TCP only, I'm not talking about the shmem transport. *Perfect.* -- View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/Musings-on-ISPN-JGRPs-OSI-transport-choices-and-ambitions-tp4028925p4028933.html Sent from the Infinispan Developer List mailing list archive at Nabble.com. From pedro at infinispan.org Tue Mar 4 06:14:25 2014 From: pedro at infinispan.org (Pedro Ruivo) Date: Tue, 04 Mar 2014 11:14:25 +0000 Subject: [infinispan-dev] Cross-Site replication with local caches? Message-ID: <5315B591.1000008@infinispan.org> Hi, simple question today: do we support cross-site replication for local caches (i.e. not clustered)? Cheers, Pedro From mmarkus at redhat.com Tue Mar 4 09:53:17 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Tue, 4 Mar 2014 14:53:17 +0000 Subject: [infinispan-dev] Cross-Site replication with local caches? In-Reply-To: <5315B591.1000008@infinispan.org> References: <5315B591.1000008@infinispan.org> Message-ID: <705A2FA6-9E99-4F87-BD79-89C401D54AF6@redhat.com> I don't see a reason no to. Lower prio imo. On Mar 4, 2014, at 11:14 AM, Pedro Ruivo wrote: > Hi, > > simple question today: do we support cross-site replication for local > caches (i.e. not clustered)? > > Cheers, > Pedro > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From emmanuel at hibernate.org Tue Mar 4 13:02:04 2014 From: emmanuel at hibernate.org (Emmanuel Bernard) Date: Tue, 4 Mar 2014 19:02:04 +0100 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> <10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org> Message-ID: On 28 Feb 2014, at 22:14, Mircea Markus wrote: > > On Feb 26, 2014, at 5:14 PM, Emmanuel Bernard wrote: > >> >> On 25 Feb 2014, at 16:08, Mircea Markus wrote: >> >>> >>> On Feb 25, 2014, at 9:28 AM, Emmanuel Bernard wrote: >>> >>>>> On 24 f?vr. 2014, at 17:39, Mircea Markus wrote: >>>>> >>>>> >>>>>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard wrote: >>>>>> >>>>>> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome. >>>>> >>>>> Curious to hear the whole story :-) >>>>> We cannot mandate all the suers to use OGM though, one of the reasons being OGM is not platform independent (hotrod). >>>> >>>> Then solve all the issues I have raised with a magic wand and come back to me when you have done it, I'm interested. >>> >>> People are going to use infinispan with one cache per entity, because it makes sense: >>> - different config (repl/dist | persistent/non-persistent) for different data types >>> - have map/reduce tasks running only the Person entires not on Dog as well, when you want to select (Person) where age > 18 >>> I don't see a reason to forbid this, on the contrary. The way I see it the relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be a better abstraction and should be recommended as such for the Java clients, but ultimately we're a general purpose storage engine that is available to different platforms as well. >>> >> >> I do disagree on your assessment. >> I did write a whole essay on why I think your view is problematic - I was getting tired of repeating myself ;P >> https://github.com/infinispan/infinispan/wiki/A-continuum-of-data-structure-and-query-complexity > > Thanks for writing this up, it is a good taxonomy of data storage schemes and querying. > >> >> To anecdotally answer your specific example, yes different configs for different entities is an interesting benefit but it has to outweigh the drawbacks. > > Using a single cache for all the types is practical at all :-) Just to expand my idea, people prefer using different caches for many reasons: > - security: Account cache has a different security requirements than the News cache > - data consistency: News is a non-transactional cache, Account require pessimistic XA transactions > - expiry: expire last year's news from the system. Not the same for Accounts > - availability: I want the Accounts cache to be backed up to another site. I don't want that for the News cache > - logical data grouping: mixing Accounts with News doesn't make sense. I might want to know which account appeared in the news, though. This kind of reasons reminds me in the RDBMS world of why people use different databases. In fact, I have had experience where literally News was a different database than Accounts. But again in this model, in one database, you have many tables. > >> If you have to do a map reduce for tasks so simple as age > 18, I think you system better have to be prepared to run gazillions of M/R jobs. > > I want to run a simple M/R job in the evening to determine who turns 18 tomorrow, to congratulate them. Once a day, not gazzilions of times, and I don't need to index the age filed just for that. Also when it comes to Map/Reduce, the drawback of holding all the data in a single cache is two-folded: > - performance: you iterate over the data that is not related to your query. If the data are never related (query wise), then we are in the database split category. Which is fine. But if some of your queries are related, what do you do? Deny the user the ability to do them? > - programming model: the Map/Reduce implementation has a dependency on both Dog and Person. If I add Cats to the cache, I'll need to update the M/R code to be aware of that as well. Same if I rename/remove Dog. Not nice. Well it?s called type safety, some people find it good ;) By the way, OGM does abstract a class from it?s representation in the datastore (including its name). But that?s another story ;) > >> I think that Dogs and any domestic animal is fundamentally related to humans - Person in your case. So queries involving both will be required - a cross cache M/R is not doable today AFAIK and even if it was, it?s still M/R and all its drawbacks. >> To me, the Cache API and Hot Rod are well suited for what I call self contained object graph (i.e. where Dog would be an embedded object of Person and not a separate Entity). In that situation, there is a single cache. > > I see where you come from but I don't think requiring people to use a single cache for all the entities is an option. Besides a natural logical separation, different data has different storage requirements: security, access patterns, consistency, durability, availability etc. For most of the non-trivial use cases, using a single cache just wont do. Let me rephrase and sum up my position. If you are storing unrelated data, use different caches if you want, that?s fine. If you are storing related data, store it as one root entity and embeddable objects (ie one cache entry for the whole graph) you can have one root entity per cache, that?s fine. If you are storing related entities and want to do queries on it: you are more or less screwed today with Infinispan and need a higher level abstraction. So _recommending_ one entity = one cache to me is wrong. It?s more one entity graph = one cache which is vastly different and has deep consequences (see my wiki page). > >> One cache per entity does make sense for API that do support what I call connected entities. Hibernate OGM specifically. > > OGM does a great job covering this, but it is very specific: java only and OOP - our C/S mode, hotrod specifically, is language independent and not OOP. Sure. Outside of Java consider that connected objects are not a supported feature of Infinispan. When I mean not supported, I mean: - no inter object query - no automatically handled relation between objects > Also I would like to comment on the following statements: > "I believe a cache API and Hot Rod are well suited to address up to the self contained object graph use case with a couple of relations maintained manually by the application but that cannot be queried. For the connected entities use case, only a high level paradigm is suited like JPA." > > I don't think storing object graphs should be under scrutiny here: Infinispan C/S mode (and there's where most of the client focus is BTW) has a schema (prtobuf) that does not support object graphs. I also think expecting people to use multiple caches for multiple data types is a solid assumption to start from. And here's me speculating: these data types have logical relations between them so people will ask for querying. In order to queries on multiple data types, you can either merge them together (your suggestion) or support some sort of new cross-cache indexing/querying/api. x-cache querying is more flexible and less restraining than merging data, but from what I understand from you has certain implementation challenges. There's no pressure to take a decision now around supporting queries spreading multiple caches - just something to keep an eye on when dealing with use cases/users. ATM merging data is the only solution available, let's wait and see if people ask for more. That?s fine I guess. My problem is that nowhere in your documentations do you guys list what can and cannot be done in that regard. And when you call a data structure an entity, it comes with an implicit bagage: relation between entities, polymorphism etc. So this needs to be clarified. Now circling back to the main topic. Doing cross-cache query on non related data (as they are in different caches) might be useful in some situations but is generally not interesting as we specifically don?t support joins (they are not related). So back to the root of that discussion now, what was the use case(s) that lead to believe we need cross-cache query support? Emmanuel -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140304/5ced0a53/attachment-0001.html From sanne at infinispan.org Tue Mar 4 14:39:44 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Tue, 4 Mar 2014 19:39:44 +0000 Subject: [infinispan-dev] Remove command -> OutdatedTopologyException Message-ID: Hi all, while running some tests today, I found many logs at ERROR level about org.infinispan.statetransfer.OutdatedTopologyException , all caused by Remove commands. It's not clear to me if these where all just logged, or if we're expecting these exceptions to be handled by the client.. either case I think this is an implementation details which should be handled automatically and not classified as a critical error. Sanne From sanne at infinispan.org Tue Mar 4 17:03:12 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Tue, 4 Mar 2014 22:03:12 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> <10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org> Message-ID: My impression is that this thread is boiling down to an identity problem of the project. It is a key/value store with some properties which make it a great fit for a Cache; I don't think there are doubts that relational queries are useful to have to resolve a certain category of problems, and no doubt someone will like to do that on Infinispan, but it should be stated clearly that this is beyond the scope of the project: we're not in the business of relational queries, because the fundamental building blocks are not fit for it. So this should be a clear no-go, as it should be clear for many other features which will be requested over time: just because we could implement some part doesn't make the project any better. That said, it is possible for us to provide some additional features which are a good fit to build on top of these fundamental building blocks, and also happen to make the fundamental API much more useful. I'm referring to the simple queries we have today, the various remoting protocols, management extensions, better eviction policies,.. So I'm not questioning many of the nice features we have today, but it's a dangerous game if the limitations of what it can or can not do start to blur. To make an example, it was four years ago that I suggested that Infinispan should not expose the full Map interface without actually implementing it in its full glory, and people regularly bang their head against it. The limitations are clear in the javadoc, but still the type is compatible with Map and so sometimes the client wasn't written with Infinispan in mind: just an example, but it's dangerous to expose a contract which we can't maintain in all its intricate complex forms. To quote Emmanuel: "So back to the root of that discussion now, what was the use case(s) that lead to believe we need cross-cache query support?" The intent I had in my original email was having several bullet points, and none are related to use cases but rather to internal efficiency concerns regarding sharing Search services, bootstrap time, configuration simplifications. The configuration one used to be quite painful but that improved recently, and for most of these points we can find alternative or even better solutions. I even have additional reasons to not do it: as previously discussed, Map/Reduce tasks should be able to output results to a different cache. That's reasonable as you need to "park" a potentially large output somewhere, but also doesn't imply cross-cache M/R search, which would be a requirement for this feature to be consistent. If any updates on different Caches - and therefore its content - should be considered highly unrelated: might be using a different distribution, different cachestore, a different Transaction. If any kind of soft-join was "implied" across caches, we would never be able to maintain consistency, a causality order (in future), a consistent backup/snapshot, etc.. a nightmare for users and for us to firefight "bugs". So my conclusion is that it's dangerous, confusing, and also complex to achieve: we won't implement it. Sanne On 4 March 2014 18:02, Emmanuel Bernard wrote: > > On 28 Feb 2014, at 22:14, Mircea Markus wrote: > > > On Feb 26, 2014, at 5:14 PM, Emmanuel Bernard > wrote: > > > On 25 Feb 2014, at 16:08, Mircea Markus wrote: > > > On Feb 25, 2014, at 9:28 AM, Emmanuel Bernard > wrote: > > On 24 f?vr. 2014, at 17:39, Mircea Markus wrote: > > > On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard > wrote: > > By the way, Mircea, Sanne and I had quite a long discussion about this one > and the idea of one cache per entity. It turns out that the right (as in > easy) solution does involve a higher level programming model like OGM > provides. You can simulate it yourself using the Infinispan APIs but it is > just cumbersome. > > > Curious to hear the whole story :-) > We cannot mandate all the suers to use OGM though, one of the reasons being > OGM is not platform independent (hotrod). > > > Then solve all the issues I have raised with a magic wand and come back to > me when you have done it, I'm interested. > > > People are going to use infinispan with one cache per entity, because it > makes sense: > - different config (repl/dist | persistent/non-persistent) for different > data types > - have map/reduce tasks running only the Person entires not on Dog as well, > when you want to select (Person) where age > 18 > I don't see a reason to forbid this, on the contrary. The way I see it the > relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be a > better abstraction and should be recommended as such for the Java clients, > but ultimately we're a general purpose storage engine that is available to > different platforms as well. > > > I do disagree on your assessment. > I did write a whole essay on why I think your view is problematic - I was > getting tired of repeating myself ;P > https://github.com/infinispan/infinispan/wiki/A-continuum-of-data-structure-and-query-complexity > > > Thanks for writing this up, it is a good taxonomy of data storage schemes > and querying. > > > To anecdotally answer your specific example, yes different configs for > different entities is an interesting benefit but it has to outweigh the > drawbacks. > > > Using a single cache for all the types is practical at all :-) Just to > expand my idea, people prefer using different caches for many reasons: > - security: Account cache has a different security requirements than the > News cache > - data consistency: News is a non-transactional cache, Account require > pessimistic XA transactions > - expiry: expire last year's news from the system. Not the same for Accounts > - availability: I want the Accounts cache to be backed up to another site. I > don't want that for the News cache > - logical data grouping: mixing Accounts with News doesn't make sense. I > might want to know which account appeared in the news, though. > > > This kind of reasons reminds me in the RDBMS world of why people use > different databases. > In fact, I have had experience where literally News was a different database > than Accounts. > > But again in this model, in one database, you have many tables. > > > If you have to do a map reduce for tasks so simple as age > 18, I think you > system better have to be prepared to run gazillions of M/R jobs. > > > I want to run a simple M/R job in the evening to determine who turns 18 > tomorrow, to congratulate them. Once a day, not gazzilions of times, and I > don't need to index the age filed just for that. Also when it comes to > Map/Reduce, the drawback of holding all the data in a single cache is > two-folded: > - performance: you iterate over the data that is not related to your query. > > > If the data are never related (query wise), then we are in the database > split category. Which is fine. But if some of your queries are related, what > do you do? Deny the user the ability to do them? > > - programming model: the Map/Reduce implementation has a dependency on both > Dog and Person. If I add Cats to the cache, I'll need to update the M/R code > to be aware of that as well. Same if I rename/remove Dog. Not nice. > > > Well it?s called type safety, some people find it good ;) > By the way, OGM does abstract a class from it?s representation in the > datastore (including its name). But that?s another story ;) > > > I think that Dogs and any domestic animal is fundamentally related to humans > - Person in your case. So queries involving both will be required - a cross > cache M/R is not doable today AFAIK and even if it was, it?s still M/R and > all its drawbacks. > To me, the Cache API and Hot Rod are well suited for what I call self > contained object graph (i.e. where Dog would be an embedded object of Person > and not a separate Entity). In that situation, there is a single cache. > > > I see where you come from but I don't think requiring people to use a single > cache for all the entities is an option. Besides a natural logical > separation, different data has different storage requirements: security, > access patterns, consistency, durability, availability etc. For most of the > non-trivial use cases, using a single cache just wont do. > > > Let me rephrase and sum up my position. > If you are storing unrelated data, use different caches if you want, that?s > fine. > If you are storing related data, store it as one root entity and embeddable > objects (ie one cache entry for the whole graph) > you can have one root entity per cache, that?s fine. > If you are storing related entities and want to do queries on it: you are > more or less screwed today with Infinispan and need a higher level > abstraction. > > So _recommending_ one entity = one cache to me is wrong. It?s more one > entity graph = one cache which is vastly different and has deep consequences > (see my wiki page). > > > One cache per entity does make sense for API that do support what I call > connected entities. Hibernate OGM specifically. > > > OGM does a great job covering this, but it is very specific: java only and > OOP - our C/S mode, hotrod specifically, is language independent and not > OOP. > > > Sure. Outside of Java consider that connected objects are not a supported > feature of Infinispan. When I mean not supported, I mean: > - no inter object query > - no automatically handled relation between objects > > Also I would like to comment on the following statements: > "I believe a cache API and Hot Rod are well suited to address up to the self > contained object graph use case with a couple of relations maintained > manually by the application but that cannot be queried. For the connected > entities use case, only a high level paradigm is suited like JPA." > > I don't think storing object graphs should be under scrutiny here: > Infinispan C/S mode (and there's where most of the client focus is BTW) has > a schema (prtobuf) that does not support object graphs. I also think > expecting people to use multiple caches for multiple data types is a solid > assumption to start from. And here's me speculating: these data types have > logical relations between them so people will ask for querying. In order to > queries on multiple data types, you can either merge them together (your > suggestion) or support some sort of new cross-cache indexing/querying/api. > x-cache querying is more flexible and less restraining than merging data, > but from what I understand from you has certain implementation challenges. > There's no pressure to take a decision now around supporting queries > spreading multiple caches - just something to keep an eye on when dealing > with use cases/users. ATM merging data is the only solution available, let's > wait and see if people ask for more. > > > That?s fine I guess. My problem is that nowhere in your documentations do > you guys list what can and cannot be done in that regard. And when you call > a data structure an entity, it comes with an implicit bagage: relation > between entities, polymorphism etc. > So this needs to be clarified. > > Now circling back to the main topic. Doing cross-cache query on non related > data (as they are in different caches) might be useful in some situations > but is generally not interesting as we specifically don?t support joins > (they are not related). > So back to the root of that discussion now, what was the use case(s) that > lead to believe we need cross-cache query support? > > Emmanuel > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From ales.justin at gmail.com Wed Mar 5 06:21:13 2014 From: ales.justin at gmail.com (Ales Justin) Date: Wed, 5 Mar 2014 12:21:13 +0100 Subject: [infinispan-dev] grouping and GridFS Message-ID: <6146D41E-DC29-4729-85AC-BD10B29DDAEA@gmail.com> Just having a discussion with Bela about this. I guess having "grouping" on GridFS' content would make sense. e.g. put all chunks on the same node Is this doable? Afaiu, we would need to have some sort of "similarity" function for content's metadata? -Ales From isavin at redhat.com Wed Mar 5 07:20:15 2014 From: isavin at redhat.com (Ion Savin) Date: Wed, 05 Mar 2014 14:20:15 +0200 Subject: [infinispan-dev] Infinispan HotRod C# Client 7.0.0.Alpha1 Message-ID: <5317167F.50204@redhat.com> Hi all, Infinispan HotRod C# Client 7.0.0.Alpha1 is now available. This new version is a C# wrapper over the native client and brings support for L2 and L3 client intelligence levels in addition to L1. As more features are added to the native client they will make their way into the C# client as well. You can find the the .msi installer on the download page [1] and the source code on GitHub [2]. Please give it a try and let us know what you think. [1] http://infinispan.org/hotrod-clients/ [2] https://github.com/infinispan/dotnet-client Regards, Ion Savin From sanne at infinispan.org Wed Mar 5 07:31:52 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Wed, 5 Mar 2014 12:31:52 +0000 Subject: [infinispan-dev] grouping and GridFS In-Reply-To: <6146D41E-DC29-4729-85AC-BD10B29DDAEA@gmail.com> References: <6146D41E-DC29-4729-85AC-BD10B29DDAEA@gmail.com> Message-ID: Why do you chunk at all if you want them stored together? I only use chunking if I can't avoid it, to spread large files. On 5 Mar 2014 11:22, "Ales Justin" wrote: > Just having a discussion with Bela about this. > > I guess having "grouping" on GridFS' content would make sense. > e.g. put all chunks on the same node > > Is this doable? > Afaiu, we would need to have some sort of "similarity" function for > content's metadata? > > -Ales > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140305/33386150/attachment.html From mmarkus at redhat.com Wed Mar 5 09:48:51 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Wed, 5 Mar 2014 14:48:51 +0000 Subject: [infinispan-dev] Infinispan HotRod C# Client 7.0.0.Alpha1 In-Reply-To: <5317167F.50204@redhat.com> References: <5317167F.50204@redhat.com> Message-ID: great job! On Mar 5, 2014, at 12:20 PM, Ion Savin wrote: > Hi all, > > Infinispan HotRod C# Client 7.0.0.Alpha1 is now available. > > This new version is a C# wrapper over the native client and brings > support for L2 and L3 client intelligence levels in addition to L1. As > more features are added to the native client they will make their way > into the C# client as well. > > You can find the the .msi installer on the download page [1] and the > source code on GitHub [2]. > > Please give it a try and let us know what you think. > > [1] http://infinispan.org/hotrod-clients/ > [2] https://github.com/infinispan/dotnet-client > > Regards, > Ion Savin > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From ales.justin at gmail.com Wed Mar 5 09:54:09 2014 From: ales.justin at gmail.com (Ales Justin) Date: Wed, 5 Mar 2014 15:54:09 +0100 Subject: [infinispan-dev] grouping and GridFS In-Reply-To: References: <6146D41E-DC29-4729-85AC-BD10B29DDAEA@gmail.com> Message-ID: <07D6F794-191A-45DA-B94C-A61356727576@gmail.com> > Why do you chunk at all if you want them stored together? > > I only use chunking if I can't avoid it, to spread large files. > That's what's GridFS all about -- store very large files. Hence chunking. So you're saying we should know the limit of what we can store on 1 node, if bigger, spread, therefore no grouping. -Ales > On 5 Mar 2014 11:22, "Ales Justin" wrote: > Just having a discussion with Bela about this. > > I guess having "grouping" on GridFS' content would make sense. > e.g. put all chunks on the same node > > Is this doable? > Afaiu, we would need to have some sort of "similarity" function for content's metadata? > > -Ales > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140305/cc27af64/attachment-0001.html From sanne at infinispan.org Wed Mar 5 10:01:49 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Wed, 5 Mar 2014 15:01:49 +0000 Subject: [infinispan-dev] grouping and GridFS In-Reply-To: <07D6F794-191A-45DA-B94C-A61356727576@gmail.com> References: <6146D41E-DC29-4729-85AC-BD10B29DDAEA@gmail.com> <07D6F794-191A-45DA-B94C-A61356727576@gmail.com> Message-ID: On 5 March 2014 14:54, Ales Justin wrote: > Why do you chunk at all if you want them stored together? > > I only use chunking if I can't avoid it, to spread large files. > > That's what's GridFS all about -- store very large files. > Hence chunking. > > So you're saying we should know the limit of what we can store on 1 node, > if bigger, spread, therefore no grouping. Yes, but a very conservative approximation would be good enough: you don't need hardware specifications to figure out a reasonable threshold. If I had to make up a number out of thin air, I'd pick something around 10MB: any file below that threshold would not use chunking and be nicely stored together to be retrieved efficiently; beyond that start distributing. (this figure could probably use some testing if you're looking into performance) Sanne > > -Ales > > On 5 Mar 2014 11:22, "Ales Justin" wrote: >> >> Just having a discussion with Bela about this. >> >> I guess having "grouping" on GridFS' content would make sense. >> e.g. put all chunks on the same node >> >> Is this doable? >> Afaiu, we would need to have some sort of "similarity" function for >> content's metadata? >> >> -Ales >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From mudokonman at gmail.com Wed Mar 5 10:02:23 2014 From: mudokonman at gmail.com (William Burns) Date: Wed, 5 Mar 2014 10:02:23 -0500 Subject: [infinispan-dev] Cluster Listeners 7.0.0.Alpha1 Message-ID: Hello everyone, I am happy to announce that the latest Infinispan 7.0.0.Alpha1 build has the first pass of Cluster Listeners implemented. You can read all about the details at the blog post [1]. You can get the latest build of Infinispan from our site [2]. Try it out and let us know what you guys think. Happy listening! - Will [1] http://blog.infinispan.org/2014/03/embedded-cluster-listeners-in.html [2] http://infinispan.org/download/ From ales.justin at gmail.com Wed Mar 5 10:04:07 2014 From: ales.justin at gmail.com (Ales Justin) Date: Wed, 5 Mar 2014 16:04:07 +0100 Subject: [infinispan-dev] grouping and GridFS In-Reply-To: References: <6146D41E-DC29-4729-85AC-BD10B29DDAEA@gmail.com> <07D6F794-191A-45DA-B94C-A61356727576@gmail.com> Message-ID: <88DE0A43-1D10-48B9-834A-6C99870F6A68@gmail.com> But yeah, the moment I start chunking, I would still like to have the grouped -- same node. Or that doesn't make sense? (hence having this discussion ;-) -Ales On 05 Mar 2014, at 16:01, Sanne Grinovero wrote: > On 5 March 2014 14:54, Ales Justin wrote: >> Why do you chunk at all if you want them stored together? >> >> I only use chunking if I can't avoid it, to spread large files. >> >> That's what's GridFS all about -- store very large files. >> Hence chunking. >> >> So you're saying we should know the limit of what we can store on 1 node, >> if bigger, spread, therefore no grouping. > > Yes, but a very conservative approximation would be good enough: you > don't need hardware specifications to figure out a reasonable > threshold. > If I had to make up a number out of thin air, I'd pick something > around 10MB: any file below that threshold would not use chunking and > be nicely stored together to be retrieved efficiently; beyond that > start distributing. > (this figure could probably use some testing if you're looking into performance) > > Sanne > >> >> -Ales >> >> On 5 Mar 2014 11:22, "Ales Justin" wrote: >>> >>> Just having a discussion with Bela about this. >>> >>> I guess having "grouping" on GridFS' content would make sense. >>> e.g. put all chunks on the same node >>> >>> Is this doable? >>> Afaiu, we would need to have some sort of "similarity" function for >>> content's metadata? >>> >>> -Ales >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From bban at redhat.com Wed Mar 5 10:26:40 2014 From: bban at redhat.com (Bela Ban) Date: Wed, 05 Mar 2014 16:26:40 +0100 Subject: [infinispan-dev] Since it's almost Friday: the next code name for Infinispan Message-ID: <53174230.5030106@redhat.com> https://www.facebook.com/HumanFishBrewery -- Bela Ban, JGroups lead (http://www.jgroups.org) From vblagoje at redhat.com Wed Mar 5 10:31:26 2014 From: vblagoje at redhat.com (Vladimir Blagojevic) Date: Wed, 05 Mar 2014 10:31:26 -0500 Subject: [infinispan-dev] Since it's almost Friday: the next code name for Infinispan In-Reply-To: <53174230.5030106@redhat.com> References: <53174230.5030106@redhat.com> Message-ID: <5317434E.2050407@redhat.com> +1 Are you in Slovenia? On 3/5/2014, 10:26 AM, Bela Ban wrote: > https://www.facebook.com/HumanFishBrewery From ales.justin at gmail.com Wed Mar 5 10:34:47 2014 From: ales.justin at gmail.com (Ales Justin) Date: Wed, 5 Mar 2014 16:34:47 +0100 Subject: [infinispan-dev] Since it's almost Friday: the next code name for Infinispan In-Reply-To: <5317434E.2050407@redhat.com> References: <53174230.5030106@redhat.com> <5317434E.2050407@redhat.com> Message-ID: > +1 Are you in Slovenia? No, just an IRC chat with me. :-) To add to this link: * http://en.wikipedia.org/wiki/Olm > On 3/5/2014, 10:26 AM, Bela Ban wrote: >> https://www.facebook.com/HumanFishBrewery -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140305/d185d4ef/attachment.html From bban at redhat.com Wed Mar 5 10:39:01 2014 From: bban at redhat.com (Bela Ban) Date: Wed, 05 Mar 2014 16:39:01 +0100 Subject: [infinispan-dev] Since it's almost Friday: the next code name for Infinispan In-Reply-To: <5317434E.2050407@redhat.com> References: <53174230.5030106@redhat.com> <5317434E.2050407@redhat.com> Message-ID: <53174515.8030002@redhat.com> No, my good friend Ales sent me that link... On 05/03/14 16:31, Vladimir Blagojevic wrote: > +1 Are you in Slovenia? > On 3/5/2014, 10:26 AM, Bela Ban wrote: >> https://www.facebook.com/HumanFishBrewery > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -- Bela Ban, JGroups lead (http://www.jgroups.org) From mmarkus at redhat.com Wed Mar 5 11:29:27 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Wed, 5 Mar 2014 16:29:27 +0000 Subject: [infinispan-dev] grouping and GridFS In-Reply-To: <88DE0A43-1D10-48B9-834A-6C99870F6A68@gmail.com> References: <6146D41E-DC29-4729-85AC-BD10B29DDAEA@gmail.com> <07D6F794-191A-45DA-B94C-A61356727576@gmail.com> <88DE0A43-1D10-48B9-834A-6C99870F6A68@gmail.com> Message-ID: On Mar 5, 2014, at 3:04 PM, Ales Justin wrote: > But yeah, the moment I start chunking, I would still like to have the grouped -- same node. > Or that doesn't make sense? > (hence having this discussion ;-) > > -Ales > > On 05 Mar 2014, at 16:01, Sanne Grinovero wrote: > >> On 5 March 2014 14:54, Ales Justin wrote: >>> Why do you chunk at all if you want them stored together? >>> >>> I only use chunking if I can't avoid it, to spread large files. >>> >>> That's what's GridFS all about -- store very large files. >>> Hence chunking. >>> >>> So you're saying we should know the limit of what we can store on 1 node, >>> if bigger, spread, therefore no grouping. >> >> Yes, but a very conservative approximation would be good enough: you >> don't need hardware specifications to figure out a reasonable >> threshold. >> If I had to make up a number out of thin air, I'd pick something >> around 10MB: any file below that threshold would not use chunking and >> be nicely stored together to be retrieved efficiently; beyond that >> start distributing. I don't think that if they are collocated, fetching all the segments to another node brings better performance. Might be quite the opposite actually, as having the segments distributed allows fetching them in parallel. >> (this figure could probably use some testing if you're looking into performance) >> >> Sanne >> >>> >>> -Ales >>> >>> On 5 Mar 2014 11:22, "Ales Justin" wrote: >>>> >>>> Just having a discussion with Bela about this. >>>> >>>> I guess having "grouping" on GridFS' content would make sense. >>>> e.g. put all chunks on the same node >>>> >>>> Is this doable? >>>> Afaiu, we would need to have some sort of "similarity" function for >>>> content's metadata? >>>> >>>> -Ales >>>> >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From sanne at infinispan.org Wed Mar 5 11:50:48 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Wed, 5 Mar 2014 16:50:48 +0000 Subject: [infinispan-dev] grouping and GridFS In-Reply-To: References: <6146D41E-DC29-4729-85AC-BD10B29DDAEA@gmail.com> <07D6F794-191A-45DA-B94C-A61356727576@gmail.com> <88DE0A43-1D10-48B9-834A-6C99870F6A68@gmail.com> Message-ID: On 5 March 2014 16:29, Mircea Markus wrote: > > On Mar 5, 2014, at 3:04 PM, Ales Justin wrote: > >> But yeah, the moment I start chunking, I would still like to have the grouped -- same node. >> Or that doesn't make sense? >> (hence having this discussion ;-) >> >> -Ales >> >> On 05 Mar 2014, at 16:01, Sanne Grinovero wrote: >> >>> On 5 March 2014 14:54, Ales Justin wrote: >>>> Why do you chunk at all if you want them stored together? >>>> >>>> I only use chunking if I can't avoid it, to spread large files. >>>> >>>> That's what's GridFS all about -- store very large files. >>>> Hence chunking. >>>> >>>> So you're saying we should know the limit of what we can store on 1 node, >>>> if bigger, spread, therefore no grouping. >>> >>> Yes, but a very conservative approximation would be good enough: you >>> don't need hardware specifications to figure out a reasonable >>> threshold. >>> If I had to make up a number out of thin air, I'd pick something >>> around 10MB: any file below that threshold would not use chunking and >>> be nicely stored together to be retrieved efficiently; beyond that >>> start distributing. > > I don't think that if they are collocated, fetching all the segments to another node brings better performance. Might be quite the opposite actually, as having the segments distributed allows fetching them in parallel. +1, although we don't do parallel fetching yet. My opinion came from an angle of better spreading the data among the nodes: multiple small segments are better than say 2 files of one terabyte each, which would blow up any single node. But this advice obviously depends on the application. If you know that you will have many files, and you want to use other locality tricks (like run an executor to process all content of a file), then you obviously would have an advantage of keeping them on the same node. In that case though I'd question usage of chunking altogether. Sanne > >>> (this figure could probably use some testing if you're looking into performance) >>> >>> Sanne >>> >>>> >>>> -Ales >>>> >>>> On 5 Mar 2014 11:22, "Ales Justin" wrote: >>>>> >>>>> Just having a discussion with Bela about this. >>>>> >>>>> I guess having "grouping" on GridFS' content would make sense. >>>>> e.g. put all chunks on the same node >>>>> >>>>> Is this doable? >>>>> Afaiu, we would need to have some sort of "similarity" function for >>>>> content's metadata? >>>>> >>>>> -Ales >>>>> >>>>> >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev at lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>>> >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From dereed at redhat.com Wed Mar 5 11:57:51 2014 From: dereed at redhat.com (Dennis Reed) Date: Wed, 05 Mar 2014 10:57:51 -0600 Subject: [infinispan-dev] grouping and GridFS In-Reply-To: <88DE0A43-1D10-48B9-834A-6C99870F6A68@gmail.com> References: <6146D41E-DC29-4729-85AC-BD10B29DDAEA@gmail.com> <07D6F794-191A-45DA-B94C-A61356727576@gmail.com> <88DE0A43-1D10-48B9-834A-6C99870F6A68@gmail.com> Message-ID: <5317578F.6070706@redhat.com> It doesn't make sense. :) The reason grid file systems exist is to distribute the file around the cluster. (both for performance so the network interface of a single server isn't a bottleneck, and for disk space so the available space on a single server isn't a bottlenect) If you don't want to distribute the file, a grid filesystem probably isn't the right choice. -Dennis On 03/05/2014 09:04 AM, Ales Justin wrote: > But yeah, the moment I start chunking, I would still like to have the grouped -- same node. > Or that doesn't make sense? > (hence having this discussion ;-) > > -Ales > > On 05 Mar 2014, at 16:01, Sanne Grinovero wrote: > >> On 5 March 2014 14:54, Ales Justin wrote: >>> Why do you chunk at all if you want them stored together? >>> >>> I only use chunking if I can't avoid it, to spread large files. >>> >>> That's what's GridFS all about -- store very large files. >>> Hence chunking. >>> >>> So you're saying we should know the limit of what we can store on 1 node, >>> if bigger, spread, therefore no grouping. >> Yes, but a very conservative approximation would be good enough: you >> don't need hardware specifications to figure out a reasonable >> threshold. >> If I had to make up a number out of thin air, I'd pick something >> around 10MB: any file below that threshold would not use chunking and >> be nicely stored together to be retrieved efficiently; beyond that >> start distributing. >> (this figure could probably use some testing if you're looking into performance) >> >> Sanne >> >>> -Ales >>> >>> On 5 Mar 2014 11:22, "Ales Justin" wrote: >>>> Just having a discussion with Bela about this. >>>> >>>> I guess having "grouping" on GridFS' content would make sense. >>>> e.g. put all chunks on the same node >>>> >>>> Is this doable? >>>> Afaiu, we would need to have some sort of "similarity" function for >>>> content's metadata? >>>> >>>> -Ales >>>> From mmarkus at redhat.com Wed Mar 5 12:00:26 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Wed, 5 Mar 2014 17:00:26 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> <10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org> Message-ID: On Mar 4, 2014, at 10:03 PM, Sanne Grinovero wrote: > My impression is that this thread is boiling down to an identity > problem of the project. > > It is a key/value store with some properties which make it a great fit > for a Cache; I don't think there are doubts that relational queries > are useful to have to resolve a certain category of problems, and no > doubt someone will like to do that on Infinispan, but it should be > stated clearly that this is beyond the scope of the project: we're not > in the business of relational queries, > because the fundamental building blocks are not fit for it. > So this should be a clear no-go, > as it should be clear for many other features which will be requested > over time: just because we could implement some part doesn't make the > project any better. > > That said, it is possible for us to provide some additional features > which are a good fit to build on top of these fundamental building > blocks, and also happen to make the fundamental API much more useful. > I'm referring to the simple queries we have today, the various > remoting protocols, management extensions, better eviction policies,.. > So I'm not questioning many of the nice features we have today, but > it's a dangerous game if the limitations of what it can or can not do > start to blur. > > To make an example, it was four years ago that I suggested that > Infinispan should not expose the full Map interface without actually > implementing it in its full glory, and people regularly bang their > head against it. The limitations are clear in the javadoc, but still > the type is compatible with Map and so sometimes the client wasn't > written with Infinispan in mind: just an example, but it's dangerous > to expose a contract which we can't maintain in all its intricate > complex forms. Implementing the Map interface was not a good idea and it created a lot of confusion indeed; We've also chosen the worst way of doing it, by only returning the local data for keySet/valueSet operations (that can be controlled by users through Flags) - but that's another discussion. > > To quote Emmanuel: "So back to the root of that discussion now, what > was the use case(s) that lead to believe we need cross-cache query > support?" > > The intent I had in my original email was having several bullet > points, and none are related to use cases but rather to internal > efficiency concerns regarding sharing Search services, bootstrap time, > configuration simplifications. The configuration one used to be quite > painful but that improved recently, and for most of these points we > can find alternative or even better solutions. above that, cross cache querying is a useful feature. > > I even have additional reasons to not do it: as previously discussed, > Map/Reduce tasks should be able to output results to a different > cache. That's reasonable as you need to "park" a potentially large > output somewhere, but also doesn't imply cross-cache M/R search, By "cross-cache M/R search" you mean implementing cross-cache querying by using Map/Reduce? That would have terrible performance. > which would be a requirement for this feature to be consistent. If any > updates on different Caches - and therefore its content - should be > considered highly unrelated: might be using a different distribution, > different cachestore, a different Transaction. If any kind of > soft-join was "implied" across caches, we would never be able to > maintain consistency, a causality order (in future), a consistent > backup/snapshot, etc.. a nightmare for users and for us to firefight > "bugs". > > So my conclusion is that it's dangerous, confusing, and also complex > to achieve: we won't implement it. I agree with your points on the limitations this feature would have. On top of that I'm not aware of many users asking for this ATM, so would not justify the effort. > > Sanne > > > On 4 March 2014 18:02, Emmanuel Bernard wrote: >> >> On 28 Feb 2014, at 22:14, Mircea Markus wrote: >> >> >> On Feb 26, 2014, at 5:14 PM, Emmanuel Bernard >> wrote: >> >> >> On 25 Feb 2014, at 16:08, Mircea Markus wrote: >> >> >> On Feb 25, 2014, at 9:28 AM, Emmanuel Bernard >> wrote: >> >> On 24 f?vr. 2014, at 17:39, Mircea Markus wrote: >> >> >> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard >> wrote: >> >> By the way, Mircea, Sanne and I had quite a long discussion about this one >> and the idea of one cache per entity. It turns out that the right (as in >> easy) solution does involve a higher level programming model like OGM >> provides. You can simulate it yourself using the Infinispan APIs but it is >> just cumbersome. >> >> >> Curious to hear the whole story :-) >> We cannot mandate all the suers to use OGM though, one of the reasons being >> OGM is not platform independent (hotrod). >> >> >> Then solve all the issues I have raised with a magic wand and come back to >> me when you have done it, I'm interested. >> >> >> People are going to use infinispan with one cache per entity, because it >> makes sense: >> - different config (repl/dist | persistent/non-persistent) for different >> data types >> - have map/reduce tasks running only the Person entires not on Dog as well, >> when you want to select (Person) where age > 18 >> I don't see a reason to forbid this, on the contrary. The way I see it the >> relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be a >> better abstraction and should be recommended as such for the Java clients, >> but ultimately we're a general purpose storage engine that is available to >> different platforms as well. >> >> >> I do disagree on your assessment. >> I did write a whole essay on why I think your view is problematic - I was >> getting tired of repeating myself ;P >> https://github.com/infinispan/infinispan/wiki/A-continuum-of-data-structure-and-query-complexity >> >> >> Thanks for writing this up, it is a good taxonomy of data storage schemes >> and querying. >> >> >> To anecdotally answer your specific example, yes different configs for >> different entities is an interesting benefit but it has to outweigh the >> drawbacks. >> >> >> Using a single cache for all the types is practical at all :-) Just to >> expand my idea, people prefer using different caches for many reasons: >> - security: Account cache has a different security requirements than the >> News cache >> - data consistency: News is a non-transactional cache, Account require >> pessimistic XA transactions >> - expiry: expire last year's news from the system. Not the same for Accounts >> - availability: I want the Accounts cache to be backed up to another site. I >> don't want that for the News cache >> - logical data grouping: mixing Accounts with News doesn't make sense. I >> might want to know which account appeared in the news, though. >> >> >> This kind of reasons reminds me in the RDBMS world of why people use >> different databases. >> In fact, I have had experience where literally News was a different database >> than Accounts. >> >> But again in this model, in one database, you have many tables. >> >> >> If you have to do a map reduce for tasks so simple as age > 18, I think you >> system better have to be prepared to run gazillions of M/R jobs. >> >> >> I want to run a simple M/R job in the evening to determine who turns 18 >> tomorrow, to congratulate them. Once a day, not gazzilions of times, and I >> don't need to index the age filed just for that. Also when it comes to >> Map/Reduce, the drawback of holding all the data in a single cache is >> two-folded: >> - performance: you iterate over the data that is not related to your query. >> >> >> If the data are never related (query wise), then we are in the database >> split category. Which is fine. But if some of your queries are related, what >> do you do? Deny the user the ability to do them? >> >> - programming model: the Map/Reduce implementation has a dependency on both >> Dog and Person. If I add Cats to the cache, I'll need to update the M/R code >> to be aware of that as well. Same if I rename/remove Dog. Not nice. >> >> >> Well it?s called type safety, some people find it good ;) >> By the way, OGM does abstract a class from it?s representation in the >> datastore (including its name). But that?s another story ;) >> >> >> I think that Dogs and any domestic animal is fundamentally related to humans >> - Person in your case. So queries involving both will be required - a cross >> cache M/R is not doable today AFAIK and even if it was, it?s still M/R and >> all its drawbacks. >> To me, the Cache API and Hot Rod are well suited for what I call self >> contained object graph (i.e. where Dog would be an embedded object of Person >> and not a separate Entity). In that situation, there is a single cache. >> >> >> I see where you come from but I don't think requiring people to use a single >> cache for all the entities is an option. Besides a natural logical >> separation, different data has different storage requirements: security, >> access patterns, consistency, durability, availability etc. For most of the >> non-trivial use cases, using a single cache just wont do. >> >> >> Let me rephrase and sum up my position. >> If you are storing unrelated data, use different caches if you want, that?s >> fine. >> If you are storing related data, store it as one root entity and embeddable >> objects (ie one cache entry for the whole graph) >> you can have one root entity per cache, that?s fine. >> If you are storing related entities and want to do queries on it: you are >> more or less screwed today with Infinispan and need a higher level >> abstraction. >> >> So _recommending_ one entity = one cache to me is wrong. It?s more one >> entity graph = one cache which is vastly different and has deep consequences >> (see my wiki page). >> >> >> One cache per entity does make sense for API that do support what I call >> connected entities. Hibernate OGM specifically. >> >> >> OGM does a great job covering this, but it is very specific: java only and >> OOP - our C/S mode, hotrod specifically, is language independent and not >> OOP. >> >> >> Sure. Outside of Java consider that connected objects are not a supported >> feature of Infinispan. When I mean not supported, I mean: >> - no inter object query >> - no automatically handled relation between objects >> >> Also I would like to comment on the following statements: >> "I believe a cache API and Hot Rod are well suited to address up to the self >> contained object graph use case with a couple of relations maintained >> manually by the application but that cannot be queried. For the connected >> entities use case, only a high level paradigm is suited like JPA." >> >> I don't think storing object graphs should be under scrutiny here: >> Infinispan C/S mode (and there's where most of the client focus is BTW) has >> a schema (prtobuf) that does not support object graphs. I also think >> expecting people to use multiple caches for multiple data types is a solid >> assumption to start from. And here's me speculating: these data types have >> logical relations between them so people will ask for querying. In order to >> queries on multiple data types, you can either merge them together (your >> suggestion) or support some sort of new cross-cache indexing/querying/api. >> x-cache querying is more flexible and less restraining than merging data, >> but from what I understand from you has certain implementation challenges. >> There's no pressure to take a decision now around supporting queries >> spreading multiple caches - just something to keep an eye on when dealing >> with use cases/users. ATM merging data is the only solution available, let's >> wait and see if people ask for more. >> >> >> That?s fine I guess. My problem is that nowhere in your documentations do >> you guys list what can and cannot be done in that regard. And when you call >> a data structure an entity, it comes with an implicit bagage: relation >> between entities, polymorphism etc. >> So this needs to be clarified. >> >> Now circling back to the main topic. Doing cross-cache query on non related >> data (as they are in different caches) might be useful in some situations >> but is generally not interesting as we specifically don?t support joins >> (they are not related). >> So back to the root of that discussion now, what was the use case(s) that >> lead to believe we need cross-cache query support? >> >> Emmanuel >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Wed Mar 5 11:30:30 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Wed, 5 Mar 2014 16:30:30 +0000 Subject: [infinispan-dev] Since it's almost Friday: the next code name for Infinispan In-Reply-To: <53174515.8030002@redhat.com> References: <53174230.5030106@redhat.com> <5317434E.2050407@redhat.com> <53174515.8030002@redhat.com> Message-ID: <9D17D7C8-1712-4A0D-B8E4-80EEC6CBC215@redhat.com> +1 :-) On Mar 5, 2014, at 3:39 PM, Bela Ban wrote: > No, my good friend Ales sent me that link... > > On 05/03/14 16:31, Vladimir Blagojevic wrote: >> +1 Are you in Slovenia? >> On 3/5/2014, 10:26 AM, Bela Ban wrote: >>> https://www.facebook.com/HumanFishBrewery >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> > > -- > Bela Ban, JGroups lead (http://www.jgroups.org) > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Wed Mar 5 12:16:47 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Wed, 5 Mar 2014 17:16:47 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> <10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org> Message-ID: <1A465602-8624-4018-A317-5A9D74CD82AD@redhat.com> Sanne came with a good follow up to this email, just some small clarifications: On Mar 4, 2014, at 6:02 PM, Emmanuel Bernard wrote: >>> If you have to do a map reduce for tasks so simple as age > 18, I think you system better have to be prepared to run gazillions of M/R jobs. >> >> I want to run a simple M/R job in the evening to determine who turns 18 tomorrow, to congratulate them. Once a day, not gazzilions of times, and I don't need to index the age filed just for that. Also when it comes to Map/Reduce, the drawback of holding all the data in a single cache is two-folded: >> - performance: you iterate over the data that is not related to your query. > > If the data are never related (query wise), then we are in the database split category. Which is fine. But if some of your queries are related, what do you do? Deny the user the ability to do them? Here's where cross-site query would have been used. As Sanne suggested (next post) these limitations overcome the advantages. > >> - programming model: the Map/Reduce implementation has a dependency on both Dog and Person. If I add Cats to the cache, I'll need to update the M/R code to be aware of that as well. Same if I rename/remove Dog. Not nice. > > Well it?s called type safety, some people find it good ;) If anything, this model reduces type safety and reusability. E.g. say you want a M/R task to see how many Persons speak French. With the single cache model(both Dog and Person int he Cache) it would look something like: a) //pseudocode map (String k, Mammal value) { if (value instanceof Person)) { //this is the ugly part if (((Person)value).speaks("French")) ... } else { //ignore it, it's an Dog } } Same thing written for a Cache: b) map (String k, Person value) { if (value.speaks("French")) ... } I don't think people would prefer writing a) instead of b) ;) Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From ttarrant at redhat.com Wed Mar 5 15:12:19 2014 From: ttarrant at redhat.com (Tristan Tarrant) Date: Wed, 05 Mar 2014 21:12:19 +0100 Subject: [infinispan-dev] JBoss Marshalling 1.4.4.Final + ASL + Infinispan 6 Message-ID: <53178523.50700@redhat.com> Dear all, David Lloyd has kindly release JBoss Marshalling 1.4.4.Final under the ASL: this means that all of our core deps are now ASL. Should we release a 6.0.2 with this dependency change only ? There are ASL projects who cannot release without this [1]. Tristan [1] https://community.jboss.org/message/859503?et=watches.email.thread#859503 From david.lloyd at redhat.com Wed Mar 5 15:58:36 2014 From: david.lloyd at redhat.com (David M. Lloyd) Date: Wed, 05 Mar 2014 14:58:36 -0600 Subject: [infinispan-dev] JBoss Marshalling 1.4.4.Final + ASL + Infinispan 6 In-Reply-To: <53178523.50700@redhat.com> References: <53178523.50700@redhat.com> Message-ID: <53178FFC.3040705@redhat.com> On 03/05/2014 02:12 PM, Tristan Tarrant wrote: > Dear all, > > David Lloyd has kindly release JBoss Marshalling 1.4.4.Final under the > ASL: this means that all of our core deps are now ASL. Should we release > a 6.0.2 with this dependency change only ? FWIW there's also a nice bugfix in there: the long-standing https://issues.jboss.org/browse/JBMAR-120 was recently solved, allowing objects which are Serializable but with read/writeObject methods which failed to read/write their fields to serialize safely, where before they would cause an exception. -- - DML From mark.k at ecobee.com Wed Mar 5 17:14:35 2014 From: mark.k at ecobee.com (Mark Kowaliszyn) Date: Wed, 5 Mar 2014 16:14:35 -0600 Subject: [infinispan-dev] Problem with HotRod cache updates Message-ID: <22DE63115B414846AF484B5EA093AD32B6DA88A159@AUSP01VMBX31.collaborationhost.net> Hi, I am using the RemoteCacheManager to access a cache on my cluster. Getting an entry and updating it on the client works no problem, however on the server, the cache receives an entry with a byte array cache key, rather than the original string I put. My server results in the following listener output when the cache put occurs: DEBUG 0305-16:49:16:789 Cache (thermostatCache) entry modified: [3, 62, 4, 49, 48, 48, 48] (local=true) {foundation.infinispan.listener.CacheLoggingListener.entryModified} [HotRod-HotRodServerServerWorker-19]({}) DEBUG 0305-16:49:16:794 ++++ string: [B at 690edaf3, new string: >1000 {foundation.infinispan.listener.CacheLoggingListener.entryModified} [HotRod-HotRodServerServerWorker-19]({}) The cache key in question is "1000". The output above is from a cache listener and the output is from CacheModifiedEvent.getKey(). I have some additional output to first do a toString() on the key, and a new string decoding the byte array. There are a few bytes prefixing the byte array which are not part of the cache key. In the cut/paste here, there are 2 characters missing, the "new string" has 2 unprintable character one before and one after the ">" character. What are the extra bytes in key? Why is the key inserted as bytes and not a string? The end effect is that my cluster cache gets a new junk entry in the cache with every client put. I did not see any documentation where it indicated I might need a custom key serializer. I am using strings for cache keys, nothing special. Updating the cache from the cluster-local cache works perfectly. Thanks, Mark From emmanuel at hibernate.org Thu Mar 6 04:21:40 2014 From: emmanuel at hibernate.org (Emmanuel Bernard) Date: Thu, 6 Mar 2014 10:21:40 +0100 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <1A465602-8624-4018-A317-5A9D74CD82AD@redhat.com> References: <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> <10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org> <1A465602-8624-4018-A317-5A9D74CD82AD@redhat.com> Message-ID: <20140306092140.GI31430@hibernate.org> On Wed 2014-03-05 17:16, Mircea Markus wrote: > Sanne came with a good follow up to this email, just some small clarifications: > > On Mar 4, 2014, at 6:02 PM, Emmanuel Bernard wrote: > > >>> If you have to do a map reduce for tasks so simple as age > 18, I think you system better have to be prepared to run gazillions of M/R jobs. > >> > >> I want to run a simple M/R job in the evening to determine who turns 18 tomorrow, to congratulate them. Once a day, not gazzilions of times, and I don't need to index the age filed just for that. Also when it comes to Map/Reduce, the drawback of holding all the data in a single cache is two-folded: > >> - performance: you iterate over the data that is not related to your query. > > > > If the data are never related (query wise), then we are in the database split category. Which is fine. But if some of your queries are related, what do you do? Deny the user the ability to do them? > > Here's where cross-site query would have been used. As Sanne suggested (next post) these limitations overcome the advantages. No. Cross-cache query if implemented will not support (efficiently enough) that kind of query. Cf my wiki page. > > > > >> - programming model: the Map/Reduce implementation has a dependency on both Dog and Person. If I add Cats to the cache, I'll need to update the M/R code to be aware of that as well. Same if I rename/remove Dog. Not nice. > > > > Well it?s called type safety, some people find it good ;) > > If anything, this model reduces type safety and reusability. E.g. say you want a M/R task to see how many Persons speak French. With the single cache model(both Dog and Person int he Cache) it would look something like: > > a) > //pseudocode > map (String k, Mammal value) { > if (value instanceof Person)) { //this is the ugly part > if (((Person)value).speaks("French")) ... > } else { > //ignore it, it's an Dog > } > } > > Same thing written for a Cache: > > b) > map (String k, Person value) { > if (value.speaks("French")) ... > } > > I don't think people would prefer writing a) instead of b) ;) I concede that point. I would actually have stored Person { name: emmanuel dogs: [ Dog { name:django } ] } in the cache making it essentially a Cache. I would not have two caches Cache, Cache though because it would prevent me from doing efficient data correlations between Persons and Dogs. From sanne at infinispan.org Thu Mar 6 05:10:13 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Thu, 6 Mar 2014 10:10:13 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <20140306092140.GI31430@hibernate.org> References: <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> <10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org> <1A465602-8624-4018-A317-5A9D74CD82AD@redhat.com> <20140306092140.GI31430@hibernate.org> Message-ID: On 6 March 2014 09:21, Emmanuel Bernard wrote: > On Wed 2014-03-05 17:16, Mircea Markus wrote: >> Sanne came with a good follow up to this email, just some small clarifications: >> >> On Mar 4, 2014, at 6:02 PM, Emmanuel Bernard wrote: >> >> >>> If you have to do a map reduce for tasks so simple as age > 18, I think you system better have to be prepared to run gazillions of M/R jobs. >> >> >> >> I want to run a simple M/R job in the evening to determine who turns 18 tomorrow, to congratulate them. Once a day, not gazzilions of times, and I don't need to index the age filed just for that. Also when it comes to Map/Reduce, the drawback of holding all the data in a single cache is two-folded: >> >> - performance: you iterate over the data that is not related to your query. >> > >> > If the data are never related (query wise), then we are in the database split category. Which is fine. But if some of your queries are related, what do you do? Deny the user the ability to do them? >> >> Here's where cross-site query would have been used. As Sanne suggested (next post) these limitations overcome the advantages. > > No. Cross-cache query if implemented will not support (efficiently > enough) that kind of query. Cf my wiki page. > >> >> > >> >> - programming model: the Map/Reduce implementation has a dependency on both Dog and Person. If I add Cats to the cache, I'll need to update the M/R code to be aware of that as well. Same if I rename/remove Dog. Not nice. >> > >> > Well it?s called type safety, some people find it good ;) >> >> If anything, this model reduces type safety and reusability. E.g. say you want a M/R task to see how many Persons speak French. With the single cache model(both Dog and Person int he Cache) it would look something like: >> >> a) >> //pseudocode >> map (String k, Mammal value) { >> if (value instanceof Person)) { //this is the ugly part >> if (((Person)value).speaks("French")) ... >> } else { >> //ignore it, it's an Dog >> } >> } >> >> Same thing written for a Cache: >> >> b) >> map (String k, Person value) { >> if (value.speaks("French")) ... >> } >> >> I don't think people would prefer writing a) instead of b) ;) > > I concede that point. I would actually have stored > Person { > name: emmanuel > dogs: [ > Dog { name:django } > ] > } > in the cache making it essentially a Cache. > I would not have two caches Cache, Cache though > because it would prevent me from doing efficient data correlations > between Persons and Dogs. True. But even the example by Mircea of non-related Mammals / Persons, there is a more elegant solution by wiring it up on Paul's proposal of Cache views. I'd obtain a typesafe Cache instance from the root Cache, then run my M/R job on top of this. As I suggested in Palma, I love the idea of cache views but if implemented they need to be taking care of all aspects, so it would implicitly filter on the "instanceof Person" clause. So, pseudocode again: cache.onType(Person.class) .map(String k, Person value) { if (value.speaks("French")) ... } And I believe the API would be typesafe all the way to the Map implementation. Sanne From rvansa at redhat.com Thu Mar 6 07:25:01 2014 From: rvansa at redhat.com (Radim Vansa) Date: Thu, 06 Mar 2014 13:25:01 +0100 Subject: [infinispan-dev] Problem with HotRod cache updates In-Reply-To: <22DE63115B414846AF484B5EA093AD32B6DA88A159@AUSP01VMBX31.collaborationhost.net> References: <22DE63115B414846AF484B5EA093AD32B6DA88A159@AUSP01VMBX31.collaborationhost.net> Message-ID: <5318691D.3050300@redhat.com> For such questions, please, use forum. Mailing list is intended for developers' discussions. The three bytes come from JBoss Marshalling, where the key type has to be marshalled as well. In this case the 3 and 62 identify that it's a java.lang.String and 4 is the string's length (variable-size encoded). If you want to access the cache both using HotRod and locally, please, use compatibility mode [1][2] Radim [1] http://infinispan.org/docs/6.0.x/user_guide/user_guide.html#_interoperability_between_embedded_and_remote_server_endpoints [2] http://infinispan.org/docs/6.0.x/infinispan_server_guide/infinispan_server_guide.html#_protocol_interoperability On 03/05/2014 11:14 PM, Mark Kowaliszyn wrote: > Hi, > > I am using the RemoteCacheManager to access a cache on my cluster. Getting an entry and updating it on the client works no problem, however on the server, the cache receives an entry with a byte array cache key, rather than the original string I put. > > My server results in the following listener output when the cache put occurs: > > DEBUG 0305-16:49:16:789 Cache (thermostatCache) entry modified: [3, 62, 4, 49, 48, 48, 48] (local=true) {foundation.infinispan.listener.CacheLoggingListener.entryModified} [HotRod-HotRodServerServerWorker-19]({}) > DEBUG 0305-16:49:16:794 ++++ string: [B at 690edaf3, new string: >1000 {foundation.infinispan.listener.CacheLoggingListener.entryModified} [HotRod-HotRodServerServerWorker-19]({}) > > The cache key in question is "1000". The output above is from a cache listener and the output is from CacheModifiedEvent.getKey(). I have some additional output to first do a toString() on the key, and a new string decoding the byte array. There are a few bytes prefixing the byte array which are not part of the cache key. In the cut/paste here, there are 2 characters missing, the "new string" has 2 unprintable character one before and one after the ">" character. > > What are the extra bytes in key? Why is the key inserted as bytes and not a string? > > The end effect is that my cluster cache gets a new junk entry in the cache with every client put. I did not see any documentation where it indicated I might need a custom key serializer. I am using strings for cache keys, nothing special. > > Updating the cache from the cluster-local cache works perfectly. > > Thanks, > Mark > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Radim Vansa JBoss DataGrid QA From vblagoje at redhat.com Thu Mar 6 10:02:21 2014 From: vblagoje at redhat.com (Vladimir Blagojevic) Date: Thu, 06 Mar 2014 10:02:21 -0500 Subject: [infinispan-dev] Cluster Listeners 7.0.0.Alpha1 In-Reply-To: References: Message-ID: <53188DFD.8060700@redhat.com> Very cool stuff Will! Just to clarify, one could define clustered listener, filter and converter and tie them all up together. Such a listener will be invoked when filter criteria is met, and converter converts key/value to some C object which is then sent of to listener, right? So how do you define method on listener to be invoked with converted object C as a parameter? On 3/5/2014, 10:02 AM, William Burns wrote: > Hello everyone, > > I am happy to announce that the latest Infinispan 7.0.0.Alpha1 build > has the first pass of Cluster Listeners implemented. > > You can read all about the details at the blog post [1]. > > You can get the latest build of Infinispan from our site [2]. > > Try it out and let us know what you guys think. > > Happy listening! > > - Will > > > [1] http://blog.infinispan.org/2014/03/embedded-cluster-listeners-in.html > [2] http://infinispan.org/download/ > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From mudokonman at gmail.com Thu Mar 6 10:38:53 2014 From: mudokonman at gmail.com (William Burns) Date: Thu, 6 Mar 2014 10:38:53 -0500 Subject: [infinispan-dev] Cluster Listeners 7.0.0.Alpha1 In-Reply-To: <53188DFD.8060700@redhat.com> References: <53188DFD.8060700@redhat.com> Message-ID: The way Listeners work they are only half typed, which is unfortunate. It determines what method to invoke solely on annotations, so Listener methods can define any type for the Event that they want. The developer using a Converter would have to be cognizant that they should be the same or super type for the listener they use. Specifically CacheEntryEvent the type V must match or be a super type of C defined on the Converter if a converter is provided, but there are no compile checks for this :( Eg. @Listener class SomeListener { @CacheEntryCreated public void handleEvent(CacheEntryEvent event) { .... } } SomeListener listener = ... Converter converter = ... cache.addListener(listener, null, converter); So a user may get a ClassCastException at runtime if they retrieve those keys or values and they don't match. The only types that are specifically checked are the Key/Value types on both the KeyValueFilter and the Converter class and will throw a checked exception if they don't match. This example will have a checked exception, since we verify the types for those. Note the Filter has a type String for V but the Converter has Integer for V. Eg: KeyValueFilter filter .. Converter converter ... cache.addListener(listener, filter, converter); One thing this made me think of though is that we don't match the Cache variants with the filter and converter, this should be doable to enhance the hierarchy to ensure those types match the cache as well, but it might take a bit of refactoring to get it right. I have logged [1] to fix this. [1] https://issues.jboss.org/browse/ISPN-4079 On Thu, Mar 6, 2014 at 10:02 AM, Vladimir Blagojevic wrote: > Very cool stuff Will! Just to clarify, one could define clustered > listener, filter and converter and tie them all up together. Such a > listener will be invoked when filter criteria is met, and converter > converts key/value to some C object which is then sent of to listener, > right? So how do you define method on listener to be invoked with > converted object C as a parameter? > > > On 3/5/2014, 10:02 AM, William Burns wrote: >> Hello everyone, >> >> I am happy to announce that the latest Infinispan 7.0.0.Alpha1 build >> has the first pass of Cluster Listeners implemented. >> >> You can read all about the details at the blog post [1]. >> >> You can get the latest build of Infinispan from our site [2]. >> >> Try it out and let us know what you guys think. >> >> Happy listening! >> >> - Will >> >> >> [1] http://blog.infinispan.org/2014/03/embedded-cluster-listeners-in.html >> [2] http://infinispan.org/download/ >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From sanne at infinispan.org Thu Mar 6 13:08:39 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Thu, 6 Mar 2014 18:08:39 +0000 Subject: [infinispan-dev] Update on the testsuite state Message-ID: Results : Failed tests: NotifyingFutureTest.testExceptionOtherThread2:51->testExceptionOtherThread:68->testException:151 expected [true] but found [false] VersionedDistStateTransferTest.testStateTransfer:96->MultipleCacheManagersTest.waitForClusterToForm:232->MultipleCacheManagersTest.waitForClusterToForm:225 ? IllegalState Tests run: 4233, Failures: 2, Errors: 0, Skipped: 0 [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Infinispan BOM .................................... SUCCESS [0.135s] [INFO] Infinispan Common Parent .......................... SUCCESS [1.700s] [INFO] Infinispan Checkstyle Rules ....................... SUCCESS [2.395s] [INFO] Infinispan Commons ................................ SUCCESS [5.411s] [INFO] Infinispan Core ................................... FAILURE [9:51.344s] Pretty good, but no jackpot yet! I'll try again next week? From ben.cotton at ALUMNI.RUTGERS.EDU Thu Mar 6 13:24:10 2014 From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben) Date: Thu, 6 Mar 2014 10:24:10 -0800 (PST) Subject: [infinispan-dev] Infinispan embedded off-heap cache In-Reply-To: <1393861594878-4028931.post@n3.nabble.com> References: <52D678BD.5010400@redhat.com> <52D67A1E.2010509@redhat.com> <1389808266330-4028649.post@n3.nabble.com> <1389893871449-4028653.post@n3.nabble.com> <1391715983011-4028794.post@n3.nabble.com> <3560BCBE-3D25-472B-9D8B-C16230FBCB4B@redhat.com> <1391992923651-4028800.post@n3.nabble.com> <1392670937953-4028836.post@n3.nabble.com> <1393861594878-4028931.post@n3.nabble.com> Message-ID: <1394130250706-4028966.post@n3.nabble.com> FYI. https://github.com/OpenHFT/HugeCollections/issues/13 -- View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028966.html Sent from the Infinispan Developer List mailing list archive at Nabble.com. From mmarkus at redhat.com Fri Mar 7 09:44:59 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Fri, 7 Mar 2014 14:44:59 +0000 Subject: [infinispan-dev] Infinispan embedded off-heap cache In-Reply-To: <1393861594878-4028931.post@n3.nabble.com> References: <52D678BD.5010400@redhat.com> <52D67A1E.2010509@redhat.com> <1389808266330-4028649.post@n3.nabble.com> <1389893871449-4028653.post@n3.nabble.com> <1391715983011-4028794.post@n3.nabble.com> <3560BCBE-3D25-472B-9D8B-C16230FBCB4B@redhat.com> <1391992923651-4028800.post@n3.nabble.com> <1392670937953-4028836.post@n3.nabble.com> <1393861594878-4028931.post@n3.nabble.com> Message-ID: <31F68D3C-07CF-41FF-A4C9-33EEF9CA2D2A@redhat.com> Hi Ben, In the diagram provided, the CacheImpl (your class) extends both from javax.cache.Cache and org.infinispan.container.DataContainer. The Cache and DataContainer interfaces are quite different and I anticipate an single class implementing both to be hard to follow and potentially not very efficient. Another way to do it is to have CacheImpl implement the DataContainer only, and then configure Infinispan's JCache implementation to use that custom DataContainer. On Mar 3, 2014, at 3:46 PM, cotton-ben wrote: > Quick Update: > > It is my understandng that Peter Lawrey will make available an OpenHFT HC > Alpha Release in Maven Central next weekend. At that time, Dmitry Gordeev > and I will take the OpenHFT dependency tag and proceed to build a branch of > Red Hat's ISPN 7 that will treat net.openhft.collections.SharedHashMap as a > Red Hat Infinispan 7 default impl of a fully JSR-107 interoperable off-heap > javax.cache.Cache ... > > A diagram of this build effort can be found here: > https://raw.github.com/Cotton-Ben/OpenHFT/master/doc/AdaptingOpenHFT-SHM-as-JCACHE-Impl.jpg > ... > > The Red Hat view of his effort will be tracked here: > https://issues.jboss.org/browse/ISPN-871 ... > > The code that defines the Impl will be here > https://github.com/Cotton-Ben/infinispan ... > > > > > > -- > View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028931.html > Sent from the Infinispan Developer List mailing list archive at Nabble.com. > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Fri Mar 7 09:47:59 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Fri, 7 Mar 2014 14:47:59 +0000 Subject: [infinispan-dev] Infinispan embedded off-heap cache In-Reply-To: <1394130250706-4028966.post@n3.nabble.com> References: <52D678BD.5010400@redhat.com> <52D67A1E.2010509@redhat.com> <1389808266330-4028649.post@n3.nabble.com> <1389893871449-4028653.post@n3.nabble.com> <1391715983011-4028794.post@n3.nabble.com> <3560BCBE-3D25-472B-9D8B-C16230FBCB4B@redhat.com> <1391992923651-4028800.post@n3.nabble.com> <1392670937953-4028836.post@n3.nabble.com> <1393861594878-4028931.post@n3.nabble.com> <1394130250706-4028966.post@n3.nabble.com> Message-ID: I imagine that you plan to contribute this back to ISPN? (which would be awesome, btw) How do you plan to organize your source code, btw? I think an new mvn module under infinispan, at the same level with core, would be a good idea, as we try to keep core's dependencies at a min. On Mar 6, 2014, at 6:24 PM, cotton-ben wrote: > > FYI. https://github.com/OpenHFT/HugeCollections/issues/13 > > > > > -- > View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028966.html > Sent from the Infinispan Developer List mailing list archive at Nabble.com. > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Fri Mar 7 09:54:40 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Fri, 7 Mar 2014 14:54:40 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <20140306092140.GI31430@hibernate.org> References: <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> <10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org> <1A465602-8624-4018-A317-5A9D74CD82AD@redhat.com> <20140306092140.GI31430@hibernate.org> Message-ID: <1F9A6E2D-D4C4-4499-8CD4-424503DB7166@redhat.com> On Mar 6, 2014, at 9:21 AM, Emmanuel Bernard wrote: > On Wed 2014-03-05 17:16, Mircea Markus wrote: >> Sanne came with a good follow up to this email, just some small clarifications: >> >> On Mar 4, 2014, at 6:02 PM, Emmanuel Bernard wrote: >> >>>>> If you have to do a map reduce for tasks so simple as age > 18, I think you system better have to be prepared to run gazillions of M/R jobs. >>>> >>>> I want to run a simple M/R job in the evening to determine who turns 18 tomorrow, to congratulate them. Once a day, not gazzilions of times, and I don't need to index the age filed just for that. Also when it comes to Map/Reduce, the drawback of holding all the data in a single cache is two-folded: >>>> - performance: you iterate over the data that is not related to your query. >>> >>> If the data are never related (query wise), then we are in the database split category. Which is fine. But if some of your queries are related, what do you do? Deny the user the ability to do them? >> >> Here's where cross-site query would have been used. As Sanne suggested (next post) these limitations overcome the advantages. > > No. Cross-cache query if implemented will not support (efficiently > enough) that kind of query. Cf my wiki page. yes, non-indexed joins would be exponential on the number of caches involved. Is it possible to use an index for x-cache joins with linear index update time and query? Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From sanne at infinispan.org Fri Mar 7 10:21:09 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Fri, 7 Mar 2014 15:21:09 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <1F9A6E2D-D4C4-4499-8CD4-424503DB7166@redhat.com> References: <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> <10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org> <1A465602-8624-4018-A317-5A9D74CD82AD@redhat.com> <20140306092140.GI31430@hibernate.org> <1F9A6E2D-D4C4-4499-8CD4-424503DB7166@redhat.com> Message-ID: On 7 March 2014 14:54, Mircea Markus wrote: > > On Mar 6, 2014, at 9:21 AM, Emmanuel Bernard wrote: > >> On Wed 2014-03-05 17:16, Mircea Markus wrote: >>> Sanne came with a good follow up to this email, just some small clarifications: >>> >>> On Mar 4, 2014, at 6:02 PM, Emmanuel Bernard wrote: >>> >>>>>> If you have to do a map reduce for tasks so simple as age > 18, I think you system better have to be prepared to run gazillions of M/R jobs. >>>>> >>>>> I want to run a simple M/R job in the evening to determine who turns 18 tomorrow, to congratulate them. Once a day, not gazzilions of times, and I don't need to index the age filed just for that. Also when it comes to Map/Reduce, the drawback of holding all the data in a single cache is two-folded: >>>>> - performance: you iterate over the data that is not related to your query. >>>> >>>> If the data are never related (query wise), then we are in the database split category. Which is fine. But if some of your queries are related, what do you do? Deny the user the ability to do them? >>> >>> Here's where cross-site query would have been used. As Sanne suggested (next post) these limitations overcome the advantages. >> >> No. Cross-cache query if implemented will not support (efficiently >> enough) that kind of query. Cf my wiki page. > > yes, non-indexed joins would be exponential on the number of caches involved. Technically non-indexed joins would be exponential on the number of caches (joins) involves *and* on the amount of entries you have stored: I know you wheren't suggesting doing it, but to confirm it's even worse than an horrible idea ;-) And that's not even considering the subtle design catch of "load it all from all cachestores".. combined with "multiple times per join".. > Is it possible to use an index for x-cache joins with linear index update time and query? Index update cost is not linear but LogN: approximates to a constant cost. And we could cut this constant by 4 orders of magnitude if only I could safely differentiate between a put of a new entry vs. an update -> something which we'll need to brainstorm about. Query time is also significantly sub-linear in practice, but specifics will vary on the query type. Yes you could use indexes to improve x-cache joins, but you'll need an additional engine to coordinate that correctly, not least to manage data size buffers; essentially I think you'd need Teiid. Sanne > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From mmarkus at redhat.com Fri Mar 7 10:27:36 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Fri, 7 Mar 2014 15:27:36 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: References: <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> <10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org> <1A465602-8624-4018-A317-5A9D74CD82AD@redhat.com> <20140306092140.GI31430@hibernate.org> <1F9A6E2D-D4C4-4499-8CD4-424503DB7166@redhat.com> Message-ID: <3116571F-7E96-4297-A6DE-F504375350E0@redhat.com> On Mar 7, 2014, at 3:21 PM, Sanne Grinovero wrote: > On 7 March 2014 14:54, Mircea Markus wrote: >> >> On Mar 6, 2014, at 9:21 AM, Emmanuel Bernard wrote: >> >>> On Wed 2014-03-05 17:16, Mircea Markus wrote: >>>> Sanne came with a good follow up to this email, just some small clarifications: >>>> >>>> On Mar 4, 2014, at 6:02 PM, Emmanuel Bernard wrote: >>>> >>>>>>> If you have to do a map reduce for tasks so simple as age > 18, I think you system better have to be prepared to run gazillions of M/R jobs. >>>>>> >>>>>> I want to run a simple M/R job in the evening to determine who turns 18 tomorrow, to congratulate them. Once a day, not gazzilions of times, and I don't need to index the age filed just for that. Also when it comes to Map/Reduce, the drawback of holding all the data in a single cache is two-folded: >>>>>> - performance: you iterate over the data that is not related to your query. >>>>> >>>>> If the data are never related (query wise), then we are in the database split category. Which is fine. But if some of your queries are related, what do you do? Deny the user the ability to do them? >>>> >>>> Here's where cross-site query would have been used. As Sanne suggested (next post) these limitations overcome the advantages. >>> >>> No. Cross-cache query if implemented will not support (efficiently >>> enough) that kind of query. Cf my wiki page. >> >> yes, non-indexed joins would be exponential on the number of caches involved. > > Technically non-indexed joins would be exponential on the number of > caches (joins) involves *and* on the amount of entries you have > stored: I know you wheren't suggesting doing it, but to confirm it's > even worse than an horrible idea ;-) > And that's not even considering the subtle design catch of "load it > all from all cachestores".. combined with "multiple times per join".. I wasn't suggesting doing it, not only for performance but also for the limitations you mentioned in the previous emails. > >> Is it possible to use an index for x-cache joins with linear index update time and query? > > Index update cost is not linear but LogN: approximates to a constant > cost. you're counting RPCs here or index seeks? > And we could cut this constant by 4 orders of magnitude if only > I could safely differentiate between a put of a new entry vs. an > update -> something which we'll need to brainstorm about. > > Query time is also significantly sub-linear in practice, but specifics > will vary on the query type. > > Yes you could use indexes to improve x-cache joins, but you'll need an > additional engine to coordinate that correctly, not least to manage > data size buffers; essentially I think you'd need Teiid. > > Sanne > > >> >> Cheers, >> -- >> Mircea Markus >> Infinispan lead (www.infinispan.org) >> >> >> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From sanne at infinispan.org Fri Mar 7 10:51:30 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Fri, 7 Mar 2014 15:51:30 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <3116571F-7E96-4297-A6DE-F504375350E0@redhat.com> References: <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> <10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org> <1A465602-8624-4018-A317-5A9D74CD82AD@redhat.com> <20140306092140.GI31430@hibernate.org> <1F9A6E2D-D4C4-4499-8CD4-424503DB7166@redhat.com> <3116571F-7E96-4297-A6DE-F504375350E0@redhat.com> Message-ID: On 7 March 2014 15:27, Mircea Markus wrote: > > On Mar 7, 2014, at 3:21 PM, Sanne Grinovero wrote: > >> On 7 March 2014 14:54, Mircea Markus wrote: >>> >>> On Mar 6, 2014, at 9:21 AM, Emmanuel Bernard wrote: >>> >>>> On Wed 2014-03-05 17:16, Mircea Markus wrote: >>>>> Sanne came with a good follow up to this email, just some small clarifications: >>>>> >>>>> On Mar 4, 2014, at 6:02 PM, Emmanuel Bernard wrote: >>>>> >>>>>>>> If you have to do a map reduce for tasks so simple as age > 18, I think you system better have to be prepared to run gazillions of M/R jobs. >>>>>>> >>>>>>> I want to run a simple M/R job in the evening to determine who turns 18 tomorrow, to congratulate them. Once a day, not gazzilions of times, and I don't need to index the age filed just for that. Also when it comes to Map/Reduce, the drawback of holding all the data in a single cache is two-folded: >>>>>>> - performance: you iterate over the data that is not related to your query. >>>>>> >>>>>> If the data are never related (query wise), then we are in the database split category. Which is fine. But if some of your queries are related, what do you do? Deny the user the ability to do them? >>>>> >>>>> Here's where cross-site query would have been used. As Sanne suggested (next post) these limitations overcome the advantages. >>>> >>>> No. Cross-cache query if implemented will not support (efficiently >>>> enough) that kind of query. Cf my wiki page. >>> >>> yes, non-indexed joins would be exponential on the number of caches involved. >> >> Technically non-indexed joins would be exponential on the number of >> caches (joins) involves *and* on the amount of entries you have >> stored: I know you wheren't suggesting doing it, but to confirm it's >> even worse than an horrible idea ;-) >> And that's not even considering the subtle design catch of "load it >> all from all cachestores".. combined with "multiple times per join".. > > I wasn't suggesting doing it, not only for performance but also for the limitations you mentioned in the previous emails. > >> >>> Is it possible to use an index for x-cache joins with linear index update time and query? >> >> Index update cost is not linear but LogN: approximates to a constant >> cost. > > you're counting RPCs here or index seeks? RPCs are constant, and independent from both the query type and the data size. For a local (or distributed) index there are zero RPCs, for DIST it depends on a factor of total index size, chunking and merging options, numowners, etc.. but these are fixed once defined - > constant number of RPCs. The count on index seeks do depend on the query type only, not on the size at all. I'm referring to the approximate computation cost of each index seek. > >> And we could cut this constant by 4 orders of magnitude if only >> I could safely differentiate between a put of a new entry vs. an >> update -> something which we'll need to brainstorm about. >> >> Query time is also significantly sub-linear in practice, but specifics >> will vary on the query type. >> >> Yes you could use indexes to improve x-cache joins, but you'll need an >> additional engine to coordinate that correctly, not least to manage >> data size buffers; essentially I think you'd need Teiid. >> >> Sanne >> >> >>> >>> Cheers, >>> -- >>> Mircea Markus >>> Infinispan lead (www.infinispan.org) >>> >>> >>> >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From emmanuel at hibernate.org Fri Mar 7 10:55:12 2014 From: emmanuel at hibernate.org (Emmanuel Bernard) Date: Fri, 7 Mar 2014 16:55:12 +0100 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: References: <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> <10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org> <1A465602-8624-4018-A317-5A9D74CD82AD@redhat.com> <20140306092140.GI31430@hibernate.org> <1F9A6E2D-D4C4-4499-8CD4-424503DB7166@redhat.com> Message-ID: On 07 Mar 2014, at 16:21, Sanne Grinovero wrote: > >> Is it possible to use an index for x-cache joins with linear index update time and query? > > Index update cost is not linear but LogN: approximates to a constant > cost. And we could cut this constant by 4 orders of magnitude if only > I could safely differentiate between a put of a new entry vs. an > update -> something which we'll need to brainstorm about. > > Query time is also significantly sub-linear in practice, but specifics > will vary on the query type. > > Yes you could use indexes to improve x-cache joins, but you'll need an > additional engine to coordinate that correctly, not least to manage > data size buffers; essentially I think you'd need Teiid. > I think Mircea means implementing joins across different caches if the foreign key is indexed. It would basically require N index seek (N being the number of fk value on your left hand side) per join and some aggregation engine to link the elements matching from the two side. So N^M M= # of joins N= # of distinct FK involved Does that reasoning sound right? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140307/b60f8971/attachment.html From sanne at infinispan.org Fri Mar 7 11:19:53 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Fri, 7 Mar 2014 16:19:53 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: References: <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> <10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org> <1A465602-8624-4018-A317-5A9D74CD82AD@redhat.com> <20140306092140.GI31430@hibernate.org> <1F9A6E2D-D4C4-4499-8CD4-424503DB7166@redhat.com> Message-ID: On 7 March 2014 15:55, Emmanuel Bernard wrote: > > On 07 Mar 2014, at 16:21, Sanne Grinovero wrote: > > > Is it possible to use an index for x-cache joins with linear index update > time and query? > > > Index update cost is not linear but LogN: approximates to a constant > cost. And we could cut this constant by 4 orders of magnitude if only > I could safely differentiate between a put of a new entry vs. an > update -> something which we'll need to brainstorm about. > > Query time is also significantly sub-linear in practice, but specifics > will vary on the query type. > > Yes you could use indexes to improve x-cache joins, but you'll need an > additional engine to coordinate that correctly, not least to manage > data size buffers; essentially I think you'd need Teiid. > > > I think Mircea means implementing joins across different caches if the > foreign key is indexed. > > It would basically require N index seek (N being the number of fk value on > your left hand side) per join and some aggregation engine to link the > elements matching from the two side. > So N^M > > M= # of joins > N= # of distinct FK involved > > Does that reasoning sound right? Right. Teiid. From sanne at infinispan.org Fri Mar 7 11:35:08 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Fri, 7 Mar 2014 16:35:08 +0000 Subject: [infinispan-dev] Update on the testsuite state In-Reply-To: References: Message-ID: today core passed, but: the Infinispan Hot Rod Client seems to fail because of an OutOfMemory Java heap space. On 6 March 2014 18:08, Sanne Grinovero wrote: > Results : > > Failed tests: > NotifyingFutureTest.testExceptionOtherThread2:51->testExceptionOtherThread:68->testException:151 > expected [true] but found [false] > VersionedDistStateTransferTest.testStateTransfer:96->MultipleCacheManagersTest.waitForClusterToForm:232->MultipleCacheManagersTest.waitForClusterToForm:225 > ? IllegalState > > Tests run: 4233, Failures: 2, Errors: 0, Skipped: 0 > > [INFO] ------------------------------------------------------------------------ > [INFO] Reactor Summary: > [INFO] > [INFO] Infinispan BOM .................................... SUCCESS [0.135s] > [INFO] Infinispan Common Parent .......................... SUCCESS [1.700s] > [INFO] Infinispan Checkstyle Rules ....................... SUCCESS [2.395s] > [INFO] Infinispan Commons ................................ SUCCESS [5.411s] > [INFO] Infinispan Core ................................... FAILURE [9:51.344s] > > > Pretty good, but no jackpot yet! > I'll try again next week? From ben.cotton at ALUMNI.RUTGERS.EDU Fri Mar 7 11:36:25 2014 From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben) Date: Fri, 7 Mar 2014 08:36:25 -0800 (PST) Subject: [infinispan-dev] Infinispan embedded off-heap cache In-Reply-To: <31F68D3C-07CF-41FF-A4C9-33EEF9CA2D2A@redhat.com> References: <52D67A1E.2010509@redhat.com> <1389808266330-4028649.post@n3.nabble.com> <1389893871449-4028653.post@n3.nabble.com> <1391715983011-4028794.post@n3.nabble.com> <3560BCBE-3D25-472B-9D8B-C16230FBCB4B@redhat.com> <1391992923651-4028800.post@n3.nabble.com> <1392670937953-4028836.post@n3.nabble.com> <1393861594878-4028931.post@n3.nabble.com> <31F68D3C-07CF-41FF-A4C9-33EEF9CA2D2A@redhat.com> Message-ID: <5319F55E.8000001@alumni.rutgers.edu> Thank you for this insight Mircea ... Ultimately ... I want the OpenHFT SHM off-heap operand to behave *exactly* like a JCACHE ... Amenable to being soundly/completely operated upon by any/all parts of ISPN7's Impl of the JSR-107 API . Musing openly: Won't that (eventually) necessitate me physically implementing javax.cache.Cache ? / //> ////Another way to do it is to have CacheImpl implement the DataContainer only, and then configure Infinispan's JCache implementation to use that custom DataContainer. /I see what you mean. OK, for sure, this sounds much simpler than what I have put on my initial TODO list. Question: Will doing it this way in any manner suggest that my JSR-107 specific operators are being transitively "brokered" by the ISPN config onto my OpenHFT SHM operand? If possible, I want everything to be direct -- no API bridge. Thanks again, Mircea. -Ben and Dmitry Sent from my iPhone On 03/07/2014 09:45 AM, Mircea Markus-2 [via Infinispan Developer List] wrote: > Hi Ben, > > In the diagram provided, the CacheImpl (your class) extends both from > javax.cache.Cache and org.infinispan.container.DataContainer. > The Cache and DataContainer interfaces are quite different and I > anticipate an single class implementing both to be hard to follow and > potentially not very efficient. > Another way to do it is to have CacheImpl implement the DataContainer > only, and then configure Infinispan's JCache implementation to use > that custom DataContainer. > > On Mar 3, 2014, at 3:46 PM, cotton-ben <[hidden email] > > wrote: > > > Quick Update: > > > > It is my understandng that Peter Lawrey will make available an > OpenHFT HC > > Alpha Release in Maven Central next weekend. At that time, Dmitry > Gordeev > > and I will take the OpenHFT dependency tag and proceed to build a > branch of > > Red Hat's ISPN 7 that will treat > net.openhft.collections.SharedHashMap as a > > Red Hat Infinispan 7 default impl of a fully JSR-107 interoperable > off-heap > > javax.cache.Cache ... > > > > A diagram of this build effort can be found here: > > > https://raw.github.com/Cotton-Ben/OpenHFT/master/doc/AdaptingOpenHFT-SHM-as-JCACHE-Impl.jpg > > ... > > > > The Red Hat view of his effort will be tracked here: > > https://issues.jboss.org/browse/ISPN-871 ... > > > > The code that defines the Impl will be here > > https://github.com/Cotton-Ben/infinispan ... > > > > > > > > > > > > -- > > View this message in context: > http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028931.html > > Sent from the Infinispan Developer List mailing list archive at > Nabble.com. > > _______________________________________________ > > infinispan-dev mailing list > > [hidden email] > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > [hidden email] > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > ------------------------------------------------------------------------ > If you reply to this email, your message will be added to the > discussion below: > http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028967.html > > To start a new topic under Infinispan Developer List, email > ml-node+s980875n2085493h0 at n3.nabble.com > To unsubscribe from [infinispan-dev] Infinispan embedded off-heap > cache, click here > . > NAML > > -- View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028976.html Sent from the Infinispan Developer List mailing list archive at Nabble.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140307/cd8b567d/attachment.html From ben.cotton at ALUMNI.RUTGERS.EDU Fri Mar 7 11:37:41 2014 From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben) Date: Fri, 7 Mar 2014 08:37:41 -0800 (PST) Subject: [infinispan-dev] Infinispan embedded off-heap cache In-Reply-To: <31F68D3C-07CF-41FF-A4C9-33EEF9CA2D2A@redhat.com> References: <52D67A1E.2010509@redhat.com> <1389808266330-4028649.post@n3.nabble.com> <1389893871449-4028653.post@n3.nabble.com> <1391715983011-4028794.post@n3.nabble.com> <3560BCBE-3D25-472B-9D8B-C16230FBCB4B@redhat.com> <1391992923651-4028800.post@n3.nabble.com> <1392670937953-4028836.post@n3.nabble.com> <1393861594878-4028931.post@n3.nabble.com> <31F68D3C-07CF-41FF-A4C9-33EEF9CA2D2A@redhat.com> Message-ID: <1394210261662-4028977.post@n3.nabble.com> Thank you for this insight Mircea ... Ultimately ... I want the OpenHFT SHM off-heap operand to behave *exactly* like a JCACHE ... Amenable to being soundly/completely operated upon by any/all parts of ISPN7's Impl of the JSR-107 API . Musing openly: Won't that (eventually) necessitate me physically implementing javax.cache.Cache ? /> Another way to do it is to have CacheImpl implement the DataContainer only, and then configure Infinispan's JCache implementation to use that custom DataContainer. / I see what you mean. OK, for sure, this sounds much simpler than what I have put on my initial TODO list. Question: Will doing it this way in any manner suggest that my JSR-107 specific operators are being transitively "brokered" by the ISPN config onto my OpenHFT SHM operand? If possible, I want everything to be direct -- no API bridge. Thanks again, Mircea. -Ben and Dmitry -- View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028977.html Sent from the Infinispan Developer List mailing list archive at Nabble.com. From ben.cotton at ALUMNI.RUTGERS.EDU Fri Mar 7 11:57:37 2014 From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben) Date: Fri, 7 Mar 2014 08:57:37 -0800 (PST) Subject: [infinispan-dev] Infinispan embedded off-heap cache In-Reply-To: References: <1389808266330-4028649.post@n3.nabble.com> <1389893871449-4028653.post@n3.nabble.com> <1391715983011-4028794.post@n3.nabble.com> <3560BCBE-3D25-472B-9D8B-C16230FBCB4B@redhat.com> <1391992923651-4028800.post@n3.nabble.com> <1392670937953-4028836.post@n3.nabble.com> <1393861594878-4028931.post@n3.nabble.com> <1394130250706-4028966.post@n3.nabble.com> Message-ID: <5319FA68.70404@alumni.rutgers.edu> > I imagine that you plan to contribute this back to ISPN? (which would be awesome, btw) Of course! (BTW, I do want Pete M. to ship me my Mark Little autographed RH fedora ASAP). > How do you plan to organize your source code, btw? Any way you see fit. Peter just 48 hours ago put OpenHFT alpha up in maven central, Dmitry and I have both fork'd/clone'd https://github.com/infinispan/infinispan.git into our repos. We also treat the root ISPN7 repo as our remote upstream master. We are building locally right now in that cloned workspace, referencing OpenHF alpha asTa dependency. Why don't you guys build Dmitry and I an ISPN new module "landing zone" in the root repo? As we develop in our local repos, confirming the off-heap capability, we will eventually publish you an upstream pull request? On 03/07/2014 09:48 AM, Mircea Markus-2 [via Infinispan Developer List] wrote: > I imagine that you plan to contribute this back to ISPN? (which would > be awesome, btw) > How do you plan to organize your source code, btw? I think an new mvn > module under infinispan, at the same level with core, would be a good > idea, as we try to keep core's dependencies at a min. > > On Mar 6, 2014, at 6:24 PM, cotton-ben <[hidden email] > > wrote: > > > > > FYI. https://github.com/OpenHFT/HugeCollections/issues/13 > > > > > > > > > > -- > > View this message in context: > http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028966.html > > Sent from the Infinispan Developer List mailing list archive at > Nabble.com. > > _______________________________________________ > > infinispan-dev mailing list > > [hidden email] > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > [hidden email] > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > ------------------------------------------------------------------------ > If you reply to this email, your message will be added to the > discussion below: > http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028968.html > > To start a new topic under Infinispan Developer List, email > ml-node+s980875n2085493h0 at n3.nabble.com > To unsubscribe from [infinispan-dev] Infinispan embedded off-heap > cache, click here > . > NAML > > -- View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028978.html Sent from the Infinispan Developer List mailing list archive at Nabble.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140307/906db0eb/attachment.html From ben.cotton at ALUMNI.RUTGERS.EDU Fri Mar 7 12:11:09 2014 From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben) Date: Fri, 7 Mar 2014 09:11:09 -0800 (PST) Subject: [infinispan-dev] Infinispan embedded off-heap cache In-Reply-To: <31F68D3C-07CF-41FF-A4C9-33EEF9CA2D2A@redhat.com> References: <52D67A1E.2010509@redhat.com> <1389808266330-4028649.post@n3.nabble.com> <1389893871449-4028653.post@n3.nabble.com> <1391715983011-4028794.post@n3.nabble.com> <3560BCBE-3D25-472B-9D8B-C16230FBCB4B@redhat.com> <1391992923651-4028800.post@n3.nabble.com> <1392670937953-4028836.post@n3.nabble.com> <1393861594878-4028931.post@n3.nabble.com> <31F68D3C-07CF-41FF-A4C9-33EEF9CA2D2A@redhat.com> Message-ID: <5319FCA9.30508@alumni.rutgers.edu> Thanks Peter. The plan is for Dmitry and I to at first extend VanillaSharedHashMap and groom it into an ISPN7 join via their DataContainer API bridge. That ExtendedVSHM will be morphed into a fully inter-operable JCACHE operand will - initially be brokered by the ISPN 7 config (JSR107 <----> VSHMExtendedDataContainer <----> VSHM) - eventually, possibly, be rendered with ExtendedVSHM directly implementing javax.cache.Cache (in addititon to DataContainer) On 03/07/2014 11:43 AM, Peter Lawrey wrote: > > In the medium term I would see SHM supporting a DataContainer. If a > Cache were supported I would do it as a layered class so those who > don't need the functionality of a Cache don't incur an overhead. > > On 8 Mar 2014 03:35, "Ben Cotton" > wrote: > > Thank you for this insight Mircea ... > > Ultimately ... I want the OpenHFT SHM off-heap operand to behave > *exactly* like a JCACHE ... Amenable to being soundly/completely > operated upon by any/all parts of ISPN7's Impl of the JSR-107 API . > > Musing openly: Won't that (eventually) necessitate me physically > implementing javax.cache.Cache ? > / > //> //Another way to do it is to have CacheImpl implement the > DataContainer only, and then configure Infinispan's JCache > implementation to use that custom DataContainer. > > /I see what you mean. OK, for sure, this sounds much simpler than > what I have put on my initial TODO list. > > Question: Will doing it this way in any manner suggest that my > JSR-107 specific operators are being transitively "brokered" by > the ISPN config onto my OpenHFT SHM operand? If possible, I want > everything to be direct -- no API bridge. > > Thanks again, Mircea. > > -Ben and Dmitry > > Sent from my iPhone > On 03/07/2014 09:45 AM, Mircea Markus-2 [via Infinispan Developer > List] wrote: >> Hi Ben, >> >> In the diagram provided, the CacheImpl (your class) extends both >> from javax.cache.Cache and org.infinispan.container.DataContainer. >> The Cache and DataContainer interfaces are quite different and I >> anticipate an single class implementing both to be hard to follow >> and potentially not very efficient. >> Another way to do it is to have CacheImpl implement the >> DataContainer only, and then configure Infinispan's JCache >> implementation to use that custom DataContainer. >> >> On Mar 3, 2014, at 3:46 PM, cotton-ben <[hidden email] >> > wrote: >> >> > Quick Update: >> > >> > It is my understandng that Peter Lawrey will make available an >> OpenHFT HC >> > Alpha Release in Maven Central next weekend. At that time, >> Dmitry Gordeev >> > and I will take the OpenHFT dependency tag and proceed to build >> a branch of >> > Red Hat's ISPN 7 that will treat >> net.openhft.collections.SharedHashMap as a >> > Red Hat Infinispan 7 default impl of a fully JSR-107 >> interoperable off-heap >> > javax.cache.Cache ... >> > >> > A diagram of this build effort can be found here: >> > >> https://raw.github.com/Cotton-Ben/OpenHFT/master/doc/AdaptingOpenHFT-SHM-as-JCACHE-Impl.jpg >> > ... >> > >> > The Red Hat view of his effort will be tracked here: >> > https://issues.jboss.org/browse/ISPN-871 ... >> > >> > The code that defines the Impl will be here >> > https://github.com/Cotton-Ben/infinispan ... >> > >> > >> > >> > >> > >> > -- >> > View this message in context: >> http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028931.html >> > Sent from the Infinispan Developer List mailing list archive at >> Nabble.com. >> > _______________________________________________ >> > infinispan-dev mailing list >> > [hidden email] >> >> > https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> Cheers, >> -- >> Mircea Markus >> Infinispan lead (www.infinispan.org ) >> >> >> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> [hidden email] >> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> ------------------------------------------------------------------------ >> If you reply to this email, your message will be added to the >> discussion below: >> http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028967.html >> >> To start a new topic under Infinispan Developer List, email >> ml-node+s980875n2085493h0 at n3.nabble.com >> >> To unsubscribe from [infinispan-dev] Infinispan embedded off-heap >> cache, click here >> . >> NAML >> >> > -- View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028979.html Sent from the Infinispan Developer List mailing list archive at Nabble.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140307/20848dbd/attachment-0001.html From ben.cotton at ALUMNI.RUTGERS.EDU Sat Mar 8 15:25:19 2014 From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben) Date: Sat, 8 Mar 2014 12:25:19 -0800 (PST) Subject: [infinispan-dev] Infinispan embedded off-heap cache In-Reply-To: <5319FCA9.30508@alumni.rutgers.edu> References: <1389808266330-4028649.post@n3.nabble.com> <1389893871449-4028653.post@n3.nabble.com> <1391715983011-4028794.post@n3.nabble.com> <3560BCBE-3D25-472B-9D8B-C16230FBCB4B@redhat.com> <1391992923651-4028800.post@n3.nabble.com> <1392670937953-4028836.post@n3.nabble.com> <1393861594878-4028931.post@n3.nabble.com> <31F68D3C-07CF-41FF-A4C9-33EEF9CA2D2A@redhat.com> <5319FCA9.30508@alumni.rutgers.edu> Message-ID: <1394310319855-4028980.post@n3.nabble.com> We have established the following at my ISPN 7 fork ( https://github.com/Cotton-Ben/infinispan ): 1. Created a new [jcache-off-heap] module and bound it to master ISPN 7 pom.xml 2. Within [jcache-off-heap] created a pom.xml that successfully joins OpenHFT 3.0a and ISPN 7.0.0 APIs. 3. Created an org.infinispan.jcache.JCacheOffHeapImpl place-holder Class that both extends net.openhft.collections.VanillaSharedHashMap and implements org.infinispan.container.DataContainer 4. Created an org.infinispan.jcache.JCacheOffHeapImplTest place-holder that will drive the tests to demonstrate that we have rendered a sound/complete [3]. 5. Confirmed that we can build [3][4] from the ISPN 7 master maven invoke. Before proceeding further, Dmitry and I need to establish from Mircea and RedHat that this module, in this form, will adapt (without concern) into the existing RedHat ISPN 7 root build environment. Ben D. Cotton III J.P.Morgan Liquidity Risk Technology 277 Park Ave Desk 08-GG64 New York, NY 10172-0003 212.622.5010 ben.cotton at jpmorgan.com -- View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028980.html Sent from the Infinispan Developer List mailing list archive at Nabble.com. From ben.cotton at ALUMNI.RUTGERS.EDU Sat Mar 8 19:53:31 2014 From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben) Date: Sat, 8 Mar 2014 16:53:31 -0800 (PST) Subject: [infinispan-dev] Infinispan embedded off-heap cache In-Reply-To: <31F68D3C-07CF-41FF-A4C9-33EEF9CA2D2A@redhat.com> References: <52D67A1E.2010509@redhat.com> <1389808266330-4028649.post@n3.nabble.com> <1389893871449-4028653.post@n3.nabble.com> <1391715983011-4028794.post@n3.nabble.com> <3560BCBE-3D25-472B-9D8B-C16230FBCB4B@redhat.com> <1391992923651-4028800.post@n3.nabble.com> <1392670937953-4028836.post@n3.nabble.com> <1393861594878-4028931.post@n3.nabble.com> <31F68D3C-07CF-41FF-A4C9-33EEF9CA2D2A@redhat.com> Message-ID: <003601cf3b31$e9ea33b0$bdbe9b10$@alumni.rutgers.edu> ? How you might wonder? You got that right! Let me study this Truth is my first, second nor third reads set off the ?light bulb? in my head. ? This works because even though I only have 7.7 GB after the OS, I can create a SHM of 137 GB because this only uses 21 MB of actual disk space/memory. Still wondering ?How??, Peter. LOL. I?m mean I?m sure it is true . But I have no insight to how it is true. Again, Let me study the code for this one. From: Peter Lawrey [mailto:peter.lawrey at higherfrequencytrading.com] Sent: Saturday, March 8, 2014 7:41 PM To: ben.cotton at alumni.rutgers.edu Cc: Ben Cotton; Justin P Dildy; Dmitry Gordeev; ml-node+s980875n4028967h94 at n3.nabble.com Subject: Re: [infinispan-dev] Infinispan embedded off-heap cache You might find this example interesting. While SHM is not expandable, this is not as much of a problem as it might seem. SHM uses virtual memory and leave the OS to map it to real memory as required. This means you can over allocate extents with a very low cost on Linux. https://github.com/OpenHFT/HugeCollections/blob/master/collections/src/test/ java/net/openhft/collections/OSResizesMain.java In the example above I create extents for an SHM which much larger than main memory and it takes a fraction of a second to do this. How you might wonder? It prints System memory= 7.7 GB, Size of map is 137.5 GB, disk used= 21MB This works because even though I only have 7.7 GB after the OS, I can create a SHM of 137 GB because this only uses 21 MB of actual disk space/memory. You can freely over allocate the size on the basis that the system only uses the resources it needs. On 8 March 2014 04:06, Ben Cotton wrote: Thanks Peter. The plan is for Dmitry and I to at first extend VanillaSharedHashMap and groom it into an ISPN7 join via their DataContainer API bridge. That ExtendedVSHM will be morphed into a fully inter-operable JCACHE operand will - initially be brokered by the ISPN 7 config (JSR107 <----> VSHMExtendedDataContainer <----> VSHM) - eventually, possibly, be rendered with ExtendedVSHM directly implementing javax.cache.Cache (in addititon to DataContainer) On 03/07/2014 11:43 AM, Peter Lawrey wrote: In the medium term I would see SHM supporting a DataContainer. If a Cache were supported I would do it as a layered class so those who don't need the functionality of a Cache don't incur an overhead. On 8 Mar 2014 03:35, "Ben Cotton" wrote: Thank you for this insight Mircea ... Ultimately ... I want the OpenHFT SHM off-heap operand to behave *exactly* like a JCACHE ... Amenable to being soundly/completely operated upon by any/all parts of ISPN7's Impl of the JSR-107 API . Musing openly: Won't that (eventually) necessitate me physically implementing javax.cache.Cache ? > Another way to do it is to have CacheImpl implement the DataContainer only, and then configure Infinispan's JCache implementation to use that custom DataContainer. I see what you mean. OK, for sure, this sounds much simpler than what I have put on my initial TODO list. Question: Will doing it this way in any manner suggest that my JSR-107 specific operators are being transitively "brokered" by the ISPN config onto my OpenHFT SHM operand? If possible, I want everything to be direct -- no API bridge. Thanks again, Mircea. -Ben and Dmitry Sent from my iPhone On 03/07/2014 09:45 AM, Mircea Markus-2 [via Infinispan Developer List] wrote: Hi Ben, In the diagram provided, the CacheImpl (your class) extends both from javax.cache.Cache and org.infinispan.container.DataContainer. The Cache and DataContainer interfaces are quite different and I anticipate an single class implementing both to be hard to follow and potentially not very efficient. Another way to do it is to have CacheImpl implement the DataContainer only, and then configure Infinispan's JCache implementation to use that custom DataContainer. On Mar 3, 2014, at 3:46 PM, cotton-ben <[hidden email] > wrote: > Quick Update: > > It is my understandng that Peter Lawrey will make available an OpenHFT HC > Alpha Release in Maven Central next weekend. At that time, Dmitry Gordeev > and I will take the OpenHFT dependency tag and proceed to build a branch of > Red Hat's ISPN 7 that will treat net.openhft.collections.SharedHashMap as a > Red Hat Infinispan 7 default impl of a fully JSR-107 interoperable off-heap > javax.cache.Cache ... > > A diagram of this build effort can be found here: > https://raw.github.com/Cotton-Ben/OpenHFT/master/doc/AdaptingOpenHFT-SHM-as- JCACHE-Impl.jpg > ... > > The Red Hat view of his effort will be tracked here: > https://issues.jboss.org/browse/ISPN-871 ... > > The code that defines the Impl will be here > https://github.com/Cotton-Ben/infinispan ... > > > > > > -- > View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinis pan-embedded-off-heap-cache-tp4026102p4028931.html > Sent from the Infinispan Developer List mailing list archive at Nabble.com. > _______________________________________________ > infinispan-dev mailing list > [hidden email] > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) _______________________________________________ infinispan-dev mailing list [hidden email] https://lists.jboss.org/mailman/listinfo/infinispan-dev _____ If you reply to this email, your message will be added to the discussion below: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinis pan-embedded-off-heap-cache-tp4026102p4028967.html To start a new topic under Infinispan Developer List, email ml-node+s980875n2085493h0 at n3.nabble.com To unsubscribe from [infinispan-dev] Infinispan embedded off-heap cache, click here . NAML -- View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028981.html Sent from the Infinispan Developer List mailing list archive at Nabble.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140308/8e5b65e4/attachment.html From ben.cotton at ALUMNI.RUTGERS.EDU Sat Mar 8 20:13:19 2014 From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben) Date: Sat, 8 Mar 2014 17:13:19 -0800 (PST) Subject: [infinispan-dev] Infinispan embedded off-heap cache In-Reply-To: <31F68D3C-07CF-41FF-A4C9-33EEF9CA2D2A@redhat.com> References: <52D67A1E.2010509@redhat.com> <1389808266330-4028649.post@n3.nabble.com> <1389893871449-4028653.post@n3.nabble.com> <1391715983011-4028794.post@n3.nabble.com> <3560BCBE-3D25-472B-9D8B-C16230FBCB4B@redhat.com> <1391992923651-4028800.post@n3.nabble.com> <1392670937953-4028836.post@n3.nabble.com> <1393861594878-4028931.post@n3.nabble.com> <31F68D3C-07CF-41FF-A4C9-33EEF9CA2D2A@redhat.com> Message-ID: <006101cf3b34$a8a09c70$f9e1d550$@alumni.rutgers.edu> Ah, I think I see it. In my old System V days, I think we used to call this "your process is eating up all my 'SWAP' ..." ... which of course was *precious* and the entire basis for demand-paged Virtual Memory process internals. So when you say my SHM can grow to 137GB virtual capacity (even on a 7.7GB physical capacity) it is because you are consuming ?SWAP? correct? I see I?ve got 2 new from you in my Inbox. Probably the answers. (Even if I?ve in fact solved this puzzle, I think I should spin again! Or buy a vowel?) :-) From: Ben Cotton [mailto:bendcotton at gmail.com] On Behalf Of ben.cotton at alumni.rutgers.edu Sent: Saturday, March 8, 2014 7:53 PM To: 'Peter Lawrey' Cc: 'Ben Cotton'; 'Justin P Dildy'; 'Dmitry Gordeev'; ml-node+s980875n4028967h94 at n3.nabble.com; ben.cotton at alumni.rutgers.edu Subject: RE: [infinispan-dev] Infinispan embedded off-heap cache ? How you might wonder? You got that right! Let me study this Truth is my first, second nor third reads set off the ?light bulb? in my head. ? This works because even though I only have 7.7 GB after the OS, I can create a SHM of 137 GB because this only uses 21 MB of actual disk space/memory. Still wondering ?How??, Peter. LOL. I?m mean I?m sure it is true . But I have no insight to how it is true. Again, Let me study the code for this one. From: Peter Lawrey [mailto:peter.lawrey at higherfrequencytrading.com] Sent: Saturday, March 8, 2014 7:41 PM To: ben.cotton at alumni.rutgers.edu Cc: Ben Cotton; Justin P Dildy; Dmitry Gordeev; ml-node+s980875n4028967h94 at n3.nabble.com Subject: Re: [infinispan-dev] Infinispan embedded off-heap cache You might find this example interesting. While SHM is not expandable, this is not as much of a problem as it might seem. SHM uses virtual memory and leave the OS to map it to real memory as required. This means you can over allocate extents with a very low cost on Linux. https://github.com/OpenHFT/HugeCollections/blob/master/collections/src/test/ java/net/openhft/collections/OSResizesMain.java In the example above I create extents for an SHM which much larger than main memory and it takes a fraction of a second to do this. How you might wonder? It prints System memory= 7.7 GB, Size of map is 137.5 GB, disk used= 21MB This works because even though I only have 7.7 GB after the OS, I can create a SHM of 137 GB because this only uses 21 MB of actual disk space/memory. You can freely over allocate the size on the basis that the system only uses the resources it needs. On 8 March 2014 04:06, Ben Cotton wrote: Thanks Peter. The plan is for Dmitry and I to at first extend VanillaSharedHashMap and groom it into an ISPN7 join via their DataContainer API bridge. That ExtendedVSHM will be morphed into a fully inter-operable JCACHE operand will - initially be brokered by the ISPN 7 config (JSR107 <----> VSHMExtendedDataContainer <----> VSHM) - eventually, possibly, be rendered with ExtendedVSHM directly implementing javax.cache.Cache (in addititon to DataContainer) On 03/07/2014 11:43 AM, Peter Lawrey wrote: In the medium term I would see SHM supporting a DataContainer. If a Cache were supported I would do it as a layered class so those who don't need the functionality of a Cache don't incur an overhead. On 8 Mar 2014 03:35, "Ben Cotton" wrote: Thank you for this insight Mircea ... Ultimately ... I want the OpenHFT SHM off-heap operand to behave *exactly* like a JCACHE ... Amenable to being soundly/completely operated upon by any/all parts of ISPN7's Impl of the JSR-107 API . Musing openly: Won't that (eventually) necessitate me physically implementing javax.cache.Cache ? > Another way to do it is to have CacheImpl implement the DataContainer only, and then configure Infinispan's JCache implementation to use that custom DataContainer. I see what you mean. OK, for sure, this sounds much simpler than what I have put on my initial TODO list. Question: Will doing it this way in any manner suggest that my JSR-107 specific operators are being transitively "brokered" by the ISPN config onto my OpenHFT SHM operand? If possible, I want everything to be direct -- no API bridge. Thanks again, Mircea. -Ben and Dmitry Sent from my iPhone On 03/07/2014 09:45 AM, Mircea Markus-2 [via Infinispan Developer List] wrote: Hi Ben, In the diagram provided, the CacheImpl (your class) extends both from javax.cache.Cache and org.infinispan.container.DataContainer. The Cache and DataContainer interfaces are quite different and I anticipate an single class implementing both to be hard to follow and potentially not very efficient. Another way to do it is to have CacheImpl implement the DataContainer only, and then configure Infinispan's JCache implementation to use that custom DataContainer. On Mar 3, 2014, at 3:46 PM, cotton-ben <[hidden email] > wrote: > Quick Update: > > It is my understandng that Peter Lawrey will make available an OpenHFT HC > Alpha Release in Maven Central next weekend. At that time, Dmitry Gordeev > and I will take the OpenHFT dependency tag and proceed to build a branch of > Red Hat's ISPN 7 that will treat net.openhft.collections.SharedHashMap as a > Red Hat Infinispan 7 default impl of a fully JSR-107 interoperable off-heap > javax.cache.Cache ... > > A diagram of this build effort can be found here: > https://raw.github.com/Cotton-Ben/OpenHFT/master/doc/AdaptingOpenHFT-SHM-as- JCACHE-Impl.jpg > ... > > The Red Hat view of his effort will be tracked here: > https://issues.jboss.org/browse/ISPN-871 ... > > The code that defines the Impl will be here > https://github.com/Cotton-Ben/infinispan ... > > > > > > -- > View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinis pan-embedded-off-heap-cache-tp4026102p4028931.html > Sent from the Infinispan Developer List mailing list archive at Nabble.com. > _______________________________________________ > infinispan-dev mailing list > [hidden email] > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) _______________________________________________ infinispan-dev mailing list [hidden email] https://lists.jboss.org/mailman/listinfo/infinispan-dev _____ If you reply to this email, your message will be added to the discussion below: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinis pan-embedded-off-heap-cache-tp4026102p4028967.html To start a new topic under Infinispan Developer List, email ml-node+s980875n2085493h0 at n3.nabble.com To unsubscribe from [infinispan-dev] Infinispan embedded off-heap cache, click here . NAML -- View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028982.html Sent from the Infinispan Developer List mailing list archive at Nabble.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140308/a6ab8ffe/attachment-0001.html From ben.cotton at ALUMNI.RUTGERS.EDU Sat Mar 8 20:21:16 2014 From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben) Date: Sat, 8 Mar 2014 17:21:16 -0800 (PST) Subject: [infinispan-dev] Infinispan embedded off-heap cache In-Reply-To: <31F68D3C-07CF-41FF-A4C9-33EEF9CA2D2A@redhat.com> References: <52D67A1E.2010509@redhat.com> <1389808266330-4028649.post@n3.nabble.com> <1389893871449-4028653.post@n3.nabble.com> <1391715983011-4028794.post@n3.nabble.com> <3560BCBE-3D25-472B-9D8B-C16230FBCB4B@redhat.com> <1391992923651-4028800.post@n3.nabble.com> <1392670937953-4028836.post@n3.nabble.com> <1393861594878-4028931.post@n3.nabble.com> <31F68D3C-07CF-41FF-A4C9-33EEF9CA2D2A@redhat.com> Message-ID: <007101cf3b35$d3f7eda0$7be7c8e0$@alumni.rutgers.edu> ? When you ask for 100 GB of mapped memory, on Linux doesn't need to allocate any pages to it, you just change the virtual memory table of your process. (This must in turn use some memory but the overhead is low) Got it! Thanks Peter it?s coming back to me (been a while since I last studied SystemV process subsystem internals). From: Peter Lawrey [mailto:peter.lawrey at higherfrequencytrading.com] Sent: Saturday, March 8, 2014 8:03 PM To: ben.cotton at alumni.rutgers.edu Cc: Ben Cotton; Justin P Dildy; Dmitry Gordeev Subject: Re: [infinispan-dev] Infinispan embedded off-heap cache When you ask for 100 GB of mapped memory, on Linux doesn't need to allocate any pages to it, you just change the virtual memory table of your process. (This must in turn use some memory but the overhead is low) e.g. even in plain NIO, you can create a 1 GB mapping to a new file. At this point it doesn't need to allocate any memory, nor disk space as you haven't used any of it yet. You write to every 1 MB of this 1 GB file. i.e. 1024 times. What happens? well it need to allocate at least one page i.e. 4 KB, for each write (even if just a byte). At this point the file will be 4KB * 1024, i.e. only 4 MB of memory and disk space is used at this point. If you plan for this, you can optimise the structure to only use the pages needed as much as possible. In the example I gave, each entry has up to 64 KB. This means each entry uses 64 KB of *address space* not memory, not disk space. So if you write only 1KB, it has to use a page, 4 KB but not the whole 64 KB. i.e. once you make the maximum size well over 4 KB, you may as well go for broke because it won't make much difference to the memory or disk space used. BTW, This is how Linux/UNIX behaves, Windows is lame this way and over committing memory doesn't work so well. On 9 March 2014 11:52, wrote: ? How you might wonder? You got that right! Let me study this Truth is my first, second nor third reads set off the ?light bulb? in my head. ? This works because even though I only have 7.7 GB after the OS, I can create a SHM of 137 GB because this only uses 21 MB of actual disk space/memory. Still wondering ?How??, Peter. LOL. I?m mean I?m sure it is true . But I have no insight to how it is true. Again, Let me study the code for this one. From: Peter Lawrey [mailto:peter.lawrey at higherfrequencytrading.com] Sent: Saturday, March 8, 2014 7:41 PM To: ben.cotton at alumni.rutgers.edu Cc: Ben Cotton; Justin P Dildy; Dmitry Gordeev; ml-node+s980875n4028967h94 at n3.nabble.com Subject: Re: [infinispan-dev] Infinispan embedded off-heap cache You might find this example interesting. While SHM is not expandable, this is not as much of a problem as it might seem. SHM uses virtual memory and leave the OS to map it to real memory as required. This means you can over allocate extents with a very low cost on Linux. https://github.com/OpenHFT/HugeCollections/blob/master/collections/src/test/ java/net/openhft/collections/OSResizesMain.java In the example above I create extents for an SHM which much larger than main memory and it takes a fraction of a second to do this. How you might wonder? It prints System memory= 7.7 GB, Size of map is 137.5 GB, disk used= 21MB This works because even though I only have 7.7 GB after the OS, I can create a SHM of 137 GB because this only uses 21 MB of actual disk space/memory. You can freely over allocate the size on the basis that the system only uses the resources it needs. On 8 March 2014 04:06, Ben Cotton wrote: Thanks Peter. The plan is for Dmitry and I to at first extend VanillaSharedHashMap and groom it into an ISPN7 join via their DataContainer API bridge. That ExtendedVSHM will be morphed into a fully inter-operable JCACHE operand will - initially be brokered by the ISPN 7 config (JSR107 <----> VSHMExtendedDataContainer <----> VSHM) - eventually, possibly, be rendered with ExtendedVSHM directly implementing javax.cache.Cache (in addititon to DataContainer) On 03/07/2014 11:43 AM, Peter Lawrey wrote: In the medium term I would see SHM supporting a DataContainer. If a Cache were supported I would do it as a layered class so those who don't need the functionality of a Cache don't incur an overhead. On 8 Mar 2014 03:35, "Ben Cotton" wrote: Thank you for this insight Mircea ... Ultimately ... I want the OpenHFT SHM off-heap operand to behave *exactly* like a JCACHE ... Amenable to being soundly/completely operated upon by any/all parts of ISPN7's Impl of the JSR-107 API . Musing openly: Won't that (eventually) necessitate me physically implementing javax.cache.Cache ? > Another way to do it is to have CacheImpl implement the DataContainer only, and then configure Infinispan's JCache implementation to use that custom DataContainer. I see what you mean. OK, for sure, this sounds much simpler than what I have put on my initial TODO list. Question: Will doing it this way in any manner suggest that my JSR-107 specific operators are being transitively "brokered" by the ISPN config onto my OpenHFT SHM operand? If possible, I want everything to be direct -- no API bridge. Thanks again, Mircea. -Ben and Dmitry Sent from my iPhone On 03/07/2014 09:45 AM, Mircea Markus-2 [via Infinispan Developer List] wrote: Hi Ben, In the diagram provided, the CacheImpl (your class) extends both from javax.cache.Cache and org.infinispan.container.DataContainer. The Cache and DataContainer interfaces are quite different and I anticipate an single class implementing both to be hard to follow and potentially not very efficient. Another way to do it is to have CacheImpl implement the DataContainer only, and then configure Infinispan's JCache implementation to use that custom DataContainer. On Mar 3, 2014, at 3:46 PM, cotton-ben <[hidden email] > wrote: > Quick Update: > > It is my understandng that Peter Lawrey will make available an OpenHFT HC > Alpha Release in Maven Central next weekend. At that time, Dmitry Gordeev > and I will take the OpenHFT dependency tag and proceed to build a branch of > Red Hat's ISPN 7 that will treat net.openhft.collections.SharedHashMap as a > Red Hat Infinispan 7 default impl of a fully JSR-107 interoperable off-heap > javax.cache.Cache ... > > A diagram of this build effort can be found here: > https://raw.github.com/Cotton-Ben/OpenHFT/master/doc/AdaptingOpenHFT-SHM-as- JCACHE-Impl.jpg > ... > > The Red Hat view of his effort will be tracked here: > https://issues.jboss.org/browse/ISPN-871 ... > > The code that defines the Impl will be here > https://github.com/Cotton-Ben/infinispan ... > > > > > > -- > View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinis pan-embedded-off-heap-cache-tp4026102p4028931.html > Sent from the Infinispan Developer List mailing list archive at Nabble.com. > _______________________________________________ > infinispan-dev mailing list > [hidden email] > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) _______________________________________________ infinispan-dev mailing list [hidden email] https://lists.jboss.org/mailman/listinfo/infinispan-dev _____ If you reply to this email, your message will be added to the discussion below: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinis pan-embedded-off-heap-cache-tp4026102p4028967.html To start a new topic under Infinispan Developer List, email ml-node+s980875n2085493h0 at n3.nabble.com To unsubscribe from [infinispan-dev] Infinispan embedded off-heap cache, click here . NAML -- View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028983.html Sent from the Infinispan Developer List mailing list archive at Nabble.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140308/70ab25c8/attachment-0001.html From ben.cotton at ALUMNI.RUTGERS.EDU Sun Mar 9 19:58:06 2014 From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben) Date: Sun, 9 Mar 2014 16:58:06 -0700 (PDT) Subject: [infinispan-dev] Infinispan embedded off-heap cache In-Reply-To: <007101cf3b35$d3f7eda0$7be7c8e0$@alumni.rutgers.edu> References: <1389808266330-4028649.post@n3.nabble.com> <1389893871449-4028653.post@n3.nabble.com> <1391715983011-4028794.post@n3.nabble.com> <3560BCBE-3D25-472B-9D8B-C16230FBCB4B@redhat.com> <1391992923651-4028800.post@n3.nabble.com> <1392670937953-4028836.post@n3.nabble.com> <1393861594878-4028931.post@n3.nabble.com> <31F68D3C-07CF-41FF-A4C9-33EEF9CA2D2A@redhat.com> <007101cf3b35$d3f7eda0$7be7c8e0$@alumni.rutgers.edu> Message-ID: <1394409486169-4028984.post@n3.nabble.com> upgraded to OpenHFT 3.0b, stubbed out JCacheOffHeapImpl with 'super.XXX()' invokes onto Peter's VanillaSharedHashMap impl. TODO: morph get()/peek()/remove() to return RedHat ICE TODO: confirm from Mircea/RedHat that the [jcache-off-heap] build module is appropriate ISPN 7 join point. Cotton-Ben/infinispan at d9f408d -- View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028984.html Sent from the Infinispan Developer List mailing list archive at Nabble.com. From galder at redhat.com Mon Mar 10 07:00:52 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Mon, 10 Mar 2014 12:00:52 +0100 Subject: [infinispan-dev] Permission to list you as our contact In-Reply-To: <53198C06.9020804@oracle.com> References: <53198C06.9020804@oracle.com> Message-ID: Hi Rory, On 07 Mar 2014, at 10:06, Rory O'Donnell Oracle, Dublin Ireland wrote: > Hi Galder, > > The Adopt OpenJDK Group are promoting the testing of FOSS projects with OpenJDK builds, > whether their own, or from someone else. We want to acknowledge projects who are actively > testing, providing feedback and any issues they have found during their testing etc. > > A draft of the page is now available here > > Is it ok to add your name as our contact , is there a mailing list I should copy ? Yeah, not a problem to add my name/contact. Mailing list wise, you can use infinispan-dev at lists.jboss.org, which is the development list of the Infinispan project for which I work for. Cheers, > > Rgds,Rory > -- > Rgds,Rory O'Donnell > Quality Engineering Manager > Oracle EMEA , Dublin, Ireland > -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From galder at redhat.com Mon Mar 10 08:00:24 2014 From: galder at redhat.com (=?iso-8859-1?Q?Galder_Zamarre=F1o?=) Date: Mon, 10 Mar 2014 13:00:24 +0100 Subject: [infinispan-dev] Cluster Listeners 7.0.0.Alpha1 In-Reply-To: References: Message-ID: <6C2A3B09-16FB-4BD1-9686-D33C7EB5C6D0@redhat.com> Great write up Will :)) On 05 Mar 2014, at 16:02, William Burns wrote: > Hello everyone, > > I am happy to announce that the latest Infinispan 7.0.0.Alpha1 build > has the first pass of Cluster Listeners implemented. > > You can read all about the details at the blog post [1]. > > You can get the latest build of Infinispan from our site [2]. > > Try it out and let us know what you guys think. > > Happy listening! > > - Will > > > [1] http://blog.infinispan.org/2014/03/embedded-cluster-listeners-in.html > [2] http://infinispan.org/download/ > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From sanne at infinispan.org Mon Mar 10 08:06:47 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Mon, 10 Mar 2014 12:06:47 +0000 Subject: [infinispan-dev] Query.getResultSize() to be available on the simplified DSL? Message-ID: Hi all, we are exposing a nice feature inherited from the Search engine via the "simple" DSL version, the one which is also available via Hot Rod: org.infinispan.query.dsl.Query.getResultSize() To be fair I hadn't noticed we do expose this, I just noticed after a recent PR review and I found it surprising. This method returns the size of the full resultset, disregarding pagination options; you can imagine it fit for situations like: "found 6 million matches, these are the top 20: " A peculiarity of Hibernate Search is that the total number of matches is extremely cheap to figure out as it's generally a side effect of finding the 20 results. Essentially we're just exposing an int value which was already computed: very cheap, and happens to be useful in practice. This is not the case with a SQL statement, in this case you'd have to craft 2 different SQL statements, often incurring the cost of 2 round trips to the database. So this getResultSize() is not available on the Hibernate ORM Query, only on our FullTextQuery extension. Now my doubt is if it is indeed a wise move to expose this method on the simplified DSL. Of course some people might find it useful, still I'm wondering how much we'll be swearing at needing to maintain this feature vs its usefulness when we'll implement alternative execution engines to run queries, not least on Map/Reduce based filtering, and ultimately hybrid strategies. In case of Map/Reduce I think we'll need to keep track of possible de-duplication of results, in case of a Teiid integration it might need a second expensive query; so in this case I'd expect this method to be lazily evaluated. Should we rather remove this functionality? Sanne From sanne at infinispan.org Mon Mar 10 08:23:10 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Mon, 10 Mar 2014 12:23:10 +0000 Subject: [infinispan-dev] Query.getResultSize() to be available on the simplified DSL? Message-ID: Hi all, we are exposing a nice feature inherited from the Search engine via the "simple" DSL version, the one which is also available via Hot Rod: org.infinispan.query.dsl.Query.getResultSize() To be fair I hadn't noticed we do expose this, I just noticed after a recent PR review and I found it surprising. This method returns the size of the full resultset, disregarding pagination options; you can imagine it fit for situations like: "found 6 million matches, these are the top 20: " A peculiarity of Hibernate Search is that the total number of matches is extremely cheap to figure out as it's generally a side effect of finding the 20 results. Essentially we're just exposing an int value which was already computed: very cheap, and happens to be useful in practice. This is not the case with a SQL statement, in this case you'd have to craft 2 different SQL statements, often incurring the cost of 2 round trips to the database. So this getResultSize() is not available on the Hibernate ORM Query, only on our FullTextQuery extension. Now my doubt is if it is indeed a wise move to expose this method on the simplified DSL. Of course some people might find it useful, still I'm wondering how much we'll be swearing at needing to maintain this feature vs its usefulness when we'll implement alternative execution engines to run queries, not least on Map/Reduce based filtering, and ultimately hybrid strategies. In case of Map/Reduce I think we'll need to keep track of possible de-duplication of results, in case of a Teiid integration it might need a second expensive query; so in this case I'd expect this method to be lazily evaluated. Should we rather remove this functionality? Sanne From anistor at redhat.com Mon Mar 10 09:16:18 2014 From: anistor at redhat.com (Adrian Nistor) Date: Mon, 10 Mar 2014 15:16:18 +0200 Subject: [infinispan-dev] Query.getResultSize() to be available on the simplified DSL? In-Reply-To: References: Message-ID: <531DBB22.5010303@redhat.com> I'd vote for keeping it, and executing it lazily in environments where it is costly to compute it upfront. And off course, document this properly so users will be aware it can incur a second execution, with significant performance impact and also possibly a data visibility/consistency impact. I'd do this because the api is meant to be first of all user friendly and useful, not just machine friendly and efficient. There's another reason for having it. Say we remove it, how will users be able to know the total number of matching results? Our DSL does not currently have a 'count' function. Maybe we should add such a thing first, and then think about removing Query.getResultsSize(). But, if we implement a proper 'count', getResultsSize() could be trivially implemented as some kind of syntactic sugar on top of it, so I would still consider it worth being in the API. And then it all boils down to the question: /should the DSL provide a count function?/ (+1 from me) Cheers On 03/10/2014 02:23 PM, Sanne Grinovero wrote: > Hi all, > we are exposing a nice feature inherited from the Search engine via > the "simple" DSL version, the one which is also available via Hot Rod: > > org.infinispan.query.dsl.Query.getResultSize() > > To be fair I hadn't noticed we do expose this, I just noticed after a > recent PR review and I found it surprising. > > This method returns the size of the full resultset, disregarding > pagination options; you can imagine it fit for situations like: > > "found 6 million matches, these are the top 20: " > > A peculiarity of Hibernate Search is that the total number of matches > is extremely cheap to figure out as it's generally a side effect of > finding the 20 results. Essentially we're just exposing an int value > which was already computed: very cheap, and happens to be useful in > practice. > > This is not the case with a SQL statement, in this case you'd have to > craft 2 different SQL statements, often incurring the cost of 2 round > trips to the database. So this getResultSize() is not available on the > Hibernate ORM Query, only on our FullTextQuery extension. > > Now my doubt is if it is indeed a wise move to expose this method on > the simplified DSL. Of course some people might find it useful, still > I'm wondering how much we'll be swearing at needing to maintain this > feature vs its usefulness when we'll implement alternative execution > engines to run queries, not least on Map/Reduce based filtering, and > ultimately hybrid strategies. > > In case of Map/Reduce I think we'll need to keep track of possible > de-duplication of results, in case of a Teiid integration it might > need a second expensive query; so in this case I'd expect this method > to be lazily evaluated. > > Should we rather remove this functionality? > > Sanne > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140310/8564cc47/attachment.html From sanne at infinispan.org Mon Mar 10 11:12:35 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Mon, 10 Mar 2014 15:12:35 +0000 Subject: [infinispan-dev] Query.getResultSize() to be available on the simplified DSL? In-Reply-To: <531DBB22.5010303@redhat.com> References: <531DBB22.5010303@redhat.com> Message-ID: Ok you make some good points, and I've no doubts of it being useful. My only concern is that this could slow us down significantly in providing other features which might be even more useful or pressing. You have to pick your battles and be wise on where to spend energy first. Considering that it's easier to add methods than to remove them, what would you think of marking this as experimental for now? I'd prefer to see the non-indexed query engine delivered first; this sounds like being a stone on the critical path so it might be wise to have the option to drop the requirement from a first implementation. Definitely you're right that we should then implement "some" COUNT strategy, I'm just not comfortable in committing on this one yet. Now on a general purpose COUNT: for sure we need one but it's a pandora's box you're opening. In a sense there is a parallelism conceptually with my concerns on the API contract we provide for the clear() method. too keep it short in this context as we're changing subject, I don't think we'll ever be able to provide a solid guarantee of a fully reliable value: indexes are not updated in transaction yet, and M/R does cross boundaries of nodes and datacontainer/cachestore without making a consistent read snapshot. We should document any such API as to providing a best effort estimate. On 10 March 2014 13:16, Adrian Nistor wrote: > I'd vote for keeping it, and executing it lazily in environments where it is > costly to compute it upfront. > > And off course, document this properly so users will be aware it can incur a > second execution, with significant performance impact and also possibly a > data visibility/consistency impact. I'd do this because the api is meant to > be first of all user friendly and useful, not just machine friendly and > efficient. > > There's another reason for having it. Say we remove it, how will users be > able to know the total number of matching results? Our DSL does not > currently have a 'count' function. Maybe we should add such a thing first, > and then think about removing Query.getResultsSize(). > > But, if we implement a proper 'count', getResultsSize() could be trivially > implemented as some kind of syntactic sugar on top of it, so I would still > consider it worth being in the API. > > And then it all boils down to the question: should the DSL provide a count > function? (+1 from me) > > Cheers > > > On 03/10/2014 02:23 PM, Sanne Grinovero wrote: > > Hi all, > we are exposing a nice feature inherited from the Search engine via > the "simple" DSL version, the one which is also available via Hot Rod: > > org.infinispan.query.dsl. > Query.getResultSize() > > To be fair I hadn't noticed we do expose this, I just noticed after a > recent PR review and I found it surprising. > > This method returns the size of the full resultset, disregarding > pagination options; you can imagine it fit for situations like: > > "found 6 million matches, these are the top 20: " > > A peculiarity of Hibernate Search is that the total number of matches > is extremely cheap to figure out as it's generally a side effect of > finding the 20 results. Essentially we're just exposing an int value > which was already computed: very cheap, and happens to be useful in > practice. > > This is not the case with a SQL statement, in this case you'd have to > craft 2 different SQL statements, often incurring the cost of 2 round > trips to the database. So this getResultSize() is not available on the > Hibernate ORM Query, only on our FullTextQuery extension. > > Now my doubt is if it is indeed a wise move to expose this method on > the simplified DSL. Of course some people might find it useful, still > I'm wondering how much we'll be swearing at needing to maintain this > feature vs its usefulness when we'll implement alternative execution > engines to run queries, not least on Map/Reduce based filtering, and > ultimately hybrid strategies. > > In case of Map/Reduce I think we'll need to keep track of possible > de-duplication of results, in case of a Teiid integration it might > need a second expensive query; so in this case I'd expect this method > to be lazily evaluated. > > Should we rather remove this functionality? > > Sanne > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From mmarkus at redhat.com Mon Mar 10 13:06:24 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Mon, 10 Mar 2014 17:06:24 +0000 Subject: [infinispan-dev] Query.getResultSize() to be available on the simplified DSL? In-Reply-To: <531DBB22.5010303@redhat.com> References: <531DBB22.5010303@redhat.com> Message-ID: <075346A7-1E34-45D2-970B-912E1142A215@redhat.com> On Mar 10, 2014, at 13:16, Adrian Nistor wrote: > I'd vote for keeping it, and executing it lazily in environments where it is costly to compute it upfront. from a user perspective I see a lot of reason to have this in. > > And off course, document this properly so users will be aware it can incur a second execution, with significant performance impact and also possibly a data visibility/consistency impact. I'd do this because the api is meant to be first of all user friendly and useful, not just machine friendly and efficient. > > There's another reason for having it. Say we remove it, how will users be able to know the total number of matching results? Our DSL does not currently have a 'count' function. Maybe we should add such a thing first, and then think about removing Query.getResultsSize(). > > But, if we implement a proper 'count', getResultsSize() could be trivially implemented as some kind of syntactic sugar on top of it, so I would still consider it worth being in the API. > > And then it all boils down to the question: should the DSL provide a count function? (+1 from me) > > Cheers > > On 03/10/2014 02:23 PM, Sanne Grinovero wrote: >> Hi all, >> we are exposing a nice feature inherited from the Search engine via >> the "simple" DSL version, the one which is also available via Hot Rod: >> >> org.infinispan.query.dsl.Query.getResultSize() >> >> To be fair I hadn't noticed we do expose this, I just noticed after a >> recent PR review and I found it surprising. >> >> This method returns the size of the full resultset, disregarding >> pagination options; you can imagine it fit for situations like: >> >> "found 6 million matches, these are the top 20: " >> >> A peculiarity of Hibernate Search is that the total number of matches >> is extremely cheap to figure out as it's generally a side effect of >> finding the 20 results. Essentially we're just exposing an int value >> which was already computed: very cheap, and happens to be useful in >> practice. >> >> This is not the case with a SQL statement, in this case you'd have to >> craft 2 different SQL statements, often incurring the cost of 2 round >> trips to the database. So this getResultSize() is not available on the >> Hibernate ORM Query, only on our FullTextQuery extension. >> >> Now my doubt is if it is indeed a wise move to expose this method on >> the simplified DSL. Of course some people might find it useful, still >> I'm wondering how much we'll be swearing at needing to maintain this >> feature vs its usefulness when we'll implement alternative execution >> engines to run queries, not least on Map/Reduce based filtering, and >> ultimately hybrid strategies. >> >> In case of Map/Reduce I think we'll need to keep track of possible >> de-duplication of results, in case of a Teiid integration it might >> need a second expensive query; so in this case I'd expect this method >> to be lazily evaluated. >> >> Should we rather remove this functionality? >> >> Sanne >> _______________________________________________ >> infinispan-dev mailing list >> >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Mon Mar 10 13:09:29 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Mon, 10 Mar 2014 17:09:29 +0000 Subject: [infinispan-dev] Query.getResultSize() to be available on the simplified DSL? In-Reply-To: References: <531DBB22.5010303@redhat.com> Message-ID: <527B79F0-5C20-4B98-92FB-4FC15E4EB8C8@redhat.com> On Mar 10, 2014, at 15:12, Sanne Grinovero wrote: > Ok you make some good points, and I've no doubts of it being useful. > > My only concern is that this could slow us down significantly in > providing other features which might be even more useful or pressing. > You have to pick your battles and be wise on where to spend energy > first. > > Considering that it's easier to add methods than to remove them, what > would you think of marking this as experimental for now? > I'd prefer to see the non-indexed query engine delivered first; this > sounds like being a stone on the critical path so it might be wise to > have the option to drop the requirement from a first implementation. > Definitely you're right that we should then implement "some" COUNT > strategy, I'm just not comfortable in committing on this one yet. I can imagine a lot of users emulating this by simply iterating over the entries in the result set. Even if we do just that and document it as slow, I think it's still worth exposing this somewhere. > > Now on a general purpose COUNT: for sure we need one but it's a > pandora's box you're opening. In a sense there is a parallelism > conceptually with my concerns on the API contract we provide for the > clear() method. too keep it short in this context as we're changing > subject, I don't think we'll ever be able to provide a solid guarantee > of a fully reliable value: indexes are not updated in transaction yet, > and M/R does cross boundaries of nodes and datacontainer/cachestore > without making a consistent read snapshot. We should document any such > API as to providing a best effort estimate. > > > > On 10 March 2014 13:16, Adrian Nistor wrote: >> I'd vote for keeping it, and executing it lazily in environments where it is >> costly to compute it upfront. >> >> And off course, document this properly so users will be aware it can incur a >> second execution, with significant performance impact and also possibly a >> data visibility/consistency impact. I'd do this because the api is meant to >> be first of all user friendly and useful, not just machine friendly and >> efficient. >> >> There's another reason for having it. Say we remove it, how will users be >> able to know the total number of matching results? Our DSL does not >> currently have a 'count' function. Maybe we should add such a thing first, >> and then think about removing Query.getResultsSize(). >> >> But, if we implement a proper 'count', getResultsSize() could be trivially >> implemented as some kind of syntactic sugar on top of it, so I would still >> consider it worth being in the API. >> >> And then it all boils down to the question: should the DSL provide a count >> function? (+1 from me) >> >> Cheers >> >> >> On 03/10/2014 02:23 PM, Sanne Grinovero wrote: >> >> Hi all, >> we are exposing a nice feature inherited from the Search engine via >> the "simple" DSL version, the one which is also available via Hot Rod: >> >> org.infinispan.query.dsl. >> Query.getResultSize() >> >> To be fair I hadn't noticed we do expose this, I just noticed after a >> recent PR review and I found it surprising. >> >> This method returns the size of the full resultset, disregarding >> pagination options; you can imagine it fit for situations like: >> >> "found 6 million matches, these are the top 20: " >> >> A peculiarity of Hibernate Search is that the total number of matches >> is extremely cheap to figure out as it's generally a side effect of >> finding the 20 results. Essentially we're just exposing an int value >> which was already computed: very cheap, and happens to be useful in >> practice. >> >> This is not the case with a SQL statement, in this case you'd have to >> craft 2 different SQL statements, often incurring the cost of 2 round >> trips to the database. So this getResultSize() is not available on the >> Hibernate ORM Query, only on our FullTextQuery extension. >> >> Now my doubt is if it is indeed a wise move to expose this method on >> the simplified DSL. Of course some people might find it useful, still >> I'm wondering how much we'll be swearing at needing to maintain this >> feature vs its usefulness when we'll implement alternative execution >> engines to run queries, not least on Map/Reduce based filtering, and >> ultimately hybrid strategies. >> >> In case of Map/Reduce I think we'll need to keep track of possible >> de-duplication of results, in case of a Teiid integration it might >> need a second expensive query; so in this case I'd expect this method >> to be lazily evaluated. >> >> Should we rather remove this functionality? >> >> Sanne >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From ben.cotton at ALUMNI.RUTGERS.EDU Mon Mar 10 19:43:02 2014 From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben) Date: Mon, 10 Mar 2014 16:43:02 -0700 (PDT) Subject: [infinispan-dev] Infinispan embedded off-heap cache In-Reply-To: <1394409486169-4028984.post@n3.nabble.com> References: <1389893871449-4028653.post@n3.nabble.com> <1391715983011-4028794.post@n3.nabble.com> <3560BCBE-3D25-472B-9D8B-C16230FBCB4B@redhat.com> <1391992923651-4028800.post@n3.nabble.com> <1392670937953-4028836.post@n3.nabble.com> <1393861594878-4028931.post@n3.nabble.com> <31F68D3C-07CF-41FF-A4C9-33EEF9CA2D2A@redhat.com> <007101cf3b35$d3f7eda0$7be7c8e0$@alumni.rutgers.edu> <1394409486169-4028984.post@n3.nabble.com> Message-ID: <1394494982252-4028993.post@n3.nabble.com> As per Mircea, I moved my Fork's [jcache-off-heap] ISPN7 module to new name [off-heap] ISPN7 module. We will now use the org.inifnispan.container.DataContainer API to cross the "Red Hat bridge" to JCACHE. https://github.com/Cotton-Ben/infinispan/commit/1ea7b859fabe181fa453f15ce1f746dd68691d32 -- View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028993.html Sent from the Infinispan Developer List mailing list archive at Nabble.com. From galder at redhat.com Tue Mar 11 04:46:11 2014 From: galder at redhat.com (=?iso-8859-1?Q?Galder_Zamarre=F1o?=) Date: Tue, 11 Mar 2014 09:46:11 +0100 Subject: [infinispan-dev] Thirty Percent Feedback Message-ID: Emmanuel posted this on the Hibernate list, some interesting ideas in it :) http://blog.42floors.com/thirty-percent-feedback/ Cheers, -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From sanne at infinispan.org Tue Mar 11 04:47:07 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Tue, 11 Mar 2014 08:47:07 +0000 Subject: [infinispan-dev] Query.getResultSize() to be available on the simplified DSL? In-Reply-To: <527B79F0-5C20-4B98-92FB-4FC15E4EB8C8@redhat.com> References: <531DBB22.5010303@redhat.com> <527B79F0-5C20-4B98-92FB-4FC15E4EB8C8@redhat.com> Message-ID: On 10 March 2014 17:09, Mircea Markus wrote: > > On Mar 10, 2014, at 15:12, Sanne Grinovero wrote: > >> Ok you make some good points, and I've no doubts of it being useful. >> >> My only concern is that this could slow us down significantly in >> providing other features which might be even more useful or pressing. >> You have to pick your battles and be wise on where to spend energy >> first. >> >> Considering that it's easier to add methods than to remove them, what >> would you think of marking this as experimental for now? >> I'd prefer to see the non-indexed query engine delivered first; this >> sounds like being a stone on the critical path so it might be wise to >> have the option to drop the requirement from a first implementation. >> Definitely you're right that we should then implement "some" COUNT >> strategy, I'm just not comfortable in committing on this one yet. > > I can imagine a lot of users emulating this by simply iterating over the entries in the result set. Even if we do just that and document it as slow, I think it's still worth exposing this somewhere. I'm not questioning it to be useful. But the implementation is tricky, for example simply iterating would require a global (and distributed) lock to be accurate. Otherwise we can only document it as an approximation, and worst is we can't even estimate an error margin: under steady load it would probably be a reasonable estimation, but there are corner cases in which you can get off by several orders of magnitude. Among others, your assumptions would need to include: - no nodes failing (no state transfers happening) - no write spikes (probably one of the best reasons to deploy infinispan is to be able to absorb spikes) - no wild churning across datastore/cachestores - expirations happening in a homogeneus pattern (this entirely depends on the use case) Also there is no way to make this work in the context of a transaction, as it would pretty much violate any promise of repeatable read properties. So the question is, if a user would still consider it useful after (hopefully) understanding all the attached strings. In other contexts we discussed the need for Infinispan to provide something like a snapshot capability based on TOA. If we had that, we could implement a count operation on top of it. Sanne From mmarkus at redhat.com Tue Mar 11 06:44:51 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Tue, 11 Mar 2014 10:44:51 +0000 Subject: [infinispan-dev] Thirty Percent Feedback In-Reply-To: References: Message-ID: On Mar 11, 2014, at 8:46, Galder Zamarre?o wrote: > Emmanuel posted this on the Hibernate list, some interesting ideas in it :) > http://blog.42floors.com/thirty-percent-feedback/ Very interesting Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From anistor at redhat.com Tue Mar 11 09:11:12 2014 From: anistor at redhat.com (Adrian Nistor) Date: Tue, 11 Mar 2014 15:11:12 +0200 Subject: [infinispan-dev] Query.getResultSize() to be available on the simplified DSL? In-Reply-To: References: <531DBB22.5010303@redhat.com> <527B79F0-5C20-4B98-92FB-4FC15E4EB8C8@redhat.com> Message-ID: <531F0B70.7010003@redhat.com> I think that technical trickiness is also required for implementing pagination, and would probably also suffer from same limitations/approximations. Should we remove pagination too from the api? On 03/11/2014 10:47 AM, Sanne Grinovero wrote: > On 10 March 2014 17:09, Mircea Markus wrote: >> On Mar 10, 2014, at 15:12, Sanne Grinovero wrote: >> >>> Ok you make some good points, and I've no doubts of it being useful. >>> >>> My only concern is that this could slow us down significantly in >>> providing other features which might be even more useful or pressing. >>> You have to pick your battles and be wise on where to spend energy >>> first. >>> >>> Considering that it's easier to add methods than to remove them, what >>> would you think of marking this as experimental for now? >>> I'd prefer to see the non-indexed query engine delivered first; this >>> sounds like being a stone on the critical path so it might be wise to >>> have the option to drop the requirement from a first implementation. >>> Definitely you're right that we should then implement "some" COUNT >>> strategy, I'm just not comfortable in committing on this one yet. >> I can imagine a lot of users emulating this by simply iterating over the entries in the result set. Even if we do just that and document it as slow, I think it's still worth exposing this somewhere. > I'm not questioning it to be useful. But the implementation is tricky, > for example simply iterating would require a global (and distributed) > lock to be accurate. > Otherwise we can only document it as an approximation, and worst is we > can't even estimate an error margin: under steady load it would > probably be a reasonable estimation, but there are corner cases in > which you can get off by several orders of magnitude. > Among others, your assumptions would need to include: > - no nodes failing (no state transfers happening) > - no write spikes (probably one of the best reasons to deploy > infinispan is to be able to absorb spikes) > - no wild churning across datastore/cachestores > - expirations happening in a homogeneus pattern (this entirely > depends on the use case) > > Also there is no way to make this work in the context of a > transaction, as it would pretty much violate any promise of repeatable > read properties. > > So the question is, if a user would still consider it useful after > (hopefully) understanding all the attached strings. > > In other contexts we discussed the need for Infinispan to provide > something like a snapshot capability based on TOA. If we had that, we > could implement a count operation on top of it. > > Sanne > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From anistor at redhat.com Tue Mar 11 09:28:35 2014 From: anistor at redhat.com (Adrian Nistor) Date: Tue, 11 Mar 2014 15:28:35 +0200 Subject: [infinispan-dev] Query.getResultSize() to be available on the simplified DSL? In-Reply-To: <531F0B70.7010003@redhat.com> References: <531DBB22.5010303@redhat.com> <527B79F0-5C20-4B98-92FB-4FC15E4EB8C8@redhat.com> <531F0B70.7010003@redhat.com> Message-ID: <531F0F83.6040308@redhat.com> To be more precise, by pagination I'm referring to methods QueryBuilder.startOffset/maxResults. Since our remote protocol are stateless we would need to re-execute the query to fetch the next page. That leads to the same problem of approximation. On 03/11/2014 03:11 PM, Adrian Nistor wrote: > I think that technical trickiness is also required for implementing > pagination, and would probably also suffer from same > limitations/approximations. > Should we remove pagination too from the api? > > On 03/11/2014 10:47 AM, Sanne Grinovero wrote: >> On 10 March 2014 17:09, Mircea Markus wrote: >>> On Mar 10, 2014, at 15:12, Sanne Grinovero wrote: >>> >>>> Ok you make some good points, and I've no doubts of it being useful. >>>> >>>> My only concern is that this could slow us down significantly in >>>> providing other features which might be even more useful or pressing. >>>> You have to pick your battles and be wise on where to spend energy >>>> first. >>>> >>>> Considering that it's easier to add methods than to remove them, what >>>> would you think of marking this as experimental for now? >>>> I'd prefer to see the non-indexed query engine delivered first; this >>>> sounds like being a stone on the critical path so it might be wise to >>>> have the option to drop the requirement from a first implementation. >>>> Definitely you're right that we should then implement "some" COUNT >>>> strategy, I'm just not comfortable in committing on this one yet. >>> I can imagine a lot of users emulating this by simply iterating over the entries in the result set. Even if we do just that and document it as slow, I think it's still worth exposing this somewhere. >> I'm not questioning it to be useful. But the implementation is tricky, >> for example simply iterating would require a global (and distributed) >> lock to be accurate. >> Otherwise we can only document it as an approximation, and worst is we >> can't even estimate an error margin: under steady load it would >> probably be a reasonable estimation, but there are corner cases in >> which you can get off by several orders of magnitude. >> Among others, your assumptions would need to include: >> - no nodes failing (no state transfers happening) >> - no write spikes (probably one of the best reasons to deploy >> infinispan is to be able to absorb spikes) >> - no wild churning across datastore/cachestores >> - expirations happening in a homogeneus pattern (this entirely >> depends on the use case) >> >> Also there is no way to make this work in the context of a >> transaction, as it would pretty much violate any promise of repeatable >> read properties. >> >> So the question is, if a user would still consider it useful after >> (hopefully) understanding all the attached strings. >> >> In other contexts we discussed the need for Infinispan to provide >> something like a snapshot capability based on TOA. If we had that, we >> could implement a count operation on top of it. >> >> Sanne >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From rhauch at redhat.com Tue Mar 11 10:37:57 2014 From: rhauch at redhat.com (Randall Hauch) Date: Tue, 11 Mar 2014 09:37:57 -0500 Subject: [infinispan-dev] Query.getResultSize() to be available on the simplified DSL? In-Reply-To: References: Message-ID: I?m struggling with this same question in ModeShape. The JCR API exposes a method that returns the number of results, but at least the spec allows the implementation to return -1 if the size is not known (or very expensive to compute). Yet this still does not satisfy all cases. Depending upon the technology, computing the **exact size** ranges from very cheap to extremely expensive to calculate. For example, consider a system that has to take into account access control limitations of the user. My current opinion is that few applications actually need an exact size, and if they do there may be alternatives (like counting as they iterate over the results). An alternative is to expose an **approximate size**, which is likely to be sufficient for generating display or other pre-computed information such as links or paging details. I think that this is sufficient for most needs, and that even an order of magnitude is sufficient. When the results are known to be small, the system might want to determine the exact size (e.g., by iterating). So one option is to expose both methods, but allow the exact size method to return -1 if the system can?t determine the size or if doing so is very expensive. This allows the system a way out for large/complex queries and flexibility in the implementation technology. The approximate size method probably always needs to return at least some usable value. BTW, computing an exact size by iterating can be expensive unless you can keep all the results in memory. That?s not ideal - a query with large results could fill up available memory. If you don?t keep all results in memory, then if you?re going to allow clients to access the results more than once you have to provide a way to buffer the results. On Mar 10, 2014, at 7:23 AM, Sanne Grinovero wrote: > Hi all, > we are exposing a nice feature inherited from the Search engine via > the "simple" DSL version, the one which is also available via Hot Rod: > > org.infinispan.query.dsl.Query.getResultSize() > > To be fair I hadn't noticed we do expose this, I just noticed after a > recent PR review and I found it surprising. > > This method returns the size of the full resultset, disregarding > pagination options; you can imagine it fit for situations like: > > "found 6 million matches, these are the top 20: " > > A peculiarity of Hibernate Search is that the total number of matches > is extremely cheap to figure out as it's generally a side effect of > finding the 20 results. Essentially we're just exposing an int value > which was already computed: very cheap, and happens to be useful in > practice. > > This is not the case with a SQL statement, in this case you'd have to > craft 2 different SQL statements, often incurring the cost of 2 round > trips to the database. So this getResultSize() is not available on the > Hibernate ORM Query, only on our FullTextQuery extension. > > Now my doubt is if it is indeed a wise move to expose this method on > the simplified DSL. Of course some people might find it useful, still > I'm wondering how much we'll be swearing at needing to maintain this > feature vs its usefulness when we'll implement alternative execution > engines to run queries, not least on Map/Reduce based filtering, and > ultimately hybrid strategies. > > In case of Map/Reduce I think we'll need to keep track of possible > de-duplication of results, in case of a Teiid integration it might > need a second expensive query; so in this case I'd expect this method > to be lazily evaluated. > > Should we rather remove this functionality? > > Sanne > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From mmarkus at redhat.com Tue Mar 11 11:30:49 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Tue, 11 Mar 2014 15:30:49 +0000 Subject: [infinispan-dev] Infinispan embedded off-heap cache In-Reply-To: <5319FCA9.30508@alumni.rutgers.edu> References: <52D67A1E.2010509@redhat.com> <1389808266330-4028649.post@n3.nabble.com> <1389893871449-4028653.post@n3.nabble.com> <1391715983011-4028794.post@n3.nabble.com> <3560BCBE-3D25-472B-9D8B-C16230FBCB4B@redhat.com> <1391992923651-4028800.post@n3.nabble.com> <1392670937953-4028836.post@n3.nabble.com> <1393861594878-4028931.post@n3.nabble.com> <31F68D3C-07CF-41FF-A4C9-33EEF9CA2D2A@redhat.com> <5319FCA9.30508@alumni.rutgers.edu> Message-ID: <9852CDBC-0EED-4F0A-88C3-3594ADA25984@redhat.com> On Mar 7, 2014, at 17:11, cotton-ben wrote: > > Thanks Peter. The plan is for Dmitry and I to at first extend VanillaSharedHashMap and groom it into an ISPN7 join via their DataContainer API bridge. Ben, at this stage you already have an JCache implementation backed by the DataContainer, by using the JCache implementation provided by ISPN an configuring Infinispan with the new custom data container. > > That ExtendedVSHM will be morphed into a fully inter-operable JCACHE operand will > > - initially be brokered by the ISPN 7 config (JSR107 <----> VSHMExtendedDataContainer <----> VSHM) > - eventually, possibly, be rendered with ExtendedVSHM directly implementing javax.cache.Cache (in addititon to DataContainer) > > On 03/07/2014 11:43 AM, Peter Lawrey wrote: >> In the medium term I would see SHM supporting a DataContainer. If a Cache were supported I would do it as a layered class so those who don't need the functionality of a Cache don't incur an overhead. >> >> On 8 Mar 2014 03:35, "Ben Cotton" <[hidden email]> wrote: >> Thank you for this insight Mircea ... >> >> Ultimately ... I want the OpenHFT SHM off-heap operand to behave *exactly* like a JCACHE ... Amenable to being soundly/completely operated upon by any/all parts of ISPN7's Impl of the JSR-107 API . >> >> Musing openly: Won't that (eventually) necessitate me physically implementing javax.cache.Cache ? >> >> > Another way to do it is to have CacheImpl implement the DataContainer only, and then configure Infinispan's JCache implementation to use that custom DataContainer. >> >> I see what you mean. OK, for sure, this sounds much simpler than what I have put on my initial TODO list. >> >> Question: Will doing it this way in any manner suggest that my JSR-107 specific operators are being transitively "brokered" by the ISPN config onto my OpenHFT SHM operand? If possible, I want everything to be direct -- no API bridge. >> >> Thanks again, Mircea. >> >> -Ben and Dmitry >> >> Sent from my iPhone >> On 03/07/2014 09:45 AM, Mircea Markus-2 [via Infinispan Developer List] wrote: >>> Hi Ben, >>> >>> In the diagram provided, the CacheImpl (your class) extends both from javax.cache.Cache and org.infinispan.container.DataContainer. >>> The Cache and DataContainer interfaces are quite different and I anticipate an single class implementing both to be hard to follow and potentially not very efficient. >>> Another way to do it is to have CacheImpl implement the DataContainer only, and then configure Infinispan's JCache implementation to use that custom DataContainer. >>> >>> On Mar 3, 2014, at 3:46 PM, cotton-ben <[hidden email]> wrote: >>> >>> > Quick Update: >>> > >>> > It is my understandng that Peter Lawrey will make available an OpenHFT HC >>> > Alpha Release in Maven Central next weekend. At that time, Dmitry Gordeev >>> > and I will take the OpenHFT dependency tag and proceed to build a branch of >>> > Red Hat's ISPN 7 that will treat net.openhft.collections.SharedHashMap as a >>> > Red Hat Infinispan 7 default impl of a fully JSR-107 interoperable off-heap >>> > javax.cache.Cache ... >>> > >>> > A diagram of this build effort can be found here: >>> > https://raw.github.com/Cotton-Ben/OpenHFT/master/doc/AdaptingOpenHFT-SHM-as-JCACHE-Impl.jpg >>> > ... >>> > >>> > The Red Hat view of his effort will be tracked here: >>> > https://issues.jboss.org/browse/ISPN-871 ... >>> > >>> > The code that defines the Impl will be here >>> > https://github.com/Cotton-Ben/infinispan ... >>> > >>> > >>> > >>> > >>> > >>> > -- >>> > View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028931.html >>> > Sent from the Infinispan Developer List mailing list archive at Nabble.com. >>> > _______________________________________________ >>> > infinispan-dev mailing list >>> > [hidden email] >>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> Cheers, >>> -- >>> Mircea Markus >>> Infinispan lead (www.infinispan.org) >>> >>> >>> >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> [hidden email] >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> >>> If you reply to this email, your message will be added to the discussion below: >>> http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028967.html >>> To start a new topic under Infinispan Developer List, email [hidden email] >>> To unsubscribe from [infinispan-dev] Infinispan embedded off-heap cache, click here. >>> NAML >> > > > View this message in context: Re: [infinispan-dev] Infinispan embedded off-heap cache > Sent from the Infinispan Developer List mailing list archive at Nabble.com. > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From ben.cotton at ALUMNI.RUTGERS.EDU Tue Mar 11 11:52:45 2014 From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben) Date: Tue, 11 Mar 2014 08:52:45 -0700 (PDT) Subject: [infinispan-dev] Infinispan embedded off-heap cache In-Reply-To: <9852CDBC-0EED-4F0A-88C3-3594ADA25984@redhat.com> References: <1389893871449-4028653.post@n3.nabble.com> <1391715983011-4028794.post@n3.nabble.com> <3560BCBE-3D25-472B-9D8B-C16230FBCB4B@redhat.com> <1391992923651-4028800.post@n3.nabble.com> <1392670937953-4028836.post@n3.nabble.com> <1393861594878-4028931.post@n3.nabble.com> <31F68D3C-07CF-41FF-A4C9-33EEF9CA2D2A@redhat.com> <5319FCA9.30508@alumni.rutgers.edu> <9852CDBC-0EED-4F0A-88C3-3594ADA25984@redhat.com> Message-ID: <1394553165431-4029001.post@n3.nabble.com> /> Ben, at this stage you already have an JCache implementation backed by the DataContainer, by using the JCache implementation provided by ISPN an configuring Infinispan with the new custom data container. / *Thanks Mircea. We now "see it" and are proceeding accordingly. We have removed all ambition to directly implement JCACHE (in any form) from our Fork. (see newly documented goals at https://github.com/Cotton-Ben/infinispan ) We have also simply re-named the new module [off-heap]. Thanks for your support. We are in full gear to adapt/test/contribute this Fork. Stay tuned.* /> configuring Infinispan with the new custom data container / *Looking forward to learning from you guys exactly how to do this! Thanks!* -- View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4029001.html Sent from the Infinispan Developer List mailing list archive at Nabble.com. From dereed at redhat.com Tue Mar 11 13:16:35 2014 From: dereed at redhat.com (Dennis Reed) Date: Tue, 11 Mar 2014 12:16:35 -0500 Subject: [infinispan-dev] Query.getResultSize() to be available on the simplified DSL? In-Reply-To: References: Message-ID: <531F44F3.2050309@redhat.com> Providing methods that work sometimes and don't work other times is generally a bad idea. No matter how much you document it, users *will* try to use it and expect it to always work (either because they didn't read the docs that say otherwise, they think they'll stick to a configuration where it does work, etc.) And then when it doesn't work (because they pushed something to production which has a different configuration than dev, etc) it's a frustrating experience. -Dennis On 03/11/2014 09:37 AM, Randall Hauch wrote: > I?m struggling with this same question in ModeShape. The JCR API exposes a method that returns the number of results, but at least the spec allows the implementation to return -1 if the size is not known (or very expensive to compute). Yet this still does not satisfy all cases. > > Depending upon the technology, computing the **exact size** ranges from very cheap to extremely expensive to calculate. For example, consider a system that has to take into account access control limitations of the user. My current opinion is that few applications actually need an exact size, and if they do there may be alternatives (like counting as they iterate over the results). > > An alternative is to expose an **approximate size**, which is likely to be sufficient for generating display or other pre-computed information such as links or paging details. I think that this is sufficient for most needs, and that even an order of magnitude is sufficient. When the results are known to be small, the system might want to determine the exact size (e.g., by iterating). > > So one option is to expose both methods, but allow the exact size method to return -1 if the system can?t determine the size or if doing so is very expensive. This allows the system a way out for large/complex queries and flexibility in the implementation technology. The approximate size method probably always needs to return at least some usable value. > > BTW, computing an exact size by iterating can be expensive unless you can keep all the results in memory. That?s not ideal - a query with large results could fill up available memory. If you don?t keep all results in memory, then if you?re going to allow clients to access the results more than once you have to provide a way to buffer the results. > > > On Mar 10, 2014, at 7:23 AM, Sanne Grinovero wrote: > >> Hi all, >> we are exposing a nice feature inherited from the Search engine via >> the "simple" DSL version, the one which is also available via Hot Rod: >> >> org.infinispan.query.dsl.Query.getResultSize() >> >> To be fair I hadn't noticed we do expose this, I just noticed after a >> recent PR review and I found it surprising. >> >> This method returns the size of the full resultset, disregarding >> pagination options; you can imagine it fit for situations like: >> >> "found 6 million matches, these are the top 20: " >> >> A peculiarity of Hibernate Search is that the total number of matches >> is extremely cheap to figure out as it's generally a side effect of >> finding the 20 results. Essentially we're just exposing an int value >> which was already computed: very cheap, and happens to be useful in >> practice. >> >> This is not the case with a SQL statement, in this case you'd have to >> craft 2 different SQL statements, often incurring the cost of 2 round >> trips to the database. So this getResultSize() is not available on the >> Hibernate ORM Query, only on our FullTextQuery extension. >> >> Now my doubt is if it is indeed a wise move to expose this method on >> the simplified DSL. Of course some people might find it useful, still >> I'm wondering how much we'll be swearing at needing to maintain this >> feature vs its usefulness when we'll implement alternative execution >> engines to run queries, not least on Map/Reduce based filtering, and >> ultimately hybrid strategies. >> >> In case of Map/Reduce I think we'll need to keep track of possible >> de-duplication of results, in case of a Teiid integration it might >> need a second expensive query; so in this case I'd expect this method >> to be lazily evaluated. >> >> Should we rather remove this functionality? >> >> Sanne >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From rhauch at redhat.com Tue Mar 11 14:23:46 2014 From: rhauch at redhat.com (Randall Hauch) Date: Tue, 11 Mar 2014 13:23:46 -0500 Subject: [infinispan-dev] Query.getResultSize() to be available on the simplified DSL? In-Reply-To: <531F44F3.2050309@redhat.com> References: <531F44F3.2050309@redhat.com> Message-ID: <87025135-6CAC-4211-AF94-0759F92AE415@redhat.com> I disagree. Most developers have access to the JavaDoc, and if even moderately well-written, they will find out what the method returns and when. It?s no different than a method sometimes returning null rather than an object reference. On Mar 11, 2014, at 12:16 PM, Dennis Reed wrote: > Providing methods that work sometimes and don't work other times is > generally a bad idea. > > No matter how much you document it, users *will* try to use it and > expect it to always work > (either because they didn't read the docs that say otherwise, they think > they'll stick to a configuration where it does work, etc.) > > And then when it doesn't work (because they pushed something to > production which has a different configuration than dev, etc) > it's a frustrating experience. > > -Dennis > > On 03/11/2014 09:37 AM, Randall Hauch wrote: >> I?m struggling with this same question in ModeShape. The JCR API exposes a method that returns the number of results, but at least the spec allows the implementation to return -1 if the size is not known (or very expensive to compute). Yet this still does not satisfy all cases. >> >> Depending upon the technology, computing the **exact size** ranges from very cheap to extremely expensive to calculate. For example, consider a system that has to take into account access control limitations of the user. My current opinion is that few applications actually need an exact size, and if they do there may be alternatives (like counting as they iterate over the results). >> >> An alternative is to expose an **approximate size**, which is likely to be sufficient for generating display or other pre-computed information such as links or paging details. I think that this is sufficient for most needs, and that even an order of magnitude is sufficient. When the results are known to be small, the system might want to determine the exact size (e.g., by iterating). >> >> So one option is to expose both methods, but allow the exact size method to return -1 if the system can?t determine the size or if doing so is very expensive. This allows the system a way out for large/complex queries and flexibility in the implementation technology. The approximate size method probably always needs to return at least some usable value. >> >> BTW, computing an exact size by iterating can be expensive unless you can keep all the results in memory. That?s not ideal - a query with large results could fill up available memory. If you don?t keep all results in memory, then if you?re going to allow clients to access the results more than once you have to provide a way to buffer the results. >> >> >> On Mar 10, 2014, at 7:23 AM, Sanne Grinovero wrote: >> >>> Hi all, >>> we are exposing a nice feature inherited from the Search engine via >>> the "simple" DSL version, the one which is also available via Hot Rod: >>> >>> org.infinispan.query.dsl.Query.getResultSize() >>> >>> To be fair I hadn't noticed we do expose this, I just noticed after a >>> recent PR review and I found it surprising. >>> >>> This method returns the size of the full resultset, disregarding >>> pagination options; you can imagine it fit for situations like: >>> >>> "found 6 million matches, these are the top 20: " >>> >>> A peculiarity of Hibernate Search is that the total number of matches >>> is extremely cheap to figure out as it's generally a side effect of >>> finding the 20 results. Essentially we're just exposing an int value >>> which was already computed: very cheap, and happens to be useful in >>> practice. >>> >>> This is not the case with a SQL statement, in this case you'd have to >>> craft 2 different SQL statements, often incurring the cost of 2 round >>> trips to the database. So this getResultSize() is not available on the >>> Hibernate ORM Query, only on our FullTextQuery extension. >>> >>> Now my doubt is if it is indeed a wise move to expose this method on >>> the simplified DSL. Of course some people might find it useful, still >>> I'm wondering how much we'll be swearing at needing to maintain this >>> feature vs its usefulness when we'll implement alternative execution >>> engines to run queries, not least on Map/Reduce based filtering, and >>> ultimately hybrid strategies. >>> >>> In case of Map/Reduce I think we'll need to keep track of possible >>> de-duplication of results, in case of a Teiid integration it might >>> need a second expensive query; so in this case I'd expect this method >>> to be lazily evaluated. >>> >>> Should we rather remove this functionality? >>> >>> Sanne >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From emmanuel at hibernate.org Tue Mar 11 15:08:39 2014 From: emmanuel at hibernate.org (Emmanuel Bernard) Date: Tue, 11 Mar 2014 19:08:39 +0000 Subject: [infinispan-dev] Query.getResultSize() to be available on the simplified DSL? In-Reply-To: <87025135-6CAC-4211-AF94-0759F92AE415@redhat.com> References: <531F44F3.2050309@redhat.com> <87025135-6CAC-4211-AF94-0759F92AE415@redhat.com> Message-ID: <0A31A119-D0A3-4C54-8ADF-87B41018CD66@hibernate.org> I agree with Randall. I tend to be very conservative about my public APIs. And offering an API that I think will block me in the future is something I tend to avoid. Something like .guessNbrOfMatchingElements() / .guessResultSize() would provide a better clue about the gamble the user takes. Note that the size is irrespective of the pagination applied which renders this result quite cool even if approximate. I?d be tempted not to put getResultSize() with an exact value in the public contract as iterating is probably going to as ?fast?. An alternative is something like that (needs to be refined though) /** * Get the result size. * Approximate results are to be preferred as it is usually very cheap to compute. * If the computation is too expensive, the approximate accuracy returns null. * * Exact results are likely to be costly and require two queries. */ Integer getResultSize(Accuracy); enum Accuracy { EXACT, APPROXIMATE_OR_NULL } Emmanuel On 11 Mar 2014, at 18:23, Randall Hauch wrote: > I disagree. Most developers have access to the JavaDoc, and if even moderately well-written, they will find out what the method returns and when. It?s no different than a method sometimes returning null rather than an object reference. > > On Mar 11, 2014, at 12:16 PM, Dennis Reed wrote: > >> Providing methods that work sometimes and don't work other times is >> generally a bad idea. >> >> No matter how much you document it, users *will* try to use it and >> expect it to always work >> (either because they didn't read the docs that say otherwise, they think >> they'll stick to a configuration where it does work, etc.) >> >> And then when it doesn't work (because they pushed something to >> production which has a different configuration than dev, etc) >> it's a frustrating experience. >> >> -Dennis >> >> On 03/11/2014 09:37 AM, Randall Hauch wrote: >>> I?m struggling with this same question in ModeShape. The JCR API exposes a method that returns the number of results, but at least the spec allows the implementation to return -1 if the size is not known (or very expensive to compute). Yet this still does not satisfy all cases. >>> >>> Depending upon the technology, computing the **exact size** ranges from very cheap to extremely expensive to calculate. For example, consider a system that has to take into account access control limitations of the user. My current opinion is that few applications actually need an exact size, and if they do there may be alternatives (like counting as they iterate over the results). >>> >>> An alternative is to expose an **approximate size**, which is likely to be sufficient for generating display or other pre-computed information such as links or paging details. I think that this is sufficient for most needs, and that even an order of magnitude is sufficient. When the results are known to be small, the system might want to determine the exact size (e.g., by iterating). >>> >>> So one option is to expose both methods, but allow the exact size method to return -1 if the system can?t determine the size or if doing so is very expensive. This allows the system a way out for large/complex queries and flexibility in the implementation technology. The approximate size method probably always needs to return at least some usable value. >>> >>> BTW, computing an exact size by iterating can be expensive unless you can keep all the results in memory. That?s not ideal - a query with large results could fill up available memory. If you don?t keep all results in memory, then if you?re going to allow clients to access the results more than once you have to provide a way to buffer the results. >>> >>> >>> On Mar 10, 2014, at 7:23 AM, Sanne Grinovero wrote: >>> >>>> Hi all, >>>> we are exposing a nice feature inherited from the Search engine via >>>> the "simple" DSL version, the one which is also available via Hot Rod: >>>> >>>> org.infinispan.query.dsl.Query.getResultSize() >>>> >>>> To be fair I hadn't noticed we do expose this, I just noticed after a >>>> recent PR review and I found it surprising. >>>> >>>> This method returns the size of the full resultset, disregarding >>>> pagination options; you can imagine it fit for situations like: >>>> >>>> "found 6 million matches, these are the top 20: " >>>> >>>> A peculiarity of Hibernate Search is that the total number of matches >>>> is extremely cheap to figure out as it's generally a side effect of >>>> finding the 20 results. Essentially we're just exposing an int value >>>> which was already computed: very cheap, and happens to be useful in >>>> practice. >>>> >>>> This is not the case with a SQL statement, in this case you'd have to >>>> craft 2 different SQL statements, often incurring the cost of 2 round >>>> trips to the database. So this getResultSize() is not available on the >>>> Hibernate ORM Query, only on our FullTextQuery extension. >>>> >>>> Now my doubt is if it is indeed a wise move to expose this method on >>>> the simplified DSL. Of course some people might find it useful, still >>>> I'm wondering how much we'll be swearing at needing to maintain this >>>> feature vs its usefulness when we'll implement alternative execution >>>> engines to run queries, not least on Map/Reduce based filtering, and >>>> ultimately hybrid strategies. >>>> >>>> In case of Map/Reduce I think we'll need to keep track of possible >>>> de-duplication of results, in case of a Teiid integration it might >>>> need a second expensive query; so in this case I'd expect this method >>>> to be lazily evaluated. >>>> >>>> Should we rather remove this functionality? >>>> >>>> Sanne >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From sanne at infinispan.org Tue Mar 11 15:18:28 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Tue, 11 Mar 2014 19:18:28 +0000 Subject: [infinispan-dev] Query.getResultSize() to be available on the simplified DSL? In-Reply-To: <0A31A119-D0A3-4C54-8ADF-87B41018CD66@hibernate.org> References: <531F44F3.2050309@redhat.com> <87025135-6CAC-4211-AF94-0759F92AE415@redhat.com> <0A31A119-D0A3-4C54-8ADF-87B41018CD66@hibernate.org> Message-ID: what about we call it int getEstimatedResultSize() ? Having such a method occasionally return null looks very bad to me, I'd rather remove the functionality. -- Sanne On 11 March 2014 19:08, Emmanuel Bernard wrote: > I agree with Randall. > > I tend to be very conservative about my public APIs. And offering an API that I think will block me in the future is something I tend to avoid. > > Something like .guessNbrOfMatchingElements() / .guessResultSize() would provide a better clue about the gamble the user takes. Note that the size is irrespective of the pagination applied which renders this result quite cool even if approximate. > > I?d be tempted not to put getResultSize() with an exact value in the public contract as iterating is probably going to as ?fast?. > > An alternative is something like that (needs to be refined though) > > /** > * Get the result size. > * Approximate results are to be preferred as it is usually very cheap to compute. > * If the computation is too expensive, the approximate accuracy returns null. > * > * Exact results are likely to be costly and require two queries. > */ > Integer getResultSize(Accuracy); > enum Accuracy { EXACT, APPROXIMATE_OR_NULL } > > Emmanuel > > On 11 Mar 2014, at 18:23, Randall Hauch wrote: > >> I disagree. Most developers have access to the JavaDoc, and if even moderately well-written, they will find out what the method returns and when. It?s no different than a method sometimes returning null rather than an object reference. >> >> On Mar 11, 2014, at 12:16 PM, Dennis Reed wrote: >> >>> Providing methods that work sometimes and don't work other times is >>> generally a bad idea. >>> >>> No matter how much you document it, users *will* try to use it and >>> expect it to always work >>> (either because they didn't read the docs that say otherwise, they think >>> they'll stick to a configuration where it does work, etc.) >>> >>> And then when it doesn't work (because they pushed something to >>> production which has a different configuration than dev, etc) >>> it's a frustrating experience. >>> >>> -Dennis >>> >>> On 03/11/2014 09:37 AM, Randall Hauch wrote: >>>> I?m struggling with this same question in ModeShape. The JCR API exposes a method that returns the number of results, but at least the spec allows the implementation to return -1 if the size is not known (or very expensive to compute). Yet this still does not satisfy all cases. >>>> >>>> Depending upon the technology, computing the **exact size** ranges from very cheap to extremely expensive to calculate. For example, consider a system that has to take into account access control limitations of the user. My current opinion is that few applications actually need an exact size, and if they do there may be alternatives (like counting as they iterate over the results). >>>> >>>> An alternative is to expose an **approximate size**, which is likely to be sufficient for generating display or other pre-computed information such as links or paging details. I think that this is sufficient for most needs, and that even an order of magnitude is sufficient. When the results are known to be small, the system might want to determine the exact size (e.g., by iterating). >>>> >>>> So one option is to expose both methods, but allow the exact size method to return -1 if the system can?t determine the size or if doing so is very expensive. This allows the system a way out for large/complex queries and flexibility in the implementation technology. The approximate size method probably always needs to return at least some usable value. >>>> >>>> BTW, computing an exact size by iterating can be expensive unless you can keep all the results in memory. That?s not ideal - a query with large results could fill up available memory. If you don?t keep all results in memory, then if you?re going to allow clients to access the results more than once you have to provide a way to buffer the results. >>>> >>>> >>>> On Mar 10, 2014, at 7:23 AM, Sanne Grinovero wrote: >>>> >>>>> Hi all, >>>>> we are exposing a nice feature inherited from the Search engine via >>>>> the "simple" DSL version, the one which is also available via Hot Rod: >>>>> >>>>> org.infinispan.query.dsl.Query.getResultSize() >>>>> >>>>> To be fair I hadn't noticed we do expose this, I just noticed after a >>>>> recent PR review and I found it surprising. >>>>> >>>>> This method returns the size of the full resultset, disregarding >>>>> pagination options; you can imagine it fit for situations like: >>>>> >>>>> "found 6 million matches, these are the top 20: " >>>>> >>>>> A peculiarity of Hibernate Search is that the total number of matches >>>>> is extremely cheap to figure out as it's generally a side effect of >>>>> finding the 20 results. Essentially we're just exposing an int value >>>>> which was already computed: very cheap, and happens to be useful in >>>>> practice. >>>>> >>>>> This is not the case with a SQL statement, in this case you'd have to >>>>> craft 2 different SQL statements, often incurring the cost of 2 round >>>>> trips to the database. So this getResultSize() is not available on the >>>>> Hibernate ORM Query, only on our FullTextQuery extension. >>>>> >>>>> Now my doubt is if it is indeed a wise move to expose this method on >>>>> the simplified DSL. Of course some people might find it useful, still >>>>> I'm wondering how much we'll be swearing at needing to maintain this >>>>> feature vs its usefulness when we'll implement alternative execution >>>>> engines to run queries, not least on Map/Reduce based filtering, and >>>>> ultimately hybrid strategies. >>>>> >>>>> In case of Map/Reduce I think we'll need to keep track of possible >>>>> de-duplication of results, in case of a Teiid integration it might >>>>> need a second expensive query; so in this case I'd expect this method >>>>> to be lazily evaluated. >>>>> >>>>> Should we rather remove this functionality? >>>>> >>>>> Sanne >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev at lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From emmanuel at hibernate.org Tue Mar 11 15:21:24 2014 From: emmanuel at hibernate.org (Emmanuel Bernard) Date: Tue, 11 Mar 2014 19:21:24 +0000 Subject: [infinispan-dev] Query.getResultSize() to be available on the simplified DSL? In-Reply-To: References: <531F44F3.2050309@redhat.com> <87025135-6CAC-4211-AF94-0759F92AE415@redhat.com> <0A31A119-D0A3-4C54-8ADF-87B41018CD66@hibernate.org> Message-ID: <2AA17AF5-2C1A-45EA-AD9C-1FF3C3D9CE05@hibernate.org> It does not work, I think, because if you implement your query via some map reduce and you do pagination, it will be costly to compute the size and you might want not to return it. Hence my Accuracy idea to clarify the intend to the API user. On 11 Mar 2014, at 19:18, Sanne Grinovero wrote: > what about we call it > > int getEstimatedResultSize() ? > > Having such a method occasionally return null looks very bad to me, > I'd rather remove the functionality. > > -- Sanne > > On 11 March 2014 19:08, Emmanuel Bernard wrote: >> I agree with Randall. >> >> I tend to be very conservative about my public APIs. And offering an API that I think will block me in the future is something I tend to avoid. >> >> Something like .guessNbrOfMatchingElements() / .guessResultSize() would provide a better clue about the gamble the user takes. Note that the size is irrespective of the pagination applied which renders this result quite cool even if approximate. >> >> I?d be tempted not to put getResultSize() with an exact value in the public contract as iterating is probably going to as ?fast?. >> >> An alternative is something like that (needs to be refined though) >> >> /** >> * Get the result size. >> * Approximate results are to be preferred as it is usually very cheap to compute. >> * If the computation is too expensive, the approximate accuracy returns null. >> * >> * Exact results are likely to be costly and require two queries. >> */ >> Integer getResultSize(Accuracy); >> enum Accuracy { EXACT, APPROXIMATE_OR_NULL } >> >> Emmanuel >> >> On 11 Mar 2014, at 18:23, Randall Hauch wrote: >> >>> I disagree. Most developers have access to the JavaDoc, and if even moderately well-written, they will find out what the method returns and when. It?s no different than a method sometimes returning null rather than an object reference. >>> >>> On Mar 11, 2014, at 12:16 PM, Dennis Reed wrote: >>> >>>> Providing methods that work sometimes and don't work other times is >>>> generally a bad idea. >>>> >>>> No matter how much you document it, users *will* try to use it and >>>> expect it to always work >>>> (either because they didn't read the docs that say otherwise, they think >>>> they'll stick to a configuration where it does work, etc.) >>>> >>>> And then when it doesn't work (because they pushed something to >>>> production which has a different configuration than dev, etc) >>>> it's a frustrating experience. >>>> >>>> -Dennis >>>> >>>> On 03/11/2014 09:37 AM, Randall Hauch wrote: >>>>> I?m struggling with this same question in ModeShape. The JCR API exposes a method that returns the number of results, but at least the spec allows the implementation to return -1 if the size is not known (or very expensive to compute). Yet this still does not satisfy all cases. >>>>> >>>>> Depending upon the technology, computing the **exact size** ranges from very cheap to extremely expensive to calculate. For example, consider a system that has to take into account access control limitations of the user. My current opinion is that few applications actually need an exact size, and if they do there may be alternatives (like counting as they iterate over the results). >>>>> >>>>> An alternative is to expose an **approximate size**, which is likely to be sufficient for generating display or other pre-computed information such as links or paging details. I think that this is sufficient for most needs, and that even an order of magnitude is sufficient. When the results are known to be small, the system might want to determine the exact size (e.g., by iterating). >>>>> >>>>> So one option is to expose both methods, but allow the exact size method to return -1 if the system can?t determine the size or if doing so is very expensive. This allows the system a way out for large/complex queries and flexibility in the implementation technology. The approximate size method probably always needs to return at least some usable value. >>>>> >>>>> BTW, computing an exact size by iterating can be expensive unless you can keep all the results in memory. That?s not ideal - a query with large results could fill up available memory. If you don?t keep all results in memory, then if you?re going to allow clients to access the results more than once you have to provide a way to buffer the results. >>>>> >>>>> >>>>> On Mar 10, 2014, at 7:23 AM, Sanne Grinovero wrote: >>>>> >>>>>> Hi all, >>>>>> we are exposing a nice feature inherited from the Search engine via >>>>>> the "simple" DSL version, the one which is also available via Hot Rod: >>>>>> >>>>>> org.infinispan.query.dsl.Query.getResultSize() >>>>>> >>>>>> To be fair I hadn't noticed we do expose this, I just noticed after a >>>>>> recent PR review and I found it surprising. >>>>>> >>>>>> This method returns the size of the full resultset, disregarding >>>>>> pagination options; you can imagine it fit for situations like: >>>>>> >>>>>> "found 6 million matches, these are the top 20: " >>>>>> >>>>>> A peculiarity of Hibernate Search is that the total number of matches >>>>>> is extremely cheap to figure out as it's generally a side effect of >>>>>> finding the 20 results. Essentially we're just exposing an int value >>>>>> which was already computed: very cheap, and happens to be useful in >>>>>> practice. >>>>>> >>>>>> This is not the case with a SQL statement, in this case you'd have to >>>>>> craft 2 different SQL statements, often incurring the cost of 2 round >>>>>> trips to the database. So this getResultSize() is not available on the >>>>>> Hibernate ORM Query, only on our FullTextQuery extension. >>>>>> >>>>>> Now my doubt is if it is indeed a wise move to expose this method on >>>>>> the simplified DSL. Of course some people might find it useful, still >>>>>> I'm wondering how much we'll be swearing at needing to maintain this >>>>>> feature vs its usefulness when we'll implement alternative execution >>>>>> engines to run queries, not least on Map/Reduce based filtering, and >>>>>> ultimately hybrid strategies. >>>>>> >>>>>> In case of Map/Reduce I think we'll need to keep track of possible >>>>>> de-duplication of results, in case of a Teiid integration it might >>>>>> need a second expensive query; so in this case I'd expect this method >>>>>> to be lazily evaluated. >>>>>> >>>>>> Should we rather remove this functionality? >>>>>> >>>>>> Sanne >>>>>> _______________________________________________ >>>>>> infinispan-dev mailing list >>>>>> infinispan-dev at lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>> >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev at lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From dan.berindei at gmail.com Tue Mar 11 16:20:14 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Tue, 11 Mar 2014 22:20:14 +0200 Subject: [infinispan-dev] Update on the testsuite state In-Reply-To: References: Message-ID: I got the OOM in the hotrod client testsuite as well today, and I have a fix: https://github.com/infinispan/infinispan/pull/2433 On Fri, Mar 7, 2014 at 6:35 PM, Sanne Grinovero wrote: > today core passed, but: the Infinispan Hot Rod Client seems to fail > because of an OutOfMemory Java heap space. > > On 6 March 2014 18:08, Sanne Grinovero wrote: > > Results : > > > > Failed tests: > > > NotifyingFutureTest.testExceptionOtherThread2:51->testExceptionOtherThread:68->testException:151 > > expected [true] but found [false] > > > VersionedDistStateTransferTest.testStateTransfer:96->MultipleCacheManagersTest.waitForClusterToForm:232->MultipleCacheManagersTest.waitForClusterToForm:225 > > ? IllegalState > > > > Tests run: 4233, Failures: 2, Errors: 0, Skipped: 0 > > > > [INFO] > ------------------------------------------------------------------------ > > [INFO] Reactor Summary: > > [INFO] > > [INFO] Infinispan BOM .................................... SUCCESS > [0.135s] > > [INFO] Infinispan Common Parent .......................... SUCCESS > [1.700s] > > [INFO] Infinispan Checkstyle Rules ....................... SUCCESS > [2.395s] > > [INFO] Infinispan Commons ................................ SUCCESS > [5.411s] > > [INFO] Infinispan Core ................................... FAILURE > [9:51.344s] > > > > > > Pretty good, but no jackpot yet! > > I'll try again next week? > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140311/a10ec2a0/attachment-0001.html From rhauch at redhat.com Tue Mar 11 17:59:04 2014 From: rhauch at redhat.com (Randall Hauch) Date: Tue, 11 Mar 2014 16:59:04 -0500 Subject: [infinispan-dev] Query.getResultSize() to be available on the simplified DSL? In-Reply-To: <2AA17AF5-2C1A-45EA-AD9C-1FF3C3D9CE05@hibernate.org> References: <531F44F3.2050309@redhat.com> <87025135-6CAC-4211-AF94-0759F92AE415@redhat.com> <0A31A119-D0A3-4C54-8ADF-87B41018CD66@hibernate.org> <2AA17AF5-2C1A-45EA-AD9C-1FF3C3D9CE05@hibernate.org> Message-ID: <758930A0-F409-4C8D-8AEE-C50E038729AC@redhat.com> Maybe a Long rather than an Integer? Ints are so last year. :-) And, what about using a primitive that returns -1 when the method cannot determine the size (if allowed by the parameter). Just as easy to check -1 than it is to check null, IMO. On Mar 11, 2014, at 2:21 PM, Emmanuel Bernard wrote: > It does not work, I think, because if you implement your query via some map reduce and you do pagination, it will be costly to compute the size and you might want not to return it. > Hence my Accuracy idea to clarify the intend to the API user. > > On 11 Mar 2014, at 19:18, Sanne Grinovero wrote: > >> what about we call it >> >> int getEstimatedResultSize() ? >> >> Having such a method occasionally return null looks very bad to me, >> I'd rather remove the functionality. >> >> -- Sanne >> >> On 11 March 2014 19:08, Emmanuel Bernard wrote: >>> I agree with Randall. >>> >>> I tend to be very conservative about my public APIs. And offering an API that I think will block me in the future is something I tend to avoid. >>> >>> Something like .guessNbrOfMatchingElements() / .guessResultSize() would provide a better clue about the gamble the user takes. Note that the size is irrespective of the pagination applied which renders this result quite cool even if approximate. >>> >>> I?d be tempted not to put getResultSize() with an exact value in the public contract as iterating is probably going to as ?fast?. >>> >>> An alternative is something like that (needs to be refined though) >>> >>> /** >>> * Get the result size. >>> * Approximate results are to be preferred as it is usually very cheap to compute. >>> * If the computation is too expensive, the approximate accuracy returns null. >>> * >>> * Exact results are likely to be costly and require two queries. >>> */ >>> Integer getResultSize(Accuracy); >>> enum Accuracy { EXACT, APPROXIMATE_OR_NULL } >>> >>> Emmanuel >>> >>> On 11 Mar 2014, at 18:23, Randall Hauch wrote: >>> >>>> I disagree. Most developers have access to the JavaDoc, and if even moderately well-written, they will find out what the method returns and when. It?s no different than a method sometimes returning null rather than an object reference. >>>> >>>> On Mar 11, 2014, at 12:16 PM, Dennis Reed wrote: >>>> >>>>> Providing methods that work sometimes and don't work other times is >>>>> generally a bad idea. >>>>> >>>>> No matter how much you document it, users *will* try to use it and >>>>> expect it to always work >>>>> (either because they didn't read the docs that say otherwise, they think >>>>> they'll stick to a configuration where it does work, etc.) >>>>> >>>>> And then when it doesn't work (because they pushed something to >>>>> production which has a different configuration than dev, etc) >>>>> it's a frustrating experience. >>>>> >>>>> -Dennis >>>>> >>>>> On 03/11/2014 09:37 AM, Randall Hauch wrote: >>>>>> I?m struggling with this same question in ModeShape. The JCR API exposes a method that returns the number of results, but at least the spec allows the implementation to return -1 if the size is not known (or very expensive to compute). Yet this still does not satisfy all cases. >>>>>> >>>>>> Depending upon the technology, computing the **exact size** ranges from very cheap to extremely expensive to calculate. For example, consider a system that has to take into account access control limitations of the user. My current opinion is that few applications actually need an exact size, and if they do there may be alternatives (like counting as they iterate over the results). >>>>>> >>>>>> An alternative is to expose an **approximate size**, which is likely to be sufficient for generating display or other pre-computed information such as links or paging details. I think that this is sufficient for most needs, and that even an order of magnitude is sufficient. When the results are known to be small, the system might want to determine the exact size (e.g., by iterating). >>>>>> >>>>>> So one option is to expose both methods, but allow the exact size method to return -1 if the system can?t determine the size or if doing so is very expensive. This allows the system a way out for large/complex queries and flexibility in the implementation technology. The approximate size method probably always needs to return at least some usable value. >>>>>> >>>>>> BTW, computing an exact size by iterating can be expensive unless you can keep all the results in memory. That?s not ideal - a query with large results could fill up available memory. If you don?t keep all results in memory, then if you?re going to allow clients to access the results more than once you have to provide a way to buffer the results. >>>>>> >>>>>> >>>>>> On Mar 10, 2014, at 7:23 AM, Sanne Grinovero wrote: >>>>>> >>>>>>> Hi all, >>>>>>> we are exposing a nice feature inherited from the Search engine via >>>>>>> the "simple" DSL version, the one which is also available via Hot Rod: >>>>>>> >>>>>>> org.infinispan.query.dsl.Query.getResultSize() >>>>>>> >>>>>>> To be fair I hadn't noticed we do expose this, I just noticed after a >>>>>>> recent PR review and I found it surprising. >>>>>>> >>>>>>> This method returns the size of the full resultset, disregarding >>>>>>> pagination options; you can imagine it fit for situations like: >>>>>>> >>>>>>> "found 6 million matches, these are the top 20: " >>>>>>> >>>>>>> A peculiarity of Hibernate Search is that the total number of matches >>>>>>> is extremely cheap to figure out as it's generally a side effect of >>>>>>> finding the 20 results. Essentially we're just exposing an int value >>>>>>> which was already computed: very cheap, and happens to be useful in >>>>>>> practice. >>>>>>> >>>>>>> This is not the case with a SQL statement, in this case you'd have to >>>>>>> craft 2 different SQL statements, often incurring the cost of 2 round >>>>>>> trips to the database. So this getResultSize() is not available on the >>>>>>> Hibernate ORM Query, only on our FullTextQuery extension. >>>>>>> >>>>>>> Now my doubt is if it is indeed a wise move to expose this method on >>>>>>> the simplified DSL. Of course some people might find it useful, still >>>>>>> I'm wondering how much we'll be swearing at needing to maintain this >>>>>>> feature vs its usefulness when we'll implement alternative execution >>>>>>> engines to run queries, not least on Map/Reduce based filtering, and >>>>>>> ultimately hybrid strategies. >>>>>>> >>>>>>> In case of Map/Reduce I think we'll need to keep track of possible >>>>>>> de-duplication of results, in case of a Teiid integration it might >>>>>>> need a second expensive query; so in this case I'd expect this method >>>>>>> to be lazily evaluated. >>>>>>> >>>>>>> Should we rather remove this functionality? >>>>>>> >>>>>>> Sanne >>>>>>> _______________________________________________ >>>>>>> infinispan-dev mailing list >>>>>>> infinispan-dev at lists.jboss.org >>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>> >>>>>> _______________________________________________ >>>>>> infinispan-dev mailing list >>>>>> infinispan-dev at lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>> >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev at lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From sanne at infinispan.org Tue Mar 11 20:30:47 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Wed, 12 Mar 2014 00:30:47 +0000 Subject: [infinispan-dev] Update on the testsuite state In-Reply-To: References: Message-ID: On 11 March 2014 20:20, Dan Berindei wrote: > I got the OOM in the hotrod client testsuite as well today, and I have a > fix: https://github.com/infinispan/infinispan/pull/2433 Thanks Dan! integrated that. From emmanuel at hibernate.org Wed Mar 12 03:43:08 2014 From: emmanuel at hibernate.org (Emmanuel Bernard) Date: Wed, 12 Mar 2014 07:43:08 +0000 Subject: [infinispan-dev] Query.getResultSize() to be available on the simplified DSL? In-Reply-To: <758930A0-F409-4C8D-8AEE-C50E038729AC@redhat.com> References: <531F44F3.2050309@redhat.com> <87025135-6CAC-4211-AF94-0759F92AE415@redhat.com> <0A31A119-D0A3-4C54-8ADF-87B41018CD66@hibernate.org> <2AA17AF5-2C1A-45EA-AD9C-1FF3C3D9CE05@hibernate.org> <758930A0-F409-4C8D-8AEE-C50E038729AC@redhat.com> Message-ID: I find checking null a bit cleaner. -1 is sooooo C :) but that's relatively minor and we can go either way. > On 11 mars 2014, at 21:59, Randall Hauch wrote: > > Maybe a Long rather than an Integer? Ints are so last year. :-) > > And, what about using a primitive that returns -1 when the method cannot determine the size (if allowed by the parameter). Just as easy to check -1 than it is to check null, IMO. > >> On Mar 11, 2014, at 2:21 PM, Emmanuel Bernard wrote: >> >> It does not work, I think, because if you implement your query via some map reduce and you do pagination, it will be costly to compute the size and you might want not to return it. >> Hence my Accuracy idea to clarify the intend to the API user. >> >>> On 11 Mar 2014, at 19:18, Sanne Grinovero wrote: >>> >>> what about we call it >>> >>> int getEstimatedResultSize() ? >>> >>> Having such a method occasionally return null looks very bad to me, >>> I'd rather remove the functionality. >>> >>> -- Sanne >>> >>>> On 11 March 2014 19:08, Emmanuel Bernard wrote: >>>> I agree with Randall. >>>> >>>> I tend to be very conservative about my public APIs. And offering an API that I think will block me in the future is something I tend to avoid. >>>> >>>> Something like .guessNbrOfMatchingElements() / .guessResultSize() would provide a better clue about the gamble the user takes. Note that the size is irrespective of the pagination applied which renders this result quite cool even if approximate. >>>> >>>> I?d be tempted not to put getResultSize() with an exact value in the public contract as iterating is probably going to as ?fast?. >>>> >>>> An alternative is something like that (needs to be refined though) >>>> >>>> /** >>>> * Get the result size. >>>> * Approximate results are to be preferred as it is usually very cheap to compute. >>>> * If the computation is too expensive, the approximate accuracy returns null. >>>> * >>>> * Exact results are likely to be costly and require two queries. >>>> */ >>>> Integer getResultSize(Accuracy); >>>> enum Accuracy { EXACT, APPROXIMATE_OR_NULL } >>>> >>>> Emmanuel >>>> >>>>> On 11 Mar 2014, at 18:23, Randall Hauch wrote: >>>>> >>>>> I disagree. Most developers have access to the JavaDoc, and if even moderately well-written, they will find out what the method returns and when. It?s no different than a method sometimes returning null rather than an object reference. >>>>> >>>>>> On Mar 11, 2014, at 12:16 PM, Dennis Reed wrote: >>>>>> >>>>>> Providing methods that work sometimes and don't work other times is >>>>>> generally a bad idea. >>>>>> >>>>>> No matter how much you document it, users *will* try to use it and >>>>>> expect it to always work >>>>>> (either because they didn't read the docs that say otherwise, they think >>>>>> they'll stick to a configuration where it does work, etc.) >>>>>> >>>>>> And then when it doesn't work (because they pushed something to >>>>>> production which has a different configuration than dev, etc) >>>>>> it's a frustrating experience. >>>>>> >>>>>> -Dennis >>>>>> >>>>>>> On 03/11/2014 09:37 AM, Randall Hauch wrote: >>>>>>> I?m struggling with this same question in ModeShape. The JCR API exposes a method that returns the number of results, but at least the spec allows the implementation to return -1 if the size is not known (or very expensive to compute). Yet this still does not satisfy all cases. >>>>>>> >>>>>>> Depending upon the technology, computing the **exact size** ranges from very cheap to extremely expensive to calculate. For example, consider a system that has to take into account access control limitations of the user. My current opinion is that few applications actually need an exact size, and if they do there may be alternatives (like counting as they iterate over the results). >>>>>>> >>>>>>> An alternative is to expose an **approximate size**, which is likely to be sufficient for generating display or other pre-computed information such as links or paging details. I think that this is sufficient for most needs, and that even an order of magnitude is sufficient. When the results are known to be small, the system might want to determine the exact size (e.g., by iterating). >>>>>>> >>>>>>> So one option is to expose both methods, but allow the exact size method to return -1 if the system can?t determine the size or if doing so is very expensive. This allows the system a way out for large/complex queries and flexibility in the implementation technology. The approximate size method probably always needs to return at least some usable value. >>>>>>> >>>>>>> BTW, computing an exact size by iterating can be expensive unless you can keep all the results in memory. That?s not ideal - a query with large results could fill up available memory. If you don?t keep all results in memory, then if you?re going to allow clients to access the results more than once you have to provide a way to buffer the results. >>>>>>> >>>>>>> >>>>>>>> On Mar 10, 2014, at 7:23 AM, Sanne Grinovero wrote: >>>>>>>> >>>>>>>> Hi all, >>>>>>>> we are exposing a nice feature inherited from the Search engine via >>>>>>>> the "simple" DSL version, the one which is also available via Hot Rod: >>>>>>>> >>>>>>>> org.infinispan.query.dsl.Query.getResultSize() >>>>>>>> >>>>>>>> To be fair I hadn't noticed we do expose this, I just noticed after a >>>>>>>> recent PR review and I found it surprising. >>>>>>>> >>>>>>>> This method returns the size of the full resultset, disregarding >>>>>>>> pagination options; you can imagine it fit for situations like: >>>>>>>> >>>>>>>> "found 6 million matches, these are the top 20: " >>>>>>>> >>>>>>>> A peculiarity of Hibernate Search is that the total number of matches >>>>>>>> is extremely cheap to figure out as it's generally a side effect of >>>>>>>> finding the 20 results. Essentially we're just exposing an int value >>>>>>>> which was already computed: very cheap, and happens to be useful in >>>>>>>> practice. >>>>>>>> >>>>>>>> This is not the case with a SQL statement, in this case you'd have to >>>>>>>> craft 2 different SQL statements, often incurring the cost of 2 round >>>>>>>> trips to the database. So this getResultSize() is not available on the >>>>>>>> Hibernate ORM Query, only on our FullTextQuery extension. >>>>>>>> >>>>>>>> Now my doubt is if it is indeed a wise move to expose this method on >>>>>>>> the simplified DSL. Of course some people might find it useful, still >>>>>>>> I'm wondering how much we'll be swearing at needing to maintain this >>>>>>>> feature vs its usefulness when we'll implement alternative execution >>>>>>>> engines to run queries, not least on Map/Reduce based filtering, and >>>>>>>> ultimately hybrid strategies. >>>>>>>> >>>>>>>> In case of Map/Reduce I think we'll need to keep track of possible >>>>>>>> de-duplication of results, in case of a Teiid integration it might >>>>>>>> need a second expensive query; so in this case I'd expect this method >>>>>>>> to be lazily evaluated. >>>>>>>> >>>>>>>> Should we rather remove this functionality? >>>>>>>> >>>>>>>> Sanne >>>>>>>> _______________________________________________ >>>>>>>> infinispan-dev mailing list >>>>>>>> infinispan-dev at lists.jboss.org >>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>>> >>>>>>> _______________________________________________ >>>>>>> infinispan-dev mailing list >>>>>>> infinispan-dev at lists.jboss.org >>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>> >>>>>> _______________________________________________ >>>>>> infinispan-dev mailing list >>>>>> infinispan-dev at lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>> >>>>> >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev at lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From sanne at infinispan.org Wed Mar 12 05:27:36 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Wed, 12 Mar 2014 09:27:36 +0000 Subject: [infinispan-dev] Never push with --force Message-ID: Yesterday I pushed a fix from Dan upstream, and this morning the fix wasn't there anymore. Some unrelated fix was merged in the meantime. I only realized this because I was updating my personal origin and git wouldn't allow me to push the non-fast-forward branch, so in a sense I could detect it because of how our workflow works (good). I have no idea of how it happened, but I guess it won't hurt to remind that we should never push with --force, at least not without warning the whole list. I now cherry-picked and fixed master by re-pushing the missing patch, so nothing bad happening :-) Sanne From emmanuel at hibernate.org Wed Mar 12 06:05:11 2014 From: emmanuel at hibernate.org (Emmanuel Bernard) Date: Wed, 12 Mar 2014 10:05:11 +0000 Subject: [infinispan-dev] Never push with --force In-Reply-To: References: Message-ID: Remove push rights for 2 weeks to the next culprit and put his name on http://infinispan.org/hallofshame/ ? :) On 12 Mar 2014, at 9:27, Sanne Grinovero wrote: > Yesterday I pushed a fix from Dan upstream, and this morning the fix > wasn't there anymore. Some unrelated fix was merged in the meantime. > > I only realized this because I was updating my personal origin and git > wouldn't allow me to push the non-fast-forward branch, so in a sense I > could detect it because of how our workflow works (good). > > I have no idea of how it happened, but I guess it won't hurt to remind > that we should never push with --force, at least not without warning > the whole list. > > I now cherry-picked and fixed master by re-pushing the missing patch, > so nothing bad happening :-) > > Sanne > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From an1310 at hotmail.com Wed Mar 12 09:28:37 2014 From: an1310 at hotmail.com (Erik Salter) Date: Wed, 12 Mar 2014 09:28:37 -0400 Subject: [infinispan-dev] Never push with --force In-Reply-To: References: Message-ID: wget -q -O- http://infinispan.org/hallofshame | grep -c "Erik Salter" count: 0 Whew! -----Original Message----- From: infinispan-dev-bounces at lists.jboss.org [mailto:infinispan-dev-bounces at lists.jboss.org] On Behalf Of Emmanuel Bernard Sent: Wednesday, March 12, 2014 6:05 AM To: infinispan -Dev List Subject: Re: [infinispan-dev] Never push with --force Remove push rights for 2 weeks to the next culprit and put his name on http://infinispan.org/hallofshame/ ? :) On 12 Mar 2014, at 9:27, Sanne Grinovero wrote: > Yesterday I pushed a fix from Dan upstream, and this morning the fix > wasn't there anymore. Some unrelated fix was merged in the meantime. > > I only realized this because I was updating my personal origin and git > wouldn't allow me to push the non-fast-forward branch, so in a sense I > could detect it because of how our workflow works (good). > > I have no idea of how it happened, but I guess it won't hurt to remind > that we should never push with --force, at least not without warning > the whole list. > > I now cherry-picked and fixed master by re-pushing the missing patch, > so nothing bad happening :-) > > Sanne > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev at lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev From anistor at redhat.com Wed Mar 12 12:01:09 2014 From: anistor at redhat.com (Adrian Nistor) Date: Wed, 12 Mar 2014 18:01:09 +0200 Subject: [infinispan-dev] Never push with --force In-Reply-To: References: Message-ID: <532084C5.2030908@redhat.com> Sanne, is it possible that you forgot to push his changes upstream when closing his PR? This is what the github news feed of infinispan repo tells me: 1. Dan opened the PR #2433 yesterday (about 20 hrs ago) 2. you closed his PR after a few hours (15 hrs ago) and commented 'Merged', but I cannot see a 'pushed to master' entry in newsfeed around this time 3. You pushed upstream this morning (7 hrs ago) I would never expect --force being used on the upstream repo. Whoever does it should ask permission first and have a good reason, not just warn the list about it. On 03/12/2014 11:27 AM, Sanne Grinovero wrote: > Yesterday I pushed a fix from Dan upstream, and this morning the fix > wasn't there anymore. Some unrelated fix was merged in the meantime. > > I only realized this because I was updating my personal origin and git > wouldn't allow me to push the non-fast-forward branch, so in a sense I > could detect it because of how our workflow works (good). > > I have no idea of how it happened, but I guess it won't hurt to remind > that we should never push with --force, at least not without warning > the whole list. > > I now cherry-picked and fixed master by re-pushing the missing patch, > so nothing bad happening :-) > > Sanne > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From galder at redhat.com Wed Mar 12 12:58:15 2014 From: galder at redhat.com (=?iso-8859-1?Q?Galder_Zamarre=F1o?=) Date: Wed, 12 Mar 2014 17:58:15 +0100 Subject: [infinispan-dev] Infinispan HotRod C# Client 7.0.0.Alpha1 In-Reply-To: <5317167F.50204@redhat.com> References: <5317167F.50204@redhat.com> Message-ID: <5FB60B99-BE97-4882-9FF4-FD093F928A61@redhat.com> Nice work Ion :) On 05 Mar 2014, at 13:20, Ion Savin wrote: > Hi all, > > Infinispan HotRod C# Client 7.0.0.Alpha1 is now available. > > This new version is a C# wrapper over the native client and brings > support for L2 and L3 client intelligence levels in addition to L1. As > more features are added to the native client they will make their way > into the C# client as well. > > You can find the the .msi installer on the download page [1] and the > source code on GitHub [2]. > > Please give it a try and let us know what you think. > > [1] http://infinispan.org/hotrod-clients/ > [2] https://github.com/infinispan/dotnet-client > > Regards, > Ion Savin > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From sanne at infinispan.org Wed Mar 12 13:00:44 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Wed, 12 Mar 2014 17:00:44 +0000 Subject: [infinispan-dev] Never push with --force In-Reply-To: <532084C5.2030908@redhat.com> References: <532084C5.2030908@redhat.com> Message-ID: On 12 March 2014 16:01, Adrian Nistor wrote: > Sanne, is it possible that you forgot to push his changes upstream when > closing his PR? Maybe that's what happened. I use a script though which pushes to both so I think it's unlikely, and my own repo is second in order, it's supposed to fail fast. > This is what the github news feed of infinispan repo tells me: > 1. Dan opened the PR #2433 yesterday (about 20 hrs ago) > 2. you closed his PR after a few hours (15 hrs ago) and commented > 'Merged', but I cannot see a 'pushed to master' entry in newsfeed around > this time > 3. You pushed upstream this morning (7 hrs ago) > > I would never expect --force being used on the upstream repo. Whoever > does it should ask permission first and have a good reason, not just > warn the list about it. +1 Yes I'm not sure what happened, just sharing it as a reminder. > > On 03/12/2014 11:27 AM, Sanne Grinovero wrote: >> Yesterday I pushed a fix from Dan upstream, and this morning the fix >> wasn't there anymore. Some unrelated fix was merged in the meantime. >> >> I only realized this because I was updating my personal origin and git >> wouldn't allow me to push the non-fast-forward branch, so in a sense I >> could detect it because of how our workflow works (good). >> >> I have no idea of how it happened, but I guess it won't hurt to remind >> that we should never push with --force, at least not without warning >> the whole list. >> >> I now cherry-picked and fixed master by re-pushing the missing patch, >> so nothing bad happening :-) >> >> Sanne >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From galder at redhat.com Wed Mar 12 13:37:29 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Wed, 12 Mar 2014 18:37:29 +0100 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> <10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org> Message-ID: On 04 Mar 2014, at 19:02, Emmanuel Bernard wrote: >> >>> >>> To anecdotally answer your specific example, yes different configs for different entities is an interesting benefit but it has to outweigh the drawbacks. >> >> Using a single cache for all the types is practical at all :-) Just to expand my idea, people prefer using different caches for many reasons: >> - security: Account cache has a different security requirements than the News cache >> - data consistency: News is a non-transactional cache, Account require pessimistic XA transactions >> - expiry: expire last year's news from the system. Not the same for Accounts >> - availability: I want the Accounts cache to be backed up to another site. I don't want that for the News cache >> - logical data grouping: mixing Accounts with News doesn't make sense. I might want to know which account appeared in the news, though. > > This kind of reasons reminds me in the RDBMS world of why people use different databases. > In fact, I have had experience where literally News was a different database than Accounts. > > But again in this model, in one database, you have many tables. > >> >>> If you have to do a map reduce for tasks so simple as age > 18, I think you system better have to be prepared to run gazillions of M/R jobs. >> >> I want to run a simple M/R job in the evening to determine who turns 18 tomorrow, to congratulate them. Once a day, not gazzilions of times, and I don't need to index the age filed just for that. Also when it comes to Map/Reduce, the drawback of holding all the data in a single cache is two-folded: >> - performance: you iterate over the data that is not related to your query. @Mircea: when we talked about mixing up data in a cache, we talked that you?d get a view of the cache, say for a particular type, and iterators, map/reduce functions?etc, would only iterate over those. Hence, you?d avoid iterating over stuff not relevant to you. > > If the data are never related (query wise), then we are in the database split category. Which is fine. But if some of your queries are related, what do you do? Deny the user the ability to do them? > >> - programming model: the Map/Reduce implementation has a dependency on both Dog and Person. If I add Cats to the cache, I'll need to update the M/R code to be aware of that as well. Same if I rename/remove Dog. Not nice. > > Well it?s called type safety, some people find it good ;) > By the way, OGM does abstract a class from it?s representation in the datastore (including its name). But that?s another story ;) > >> >>> I think that Dogs and any domestic animal is fundamentally related to humans - Person in your case. So queries involving both will be required - a cross cache M/R is not doable today AFAIK and even if it was, it?s still M/R and all its drawbacks. >>> To me, the Cache API and Hot Rod are well suited for what I call self contained object graph (i.e. where Dog would be an embedded object of Person and not a separate Entity). In that situation, there is a single cache. >> >> I see where you come from but I don't think requiring people to use a single cache for all the entities is an option. Besides a natural logical separation, different data has different storage requirements: security, access patterns, consistency, durability, availability etc. For most of the non-trivial use cases, using a single cache just wont do. > > Let me rephrase and sum up my position. > If you are storing unrelated data, use different caches if you want, that?s fine. > If you are storing related data, store it as one root entity and embeddable objects (ie one cache entry for the whole graph) > you can have one root entity per cache, that?s fine. > If you are storing related entities and want to do queries on it: you are more or less screwed today with Infinispan and need a higher level abstraction. > > So _recommending_ one entity = one cache to me is wrong. ^ +100 > It?s more one entity graph = one cache which is vastly different and has deep consequences (see my wiki page). +1 - it opens up a lot of interesting possibilities, and with with cache views you could drill down to subsets of the cache. Cheers, -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From galder at redhat.com Wed Mar 12 13:45:32 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Wed, 12 Mar 2014 18:45:32 +0100 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <1A465602-8624-4018-A317-5A9D74CD82AD@redhat.com> References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> <10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org> <1A465602-8624-4018-A317-5A9D74CD82AD@redhat.com> Message-ID: On 05 Mar 2014, at 18:16, Mircea Markus wrote: > Sanne came with a good follow up to this email, just some small clarifications: > > On Mar 4, 2014, at 6:02 PM, Emmanuel Bernard wrote: > >>>> If you have to do a map reduce for tasks so simple as age > 18, I think you system better have to be prepared to run gazillions of M/R jobs. >>> >>> I want to run a simple M/R job in the evening to determine who turns 18 tomorrow, to congratulate them. Once a day, not gazzilions of times, and I don't need to index the age filed just for that. Also when it comes to Map/Reduce, the drawback of holding all the data in a single cache is two-folded: >>> - performance: you iterate over the data that is not related to your query. >> >> If the data are never related (query wise), then we are in the database split category. Which is fine. But if some of your queries are related, what do you do? Deny the user the ability to do them? > > Here's where cross-site query would have been used. As Sanne suggested (next post) these limitations overcome the advantages. > >> >>> - programming model: the Map/Reduce implementation has a dependency on both Dog and Person. If I add Cats to the cache, I'll need to update the M/R code to be aware of that as well. Same if I rename/remove Dog. Not nice. >> >> Well it?s called type safety, some people find it good ;) > > If anything, this model reduces type safety and reusability. E.g. say you want a M/R task to see how many Persons speak French. With the single cache model(both Dog and Person int he Cache) it would look something like: > > a) > //pseudocode > map (String k, Mammal value) { > if (value instanceof Person)) { //this is the ugly part > if (((Person)value).speaks("French")) ... > } else { > //ignore it, it's an Dog > } > } > > Same thing written for a Cache: > > b) > map (String k, Person value) { > if (value.speaks("French")) ... > } > > I don't think people would prefer writing a) instead of b) ;) It?s a pity that having discussed in Mallorca and having referenced a fair few times already, you could not think of an option based on what Paul suggsted in https://issues.jboss.org/browse/ISPN-3640 Here?s my attempt: Cache cache = ?; ValueFilter filter = ? // filter would check those values that are persons... Cache view = cache.filter(filter); view.map(String k, Person value) { ... } Paul referred to per key type filters, but maybe per value type filters, such as the one I did here, might be useful. Cheers, > > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From galder at redhat.com Wed Mar 12 13:47:43 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Wed, 12 Mar 2014 18:47:43 +0100 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> <10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org> <1A465602-8624-4018-A317-5A9D74CD82AD@redhat.com> Message-ID: <425EA77B-2C00-4ABD-A84E-6D515E57D238@redhat.com> Just saw Sanne?s follow up reply, it?s pretty much the same I suggest. Cheers, On 12 Mar 2014, at 18:45, Galder Zamarre?o wrote: > > On 05 Mar 2014, at 18:16, Mircea Markus wrote: > >> Sanne came with a good follow up to this email, just some small clarifications: >> >> On Mar 4, 2014, at 6:02 PM, Emmanuel Bernard wrote: >> >>>>> If you have to do a map reduce for tasks so simple as age > 18, I think you system better have to be prepared to run gazillions of M/R jobs. >>>> >>>> I want to run a simple M/R job in the evening to determine who turns 18 tomorrow, to congratulate them. Once a day, not gazzilions of times, and I don't need to index the age filed just for that. Also when it comes to Map/Reduce, the drawback of holding all the data in a single cache is two-folded: >>>> - performance: you iterate over the data that is not related to your query. >>> >>> If the data are never related (query wise), then we are in the database split category. Which is fine. But if some of your queries are related, what do you do? Deny the user the ability to do them? >> >> Here's where cross-site query would have been used. As Sanne suggested (next post) these limitations overcome the advantages. >> >>> >>>> - programming model: the Map/Reduce implementation has a dependency on both Dog and Person. If I add Cats to the cache, I'll need to update the M/R code to be aware of that as well. Same if I rename/remove Dog. Not nice. >>> >>> Well it?s called type safety, some people find it good ;) >> >> If anything, this model reduces type safety and reusability. E.g. say you want a M/R task to see how many Persons speak French. With the single cache model(both Dog and Person int he Cache) it would look something like: >> >> a) >> //pseudocode >> map (String k, Mammal value) { >> if (value instanceof Person)) { //this is the ugly part >> if (((Person)value).speaks("French")) ... >> } else { >> //ignore it, it's an Dog >> } >> } >> >> Same thing written for a Cache: >> >> b) >> map (String k, Person value) { >> if (value.speaks("French")) ... >> } >> >> I don't think people would prefer writing a) instead of b) ;) > > It?s a pity that having discussed in Mallorca and having referenced a fair few times already, you could not think of an option based on what Paul suggsted in https://issues.jboss.org/browse/ISPN-3640 > > Here?s my attempt: > > Cache cache = ?; > ValueFilter filter = ? // filter would check those values that are persons... > Cache view = cache.filter(filter); > view.map(String k, Person value) { > ... > } > > Paul referred to per key type filters, but maybe per value type filters, such as the one I did here, might be useful. > > Cheers, > > >> >> >> Cheers, >> -- >> Mircea Markus >> Infinispan lead (www.infinispan.org) >> >> >> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz > > Project Lead, Escalante > http://escalante.io > > Engineer, Infinispan > http://infinispan.org > -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From paul.ferraro at redhat.com Wed Mar 12 15:12:23 2014 From: paul.ferraro at redhat.com (Paul Ferraro) Date: Wed, 12 Mar 2014 15:12:23 -0400 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> <10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org> <1A465602-8624-4018-A317-5A9D74CD82AD@redhat.com> Message-ID: <1394651543.7427.22.camel@T520> On Wed, 2014-03-12 at 18:45 +0100, Galder Zamarre?o wrote: > On 05 Mar 2014, at 18:16, Mircea Markus wrote: > > > Sanne came with a good follow up to this email, just some small clarifications: > > > > On Mar 4, 2014, at 6:02 PM, Emmanuel Bernard wrote: > > > >>>> If you have to do a map reduce for tasks so simple as age > 18, I think you system better have to be prepared to run gazillions of M/R jobs. > >>> > >>> I want to run a simple M/R job in the evening to determine who turns 18 tomorrow, to congratulate them. Once a day, not gazzilions of times, and I don't need to index the age filed just for that. Also when it comes to Map/Reduce, the drawback of holding all the data in a single cache is two-folded: > >>> - performance: you iterate over the data that is not related to your query. > >> > >> If the data are never related (query wise), then we are in the database split category. Which is fine. But if some of your queries are related, what do you do? Deny the user the ability to do them? > > > > Here's where cross-site query would have been used. As Sanne suggested (next post) these limitations overcome the advantages. > > > >> > >>> - programming model: the Map/Reduce implementation has a dependency on both Dog and Person. If I add Cats to the cache, I'll need to update the M/R code to be aware of that as well. Same if I rename/remove Dog. Not nice. > >> > >> Well it?s called type safety, some people find it good ;) > > > > If anything, this model reduces type safety and reusability. E.g. say you want a M/R task to see how many Persons speak French. With the single cache model(both Dog and Person int he Cache) it would look something like: > > > > a) > > //pseudocode > > map (String k, Mammal value) { > > if (value instanceof Person)) { //this is the ugly part > > if (((Person)value).speaks("French")) ... > > } else { > > //ignore it, it's an Dog > > } > > } > > > > Same thing written for a Cache: > > > > b) > > map (String k, Person value) { > > if (value.speaks("French")) ... > > } > > > > I don't think people would prefer writing a) instead of b) ;) > > It?s a pity that having discussed in Mallorca and having referenced a fair few times already, you could not think of an option based on what Paul suggsted in https://issues.jboss.org/browse/ISPN-3640 > > Here?s my attempt: > > Cache cache = ?; > ValueFilter filter = ? // filter would check those values that are persons... > Cache view = cache.filter(filter); > view.map(String k, Person value) { > ... > } > > Paul referred to per key type filters, but maybe per value type filters, such as the one I did here, might be useful. We could also generalize KeyFilter to something like: public interface Filter { boolean accepts(Object key, Object value); } Infinispan could ship some simple default implementations that filter based on a specific key type, value type, or both. e.g. Cache cache = ...; Cache view = cache.filter(new KeyValueTypeFilter(String.class, Integer.class)); view.addListener(...); // Listens only to events accepted by filter for (Map.Entry entry: view.entrySet()) { // No casting necessary! } where KeyValueTypeFilter looks like: public class KeyValueTypeFilter implements Filter { private Class keyType; private Class valueType; public KeyValueTypeFilter(Class keyType, Class valueType) { this.keyType = keyType; this.valueType = valueType; } @Override public boolean accepts(Object key, Object value) { return this.keyType.isInstance(key) && this.valueType.isInstance(value); } } > Cheers, > > > > > > > > Cheers, > > -- > > Mircea Markus > > Infinispan lead (www.infinispan.org) > > > > > > > > > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz > > Project Lead, Escalante > http://escalante.io > > Engineer, Infinispan > http://infinispan.org > From mudokonman at gmail.com Wed Mar 12 15:44:18 2014 From: mudokonman at gmail.com (William Burns) Date: Wed, 12 Mar 2014 15:44:18 -0400 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <1394651543.7427.22.camel@T520> References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> <10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org> <1A465602-8624-4018-A317-5A9D74CD82AD@redhat.com> <1394651543.7427.22.camel@T520> Message-ID: On Wed, Mar 12, 2014 at 3:12 PM, Paul Ferraro wrote: > On Wed, 2014-03-12 at 18:45 +0100, Galder Zamarre?o wrote: >> On 05 Mar 2014, at 18:16, Mircea Markus wrote: >> >> > Sanne came with a good follow up to this email, just some small clarifications: >> > >> > On Mar 4, 2014, at 6:02 PM, Emmanuel Bernard wrote: >> > >> >>>> If you have to do a map reduce for tasks so simple as age > 18, I think you system better have to be prepared to run gazillions of M/R jobs. >> >>> >> >>> I want to run a simple M/R job in the evening to determine who turns 18 tomorrow, to congratulate them. Once a day, not gazzilions of times, and I don't need to index the age filed just for that. Also when it comes to Map/Reduce, the drawback of holding all the data in a single cache is two-folded: >> >>> - performance: you iterate over the data that is not related to your query. >> >> >> >> If the data are never related (query wise), then we are in the database split category. Which is fine. But if some of your queries are related, what do you do? Deny the user the ability to do them? >> > >> > Here's where cross-site query would have been used. As Sanne suggested (next post) these limitations overcome the advantages. >> > >> >> >> >>> - programming model: the Map/Reduce implementation has a dependency on both Dog and Person. If I add Cats to the cache, I'll need to update the M/R code to be aware of that as well. Same if I rename/remove Dog. Not nice. >> >> >> >> Well it's called type safety, some people find it good ;) >> > >> > If anything, this model reduces type safety and reusability. E.g. say you want a M/R task to see how many Persons speak French. With the single cache model(both Dog and Person int he Cache) it would look something like: >> > >> > a) >> > //pseudocode >> > map (String k, Mammal value) { >> > if (value instanceof Person)) { //this is the ugly part >> > if (((Person)value).speaks("French")) ... >> > } else { >> > //ignore it, it's an Dog >> > } >> > } >> > >> > Same thing written for a Cache: >> > >> > b) >> > map (String k, Person value) { >> > if (value.speaks("French")) ... >> > } >> > >> > I don't think people would prefer writing a) instead of b) ;) >> >> It's a pity that having discussed in Mallorca and having referenced a fair few times already, you could not think of an option based on what Paul suggsted in https://issues.jboss.org/browse/ISPN-3640 >> >> Here's my attempt: >> >> Cache cache = ...; >> ValueFilter filter = ... // filter would check those values that are persons... >> Cache view = cache.filter(filter); >> view.map(String k, Person value) { >> ... >> } >> >> Paul referred to per key type filters, but maybe per value type filters, such as the one I did here, might be useful. > > We could also generalize KeyFilter to something like: > > public interface Filter { > boolean accepts(Object key, Object value); > } > > Infinispan could ship some simple default implementations that filter > based on a specific key type, value type, or both. > > e.g. > > Cache cache = ...; > Cache view = cache.filter(new KeyValueTypeFilter(String.class, Integer.class)); > view.addListener(...); // Listens only to events accepted by filter > for (Map.Entry entry: view.entrySet()) { > // No casting necessary! > } > > where KeyValueTypeFilter looks like: > > public class KeyValueTypeFilter implements Filter { > private Class keyType; > private Class valueType; > public KeyValueTypeFilter(Class keyType, Class valueType) { > this.keyType = keyType; > this.valueType = valueType; > } > > @Override > public boolean accepts(Object key, Object value) { > return this.keyType.isInstance(key) && this.valueType.isInstance(value); > } > } > I am all for a feature like this, however I want to caution making this so flexible. Lets say you have 2 views of the same cache and . Now lets say you do a put("foo", mircea) in the first and the second does put("foo", dog1). Will these puts collide? How are they stored? Do we need a Map for each Key in the DataContainer? What would happen if I use the non view Cache get("foo")? Would that return both values, neither, the last one written? What would happen if I use the non view Cache put("foo", mircea)? I assume that also updates the view but not the ? Maybe it would be simpler to have these additional views read only? >> Cheers, >> >> >> > >> > >> > Cheers, >> > -- >> > Mircea Markus >> > Infinispan lead (www.infinispan.org) >> > >> > >> > >> > >> > >> > _______________________________________________ >> > infinispan-dev mailing list >> > infinispan-dev at lists.jboss.org >> > https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> -- >> Galder Zamarre?o >> galder at redhat.com >> twitter.com/galderz >> >> Project Lead, Escalante >> http://escalante.io >> >> Engineer, Infinispan >> http://infinispan.org >> > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From mmarkus at redhat.com Thu Mar 13 05:48:04 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Thu, 13 Mar 2014 09:48:04 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> <10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org> <1A465602-8624-4018-A317-5A9D74CD82AD@redhat.com> Message-ID: <8EEB7263-528A-4F4E-ADCB-7C350C2C23A7@redhat.com> On Mar 12, 2014, at 17:45, Galder Zamarre?o wrote: > > On 05 Mar 2014, at 18:16, Mircea Markus wrote: > >> If anything, this model reduces type safety and reusability. E.g. say you want a M/R task to see how many Persons speak French. With the single cache model(both Dog and Person int he Cache) it would look something like: >> >> a) >> //pseudocode >> map (String k, Mammal value) { >> if (value instanceof Person)) { //this is the ugly part >> if (((Person)value).speaks("French")) ... >> } else { >> //ignore it, it's an Dog >> } >> } >> >> Same thing written for a Cache: >> >> b) >> map (String k, Person value) { >> if (value.speaks("French")) ... >> } >> >> I don't think people would prefer writing a) instead of b) ;) > > It?s a pity that having discussed in Mallorca and having referenced a fair few times already, you could not think of an option based on what Paul suggsted in https://issues.jboss.org/browse/ISPN-3640 Not sure what you want to imply with that, it was an oversight on my side :\ Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Thu Mar 13 06:03:28 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Thu, 13 Mar 2014 10:03:28 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> <10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org> Message-ID: On Mar 12, 2014, at 17:37, Galder Zamarre?o wrote: > > On 04 Mar 2014, at 19:02, Emmanuel Bernard wrote: > >>> >>>> >>>> To anecdotally answer your specific example, yes different configs for different entities is an interesting benefit but it has to outweigh the drawbacks. >>> >>> Using a single cache for all the types is practical at all :-) Just to expand my idea, people prefer using different caches for many reasons: >>> - security: Account cache has a different security requirements than the News cache >>> - data consistency: News is a non-transactional cache, Account require pessimistic XA transactions >>> - expiry: expire last year's news from the system. Not the same for Accounts >>> - availability: I want the Accounts cache to be backed up to another site. I don't want that for the News cache >>> - logical data grouping: mixing Accounts with News doesn't make sense. I might want to know which account appeared in the news, though. >> >> This kind of reasons reminds me in the RDBMS world of why people use different databases. >> In fact, I have had experience where literally News was a different database than Accounts. >> >> But again in this model, in one database, you have many tables. >> >>> >>>> If you have to do a map reduce for tasks so simple as age > 18, I think you system better have to be prepared to run gazillions of M/R jobs. >>> >>> I want to run a simple M/R job in the evening to determine who turns 18 tomorrow, to congratulate them. Once a day, not gazzilions of times, and I don't need to index the age filed just for that. Also when it comes to Map/Reduce, the drawback of holding all the data in a single cache is two-folded: >>> - performance: you iterate over the data that is not related to your query. > > @Mircea: when we talked about mixing up data in a cache, we talked that you?d get a view of the cache, say for a particular type, and iterators, map/reduce functions?etc, would only iterate over those. Hence, you?d avoid iterating over stuff not relevant to you. It depends how you implement the view cache: if it just a filtering decorator (no state) around the actual cache then the the M/R will still iterate over all the entries and ignore them. If the view has state (the filtered keys only), the map reduce iteration would work indeed faster. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mudokonman at gmail.com Thu Mar 13 08:07:26 2014 From: mudokonman at gmail.com (William Burns) Date: Thu, 13 Mar 2014 08:07:26 -0400 Subject: [infinispan-dev] Proposed ISPN 7 compilation incompatibilities with ISPN 6 Message-ID: Recently while working on some ISPN 7 features, there were some public API inconsistencies. I wanted to bring these up just in case if someone had concerns. The first few are pretty trivial, but can cause compilation errors between versions if user code implements these interfaces and defines types. 1. The CacheWriter interface currently defines a delete(K key) method. To be more inline with JCache and java.util.collections interfaces I was hoping to change this to be delete(Object key) instead. 2. The CacheLoader interface currently defines load(K key) and contains(K key) methods. Similar to above I was hoping to change the K type to be Object to be more inline with JCache and java.util.collections interfaces. This last one is a bit more major, but currently we have 2 classes that are named KeyFilter. One that resides in the org.infinispan.notifications package and another that resides in the org.infinispan.persistence.spi.AdvancedCacheLoader interface. 3. My plan is instead to consolidate these classes into 1 into a new core org.infinispan.filter package. I would also move the new KeyValueFilter class that was added for cluster listeners into this package and their accompanying implementations. The first 2 is currently implemented as changes in https://github.com/infinispan/infinispan/pull/2423. The latter I was going to add into changes for https://issues.jboss.org/browse/ISPN-4068. Let me know what you guys think. - Will From pedro at infinispan.org Thu Mar 13 08:31:58 2014 From: pedro at infinispan.org (Pedro Ruivo) Date: Thu, 13 Mar 2014 12:31:58 +0000 Subject: [infinispan-dev] Proposed ISPN 7 compilation incompatibilities with ISPN 6 In-Reply-To: References: Message-ID: <5321A53E.3040402@infinispan.org> Hi, #1 and #2 are ok to me but, IMO, the filter package should be in commons module Cheers, Pedro On 03/13/2014 12:07 PM, William Burns wrote: > Recently while working on some ISPN 7 features, there were some public > API inconsistencies. I wanted to bring these up just in case if > someone had concerns. > > The first few are pretty trivial, but can cause compilation errors > between versions if user code implements these interfaces and defines > types. > > 1. The CacheWriter interface currently defines a delete(K key) method. > To be more inline with JCache and java.util.collections interfaces I > was hoping to change this to be delete(Object key) instead. > 2. The CacheLoader interface currently defines load(K key) and > contains(K key) methods. Similar to above I was hoping to change the > K type to be Object to be more inline with JCache and > java.util.collections interfaces. > > This last one is a bit more major, but currently we have 2 classes > that are named KeyFilter. One that resides in the > org.infinispan.notifications package and another that resides in the > org.infinispan.persistence.spi.AdvancedCacheLoader interface. > > 3. My plan is instead to consolidate these classes into 1 into a new > core org.infinispan.filter package. I would also move the new > KeyValueFilter class that was added for cluster listeners into this > package and their accompanying implementations. > > The first 2 is currently implemented as changes in > https://github.com/infinispan/infinispan/pull/2423. The latter I was > going to add into changes for > https://issues.jboss.org/browse/ISPN-4068. > > Let me know what you guys think. > > - Will > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > From mudokonman at gmail.com Thu Mar 13 08:35:15 2014 From: mudokonman at gmail.com (William Burns) Date: Thu, 13 Mar 2014 08:35:15 -0400 Subject: [infinispan-dev] Proposed ISPN 7 compilation incompatibilities with ISPN 6 In-Reply-To: <5321A53E.3040402@infinispan.org> References: <5321A53E.3040402@infinispan.org> Message-ID: On Thu, Mar 13, 2014 at 8:31 AM, Pedro Ruivo wrote: > Hi, > > #1 and #2 are ok to me but, IMO, the filter package should be in commons > module Sorry I forgot to detail why I said core. I originally planned for commons package as well, however the KeyValueFilter class needs the Metadata class, which doesn't live in the commons package. I didn't want to separate the 2 filter classes. And unfortunately the Metadata class relies on other classes in core, so that isn't easy to move over either, but doable :( WDYT? > > Cheers, > Pedro > > On 03/13/2014 12:07 PM, William Burns wrote: >> Recently while working on some ISPN 7 features, there were some public >> API inconsistencies. I wanted to bring these up just in case if >> someone had concerns. >> >> The first few are pretty trivial, but can cause compilation errors >> between versions if user code implements these interfaces and defines >> types. >> >> 1. The CacheWriter interface currently defines a delete(K key) method. >> To be more inline with JCache and java.util.collections interfaces I >> was hoping to change this to be delete(Object key) instead. >> 2. The CacheLoader interface currently defines load(K key) and >> contains(K key) methods. Similar to above I was hoping to change the >> K type to be Object to be more inline with JCache and >> java.util.collections interfaces. >> >> This last one is a bit more major, but currently we have 2 classes >> that are named KeyFilter. One that resides in the >> org.infinispan.notifications package and another that resides in the >> org.infinispan.persistence.spi.AdvancedCacheLoader interface. >> >> 3. My plan is instead to consolidate these classes into 1 into a new >> core org.infinispan.filter package. I would also move the new >> KeyValueFilter class that was added for cluster listeners into this >> package and their accompanying implementations. >> >> The first 2 is currently implemented as changes in >> https://github.com/infinispan/infinispan/pull/2423. The latter I was >> going to add into changes for >> https://issues.jboss.org/browse/ISPN-4068. >> >> Let me know what you guys think. >> >> - Will >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From pedro at infinispan.org Thu Mar 13 08:37:53 2014 From: pedro at infinispan.org (Pedro Ruivo) Date: Thu, 13 Mar 2014 12:37:53 +0000 Subject: [infinispan-dev] Proposed ISPN 7 compilation incompatibilities with ISPN 6 In-Reply-To: References: <5321A53E.3040402@infinispan.org> Message-ID: <5321A6A1.2000707@infinispan.org> On 03/13/2014 12:35 PM, William Burns wrote: > On Thu, Mar 13, 2014 at 8:31 AM, Pedro Ruivo wrote: >> Hi, >> >> #1 and #2 are ok to me but, IMO, the filter package should be in commons >> module > > Sorry I forgot to detail why I said core. I originally planned for > commons package as well, however the KeyValueFilter class needs the > Metadata class, which doesn't live in the commons package. I didn't > want to separate the 2 filter classes. And unfortunately the Metadata > class relies on other classes in core, so that isn't easy to move over > either, but doable :( WDYT? can you explain why the metadata is needed? I assumed that the key and the value were the only objects needed. > >> >> Cheers, >> Pedro >> >> On 03/13/2014 12:07 PM, William Burns wrote: >>> Recently while working on some ISPN 7 features, there were some public >>> API inconsistencies. I wanted to bring these up just in case if >>> someone had concerns. >>> >>> The first few are pretty trivial, but can cause compilation errors >>> between versions if user code implements these interfaces and defines >>> types. >>> >>> 1. The CacheWriter interface currently defines a delete(K key) method. >>> To be more inline with JCache and java.util.collections interfaces I >>> was hoping to change this to be delete(Object key) instead. >>> 2. The CacheLoader interface currently defines load(K key) and >>> contains(K key) methods. Similar to above I was hoping to change the >>> K type to be Object to be more inline with JCache and >>> java.util.collections interfaces. >>> >>> This last one is a bit more major, but currently we have 2 classes >>> that are named KeyFilter. One that resides in the >>> org.infinispan.notifications package and another that resides in the >>> org.infinispan.persistence.spi.AdvancedCacheLoader interface. >>> >>> 3. My plan is instead to consolidate these classes into 1 into a new >>> core org.infinispan.filter package. I would also move the new >>> KeyValueFilter class that was added for cluster listeners into this >>> package and their accompanying implementations. >>> >>> The first 2 is currently implemented as changes in >>> https://github.com/infinispan/infinispan/pull/2423. The latter I was >>> going to add into changes for >>> https://issues.jboss.org/browse/ISPN-4068. >>> >>> Let me know what you guys think. >>> >>> - Will >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > From mudokonman at gmail.com Thu Mar 13 08:45:12 2014 From: mudokonman at gmail.com (William Burns) Date: Thu, 13 Mar 2014 08:45:12 -0400 Subject: [infinispan-dev] Proposed ISPN 7 compilation incompatibilities with ISPN 6 In-Reply-To: <5321A6A1.2000707@infinispan.org> References: <5321A53E.3040402@infinispan.org> <5321A6A1.2000707@infinispan.org> Message-ID: On Thu, Mar 13, 2014 at 8:37 AM, Pedro Ruivo wrote: > > > On 03/13/2014 12:35 PM, William Burns wrote: >> On Thu, Mar 13, 2014 at 8:31 AM, Pedro Ruivo wrote: >>> Hi, >>> >>> #1 and #2 are ok to me but, IMO, the filter package should be in commons >>> module >> >> Sorry I forgot to detail why I said core. I originally planned for >> commons package as well, however the KeyValueFilter class needs the >> Metadata class, which doesn't live in the commons package. I didn't >> want to separate the 2 filter classes. And unfortunately the Metadata >> class relies on other classes in core, so that isn't easy to move over >> either, but doable :( WDYT? > > can you explain why the metadata is needed? I assumed that the key and > the value were the only objects needed. That is how the design doc was written up :P My guess is so that people if needed can filter out versioned entries or to possibly do some eviction magic since they can try to calculate when the entry would be removed. Maybe Mircea can shed some additional light. > >> >>> >>> Cheers, >>> Pedro >>> >>> On 03/13/2014 12:07 PM, William Burns wrote: >>>> Recently while working on some ISPN 7 features, there were some public >>>> API inconsistencies. I wanted to bring these up just in case if >>>> someone had concerns. >>>> >>>> The first few are pretty trivial, but can cause compilation errors >>>> between versions if user code implements these interfaces and defines >>>> types. >>>> >>>> 1. The CacheWriter interface currently defines a delete(K key) method. >>>> To be more inline with JCache and java.util.collections interfaces I >>>> was hoping to change this to be delete(Object key) instead. >>>> 2. The CacheLoader interface currently defines load(K key) and >>>> contains(K key) methods. Similar to above I was hoping to change the >>>> K type to be Object to be more inline with JCache and >>>> java.util.collections interfaces. >>>> >>>> This last one is a bit more major, but currently we have 2 classes >>>> that are named KeyFilter. One that resides in the >>>> org.infinispan.notifications package and another that resides in the >>>> org.infinispan.persistence.spi.AdvancedCacheLoader interface. >>>> >>>> 3. My plan is instead to consolidate these classes into 1 into a new >>>> core org.infinispan.filter package. I would also move the new >>>> KeyValueFilter class that was added for cluster listeners into this >>>> package and their accompanying implementations. >>>> >>>> The first 2 is currently implemented as changes in >>>> https://github.com/infinispan/infinispan/pull/2423. The latter I was >>>> going to add into changes for >>>> https://issues.jboss.org/browse/ISPN-4068. >>>> >>>> Let me know what you guys think. >>>> >>>> - Will >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From sanne at infinispan.org Thu Mar 13 10:25:53 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Thu, 13 Mar 2014 14:25:53 +0000 Subject: [infinispan-dev] Proposed ISPN 7 compilation incompatibilities with ISPN 6 In-Reply-To: References: <5321A53E.3040402@infinispan.org> <5321A6A1.2000707@infinispan.org> Message-ID: Is there a migration guide? This should be documented in some wiki page similar to - https://community.jboss.org/wiki/HibernateSearchMigrationGuide On 13 March 2014 12:45, William Burns wrote: > On Thu, Mar 13, 2014 at 8:37 AM, Pedro Ruivo wrote: >> >> >> On 03/13/2014 12:35 PM, William Burns wrote: >>> On Thu, Mar 13, 2014 at 8:31 AM, Pedro Ruivo wrote: >>>> Hi, >>>> >>>> #1 and #2 are ok to me but, IMO, the filter package should be in commons >>>> module >>> >>> Sorry I forgot to detail why I said core. I originally planned for >>> commons package as well, however the KeyValueFilter class needs the >>> Metadata class, which doesn't live in the commons package. I didn't >>> want to separate the 2 filter classes. And unfortunately the Metadata >>> class relies on other classes in core, so that isn't easy to move over >>> either, but doable :( WDYT? >> >> can you explain why the metadata is needed? I assumed that the key and >> the value were the only objects needed. > > That is how the design doc was written up :P My guess is so that > people if needed can filter out versioned entries or to possibly do > some eviction magic since they can try to calculate when the entry > would be removed. Maybe Mircea can shed some additional light. > >> >>> >>>> >>>> Cheers, >>>> Pedro >>>> >>>> On 03/13/2014 12:07 PM, William Burns wrote: >>>>> Recently while working on some ISPN 7 features, there were some public >>>>> API inconsistencies. I wanted to bring these up just in case if >>>>> someone had concerns. >>>>> >>>>> The first few are pretty trivial, but can cause compilation errors >>>>> between versions if user code implements these interfaces and defines >>>>> types. >>>>> >>>>> 1. The CacheWriter interface currently defines a delete(K key) method. >>>>> To be more inline with JCache and java.util.collections interfaces I >>>>> was hoping to change this to be delete(Object key) instead. >>>>> 2. The CacheLoader interface currently defines load(K key) and >>>>> contains(K key) methods. Similar to above I was hoping to change the >>>>> K type to be Object to be more inline with JCache and >>>>> java.util.collections interfaces. >>>>> >>>>> This last one is a bit more major, but currently we have 2 classes >>>>> that are named KeyFilter. One that resides in the >>>>> org.infinispan.notifications package and another that resides in the >>>>> org.infinispan.persistence.spi.AdvancedCacheLoader interface. >>>>> >>>>> 3. My plan is instead to consolidate these classes into 1 into a new >>>>> core org.infinispan.filter package. I would also move the new >>>>> KeyValueFilter class that was added for cluster listeners into this >>>>> package and their accompanying implementations. >>>>> >>>>> The first 2 is currently implemented as changes in >>>>> https://github.com/infinispan/infinispan/pull/2423. The latter I was >>>>> going to add into changes for >>>>> https://issues.jboss.org/browse/ISPN-4068. >>>>> >>>>> Let me know what you guys think. >>>>> >>>>> - Will >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev at lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From mmarkus at redhat.com Thu Mar 13 10:45:09 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Thu, 13 Mar 2014 14:45:09 +0000 Subject: [infinispan-dev] Proposed ISPN 7 compilation incompatibilities with ISPN 6 In-Reply-To: References: <5321A53E.3040402@infinispan.org> <5321A6A1.2000707@infinispan.org> Message-ID: <81095E25-46A6-474E-9F1F-9A52DAF05C81@redhat.com> On Mar 13, 2014, at 12:45, William Burns wrote: > On Thu, Mar 13, 2014 at 8:37 AM, Pedro Ruivo wrote: >> >> >> On 03/13/2014 12:35 PM, William Burns wrote: >>> On Thu, Mar 13, 2014 at 8:31 AM, Pedro Ruivo wrote: >>>> Hi, >>>> >>>> #1 and #2 are ok to me but, IMO, the filter package should be in commons >>>> module >>> >>> Sorry I forgot to detail why I said core. I originally planned for >>> commons package as well, however the KeyValueFilter class needs the >>> Metadata class, which doesn't live in the commons package. I didn't >>> want to separate the 2 filter classes. And unfortunately the Metadata >>> class relies on other classes in core, so that isn't easy to move over >>> either, but doable :( WDYT? >> >> can you explain why the metadata is needed? I assumed that the key and >> the value were the only objects needed. > > That is how the design doc was written up :P My guess is so that > people if needed can filter out versioned entries or to possibly do > some eviction magic since they can try to calculate when the entry > would be removed. Maybe Mircea can shed some additional light. org.infinispan.metadata.Metadata was added in order to group all the information that needs to be associated with a cache entry: timestamp, expiry, idleTime, version but also custom external data that ISPN extension might want to associate with it: e.g. REST server associates MIME information: https://github.com/mmarkus/infinispan/blob/master/server/rest/src/main/scala/org/infinispan/rest/MimeMetadata.scala#L20-20. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From vblagoje at redhat.com Thu Mar 13 11:14:23 2014 From: vblagoje at redhat.com (Vladimir Blagojevic) Date: Thu, 13 Mar 2014 11:14:23 -0400 Subject: [infinispan-dev] Proposed ISPN 7 compilation incompatibilities with ISPN 6 In-Reply-To: References: <5321A53E.3040402@infinispan.org> Message-ID: <5321CB4F.80800@redhat.com> Mircea and I wanted to "promote" AdvancedCacheLoader.KeyFilter or merge it with one filter we all choose to use because we have so many of these internal filter classes it is ridiculous. The filter you wrote is THE one? Vladimir On 3/13/2014, 8:35 AM, William Burns wrote: > On Thu, Mar 13, 2014 at 8:31 AM, Pedro Ruivo wrote: >> Hi, >> >> #1 and #2 are ok to me but, IMO, the filter package should be in commons >> module > Sorry I forgot to detail why I said core. I originally planned for > commons package as well, however the KeyValueFilter class needs the > Metadata class, which doesn't live in the commons package. I didn't > want to separate the 2 filter classes. And unfortunately the Metadata > class relies on other classes in core, so that isn't easy to move over > either, but doable :( WDYT? > > From mudokonman at gmail.com Thu Mar 13 11:39:15 2014 From: mudokonman at gmail.com (William Burns) Date: Thu, 13 Mar 2014 11:39:15 -0400 Subject: [infinispan-dev] Proposed ISPN 7 compilation incompatibilities with ISPN 6 In-Reply-To: <5321CB4F.80800@redhat.com> References: <5321A53E.3040402@infinispan.org> <5321CB4F.80800@redhat.com> Message-ID: On Thu, Mar 13, 2014 at 11:14 AM, Vladimir Blagojevic wrote: > Mircea and I wanted to "promote" AdvancedCacheLoader.KeyFilter or merge > it with one filter we all choose to use because we have so many of these > internal filter classes it is ridiculous. The filter you wrote is THE one? I can't say that for sure. Basically the KeyFilter that exists in the notifications package was what I was going to use as it is more generic since the method is called accept, which I thought was better than shouldLoadKey. Are we thinking we would get rid of the KeyFilter and only use KeyValueFilter though ? - Will > > Vladimir > On 3/13/2014, 8:35 AM, William Burns wrote: >> On Thu, Mar 13, 2014 at 8:31 AM, Pedro Ruivo wrote: >>> Hi, >>> >>> #1 and #2 are ok to me but, IMO, the filter package should be in commons >>> module >> Sorry I forgot to detail why I said core. I originally planned for >> commons package as well, however the KeyValueFilter class needs the >> Metadata class, which doesn't live in the commons package. I didn't >> want to separate the 2 filter classes. And unfortunately the Metadata >> class relies on other classes in core, so that isn't easy to move over >> either, but doable :( WDYT? >> >> > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From vblagoje at redhat.com Thu Mar 13 11:54:58 2014 From: vblagoje at redhat.com (Vladimir Blagojevic) Date: Thu, 13 Mar 2014 11:54:58 -0400 Subject: [infinispan-dev] Proposed ISPN 7 compilation incompatibilities with ISPN 6 In-Reply-To: References: <5321A53E.3040402@infinispan.org> <5321CB4F.80800@redhat.com> Message-ID: <5321D4D2.2030003@redhat.com> On 3/13/2014, 11:39 AM, William Burns wrote: > On Thu, Mar 13, 2014 at 11:14 AM, Vladimir Blagojevic > wrote: >> Mircea and I wanted to "promote" AdvancedCacheLoader.KeyFilter or merge >> it with one filter we all choose to use because we have so many of these >> internal filter classes it is ridiculous. The filter you wrote is THE one? > I can't say that for sure. Basically the KeyFilter that exists in the > notifications package was what I was going to use as it is more > generic since the method is called accept, which I thought was better > than shouldLoadKey. > > Are we thinking we would get rid of the KeyFilter and only use > KeyValueFilter though ? > > - Will > Not sure Will. I would move both KeyFilter and KeyValueFilter in some major package so we can all use it throughout the codebase! Guys? Vladimir From mudokonman at gmail.com Thu Mar 13 11:58:51 2014 From: mudokonman at gmail.com (William Burns) Date: Thu, 13 Mar 2014 11:58:51 -0400 Subject: [infinispan-dev] Proposed ISPN 7 compilation incompatibilities with ISPN 6 In-Reply-To: <5321D4D2.2030003@redhat.com> References: <5321A53E.3040402@infinispan.org> <5321CB4F.80800@redhat.com> <5321D4D2.2030003@redhat.com> Message-ID: On Thu, Mar 13, 2014 at 11:54 AM, Vladimir Blagojevic wrote: > On 3/13/2014, 11:39 AM, William Burns wrote: >> On Thu, Mar 13, 2014 at 11:14 AM, Vladimir Blagojevic >> wrote: >>> Mircea and I wanted to "promote" AdvancedCacheLoader.KeyFilter or merge >>> it with one filter we all choose to use because we have so many of these >>> internal filter classes it is ridiculous. The filter you wrote is THE one? >> I can't say that for sure. Basically the KeyFilter that exists in the >> notifications package was what I was going to use as it is more >> generic since the method is called accept, which I thought was better >> than shouldLoadKey. >> >> Are we thinking we would get rid of the KeyFilter and only use >> KeyValueFilter though ? >> >> - Will >> > Not sure Will. I would move both KeyFilter and KeyValueFilter in some > major package so we can all use it throughout the codebase! That is exactly what I was hoping to do with the new org.infinispan.filter package in core module. > > Guys? > > Vladimir > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From mudokonman at gmail.com Thu Mar 13 12:37:36 2014 From: mudokonman at gmail.com (William Burns) Date: Thu, 13 Mar 2014 12:37:36 -0400 Subject: [infinispan-dev] Proposed ISPN 7 compilation incompatibilities with ISPN 6 In-Reply-To: References: <5321A53E.3040402@infinispan.org> <5321A6A1.2000707@infinispan.org> Message-ID: On Thu, Mar 13, 2014 at 10:25 AM, Sanne Grinovero wrote: > Is there a migration guide? > This should be documented in some wiki page similar to > - https://community.jboss.org/wiki/HibernateSearchMigrationGuide I have created https://issues.jboss.org/browse/ISPN-4109 to address this after the issues are merged in to update the Infinispan upgrade guide. Thanks for bringing this up. > > On 13 March 2014 12:45, William Burns wrote: >> On Thu, Mar 13, 2014 at 8:37 AM, Pedro Ruivo wrote: >>> >>> >>> On 03/13/2014 12:35 PM, William Burns wrote: >>>> On Thu, Mar 13, 2014 at 8:31 AM, Pedro Ruivo wrote: >>>>> Hi, >>>>> >>>>> #1 and #2 are ok to me but, IMO, the filter package should be in commons >>>>> module >>>> >>>> Sorry I forgot to detail why I said core. I originally planned for >>>> commons package as well, however the KeyValueFilter class needs the >>>> Metadata class, which doesn't live in the commons package. I didn't >>>> want to separate the 2 filter classes. And unfortunately the Metadata >>>> class relies on other classes in core, so that isn't easy to move over >>>> either, but doable :( WDYT? >>> >>> can you explain why the metadata is needed? I assumed that the key and >>> the value were the only objects needed. >> >> That is how the design doc was written up :P My guess is so that >> people if needed can filter out versioned entries or to possibly do >> some eviction magic since they can try to calculate when the entry >> would be removed. Maybe Mircea can shed some additional light. >> >>> >>>> >>>>> >>>>> Cheers, >>>>> Pedro >>>>> >>>>> On 03/13/2014 12:07 PM, William Burns wrote: >>>>>> Recently while working on some ISPN 7 features, there were some public >>>>>> API inconsistencies. I wanted to bring these up just in case if >>>>>> someone had concerns. >>>>>> >>>>>> The first few are pretty trivial, but can cause compilation errors >>>>>> between versions if user code implements these interfaces and defines >>>>>> types. >>>>>> >>>>>> 1. The CacheWriter interface currently defines a delete(K key) method. >>>>>> To be more inline with JCache and java.util.collections interfaces I >>>>>> was hoping to change this to be delete(Object key) instead. >>>>>> 2. The CacheLoader interface currently defines load(K key) and >>>>>> contains(K key) methods. Similar to above I was hoping to change the >>>>>> K type to be Object to be more inline with JCache and >>>>>> java.util.collections interfaces. >>>>>> >>>>>> This last one is a bit more major, but currently we have 2 classes >>>>>> that are named KeyFilter. One that resides in the >>>>>> org.infinispan.notifications package and another that resides in the >>>>>> org.infinispan.persistence.spi.AdvancedCacheLoader interface. >>>>>> >>>>>> 3. My plan is instead to consolidate these classes into 1 into a new >>>>>> core org.infinispan.filter package. I would also move the new >>>>>> KeyValueFilter class that was added for cluster listeners into this >>>>>> package and their accompanying implementations. >>>>>> >>>>>> The first 2 is currently implemented as changes in >>>>>> https://github.com/infinispan/infinispan/pull/2423. The latter I was >>>>>> going to add into changes for >>>>>> https://issues.jboss.org/browse/ISPN-4068. >>>>>> >>>>>> Let me know what you guys think. >>>>>> >>>>>> - Will >>>>>> _______________________________________________ >>>>>> infinispan-dev mailing list >>>>>> infinispan-dev at lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>> >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev at lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From dan.berindei at gmail.com Thu Mar 13 12:43:35 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Thu, 13 Mar 2014 18:43:35 +0200 Subject: [infinispan-dev] Proposed ISPN 7 compilation incompatibilities with ISPN 6 In-Reply-To: References: <5321A53E.3040402@infinispan.org> <5321CB4F.80800@redhat.com> <5321D4D2.2030003@redhat.com> Message-ID: On Thu, Mar 13, 2014 at 5:58 PM, William Burns wrote: > On Thu, Mar 13, 2014 at 11:54 AM, Vladimir Blagojevic > wrote: > > On 3/13/2014, 11:39 AM, William Burns wrote: > >> On Thu, Mar 13, 2014 at 11:14 AM, Vladimir Blagojevic > >> wrote: > >>> Mircea and I wanted to "promote" AdvancedCacheLoader.KeyFilter or merge > >>> it with one filter we all choose to use because we have so many of > these > >>> internal filter classes it is ridiculous. The filter you wrote is THE > one? > >> I can't say that for sure. Basically the KeyFilter that exists in the > >> notifications package was what I was going to use as it is more > >> generic since the method is called accept, which I thought was better > >> than shouldLoadKey. > >> > >> Are we thinking we would get rid of the KeyFilter and only use > >> KeyValueFilter though ? > >> > >> - Will > >> > > Not sure Will. I would move both KeyFilter and KeyValueFilter in some > > major package so we can all use it throughout the codebase! > > That is exactly what I was hoping to do with the new > org.infinispan.filter package in core module. > I would definitely keep KeyFilter, I imagine there are places where we don't have the value yet and we'd have to load it from a store to pass it to the filter. Even for clustered listeners, it might make a difference for the initial transfer - although deciding which is optimal, loading all the keys and then loading accepted values one by one, or loading all the values upfront, isn't really doable without a hint from the user. In a way I also see an overlap between KeyValueFilter and Converter. At least with SingleFileStore, KeyFilter can be used to filter keys before doing any real work, whereas KeyValueFilter/Converter can only be used to trim the amount of information to send over the network after the entry has been fully loaded. Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140313/b988902a/attachment.html From mmarkus at redhat.com Thu Mar 13 15:01:35 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Thu, 13 Mar 2014 19:01:35 +0000 Subject: [infinispan-dev] Infinispan - Hadoop integration Message-ID: Hi, I had a very good conversation with Jonathan Halliday, Sanne and Emmanuel around the integration between Infinispan and Hadoop. Just to recap, the goal is to be able to run Hadoop M/R tasks on data that is stored in Infinispan in order to gain speed. (once we have a prototype in place, one of the first tasks will be to validate this speed assumptions). In previous discussions we explored the idea of providing an HDFS implementation for Infinispan, which whilst doable, might not be the best integration point: - in order to run M/R jobs, hadoop interacts with two interfaces: InputFormat[1] and OutputFormat[2] - it's the specific InputFormat and OutputForman implementations that work on top of HDFS - instead of implementing HDFS, we could provide implementations for the InputFormat and OutputFormat interfaces, which would give us more flexibility - this seems to be the preferred integration point for other systems, such as Cassandra - also important to notice that we will have both an Hadoop and an Infinispan cluster running in parallel: the user will interact with the former in order to run M/R tasks. Hadoop will use Infinispan (integration achieved through InputFormat and OutputFormat ) in order to get the data to be processed. - Assumptions that we'll need to validate: this approach doesn't impose any constraint on how data is stored in Infinispan and should allow data to be read through the Map interface. Also InputFormat and OutputFormat implementations would only use get(k) and keySet() methods, and no native Infinispan M/R, which means that C/S access should also be possible. - very important: hadoop HDFS is an append only file system, and the M/R tasks operate on a snapshot of the data. From a task's perspective, all the data in the storage doesn't change after the task is started. More data can be appended whilst the task runs, but this won't be visible by the task. Infinispan doesn't have such an append structure, nor MVCC. The closest thing we have is the Snapshot isolation transactions implemented by the cloudTM project (this is not integrated yet). I assume that the M/R tasks are built with this snapshot-issolation requirement from the storage - this is something that we should investigate as well. It is possible that, in the first stages of this integration, we would require data stored in ISPM to be read only. [1] http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/InputFormat.html [2] http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/OutputFormat.html Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Thu Mar 13 15:01:57 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Thu, 13 Mar 2014 19:01:57 +0000 Subject: [infinispan-dev] Infinispan - Hadoop integration Message-ID: Hi, I had a very good conversation with Jonathan Halliday, Sanne and Emmanuel around the integration between Infinispan and Hadoop. Just to recap, the goal is to be able to run Hadoop M/R tasks on data that is stored in Infinispan in order to gain speed. (once we have a prototype in place, one of the first tasks will be to validate this speed assumptions). In previous discussions we explored the idea of providing an HDFS implementation for Infinispan, which whilst doable, might not be the best integration point: - in order to run M/R jobs, hadoop interacts with two interfaces: InputFormat[1] and OutputFormat[2] - it's the specific InputFormat and OutputForman implementations that work on top of HDFS - instead of implementing HDFS, we could provide implementations for the InputFormat and OutputFormat interfaces, which would give us more flexibility - this seems to be the preferred integration point for other systems, such as Cassandra - also important to notice that we will have both an Hadoop and an Infinispan cluster running in parallel: the user will interact with the former in order to run M/R tasks. Hadoop will use Infinispan (integration achieved through InputFormat and OutputFormat ) in order to get the data to be processed. - Assumptions that we'll need to validate: this approach doesn't impose any constraint on how data is stored in Infinispan and should allow data to be read through the Map interface. Also InputFormat and OutputFormat implementations would only use get(k) and keySet() methods, and no native Infinispan M/R, which means that C/S access should also be possible. - very important: hadoop HDFS is an append only file system, and the M/R tasks operate on a snapshot of the data. From a task's perspective, all the data in the storage doesn't change after the task is started. More data can be appended whilst the task runs, but this won't be visible by the task. Infinispan doesn't have such an append structure, nor MVCC. The closest thing we have is the Snapshot isolation transactions implemented by the cloudTM project (this is not integrated yet). I assume that the M/R tasks are built with this snapshot-issolation requirement from the storage - this is something that we should investigate as well. It is possible that, in the first stages of this integration, we would require data stored in ISPM to be read only. [1] http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/InputFormat.html [2] http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/OutputFormat.html Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From ales.justin at gmail.com Thu Mar 13 16:59:35 2014 From: ales.justin at gmail.com (Ales Justin) Date: Thu, 13 Mar 2014 13:59:35 -0700 Subject: [infinispan-dev] Infinispan - Hadoop integration In-Reply-To: References: Message-ID: <2D07BCCF-1230-4EF2-BBCC-17F67B203B86@gmail.com> > - also important to notice that we will have both an Hadoop and an Infinispan cluster running in parallel: the user will interact with the former in order to run M/R tasks. Hadoop will use Infinispan (integration achieved through InputFormat and OutputFormat ) in order to get the data to be processed. Would this be 2 JVMs, or you can trick Hadoop to start Infinispan as well -- hence 1JVM? -Ales From mmarkus at redhat.com Thu Mar 13 18:05:45 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Thu, 13 Mar 2014 22:05:45 +0000 Subject: [infinispan-dev] Infinispan - Hadoop integration In-Reply-To: <2D07BCCF-1230-4EF2-BBCC-17F67B203B86@gmail.com> References: <2D07BCCF-1230-4EF2-BBCC-17F67B203B86@gmail.com> Message-ID: <75211F34-A043-4F1F-9C48-D9457C20DC04@redhat.com> On Mar 13, 2014, at 20:59, Ales Justin wrote: >> - also important to notice that we will have both an Hadoop and an Infinispan cluster running in parallel: the user will interact with the former in order to run M/R tasks. Hadoop will use Infinispan (integration achieved through InputFormat and OutputFormat ) in order to get the data to be processed. > > Would this be 2 JVMs, or you can trick Hadoop to start Infinispan as well -- hence 1JVM? good point, ideally it should be a single VM: reduced serialization cost (in vm access) and simpler architecture. That's if you're not using C/S mode, of course. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From sanne at infinispan.org Thu Mar 13 18:17:01 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Thu, 13 Mar 2014 22:17:01 +0000 Subject: [infinispan-dev] Infinispan - Hadoop integration In-Reply-To: <75211F34-A043-4F1F-9C48-D9457C20DC04@redhat.com> References: <2D07BCCF-1230-4EF2-BBCC-17F67B203B86@gmail.com> <75211F34-A043-4F1F-9C48-D9457C20DC04@redhat.com> Message-ID: On 13 March 2014 22:05, Mircea Markus wrote: > > On Mar 13, 2014, at 20:59, Ales Justin wrote: > >>> - also important to notice that we will have both an Hadoop and an Infinispan cluster running in parallel: the user will interact with the former in order to run M/R tasks. Hadoop will use Infinispan (integration achieved through InputFormat and OutputFormat ) in order to get the data to be processed. >> >> Would this be 2 JVMs, or you can trick Hadoop to start Infinispan as well -- hence 1JVM? > > good point, ideally it should be a single VM: reduced serialization cost (in vm access) and simpler architecture. That's if you're not using C/S mode, of course. ? Don't try confusing us again on that :-) I think we agreed that the job would *always* run in strict locality with the datacontainer (i.e. in the same JVM). Sure, an Hadoop client would be connecting from somewhere else but that's unrelated. > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From mmarkus at redhat.com Thu Mar 13 18:19:14 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Thu, 13 Mar 2014 22:19:14 +0000 Subject: [infinispan-dev] Infinispan - Hadoop integration In-Reply-To: References: <2D07BCCF-1230-4EF2-BBCC-17F67B203B86@gmail.com> <75211F34-A043-4F1F-9C48-D9457C20DC04@redhat.com> Message-ID: <3D492D0D-3581-4B53-B272-89F93BD6FAAA@redhat.com> On Mar 13, 2014, at 22:17, Sanne Grinovero wrote: > On 13 March 2014 22:05, Mircea Markus wrote: >> >> On Mar 13, 2014, at 20:59, Ales Justin wrote: >> >>>> - also important to notice that we will have both an Hadoop and an Infinispan cluster running in parallel: the user will interact with the former in order to run M/R tasks. Hadoop will use Infinispan (integration achieved through InputFormat and OutputFormat ) in order to get the data to be processed. >>> >>> Would this be 2 JVMs, or you can trick Hadoop to start Infinispan as well -- hence 1JVM? >> >> good point, ideally it should be a single VM: reduced serialization cost (in vm access) and simpler architecture. That's if you're not using C/S mode, of course. > > ? > Don't try confusing us again on that :-) > I think we agreed that the job would *always* run in strict locality > with the datacontainer (i.e. in the same JVM). Sure, an Hadoop client > would be connecting from somewhere else but that's unrelated. we did discuss the possibility of running it over hotrod though, do you see a problem with that? Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From sanne at infinispan.org Thu Mar 13 19:39:56 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Thu, 13 Mar 2014 23:39:56 +0000 Subject: [infinispan-dev] Infinispan - Hadoop integration In-Reply-To: <3D492D0D-3581-4B53-B272-89F93BD6FAAA@redhat.com> References: <2D07BCCF-1230-4EF2-BBCC-17F67B203B86@gmail.com> <75211F34-A043-4F1F-9C48-D9457C20DC04@redhat.com> <3D492D0D-3581-4B53-B272-89F93BD6FAAA@redhat.com> Message-ID: On 13 March 2014 22:19, Mircea Markus wrote: > > On Mar 13, 2014, at 22:17, Sanne Grinovero wrote: > >> On 13 March 2014 22:05, Mircea Markus wrote: >>> >>> On Mar 13, 2014, at 20:59, Ales Justin wrote: >>> >>>>> - also important to notice that we will have both an Hadoop and an Infinispan cluster running in parallel: the user will interact with the former in order to run M/R tasks. Hadoop will use Infinispan (integration achieved through InputFormat and OutputFormat ) in order to get the data to be processed. >>>> >>>> Would this be 2 JVMs, or you can trick Hadoop to start Infinispan as well -- hence 1JVM? >>> >>> good point, ideally it should be a single VM: reduced serialization cost (in vm access) and simpler architecture. That's if you're not using C/S mode, of course. >> >> ? >> Don't try confusing us again on that :-) >> I think we agreed that the job would *always* run in strict locality >> with the datacontainer (i.e. in the same JVM). Sure, an Hadoop client >> would be connecting from somewhere else but that's unrelated. > > we did discuss the possibility of running it over hotrod though, do you see a problem with that? No of course not, we discussed that. I just mean I think that needs to be clarified on the list that the Hadoop engine will always run in the same JVM. Clients (be it Hot Rod via new custom commands or Hadoop native clients, or Hadoop clients over Hot Rod) can indeed connect remotely, but it's important to clarify that the processing itself will take advantage of locality in all configurations. In other words, to clarify that the serialization cost you mention for clients is just to transfer the job definition and optionally the final processing result. Sanne > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From emmanuel at hibernate.org Fri Mar 14 05:06:06 2014 From: emmanuel at hibernate.org (Emmanuel Bernard) Date: Fri, 14 Mar 2014 09:06:06 +0000 Subject: [infinispan-dev] Infinispan - Hadoop integration In-Reply-To: References: <2D07BCCF-1230-4EF2-BBCC-17F67B203B86@gmail.com> <75211F34-A043-4F1F-9C48-D9457C20DC04@redhat.com> <3D492D0D-3581-4B53-B272-89F93BD6FAAA@redhat.com> Message-ID: <3F02541E-F385-4B04-8654-054E4483397F@hibernate.org> > On 13 mars 2014, at 23:39, Sanne Grinovero wrote: > >> On 13 March 2014 22:19, Mircea Markus wrote: >> >>> On Mar 13, 2014, at 22:17, Sanne Grinovero wrote: >>> >>>> On 13 March 2014 22:05, Mircea Markus wrote: >>>> >>>> On Mar 13, 2014, at 20:59, Ales Justin wrote: >>>> >>>>>> - also important to notice that we will have both an Hadoop and an Infinispan cluster running in parallel: the user will interact with the former in order to run M/R tasks. Hadoop will use Infinispan (integration achieved through InputFormat and OutputFormat ) in order to get the data to be processed. >>>>> >>>>> Would this be 2 JVMs, or you can trick Hadoop to start Infinispan as well -- hence 1JVM? >>>> >>>> good point, ideally it should be a single VM: reduced serialization cost (in vm access) and simpler architecture. That's if you're not using C/S mode, of course. >>> >>> ? >>> Don't try confusing us again on that :-) >>> I think we agreed that the job would *always* run in strict locality >>> with the datacontainer (i.e. in the same JVM). Sure, an Hadoop client >>> would be connecting from somewhere else but that's unrelated. >> >> we did discuss the possibility of running it over hotrod though, do you see a problem with that? > > No of course not, we discussed that. I just mean I think that needs to > be clarified on the list that the Hadoop engine will always run in the > same JVM. Clients (be it Hot Rod via new custom commands or Hadoop > native clients, or Hadoop clients over Hot Rod) can indeed connect > remotely, but it's important to clarify that the processing itself > will take advantage of locality in all configurations. In other words, > to clarify that the serialization cost you mention for clients is just > to transfer the job definition and optionally the final processing > result. > Not quite. The serialization cost Mircea mentions I think is between the Hadoop vm and the Infinispan vm on a single node. The serialization does not require network traffic but is still shuffling data between two processes basically. We could eliminate this by starting both Hadoop and Infinispan from the same VM but that requires more work than necessary for a prototype. So to clarify, we will have a cluster of nodes where each node contains two JVM, one running an Hadoop process, one running an Infinispan process. The Hadoop process would only read the data from the Infinispan process in the same node during a normal M/R execution. From galder at redhat.com Fri Mar 14 05:30:53 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Fri, 14 Mar 2014 10:30:53 +0100 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <8EEB7263-528A-4F4E-ADCB-7C350C2C23A7@redhat.com> References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> <10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org> <1A465602-8624-4018-A317-5A9D74CD82AD@redhat.com> <8EEB7263-528A-4F4E-ADCB-7C350C2C23A7@redhat.com> Message-ID: <9C194920-40D8-417D-9783-806FB4AB47C3@redhat.com> On 13 Mar 2014, at 10:48, Mircea Markus wrote: > > On Mar 12, 2014, at 17:45, Galder Zamarre?o wrote: > >> >> On 05 Mar 2014, at 18:16, Mircea Markus wrote: >> >>> If anything, this model reduces type safety and reusability. E.g. say you want a M/R task to see how many Persons speak French. With the single cache model(both Dog and Person int he Cache) it would look something like: >>> >>> a) >>> //pseudocode >>> map (String k, Mammal value) { >>> if (value instanceof Person)) { //this is the ugly part >>> if (((Person)value).speaks("French")) ... >>> } else { >>> //ignore it, it's an Dog >>> } >>> } >>> >>> Same thing written for a Cache: >>> >>> b) >>> map (String k, Person value) { >>> if (value.speaks("French")) ... >>> } >>> >>> I don't think people would prefer writing a) instead of b) ;) >> >> It?s a pity that having discussed in Mallorca and having referenced a fair few times already, you could not think of an option based on what Paul suggsted in https://issues.jboss.org/browse/ISPN-3640 > > Not sure what you want to imply with that, it was an oversight on my side :\ That?s ok, happens to all. Sorry :) > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From sanne at infinispan.org Fri Mar 14 07:34:37 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Fri, 14 Mar 2014 11:34:37 +0000 Subject: [infinispan-dev] Infinispan - Hadoop integration In-Reply-To: <3F02541E-F385-4B04-8654-054E4483397F@hibernate.org> References: <2D07BCCF-1230-4EF2-BBCC-17F67B203B86@gmail.com> <75211F34-A043-4F1F-9C48-D9457C20DC04@redhat.com> <3D492D0D-3581-4B53-B272-89F93BD6FAAA@redhat.com> <3F02541E-F385-4B04-8654-054E4483397F@hibernate.org> Message-ID: On 14 March 2014 09:06, Emmanuel Bernard wrote: > > >> On 13 mars 2014, at 23:39, Sanne Grinovero wrote: >> >>> On 13 March 2014 22:19, Mircea Markus wrote: >>> >>>> On Mar 13, 2014, at 22:17, Sanne Grinovero wrote: >>>> >>>>> On 13 March 2014 22:05, Mircea Markus wrote: >>>>> >>>>> On Mar 13, 2014, at 20:59, Ales Justin wrote: >>>>> >>>>>>> - also important to notice that we will have both an Hadoop and an Infinispan cluster running in parallel: the user will interact with the former in order to run M/R tasks. Hadoop will use Infinispan (integration achieved through InputFormat and OutputFormat ) in order to get the data to be processed. >>>>>> >>>>>> Would this be 2 JVMs, or you can trick Hadoop to start Infinispan as well -- hence 1JVM? >>>>> >>>>> good point, ideally it should be a single VM: reduced serialization cost (in vm access) and simpler architecture. That's if you're not using C/S mode, of course. >>>> >>>> ? >>>> Don't try confusing us again on that :-) >>>> I think we agreed that the job would *always* run in strict locality >>>> with the datacontainer (i.e. in the same JVM). Sure, an Hadoop client >>>> would be connecting from somewhere else but that's unrelated. >>> >>> we did discuss the possibility of running it over hotrod though, do you see a problem with that? >> >> No of course not, we discussed that. I just mean I think that needs to >> be clarified on the list that the Hadoop engine will always run in the >> same JVM. Clients (be it Hot Rod via new custom commands or Hadoop >> native clients, or Hadoop clients over Hot Rod) can indeed connect >> remotely, but it's important to clarify that the processing itself >> will take advantage of locality in all configurations. In other words, >> to clarify that the serialization cost you mention for clients is just >> to transfer the job definition and optionally the final processing >> result. >> > > Not quite. The serialization cost Mircea mentions I think is between the Hadoop vm and the Infinispan vm on a single node. The serialization does not require network traffic but is still shuffling data between two processes basically. We could eliminate this by starting both Hadoop and Infinispan from the same VM but that requires more work than necessary for a prototype. Ok so there was indeed confusion on terminology: I don't agree with that design. >From an implementor's effort perspective having to setup an Hot Rod client rather than embedding an Infinispan node is approximately the same work, or slightly more as you have to start both. Also to test it, embedded mode it easier. Hot Rod is not meant to be used on the same node, especially not if you only want to access data in strict locality; for example it wouldn't be able to iterated on all keys of the current server node (and limiting to those keys only). I might be wrong as I'm not too familiar with Hot Rod, but I think it might not even be able to iterate on keys at all; maybe today it can actually via some trick, but the point is this is a conceptual mismatch for it. Where you say this doesn't require nework traffic you need to consider that while it's true this might not be using the physical network wire being localhost, it would still be transferred over a costly network stream, as we don't do off-heap buffer sharing yet. > So to clarify, we will have a cluster of nodes where each node contains two JVM, one running an Hadoop process, one running an Infinispan process. The Hadoop process would only read the data from the Infinispan process in the same node during a normal M/R execution. So we discussed two use cases: - engage Infinispan to accelerate an existing Hadoop deployment - engage Hadoop to run an Hadoop job on existing data in Infinispan In neither case I see why I'd run them in separate JVMs: seems less effective and more work to get done, and no benefit unless you're thinking about independent JVM tuning? That might be something to consider, but I doubt tuning independence would ever offset the cost of serialized transfer of each entry. The second use case could be used via Hot Rod too, but that's a different discussion, actually just a nice side effect of Hadoop being language agnostic that we would take advantage of. Sanne > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From mmarkus at redhat.com Fri Mar 14 11:26:03 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Fri, 14 Mar 2014 15:26:03 +0000 Subject: [infinispan-dev] Infinispan - Hadoop integration In-Reply-To: <3F02541E-F385-4B04-8654-054E4483397F@hibernate.org> References: <2D07BCCF-1230-4EF2-BBCC-17F67B203B86@gmail.com> <75211F34-A043-4F1F-9C48-D9457C20DC04@redhat.com> <3D492D0D-3581-4B53-B272-89F93BD6FAAA@redhat.com> <3F02541E-F385-4B04-8654-054E4483397F@hibernate.org> Message-ID: <9FE5D6B7-B33D-4566-8DA6-A5A6C5DCEEF4@redhat.com> On Mar 14, 2014, at 9:06, Emmanuel Bernard wrote: > > >> On 13 mars 2014, at 23:39, Sanne Grinovero wrote: >> >>> On 13 March 2014 22:19, Mircea Markus wrote: >>> >>>> On Mar 13, 2014, at 22:17, Sanne Grinovero wrote: >>>> >>>>> On 13 March 2014 22:05, Mircea Markus wrote: >>>>> >>>>> On Mar 13, 2014, at 20:59, Ales Justin wrote: >>>>> >>>>>>> - also important to notice that we will have both an Hadoop and an Infinispan cluster running in parallel: the user will interact with the former in order to run M/R tasks. Hadoop will use Infinispan (integration achieved through InputFormat and OutputFormat ) in order to get the data to be processed. >>>>>> >>>>>> Would this be 2 JVMs, or you can trick Hadoop to start Infinispan as well -- hence 1JVM? >>>>> >>>>> good point, ideally it should be a single VM: reduced serialization cost (in vm access) and simpler architecture. That's if you're not using C/S mode, of course. >>>> >>>> ? >>>> Don't try confusing us again on that :-) >>>> I think we agreed that the job would *always* run in strict locality >>>> with the datacontainer (i.e. in the same JVM). Sure, an Hadoop client >>>> would be connecting from somewhere else but that's unrelated. >>> >>> we did discuss the possibility of running it over hotrod though, do you see a problem with that? >> >> No of course not, we discussed that. I just mean I think that needs to >> be clarified on the list that the Hadoop engine will always run in the >> same JVM. Clients (be it Hot Rod via new custom commands or Hadoop >> native clients, or Hadoop clients over Hot Rod) can indeed connect >> remotely, but it's important to clarify that the processing itself >> will take advantage of locality in all configurations. In other words, >> to clarify that the serialization cost you mention for clients is just >> to transfer the job definition and optionally the final processing >> result. >> > > Not quite. The serialization cost Mircea mentions I think is between the Hadoop vm and the Infinispan vm on a single node. The serialization does not require network traffic but is still shuffling data between two processes basically. We could eliminate this by starting both Hadoop and Infinispan from the same VM but that requires more work than necessary for a prototype. thanks for the clarification, indeed this is the serialization overhead I had in mind. > > So to clarify, we will have a cluster of nodes where each node contains two JVM, one running an Hadoop process, one running an Infinispan process. The Hadoop process would only read the data from the Infinispan process in the same node during a normal M/R execution. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From sanne at infinispan.org Fri Mar 14 11:55:54 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Fri, 14 Mar 2014 15:55:54 +0000 Subject: [infinispan-dev] Deprecating parent dependencies & migrating to JUnit Message-ID: TLDR: I'm banning TestNG from the Query modules, and this will affect the parent poms. -- I needed to run some quick experiment rounds in Query, but *some* tests fail, and I simply want to know which ones to evaluate some approaches. But I can't get to know the full list easily, as failing a test in our current testing framework apparently implies you'll be leaking resources, and the suite slowly grinds to a slow death running out of resources. The problem in this, is when I need a quick evaluation which is blocking other people in other teams, embarking in such witch hunting is not an option; while I'm co-responsible of some of the embarassing ugliness in our test's resource handling, I do primarily blame to tooling rather than ourselves: especially TestNG's ridiculously unreliable handling of shutdown hooks - and have enough of it. Compared to that, the resource handling of JUnit is extremely handy and reliable: requires a bit more coding as it provides less features than TestNG, but can be relied on. I can't really afford spending time to migrate all tests to JUnit today, but I will start doing so in smaller iterations. Now one problem is that TestNG, and several other dependencies, are included by default in the parent classpath. Having a set of default dependencies - even just in provided or test mode - forcefully provided to all modules has created some other trouble too; for example I really don't want the slf4j logger in my classpath as it has hidden problems in classpath structure of our own modules. ## -> Would you all be ok if I move the dependencies to the dependency management section? (You can all use TestNG in your own modules if you want, but please allow me to opt out) That implies each module will have to explicitly pick what it needs; I consider that a good thing as explicit is good, and getting utilities on classpath from out-of-the-blue is confusing. For example, moving just one dependency yesterday I could figure out that the migration to Netty4 was actually incomplete and some tests where still using Netty3. I consider that a proof of a more healthy structure. Sanne From sanne at infinispan.org Fri Mar 14 13:00:31 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Fri, 14 Mar 2014 17:00:31 +0000 Subject: [infinispan-dev] Disabling unstable tests: push on master directly? Message-ID: I'm blocked as usual on some core tests failing. Since I've found evidence that it's not "just me", I'm using the new cute annotation to mark them as unstable. Would we all be good for me to push such findings - when I'm sure - directly to master skipping the Pull Request process? My reasoning is that: - these are urgent as they block other work and rise FUD among other occasional contributors - you seem overwhelmed with PRs which deserve actual attention - worst case you can revert: these commits can't go unnoticed as there are JIRAs created to track these, so essentially it's not skipping peer review but postponing it to a more suitable point in time. Good idea? Cheers, Sanne From afield at redhat.com Fri Mar 14 14:22:04 2014 From: afield at redhat.com (Alan Field) Date: Fri, 14 Mar 2014 14:22:04 -0400 (EDT) Subject: [infinispan-dev] Infinispan - Hadoop integration In-Reply-To: <9FE5D6B7-B33D-4566-8DA6-A5A6C5DCEEF4@redhat.com> References: <2D07BCCF-1230-4EF2-BBCC-17F67B203B86@gmail.com> <75211F34-A043-4F1F-9C48-D9457C20DC04@redhat.com> <3D492D0D-3581-4B53-B272-89F93BD6FAAA@redhat.com> <3F02541E-F385-4B04-8654-054E4483397F@hibernate.org> <9FE5D6B7-B33D-4566-8DA6-A5A6C5DCEEF4@redhat.com> Message-ID: <455800889.18555633.1394821324912.JavaMail.zimbra@redhat.com> Hey, First off, I think integrating Infinispan with Hadoop using the InputFormat and OutputFormat interfaces is a really good idea, instead of using the file system interfaces. That's the approach that Gluster is using, but I don't think it's ideal for Infinispan. > So to clarify, we will have a cluster of nodes where each node contains two > JVM, one running an Hadoop process, one running an Infinispan process. The > Hadoop process would only read the data from the Infinispan process in the > same node during a normal M/R execution. A regular non-master Hadoop node is running a DataNode daemon and a TaskTracker daemon in separate JVMs. Each Map/Reduce Task is also executed in a separate JVM. In a Hadoop 2.0, the TaskTracker daemon is replaced by a NodeManager daemon. This is the minimal number of JVM processes needed by a node, and there will be more if other services are running in the cluster. The first point I am trying to make is that there are many different JVM processes running on a Hadoop node. Then I am trying to understand which JVM process you are talking about when you say the "Hadoop process"? I *think* this is referring to the actual Map/Reduce Task, but I'm not sure. I would also say that trying to figure out how to integrate Infinispan with Apache Spark [1] would also be interesting. I don't know as much about it, but a lot of processing that is currently being performed in Map/Reduce will be migrated to Spark. It's certainly being billed as the replacement, or at least the future of Map/Reduce. Thanks, Alan [1] https://spark.apache.org/ ----- Original Message ----- > From: "Mircea Markus" > To: "infinispan -Dev List" > Sent: Friday, March 14, 2014 11:26:03 AM > Subject: Re: [infinispan-dev] Infinispan - Hadoop integration > > > On Mar 14, 2014, at 9:06, Emmanuel Bernard wrote: > > > > > > >> On 13 mars 2014, at 23:39, Sanne Grinovero wrote: > >> > >>> On 13 March 2014 22:19, Mircea Markus wrote: > >>> > >>>> On Mar 13, 2014, at 22:17, Sanne Grinovero wrote: > >>>> > >>>>> On 13 March 2014 22:05, Mircea Markus wrote: > >>>>> > >>>>> On Mar 13, 2014, at 20:59, Ales Justin wrote: > >>>>> > >>>>>>> - also important to notice that we will have both an Hadoop and an > >>>>>>> Infinispan cluster running in parallel: the user will interact with > >>>>>>> the former in order to run M/R tasks. Hadoop will use Infinispan > >>>>>>> (integration achieved through InputFormat and OutputFormat ) in > >>>>>>> order to get the data to be processed. > >>>>>> > >>>>>> Would this be 2 JVMs, or you can trick Hadoop to start Infinispan as > >>>>>> well -- hence 1JVM? > >>>>> > >>>>> good point, ideally it should be a single VM: reduced serialization > >>>>> cost (in vm access) and simpler architecture. That's if you're not > >>>>> using C/S mode, of course. > >>>> > >>>> ? > >>>> Don't try confusing us again on that :-) > >>>> I think we agreed that the job would *always* run in strict locality > >>>> with the datacontainer (i.e. in the same JVM). Sure, an Hadoop client > >>>> would be connecting from somewhere else but that's unrelated. > >>> > >>> we did discuss the possibility of running it over hotrod though, do you > >>> see a problem with that? > >> > >> No of course not, we discussed that. I just mean I think that needs to > >> be clarified on the list that the Hadoop engine will always run in the > >> same JVM. Clients (be it Hot Rod via new custom commands or Hadoop > >> native clients, or Hadoop clients over Hot Rod) can indeed connect > >> remotely, but it's important to clarify that the processing itself > >> will take advantage of locality in all configurations. In other words, > >> to clarify that the serialization cost you mention for clients is just > >> to transfer the job definition and optionally the final processing > >> result. > >> > > > > Not quite. The serialization cost Mircea mentions I think is between the > > Hadoop vm and the Infinispan vm on a single node. The serialization does > > not require network traffic but is still shuffling data between two > > processes basically. We could eliminate this by starting both Hadoop and > > Infinispan from the same VM but that requires more work than necessary for > > a prototype. > > thanks for the clarification, indeed this is the serialization overhead I had > in mind. > > > > > So to clarify, we will have a cluster of nodes where each node contains two > > JVM, one running an Hadoop process, one running an Infinispan process. The > > Hadoop process would only read the data from the Infinispan process in the > > same node during a normal M/R execution. > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > From sanne at infinispan.org Fri Mar 14 14:53:12 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Fri, 14 Mar 2014 18:53:12 +0000 Subject: [infinispan-dev] =?utf-8?q?OpenJDK_and_HashMap_=E2=80=A6=2E_Safel?= =?utf-8?q?y_Teaching_an_Old_Dog_New_=28Off-Heap!=29_Tricks?= Message-ID: http://www.infoq.com/articles/Open-JDK-and-HashMap-Off-Heap Great Article! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140314/879e3208/attachment.html From ben.cotton at ALUMNI.RUTGERS.EDU Fri Mar 14 15:22:48 2014 From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben) Date: Fri, 14 Mar 2014 12:22:48 -0700 (PDT) Subject: [infinispan-dev] =?utf-8?q?OpenJDK_and_HashMap_=E2=80=A6=2E_Safel?= =?utf-8?q?y_Teaching_an_Old_Dog_New_=28Off-Heap!=29_Tricks?= In-Reply-To: References: Message-ID: <1394824968708-4029052.post@n3.nabble.com> Thanks Sanne. Interesting how many "penguins" were involved: 1. openJDK (duke) 2. Linux (mascot) 3. Infinispan 5.3 (T.N.P) ;) -- View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-OpenJDK-and-HashMap-Safely-Teaching-an-Old-Dog-New-Off-Heap-Tricks-tp4029051p4029052.html Sent from the Infinispan Developer List mailing list archive at Nabble.com. From ben.cotton at ALUMNI.RUTGERS.EDU Fri Mar 14 22:11:09 2014 From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben) Date: Fri, 14 Mar 2014 19:11:09 -0700 (PDT) Subject: [infinispan-dev] Learning ISPN7 DataContainer internals ... first steps Message-ID: <1394849469300-4029053.post@n3.nabble.com> Hi Mircea and RedHat, Dmitry, and I are now taking initial steps to code the integration of OpenHFT SHM as an off-heap ISPN7 DataContainer. We have mused that a possible approach to this may be to use the symmetry of ISPN7's existing DefaultDataContainer.java impl and (copy a/o extend) that existing work into a new org.infinispan.offheap.OffHeapDefaultDataContainer.java impl. The key steps would be for us to 100% soundly and 100% completely replace ConcurrentMap entries = CollectionFactory.makeConcurrentParallelMap(128, concurrencyLevel); with ConcurrentMap entries = entries = new net.openhft.collections.SharedHashMapBuilder() .generatedValueType(Boolean.TRUE) .entrySize(512) .create( new File("/dev/shm/offHeapSharedHashMap.DataContainer"), Object.class, InternalCacheEntry.class ); We are of course very newbie wrt to ISPN7 DataContainer internals. Before we get into building and testing compelling exercises and hardening the OffHeapDefaultDataContainer, would you please comment wrt to your view that this seems like the correct first steps? https://github.com/Cotton-Ben/infinispan/blob/master/off-heap/src/main/java/org/infinispan/offheap/OffHeapDefaultDataContainer.java -- View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/Learning-ISPN7-DataContainer-internals-first-steps-tp4029053.html Sent from the Infinispan Developer List mailing list archive at Nabble.com. From bibryam at gmail.com Sun Mar 16 19:57:42 2014 From: bibryam at gmail.com (Bilgin Ibryam) Date: Sun, 16 Mar 2014 23:57:42 +0000 Subject: [infinispan-dev] Infinispan Query API module Message-ID: Hi all, I was working on extending camel-infinispan component with remote query capability and just realized that org.infinispan/infinispan-query/6.0.1.Final depends on hibernate-hql-parser and hibernate-hql-lucene which are still in Alpha. Am I missing something or is there a way to no depend on alpha version of these artifacts from a final version artifact? Thanks, -- Bilgin Ibryam Apache Camel & Apache OFBiz committer Blog: ofbizian.com Twitter: @bibryam Author of Instant Apache Camel Message Routing http://www.amazon.com/dp/1783283475 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140316/e1414e8d/attachment.html From anistor at redhat.com Mon Mar 17 05:42:33 2014 From: anistor at redhat.com (Adrian Nistor) Date: Mon, 17 Mar 2014 11:42:33 +0200 Subject: [infinispan-dev] Infinispan Query API module In-Reply-To: References: Message-ID: <5326C389.4030006@redhat.com> Hi, hibernate-hql-parser and hibernate-hql-lucene do not have a Final release yet, so at this moment it is not possible to avoid the alpha dependency. Cheers On 03/17/2014 01:57 AM, Bilgin Ibryam wrote: > Hi all, > > I was working on extending camel-infinispan component with remote > query capability and just realized that > org.infinispan/infinispan-query/6.0.1.Final depends on > hibernate-hql-parser and hibernate-hql-lucene which are still in Alpha. > > Am I missing something or is there a way to no depend on alpha version > of these artifacts from a final version artifact? > > Thanks, > > -- > Bilgin Ibryam > > Apache Camel & Apache OFBiz committer > Blog: ofbizian.com > Twitter: @bibryam > > Author of Instant Apache Camel Message Routing > http://www.amazon.com/dp/1783283475 > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140317/d770fd3b/attachment-0001.html From ttarrant at redhat.com Mon Mar 17 06:16:42 2014 From: ttarrant at redhat.com (Tristan Tarrant) Date: Mon, 17 Mar 2014 11:16:42 +0100 Subject: [infinispan-dev] Learning ISPN7 DataContainer internals ... first steps In-Reply-To: <1394849469300-4029053.post@n3.nabble.com> References: <1394849469300-4029053.post@n3.nabble.com> Message-ID: <5326CB8A.3060404@redhat.com> On 15/03/2014 03:11, cotton-ben wrote: > We are of course very newbie wrt to ISPN7 DataContainer internals. Before > we get into building and testing compelling exercises and hardening the > OffHeapDefaultDataContainer, would you please comment wrt to your view that > this seems like the correct first steps? > > https://github.com/Cotton-Ben/infinispan/blob/master/off-heap/src/main/java/org/infinispan/offheap/OffHeapDefaultDataContainer.java > Ben, make sure that all possible user-configurable parameters are extracted appropriately. What happens if I start multiple caches on the same /dev/shm/offHeapSharedHashMap.DataContainer ? Do they share the memory ? If this is not possible, I suggest concatenating the name of the cache to the name of the file. Tristan From guillaume.terral at gmail.com Mon Mar 17 06:45:36 2014 From: guillaume.terral at gmail.com (Guillaume terral) Date: Mon, 17 Mar 2014 11:45:36 +0100 Subject: [infinispan-dev] Fwd: Infinspan configuration Distributed Node with distributed index and HotRod usage In-Reply-To: References: Message-ID: Hello Infinispan Dev Team, After several days of testing and research with Sanne Grinovero, he asked me to send an email to this mailing list to expose you my issue with I nfinispan. I'm actually working on a project to build an infinispan cluster with the following requirement: - Clustered nodes in Distributed mode - Usage of HotRod to send and query data from the cluster - Low latency for read and write - Distributed lucene index - JGroups back-end I posted the configuration i have for my cluster and for the hot rod client on a github under: https://github.com/guillaumeterral/infinispan-test You will see a little readme explaining how to use this java project. I faced some issues along my development, What i noticed is: If i remove the line * .transaction().transactionMode(TransactionMode.NON_TRANSACTIONAL)*, I have good latencies showed in the statistics MBean for averageWritingTime but I get an unstable cluster with the following issues: - lock issue, one node lock the index and try to write in it, at the same time another node is trying to lock the index, this behavior lead to a lock timeout. - send message timeout, some node timeout to send message to others, i was not able to understand the root code. - compound file not found, this appears only if I use compound file (default behavior) I think those errors are due to previous ones. But with a non transactional cluster i get a pretty stable cluster but with really bad latencies (~ 1.5s) showed in the cluster and in the GUI client i wrote, this client is not present of GitHub I can push a console client on demand if you want. We tried to investigate the issue with Sanne, but across IRC it's not that easy. Could you provide me some help/advice, we are really motivated to use Infinispan and to give you some feedback on it, especially the HotRod client/server component. We plan to test Infinispan 7.0 and provide you some feedback when we will complete our requirements on Infinispan 6.0 cluster. You can join me at this mail adress at any time, or contact me on IRC (#infinispan) my nickname is guillaume__ , i'm almost always connected on it. Thanks in advance, Guillaume -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140317/3761ae96/attachment.html From jonathan.halliday at redhat.com Mon Mar 17 07:00:32 2014 From: jonathan.halliday at redhat.com (Jonathan Halliday) Date: Mon, 17 Mar 2014 11:00:32 +0000 Subject: [infinispan-dev] event processing integration Message-ID: <5326D5D0.4010900@redhat.com> Alongside recent talk of integrating infinispan with hadoop batch processing, there has been some discussion of using the data grid alongside an event stream processing system. There are several directions we could consider here. In approximate order of increasing complexity these are: - Allow bi-directional flow of events, such that listeners on the cache can be used to cause events in the processing engine, or events in the processing engine can update the cache. - Allow the cache to be used to hold lookup data for reference from user code running the processing engine, to speed up joining streamed events to what would otherwise be data tables on disk. - Integrate with the processing engine itself, such that infinispan can be used to store items that would otherwise occupy precious RAM. This one is probably only viable with the cooperation of the stream processing system, so I'll base further discussion on Drools Fusion. The engine uses memory for a) rules, i.e. processing logic. Some of this is infrequently accessed. Think of a decision tree in which some branches are traversed more than others. So, opportunities to swap bits out to cache perhaps. b) state, particularly sliding windows. Again some data is infrequently accessed. For many sliding window calculations in particular (e.g. running average), only the head and tail of the window are actually used. The events in-between can be swapped out. Of course these integrations require the stream processing engine to be written to support such operations - careful handling of object references is needed. Currently the engine doesn't work that way - everything is focussed on speed at the expense of memory. - Borrow some ideas from the event processing DSLs, such that the data grid query engine can independently support continuous (standing) queries rather than just one-off queries. Arguably this is reinventing the wheel, but for simple use cases it may be preferable to run the stream processing logic directly in the grid rather than deploying a dedicated event stream processing system. I think it's probably going to require supporting lists as a first class construct alongside maps though. There are various cludges possible here, including the brute force approach of faking continuous query by re-executing a one-off query on each mutation, but they tend to be inefficient. There is also the thorny problem of supporting a (potentially distributed) clock, since a lot of use cases need to reference the passage of time in the query e.g. 'send event to listener if avg in last N minutes > x'. Jonathan Halliday Core developer, JBoss. -- Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham (USA), Paul Hickey (Ireland), Matt Parson (USA), Charlie Peters (USA) From ben.cotton at ALUMNI.RUTGERS.EDU Mon Mar 17 07:23:58 2014 From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben) Date: Mon, 17 Mar 2014 04:23:58 -0700 (PDT) Subject: [infinispan-dev] Learning ISPN7 DataContainer internals ... first steps In-Reply-To: <5326CB8A.3060404@redhat.com> References: <1394849469300-4029053.post@n3.nabble.com> <5326CB8A.3060404@redhat.com> Message-ID: <1395055438949-4029059.post@n3.nabble.com> Many thanks for this counsel, Tristan. Dmitry and I are taking our first "baby steps" into learning the DataContainer internals. We will exercise this consideration that you mention - and exercise it explicitly - to ensure our adaptation is accommodating. In general, our approach to adapting this off-heap capability in ISPN7 will be to play with some initial tests - that exercise as many considerations as possible - and then react to those test results accordingly). Such an approach, coupled with the availability of this forum's expertise, we think may be effective. Excited, thanks again, Ben and Dmitry -- View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/Learning-ISPN7-DataContainer-internals-first-steps-tp4029053p4029059.html Sent from the Infinispan Developer List mailing list archive at Nabble.com. From ben.cotton at ALUMNI.RUTGERS.EDU Mon Mar 17 08:14:54 2014 From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben) Date: Mon, 17 Mar 2014 05:14:54 -0700 (PDT) Subject: [infinispan-dev] Learning ISPN7 DataContainer internals ... first steps In-Reply-To: <1395055438949-4029059.post@n3.nabble.com> References: <1394849469300-4029053.post@n3.nabble.com> <5326CB8A.3060404@redhat.com> <1395055438949-4029059.post@n3.nabble.com> Message-ID: <1395058494270-4029060.post@n3.nabble.com> Tristan, Does the ISPN7 API (or config) have a FluentBuilder mechanism via which Cache instance A can be bound to DataContainer A and Cache instance B can be bound to DataContainer B? Thx, Ben & Dmitry -- View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/Learning-ISPN7-DataContainer-internals-first-steps-tp4029053p4029060.html Sent from the Infinispan Developer List mailing list archive at Nabble.com. From ttarrant at redhat.com Mon Mar 17 08:30:49 2014 From: ttarrant at redhat.com (Tristan Tarrant) Date: Mon, 17 Mar 2014 13:30:49 +0100 Subject: [infinispan-dev] Learning ISPN7 DataContainer internals ... first steps In-Reply-To: <1395058494270-4029060.post@n3.nabble.com> References: <1394849469300-4029053.post@n3.nabble.com> <5326CB8A.3060404@redhat.com> <1395055438949-4029059.post@n3.nabble.com> <1395058494270-4029060.post@n3.nabble.com> Message-ID: <5326EAF9.9020408@redhat.com> On 17/03/2014 13:14, cotton-ben wrote: > Tristan, Does the ISPN7 API (or config) have a FluentBuilder mechanism via > which Cache instance A can be bound to DataContainer A and Cache instance B > can be bound to DataContainer B? Thx, Ben & Dmitry you would do ConfigurationBuilder a = new ConfigurationBuilder(); a.dataContainer().dataContainer(new OffHeapContainer()); ConfigurationBuilder b = new ConfigurationBuilder(); b.dataContainer().dataContainer(new DefaultDataContainer()); Tristan From mudokonman at gmail.com Mon Mar 17 09:30:00 2014 From: mudokonman at gmail.com (William Burns) Date: Mon, 17 Mar 2014 09:30:00 -0400 Subject: [infinispan-dev] New API to iterate over current entries in cache Message-ID: While working on ISPN-4068 to add the current state to listeners that were added I found that what I essentially needed was a way to iterate over the entries of the cache. I am thinking of adding this to the public API available on the AdvancedCache interface. I wanted to get your guy's opinions if you don't think we should add it or any changes you might suggest. My thought was to add 2 overloaded methods: Iterator> entryIterator(KeyValueFilter filter, Converter converter); and Iterator> entryIterator(KeyValueFilter filter); The method would return almost immediately after invocation and the iterator would queue entries and block as entries are required to be returned. The filter and converter are applied on each of the remote nodes and are required to be serializable or have an externalizer registered. Internally the iterator would use chunking to help prevent memory saturation. The max memory usage would be (chunkSize * N) + local entries where N is the number of nodes. These methods would be different than other methods on the Cache/AdvancedCache in the following things: 1. This operation is treated as nontx and thus won't store them into the context and thus repeatable read semantics would not be guaranteed. This doesn't preclude manually adding values to the context. Also prior writes in the current context would be ignored (current data returned), although this could be changed if desired. 2. Values are not activated from loaders and visited listeners would not be notified of access. The latter could be sensibly changed if desired. - Will From rvansa at redhat.com Mon Mar 17 10:45:55 2014 From: rvansa at redhat.com (Radim Vansa) Date: Mon, 17 Mar 2014 15:45:55 +0100 Subject: [infinispan-dev] New API to iterate over current entries in cache In-Reply-To: References: Message-ID: <53270AA3.30702@redhat.com> Why listeners are not invoked? JCache iterator() notifies the listeners. Can the iterator remove entries? I assume there's no ordering guarantee, but behaviour under concurrent changes would be rather tricky. I don't like the idea of iterating over changing structure. If you need to iterate through all entries, I'd rather introduce the snapshot ability and then iterate over the snapshot. Then, you wouldn't have to mess with tx and introduce non-tx operation on tx cache. My 2c Radim On 03/17/2014 02:30 PM, William Burns wrote: > While working on ISPN-4068 to add the current state to listeners that > were added I found that what I essentially needed was a way to iterate > over the entries of the cache. I am thinking of adding this to the > public API available on the AdvancedCache interface. > > I wanted to get your guy's opinions if you don't think we should add > it or any changes you might suggest. > > My thought was to add 2 overloaded methods: > > Iterator> entryIterator(KeyValueFilter ? super V> filter, Converter converter); > > and > > Iterator> entryIterator(KeyValueFilter super V> filter); > > The method would return almost immediately after invocation and the > iterator would queue entries and block as entries are required to be > returned. The filter and converter are applied on each of the remote > nodes and are required to be serializable or have an externalizer > registered. > > Internally the iterator would use chunking to help prevent memory > saturation. The max memory usage would be (chunkSize * N) + local > entries where N is the number of nodes. > > These methods would be different than other methods on the > Cache/AdvancedCache in the following things: > > 1. This operation is treated as nontx and thus won't store them into > the context and thus repeatable read semantics would not be > guaranteed. This doesn't preclude manually adding values to the > context. Also prior writes in the current context would be ignored > (current data returned), although this could be changed if desired. > 2. Values are not activated from loaders and visited listeners would > not be notified of access. The latter could be sensibly changed if > desired. > > - Will > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Radim Vansa JBoss DataGrid QA From vblagoje at redhat.com Mon Mar 17 10:58:55 2014 From: vblagoje at redhat.com (Vladimir Blagojevic) Date: Mon, 17 Mar 2014 10:58:55 -0400 Subject: [infinispan-dev] Intermediate cache in M/R API Message-ID: <53270DAF.6090702@redhat.com> Guys, We need some input on how to design API regarding use of intermediate caches [1]. As you might know one of the requirements for improving our M/R is allowing applications to use custom defined intermediate key/value cache used to store keys/values of map/combine phase before being reduced in reduced phase. Currently we have a constructor where one can specify whether to use shared or per-task intermediate cache. And now we wanted to add an additional method: usingIntermediateCache(String cacheName, String cacheConfigurationName); that will enable use of custom intermediate cache. Now, Dan, and rightly so, thought this was a bit confusing. Are we referring to intermediate shared or per-task intermediate cache when using the above mentioned method. His proposal is touse a per-task intermediate cache with our default specified intermediate cache configuration. Remove the constructor parameter in MapReduceTask regarding shared or non shared cache and add configuration methods for both caches: usingIntermediateCache(String configName) - use a per-task intermediate cache with the given configuration usingSharedIntermediateCache(String cache) - use a shared cache with our default configuration usingSharedIntermediateCache(String cache, String configName) - use a shared cache with the given configuration Note that we need a name for shared cache because we want to enable application to easily remove/inspect that cache after all m/r tasks sharing that intermediate cache have been executed. What are your thoughts here? Vladimir [1] https://issues.jboss.org/browse/ISPN-4021 From mudokonman at gmail.com Mon Mar 17 11:02:55 2014 From: mudokonman at gmail.com (William Burns) Date: Mon, 17 Mar 2014 11:02:55 -0400 Subject: [infinispan-dev] New API to iterate over current entries in cache In-Reply-To: <53270AA3.30702@redhat.com> References: <53270AA3.30702@redhat.com> Message-ID: On Mon, Mar 17, 2014 at 10:45 AM, Radim Vansa wrote: > Why listeners are not invoked? JCache iterator() notifies the listeners. Like I mentioned this can be changed. However, I have not seen a cache entry visitor listener in JCache. The only listeners I am aware of are for created, removed, expired and updated which wouldn't be affected by this. > > Can the iterator remove entries? Sorry I forgot to mention this, but no it would not be able to remove with what I was planning. This could be added easily, however this would be problematic for when using repeatable read. Would the remove take part of the transaction and if so it causes an inconsistency since we aren't storing all the values that were read. I figured it would be easier to just not support it and if the user wants they can simply call cache.remove(key) which this would do anyways. > > I assume there's no ordering guarantee, but behaviour under concurrent > changes would be rather tricky. I don't like the idea of iterating over > changing structure. There are no ordering of elements. This would have guarantees more similar to the ConcurrentMap entrySet iteration in that you are guaranteed to see a value that was valid for the given key that may or may not be the most up to date. This would work similarly to how our Cache works now currently though. > > If you need to iterate through all entries, I'd rather introduce the > snapshot ability and then iterate over the snapshot. Then, you wouldn't > have to mess with tx and introduce non-tx operation on tx cache. That would definitely be helpful, but we don't have support for snapshot atm. The way this is implemented currently though it would be pretty simple to add an option for this though, as all you would need to do is pass along the version of the snapshot in the request command. The reason I was saying not to support this for tx right now, is because of repeatable read, there is no way we can hold all the values of the cache in the current context. > > My 2c > > Radim > > On 03/17/2014 02:30 PM, William Burns wrote: >> While working on ISPN-4068 to add the current state to listeners that >> were added I found that what I essentially needed was a way to iterate >> over the entries of the cache. I am thinking of adding this to the >> public API available on the AdvancedCache interface. >> >> I wanted to get your guy's opinions if you don't think we should add >> it or any changes you might suggest. >> >> My thought was to add 2 overloaded methods: >> >> Iterator> entryIterator(KeyValueFilter> ? super V> filter, Converter converter); >> >> and >> >> Iterator> entryIterator(KeyValueFilter> super V> filter); >> >> The method would return almost immediately after invocation and the >> iterator would queue entries and block as entries are required to be >> returned. The filter and converter are applied on each of the remote >> nodes and are required to be serializable or have an externalizer >> registered. >> >> Internally the iterator would use chunking to help prevent memory >> saturation. The max memory usage would be (chunkSize * N) + local >> entries where N is the number of nodes. >> >> These methods would be different than other methods on the >> Cache/AdvancedCache in the following things: >> >> 1. This operation is treated as nontx and thus won't store them into >> the context and thus repeatable read semantics would not be >> guaranteed. This doesn't preclude manually adding values to the >> context. Also prior writes in the current context would be ignored >> (current data returned), although this could be changed if desired. >> 2. Values are not activated from loaders and visited listeners would >> not be notified of access. The latter could be sensibly changed if >> desired. >> >> - Will >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Radim Vansa > JBoss DataGrid QA > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From emmanuel at hibernate.org Mon Mar 17 11:31:34 2014 From: emmanuel at hibernate.org (Emmanuel Bernard) Date: Mon, 17 Mar 2014 16:31:34 +0100 Subject: [infinispan-dev] Infinispan - Hadoop integration In-Reply-To: References: <2D07BCCF-1230-4EF2-BBCC-17F67B203B86@gmail.com> <75211F34-A043-4F1F-9C48-D9457C20DC04@redhat.com> <3D492D0D-3581-4B53-B272-89F93BD6FAAA@redhat.com> <3F02541E-F385-4B04-8654-054E4483397F@hibernate.org> Message-ID: <31AE8FF0-5ED0-426F-9112-24509AA14CA0@hibernate.org> Got it now. That being said, if Alan is correct (one JVM per M/R task run per node), we will need to implement C/S local key and keyset lookup. Emmanuel On 14 Mar 2014, at 12:34, Sanne Grinovero wrote: > On 14 March 2014 09:06, Emmanuel Bernard wrote: >> >> >>> On 13 mars 2014, at 23:39, Sanne Grinovero wrote: >>> >>>> On 13 March 2014 22:19, Mircea Markus wrote: >>>> >>>>> On Mar 13, 2014, at 22:17, Sanne Grinovero wrote: >>>>> >>>>>> On 13 March 2014 22:05, Mircea Markus wrote: >>>>>> >>>>>> On Mar 13, 2014, at 20:59, Ales Justin wrote: >>>>>> >>>>>>>> - also important to notice that we will have both an Hadoop and an Infinispan cluster running in parallel: the user will interact with the former in order to run M/R tasks. Hadoop will use Infinispan (integration achieved through InputFormat and OutputFormat ) in order to get the data to be processed. >>>>>>> >>>>>>> Would this be 2 JVMs, or you can trick Hadoop to start Infinispan as well -- hence 1JVM? >>>>>> >>>>>> good point, ideally it should be a single VM: reduced serialization cost (in vm access) and simpler architecture. That's if you're not using C/S mode, of course. >>>>> >>>>> ? >>>>> Don't try confusing us again on that :-) >>>>> I think we agreed that the job would *always* run in strict locality >>>>> with the datacontainer (i.e. in the same JVM). Sure, an Hadoop client >>>>> would be connecting from somewhere else but that's unrelated. >>>> >>>> we did discuss the possibility of running it over hotrod though, do you see a problem with that? >>> >>> No of course not, we discussed that. I just mean I think that needs to >>> be clarified on the list that the Hadoop engine will always run in the >>> same JVM. Clients (be it Hot Rod via new custom commands or Hadoop >>> native clients, or Hadoop clients over Hot Rod) can indeed connect >>> remotely, but it's important to clarify that the processing itself >>> will take advantage of locality in all configurations. In other words, >>> to clarify that the serialization cost you mention for clients is just >>> to transfer the job definition and optionally the final processing >>> result. >>> >> >> Not quite. The serialization cost Mircea mentions I think is between the Hadoop vm and the Infinispan vm on a single node. The serialization does not require network traffic but is still shuffling data between two processes basically. We could eliminate this by starting both Hadoop and Infinispan from the same VM but that requires more work than necessary for a prototype. > > Ok so there was indeed confusion on terminology: I don't agree with that design. >> From an implementor's effort perspective having to setup an Hot Rod > client rather than embedding an Infinispan node is approximately the > same work, or slightly more as you have to start both. Also to test > it, embedded mode it easier. > > Hot Rod is not meant to be used on the same node, especially not if > you only want to access data in strict locality; for example it > wouldn't be able to iterated on all keys of the current server node > (and limiting to those keys only). I might be wrong as I'm not too > familiar with Hot Rod, but I think it might not even be able to > iterate on keys at all; maybe today it can actually via some trick, > but the point is this is a conceptual mismatch for it. > > Where you say this doesn't require nework traffic you need to consider > that while it's true this might not be using the physical network wire > being localhost, it would still be transferred over a costly network > stream, as we don't do off-heap buffer sharing yet. > >> So to clarify, we will have a cluster of nodes where each node contains two JVM, one running an Hadoop process, one running an Infinispan process. The Hadoop process would only read the data from the Infinispan process in the same node during a normal M/R execution. > > So we discussed two use cases: > - engage Infinispan to accelerate an existing Hadoop deployment > - engage Hadoop to run an Hadoop job on existing data in Infinispan > In neither case I see why I'd run them in separate JVMs: seems less > effective and more work to get done, and no benefit unless you're > thinking about independent JVM tuning? That might be something to > consider, but I doubt tuning independence would ever offset the cost > of serialized transfer of each entry. > > The second use case could be used via Hot Rod too, but that's a > different discussion, actually just a nice side effect of Hadoop being > language agnostic that we would take advantage of. > > Sanne > >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140317/45e4560e/attachment.html From afield at redhat.com Mon Mar 17 12:06:40 2014 From: afield at redhat.com (Alan Field) Date: Mon, 17 Mar 2014 12:06:40 -0400 (EDT) Subject: [infinispan-dev] Infinispan - Hadoop integration In-Reply-To: <31AE8FF0-5ED0-426F-9112-24509AA14CA0@hibernate.org> References: <75211F34-A043-4F1F-9C48-D9457C20DC04@redhat.com> <3D492D0D-3581-4B53-B272-89F93BD6FAAA@redhat.com> <3F02541E-F385-4B04-8654-054E4483397F@hibernate.org> <31AE8FF0-5ED0-426F-9112-24509AA14CA0@hibernate.org> Message-ID: <913344291.565084.1395072400107.JavaMail.zimbra@redhat.com> For Map/Reduce v1 this is definitely the case: https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html#Task+Execution+%26+Environment " The TaskTracker executes the Mapper / Reducer task as a child process in a separate jvm. " I believe this is also the case for Map/Reduce v2, but I haven't found a definitive reference in the docs yet. YARN is architected to split resource management and job scheduling/monitoring into different pieces, but I think task execution is the same as MRv1. Thanks, Alan ----- Original Message ----- > From: "Emmanuel Bernard" > To: "infinispan -Dev List" > Sent: Monday, March 17, 2014 11:31:34 AM > Subject: Re: [infinispan-dev] Infinispan - Hadoop integration > Got it now. > That being said, if Alan is correct (one JVM per M/R task run per node), we > will need to implement C/S local key and keyset lookup. > Emmanuel > On 14 Mar 2014, at 12:34, Sanne Grinovero < sanne at infinispan.org > wrote: > > On 14 March 2014 09:06, Emmanuel Bernard < emmanuel at hibernate.org > wrote: > > > > > On 13 mars 2014, at 23:39, Sanne Grinovero < sanne at infinispan.org > > > > > wrote: > > > > > > > > > > > On 13 March 2014 22:19, Mircea Markus < mmarkus at redhat.com > wrote: > > > > > > > > > > > > > > > > On Mar 13, 2014, at 22:17, Sanne Grinovero < sanne at infinispan.org > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > On 13 March 2014 22:05, Mircea Markus < mmarkus at redhat.com > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mar 13, 2014, at 20:59, Ales Justin < ales.justin at gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - also important to notice that we will have both an Hadoop > > > > > > > > > and > > > > > > > > > an > > > > > > > > > Infinispan > > > > > > > > > cluster running in parallel: the user will interact with the > > > > > > > > > former > > > > > > > > > in > > > > > > > > > order > > > > > > > > > to run M/R tasks. Hadoop will use Infinispan (integration > > > > > > > > > achieved > > > > > > > > > through > > > > > > > > > InputFormat and OutputFormat ) in order to get the data to be > > > > > > > > > processed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Would this be 2 JVMs, or you can trick Hadoop to start > > > > > > > > Infinispan > > > > > > > > as > > > > > > > > well > > > > > > > > -- > > > > > > > > hence 1JVM? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > good point, ideally it should be a single VM: reduced > > > > > > > serialization > > > > > > > cost > > > > > > > (in > > > > > > > vm access) and simpler architecture. That's if you're not using > > > > > > > C/S > > > > > > > mode, > > > > > > > of > > > > > > > course. > > > > > > > > > > > > > > > > > > > > > > > > > > > ? > > > > > > > > > > > > > > > > > > > > > Don't try confusing us again on that :-) > > > > > > > > > > > > > > > > > > > > > I think we agreed that the job would *always* run in strict > > > > > > locality > > > > > > > > > > > > > > > > > > > > > with the datacontainer (i.e. in the same JVM). Sure, an Hadoop > > > > > > client > > > > > > > > > > > > > > > > > > > > > would be connecting from somewhere else but that's unrelated. > > > > > > > > > > > > > > > > > > > > we did discuss the possibility of running it over hotrod though, do > > > > > you > > > > > see > > > > > a > > > > > problem with that? > > > > > > > > > > > > > > No of course not, we discussed that. I just mean I think that needs to > > > > > > > > > > be clarified on the list that the Hadoop engine will always run in the > > > > > > > > > > same JVM. Clients (be it Hot Rod via new custom commands or Hadoop > > > > > > > > > > native clients, or Hadoop clients over Hot Rod) can indeed connect > > > > > > > > > > remotely, but it's important to clarify that the processing itself > > > > > > > > > > will take advantage of locality in all configurations. In other words, > > > > > > > > > > to clarify that the serialization cost you mention for clients is just > > > > > > > > > > to transfer the job definition and optionally the final processing > > > > > > > > > > result. > > > > > > > > > Not quite. The serialization cost Mircea mentions I think is between the > > > Hadoop vm and the Infinispan vm on a single node. The serialization does > > > not > > > require network traffic but is still shuffling data between two processes > > > basically. We could eliminate this by starting both Hadoop and Infinispan > > > from the same VM but that requires more work than necessary for a > > > prototype. > > > > > Ok so there was indeed confusion on terminology: I don't agree with that > > design. > > > > From an implementor's effort perspective having to setup an Hot Rod > > > > > client rather than embedding an Infinispan node is approximately the > > > same work, or slightly more as you have to start both. Also to test > > > it, embedded mode it easier. > > > Hot Rod is not meant to be used on the same node, especially not if > > > you only want to access data in strict locality; for example it > > > wouldn't be able to iterated on all keys of the current server node > > > (and limiting to those keys only). I might be wrong as I'm not too > > > familiar with Hot Rod, but I think it might not even be able to > > > iterate on keys at all; maybe today it can actually via some trick, > > > but the point is this is a conceptual mismatch for it. > > > Where you say this doesn't require nework traffic you need to consider > > > that while it's true this might not be using the physical network wire > > > being localhost, it would still be transferred over a costly network > > > stream, as we don't do off-heap buffer sharing yet. > > > > So to clarify, we will have a cluster of nodes where each node contains > > > two > > > JVM, one running an Hadoop process, one running an Infinispan process. > > > The > > > Hadoop process would only read the data from the Infinispan process in > > > the > > > same node during a normal M/R execution. > > > > > So we discussed two use cases: > > > - engage Infinispan to accelerate an existing Hadoop deployment > > > - engage Hadoop to run an Hadoop job on existing data in Infinispan > > > In neither case I see why I'd run them in separate JVMs: seems less > > > effective and more work to get done, and no benefit unless you're > > > thinking about independent JVM tuning? That might be something to > > > consider, but I doubt tuning independence would ever offset the cost > > > of serialized transfer of each entry. > > > The second use case could be used via Hot Rod too, but that's a > > > different discussion, actually just a nice side effect of Hadoop being > > > language agnostic that we would take advantage of. > > > Sanne > > > > _______________________________________________ > > > > > > infinispan-dev mailing list > > > > > > infinispan-dev at lists.jboss.org > > > > > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > _______________________________________________ > > > infinispan-dev mailing list > > > infinispan-dev at lists.jboss.org > > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140317/9e280355/attachment-0001.html From gustavonalle at gmail.com Mon Mar 17 12:49:17 2014 From: gustavonalle at gmail.com (Gustavo Fernandes) Date: Mon, 17 Mar 2014 16:49:17 +0000 Subject: [infinispan-dev] Infinispan - Hadoop integration In-Reply-To: <913344291.565084.1395072400107.JavaMail.zimbra@redhat.com> References: <75211F34-A043-4F1F-9C48-D9457C20DC04@redhat.com> <3D492D0D-3581-4B53-B272-89F93BD6FAAA@redhat.com> <3F02541E-F385-4B04-8654-054E4483397F@hibernate.org> <31AE8FF0-5ED0-426F-9112-24509AA14CA0@hibernate.org> <913344291.565084.1395072400107.JavaMail.zimbra@redhat.com> Message-ID: Yes, the M/R v1 by default launches one *new* JVM per task, so during the execution of a certain Job, at a given moment in a node there could be dozens of JVMs running in parallel, that will be destroyed when the task (map or reduce) finishes. It is possible to instruct the map reduce system to reuse the same JVM for several map or reduce tasks: this is interesting when map tasks executes in a matter of seconds and the overhead of creating, warming up and destroying a JVM becomes significant. But even in this case, there will be 'n' JVM running where 'n' is the task capacity of the node. The difference is that they are recycled. In Yarn the behaviour is similar, the YarnChild runs in a separate JVM and it's possible to cause some reuse setting the property "mapreduce.job. ubertask.enable" Apart from all those transient tasks JVMs, there will more long running JVMs in each node which is the TaskTracker (who accepts tasks and send status to the global jobtracker)and If HDFS is used, there will be one extra JVM per node (DataNode) plus one or two Namenode global processes. Hadoop is very fond of JVMs. Cheers, Gustavo On Mon, Mar 17, 2014 at 4:06 PM, Alan Field wrote: > For Map/Reduce v1 this is definitely the case: > > > https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html#Task+Execution+%26+Environment > > "The TaskTracker executes the Mapper/ Reducer *task* as a child process > in a separate jvm." > > I believe this is also the case for Map/Reduce v2, but I haven't found a > definitive reference in the docs yet. YARN is architected to split resource > management and job scheduling/monitoring into different pieces, but I think > task execution is the same as MRv1. > > Thanks, > Alan > > ------------------------------ > > *From: *"Emmanuel Bernard" > *To: *"infinispan -Dev List" > *Sent: *Monday, March 17, 2014 11:31:34 AM > *Subject: *Re: [infinispan-dev] Infinispan - Hadoop integration > > > Got it now. > That being said, if Alan is correct (one JVM per M/R task run per node), > we will need to implement C/S local key and keyset lookup. > > Emmanuel > > On 14 Mar 2014, at 12:34, Sanne Grinovero wrote: > > On 14 March 2014 09:06, Emmanuel Bernard wrote: > > > > On 13 mars 2014, at 23:39, Sanne Grinovero wrote: > > On 13 March 2014 22:19, Mircea Markus wrote: > > On Mar 13, 2014, at 22:17, Sanne Grinovero wrote: > > On 13 March 2014 22:05, Mircea Markus wrote: > > On Mar 13, 2014, at 20:59, Ales Justin wrote: > > - also important to notice that we will have both an Hadoop and an > Infinispan cluster running in parallel: the user will interact with the > former in order to run M/R tasks. Hadoop will use Infinispan (integration > achieved through InputFormat and OutputFormat ) in order to get the data to > be processed. > > > Would this be 2 JVMs, or you can trick Hadoop to start Infinispan as well > -- hence 1JVM? > > > good point, ideally it should be a single VM: reduced serialization cost > (in vm access) and simpler architecture. That's if you're not using C/S > mode, of course. > > > ? > Don't try confusing us again on that :-) > I think we agreed that the job would *always* run in strict locality > with the datacontainer (i.e. in the same JVM). Sure, an Hadoop client > would be connecting from somewhere else but that's unrelated. > > > we did discuss the possibility of running it over hotrod though, do you > see a problem with that? > > > No of course not, we discussed that. I just mean I think that needs to > be clarified on the list that the Hadoop engine will always run in the > same JVM. Clients (be it Hot Rod via new custom commands or Hadoop > native clients, or Hadoop clients over Hot Rod) can indeed connect > remotely, but it's important to clarify that the processing itself > will take advantage of locality in all configurations. In other words, > to clarify that the serialization cost you mention for clients is just > to transfer the job definition and optionally the final processing > result. > > > Not quite. The serialization cost Mircea mentions I think is between the > Hadoop vm and the Infinispan vm on a single node. The serialization does > not require network traffic but is still shuffling data between two > processes basically. We could eliminate this by starting both Hadoop and > Infinispan from the same VM but that requires more work than necessary for > a prototype. > > > Ok so there was indeed confusion on terminology: I don't agree with that > design. > > From an implementor's effort perspective having to setup an Hot Rod > > client rather than embedding an Infinispan node is approximately the > same work, or slightly more as you have to start both. Also to test > it, embedded mode it easier. > > Hot Rod is not meant to be used on the same node, especially not if > you only want to access data in strict locality; for example it > wouldn't be able to iterated on all keys of the current server node > (and limiting to those keys only). I might be wrong as I'm not too > familiar with Hot Rod, but I think it might not even be able to > iterate on keys at all; maybe today it can actually via some trick, > but the point is this is a conceptual mismatch for it. > > Where you say this doesn't require nework traffic you need to consider > that while it's true this might not be using the physical network wire > being localhost, it would still be transferred over a costly network > stream, as we don't do off-heap buffer sharing yet. > > So to clarify, we will have a cluster of nodes where each node contains > two JVM, one running an Hadoop process, one running an Infinispan process. > The Hadoop process would only read the data from the Infinispan process in > the same node during a normal M/R execution. > > > So we discussed two use cases: > - engage Infinispan to accelerate an existing Hadoop deployment > - engage Hadoop to run an Hadoop job on existing data in Infinispan > In neither case I see why I'd run them in separate JVMs: seems less > effective and more work to get done, and no benefit unless you're > thinking about independent JVM tuning? That might be something to > consider, but I doubt tuning independence would ever offset the cost > of serialized transfer of each entry. > > The second use case could be used via Hot Rod too, but that's a > different discussion, actually just a nice side effect of Hadoop being > language agnostic that we would take advantage of. > > Sanne > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140317/06a6c05b/attachment.html From emmanuel at hibernate.org Mon Mar 17 12:54:24 2014 From: emmanuel at hibernate.org (Emmanuel Bernard) Date: Mon, 17 Mar 2014 17:54:24 +0100 Subject: [infinispan-dev] Infinispan - Hadoop integration In-Reply-To: References: <75211F34-A043-4F1F-9C48-D9457C20DC04@redhat.com> <3D492D0D-3581-4B53-B272-89F93BD6FAAA@redhat.com> <3F02541E-F385-4B04-8654-054E4483397F@hibernate.org> <31AE8FF0-5ED0-426F-9112-24509AA14CA0@hibernate.org> <913344291.565084.1395072400107.JavaMail.zimbra@redhat.com> Message-ID: Well that?s only a guestimate but if I had to put a number, this approach is going to be shit slow +- 10% compared to what Infinispan M/R does (despite all of its limitations). We can do the proto but at some stage we might want to take over and replace some of that JVM spawn logic. Especially for the use case where the grid is used in parallel to the Hadoop M/R. On 17 Mar 2014, at 17:49, Gustavo Fernandes wrote: > Yes, the M/R v1 by default launches one *new* JVM per task, so during the execution of a certain Job, at a given moment in a node there could be dozens of JVMs running in parallel, that will be destroyed when the task (map or reduce) finishes. It is possible to instruct the map reduce system to reuse the same JVM for several map or reduce tasks: this is interesting when map tasks executes in a matter of seconds and the overhead of creating, warming up and destroying a JVM becomes significant. But even in this case, there will be 'n' JVM running where 'n' is the task capacity of the node. The difference is that they are recycled. > > In Yarn the behaviour is similar, the YarnChild runs in a separate JVM and it's possible to cause some reuse setting the property "mapreduce.job.ubertask.enable" > > Apart from all those transient tasks JVMs, there will more long running JVMs in each node which is the TaskTracker (who accepts tasks and send status to the global jobtracker)and If HDFS is used, there will be one extra JVM per node (DataNode) plus one or two Namenode global processes. > > Hadoop is very fond of JVMs. > > Cheers, > Gustavo > > > On Mon, Mar 17, 2014 at 4:06 PM, Alan Field wrote: > For Map/Reduce v1 this is definitely the case: > > https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html#Task+Execution+%26+Environment > > "The TaskTracker executes the Mapper/ Reducer task as a child process in a separate jvm." > > I believe this is also the case for Map/Reduce v2, but I haven't found a definitive reference in the docs yet. YARN is architected to split resource management and job scheduling/monitoring into different pieces, but I think task execution is the same as MRv1. > > Thanks, > Alan > > From: "Emmanuel Bernard" > To: "infinispan -Dev List" > Sent: Monday, March 17, 2014 11:31:34 AM > Subject: Re: [infinispan-dev] Infinispan - Hadoop integration > > > Got it now. > That being said, if Alan is correct (one JVM per M/R task run per node), we will need to implement C/S local key and keyset lookup. > > Emmanuel > > On 14 Mar 2014, at 12:34, Sanne Grinovero wrote: > > On 14 March 2014 09:06, Emmanuel Bernard wrote: > > > On 13 mars 2014, at 23:39, Sanne Grinovero wrote: > > On 13 March 2014 22:19, Mircea Markus wrote: > > On Mar 13, 2014, at 22:17, Sanne Grinovero wrote: > > On 13 March 2014 22:05, Mircea Markus wrote: > > On Mar 13, 2014, at 20:59, Ales Justin wrote: > > - also important to notice that we will have both an Hadoop and an Infinispan cluster running in parallel: the user will interact with the former in order to run M/R tasks. Hadoop will use Infinispan (integration achieved through InputFormat and OutputFormat ) in order to get the data to be processed. > > Would this be 2 JVMs, or you can trick Hadoop to start Infinispan as well -- hence 1JVM? > > good point, ideally it should be a single VM: reduced serialization cost (in vm access) and simpler architecture. That's if you're not using C/S mode, of course. > > ? > Don't try confusing us again on that :-) > I think we agreed that the job would *always* run in strict locality > with the datacontainer (i.e. in the same JVM). Sure, an Hadoop client > would be connecting from somewhere else but that's unrelated. > > we did discuss the possibility of running it over hotrod though, do you see a problem with that? > > No of course not, we discussed that. I just mean I think that needs to > be clarified on the list that the Hadoop engine will always run in the > same JVM. Clients (be it Hot Rod via new custom commands or Hadoop > native clients, or Hadoop clients over Hot Rod) can indeed connect > remotely, but it's important to clarify that the processing itself > will take advantage of locality in all configurations. In other words, > to clarify that the serialization cost you mention for clients is just > to transfer the job definition and optionally the final processing > result. > > > Not quite. The serialization cost Mircea mentions I think is between the Hadoop vm and the Infinispan vm on a single node. The serialization does not require network traffic but is still shuffling data between two processes basically. We could eliminate this by starting both Hadoop and Infinispan from the same VM but that requires more work than necessary for a prototype. > > Ok so there was indeed confusion on terminology: I don't agree with that design. > From an implementor's effort perspective having to setup an Hot Rod > client rather than embedding an Infinispan node is approximately the > same work, or slightly more as you have to start both. Also to test > it, embedded mode it easier. > > Hot Rod is not meant to be used on the same node, especially not if > you only want to access data in strict locality; for example it > wouldn't be able to iterated on all keys of the current server node > (and limiting to those keys only). I might be wrong as I'm not too > familiar with Hot Rod, but I think it might not even be able to > iterate on keys at all; maybe today it can actually via some trick, > but the point is this is a conceptual mismatch for it. > > Where you say this doesn't require nework traffic you need to consider > that while it's true this might not be using the physical network wire > being localhost, it would still be transferred over a costly network > stream, as we don't do off-heap buffer sharing yet. > > So to clarify, we will have a cluster of nodes where each node contains two JVM, one running an Hadoop process, one running an Infinispan process. The Hadoop process would only read the data from the Infinispan process in the same node during a normal M/R execution. > > So we discussed two use cases: > - engage Infinispan to accelerate an existing Hadoop deployment > - engage Hadoop to run an Hadoop job on existing data in Infinispan > In neither case I see why I'd run them in separate JVMs: seems less > effective and more work to get done, and no benefit unless you're > thinking about independent JVM tuning? That might be something to > consider, but I doubt tuning independence would ever offset the cost > of serialized transfer of each entry. > > The second use case could be used via Hot Rod too, but that's a > different discussion, actually just a nice side effect of Hadoop being > language agnostic that we would take advantage of. > > Sanne > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140317/1ac44d98/attachment-0001.html From sanne at infinispan.org Mon Mar 17 13:43:37 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Mon, 17 Mar 2014 17:43:37 +0000 Subject: [infinispan-dev] New API to iterate over current entries in cache In-Reply-To: References: <53270AA3.30702@redhat.com> Message-ID: I do realize you need such a feature, still as I pointed out when we first drafted it I'm skeptical because of the complexities you mention. It highly depends on what use cases we want to address, but as a general purpose solution I think this "initial state" received by the clients needs to be accurate, which implies that providing similar iteration guarantees of ConcurrentMap is not fit for the purpose. In the context of the Haddop integration discussions, the need for a fully consistent iteration was also mentioned; in that specific case we're confident that the feature would still be quite useful even if we assume that the user needs to somehow assume the state "shall not be changed" while the map/reduce jobs make progress. Probably not something that you should be working on in the context of this specific feature, but I'm getting convinced that we should also work on "read consistent iteration" which works on a globally consistent snaphost, could be implemented on TOA I guess. Another use case which comes to mind, is the recent discussion about the need for a consistent and accurate "count" implementation. Note the Snapshot capability would need to take into account CacheStores. -- Sanne On 17 March 2014 15:02, William Burns wrote: > On Mon, Mar 17, 2014 at 10:45 AM, Radim Vansa wrote: >> Why listeners are not invoked? JCache iterator() notifies the listeners. > > Like I mentioned this can be changed. However, I have not seen a > cache entry visitor listener in JCache. The only listeners I am aware > of are for created, removed, expired and updated which wouldn't be > affected by this. > >> >> Can the iterator remove entries? > > Sorry I forgot to mention this, but no it would not be able to remove > with what I was planning. This could be added easily, however this > would be problematic for when using repeatable read. Would the remove > take part of the transaction and if so it causes an inconsistency > since we aren't storing all the values that were read. I figured it > would be easier to just not support it and if the user wants they can > simply call cache.remove(key) which this would do anyways. > >> >> I assume there's no ordering guarantee, but behaviour under concurrent >> changes would be rather tricky. I don't like the idea of iterating over >> changing structure. > > There are no ordering of elements. This would have guarantees more > similar to the ConcurrentMap entrySet iteration in that you are > guaranteed to see a value that was valid for the given key that may or > may not be the most up to date. This would work similarly to how our > Cache works now currently though. > >> >> If you need to iterate through all entries, I'd rather introduce the >> snapshot ability and then iterate over the snapshot. Then, you wouldn't >> have to mess with tx and introduce non-tx operation on tx cache. > > That would definitely be helpful, but we don't have support for > snapshot atm. The way this is implemented currently though it would > be pretty simple to add an option for this though, as all you would > need to do is pass along the version of the snapshot in the request > command. > > The reason I was saying not to support this for tx right now, is > because of repeatable read, there is no way we can hold all the values > of the cache in the current context. > >> >> My 2c >> >> Radim >> >> On 03/17/2014 02:30 PM, William Burns wrote: >>> While working on ISPN-4068 to add the current state to listeners that >>> were added I found that what I essentially needed was a way to iterate >>> over the entries of the cache. I am thinking of adding this to the >>> public API available on the AdvancedCache interface. >>> >>> I wanted to get your guy's opinions if you don't think we should add >>> it or any changes you might suggest. >>> >>> My thought was to add 2 overloaded methods: >>> >>> Iterator> entryIterator(KeyValueFilter>> ? super V> filter, Converter converter); >>> >>> and >>> >>> Iterator> entryIterator(KeyValueFilter>> super V> filter); >>> >>> The method would return almost immediately after invocation and the >>> iterator would queue entries and block as entries are required to be >>> returned. The filter and converter are applied on each of the remote >>> nodes and are required to be serializable or have an externalizer >>> registered. >>> >>> Internally the iterator would use chunking to help prevent memory >>> saturation. The max memory usage would be (chunkSize * N) + local >>> entries where N is the number of nodes. >>> >>> These methods would be different than other methods on the >>> Cache/AdvancedCache in the following things: >>> >>> 1. This operation is treated as nontx and thus won't store them into >>> the context and thus repeatable read semantics would not be >>> guaranteed. This doesn't preclude manually adding values to the >>> context. Also prior writes in the current context would be ignored >>> (current data returned), although this could be changed if desired. >>> 2. Values are not activated from loaders and visited listeners would >>> not be notified of access. The latter could be sensibly changed if >>> desired. >>> >>> - Will >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> -- >> Radim Vansa >> JBoss DataGrid QA >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From mudokonman at gmail.com Mon Mar 17 13:57:59 2014 From: mudokonman at gmail.com (William Burns) Date: Mon, 17 Mar 2014 13:57:59 -0400 Subject: [infinispan-dev] New API to iterate over current entries in cache In-Reply-To: References: <53270AA3.30702@redhat.com> Message-ID: On Mon, Mar 17, 2014 at 1:43 PM, Sanne Grinovero wrote: > I do realize you need such a feature, still as I pointed out when we > first drafted it I'm skeptical because of the complexities you > mention. > > It highly depends on what use cases we want to address, but as a > general purpose solution I think this "initial state" received by the > clients needs to be accurate, which implies that providing similar > iteration guarantees of ConcurrentMap is not fit for the purpose. Definitely, this feature alone is not enough for the listeners to be consistent enough. This is why I had to implement additional queueing on the listener node to prevent concurrent events from being raised and only raise them after the state has been applied for each key respectively. Unfortunately without some sort of snapshot the only consistency guarantee we can have is by key. > > In the context of the Haddop integration discussions, the need for a > fully consistent iteration was also mentioned; in that specific case > we're confident that the feature would still be quite useful even if > we assume that the user needs to somehow assume the state "shall not > be changed" while the map/reduce jobs make progress. > > Probably not something that you should be working on in the context of > this specific feature, but I'm getting convinced that we should also > work on "read consistent iteration" which works on a globally > consistent snaphost, could be implemented on TOA I guess. > > Another use case which comes to mind, is the recent discussion about > the need for a consistent and accurate "count" implementation. > > Note the Snapshot capability would need to take into account CacheStores. I agree, unfortunately that would make the stores quite a bit more complex then they are currently. > > -- Sanne > > > > > On 17 March 2014 15:02, William Burns wrote: >> On Mon, Mar 17, 2014 at 10:45 AM, Radim Vansa wrote: >>> Why listeners are not invoked? JCache iterator() notifies the listeners. >> >> Like I mentioned this can be changed. However, I have not seen a >> cache entry visitor listener in JCache. The only listeners I am aware >> of are for created, removed, expired and updated which wouldn't be >> affected by this. >> >>> >>> Can the iterator remove entries? >> >> Sorry I forgot to mention this, but no it would not be able to remove >> with what I was planning. This could be added easily, however this >> would be problematic for when using repeatable read. Would the remove >> take part of the transaction and if so it causes an inconsistency >> since we aren't storing all the values that were read. I figured it >> would be easier to just not support it and if the user wants they can >> simply call cache.remove(key) which this would do anyways. >> >>> >>> I assume there's no ordering guarantee, but behaviour under concurrent >>> changes would be rather tricky. I don't like the idea of iterating over >>> changing structure. >> >> There are no ordering of elements. This would have guarantees more >> similar to the ConcurrentMap entrySet iteration in that you are >> guaranteed to see a value that was valid for the given key that may or >> may not be the most up to date. This would work similarly to how our >> Cache works now currently though. >> >>> >>> If you need to iterate through all entries, I'd rather introduce the >>> snapshot ability and then iterate over the snapshot. Then, you wouldn't >>> have to mess with tx and introduce non-tx operation on tx cache. >> >> That would definitely be helpful, but we don't have support for >> snapshot atm. The way this is implemented currently though it would >> be pretty simple to add an option for this though, as all you would >> need to do is pass along the version of the snapshot in the request >> command. >> >> The reason I was saying not to support this for tx right now, is >> because of repeatable read, there is no way we can hold all the values >> of the cache in the current context. >> >>> >>> My 2c >>> >>> Radim >>> >>> On 03/17/2014 02:30 PM, William Burns wrote: >>>> While working on ISPN-4068 to add the current state to listeners that >>>> were added I found that what I essentially needed was a way to iterate >>>> over the entries of the cache. I am thinking of adding this to the >>>> public API available on the AdvancedCache interface. >>>> >>>> I wanted to get your guy's opinions if you don't think we should add >>>> it or any changes you might suggest. >>>> >>>> My thought was to add 2 overloaded methods: >>>> >>>> Iterator> entryIterator(KeyValueFilter>>> ? super V> filter, Converter converter); >>>> >>>> and >>>> >>>> Iterator> entryIterator(KeyValueFilter>>> super V> filter); >>>> >>>> The method would return almost immediately after invocation and the >>>> iterator would queue entries and block as entries are required to be >>>> returned. The filter and converter are applied on each of the remote >>>> nodes and are required to be serializable or have an externalizer >>>> registered. >>>> >>>> Internally the iterator would use chunking to help prevent memory >>>> saturation. The max memory usage would be (chunkSize * N) + local >>>> entries where N is the number of nodes. >>>> >>>> These methods would be different than other methods on the >>>> Cache/AdvancedCache in the following things: >>>> >>>> 1. This operation is treated as nontx and thus won't store them into >>>> the context and thus repeatable read semantics would not be >>>> guaranteed. This doesn't preclude manually adding values to the >>>> context. Also prior writes in the current context would be ignored >>>> (current data returned), although this could be changed if desired. >>>> 2. Values are not activated from loaders and visited listeners would >>>> not be notified of access. The latter could be sensibly changed if >>>> desired. >>>> >>>> - Will >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> >>> -- >>> Radim Vansa >>> JBoss DataGrid QA >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From rvansa at redhat.com Tue Mar 18 03:58:15 2014 From: rvansa at redhat.com (Radim Vansa) Date: Tue, 18 Mar 2014 08:58:15 +0100 Subject: [infinispan-dev] Intermediate cache in M/R API In-Reply-To: <53270DAF.6090702@redhat.com> References: <53270DAF.6090702@redhat.com> Message-ID: <5327FC97.4030701@redhat.com> What does exactly the configName refer to? Some cache name as it appears in configuration, and then the intermediate cache will copy that configuration and create new custom cache? That means, that configured cache will never be instantiated? Why don't you use Configuration directly instead, if you want to specify configuration? Radim On 03/17/2014 03:58 PM, Vladimir Blagojevic wrote: > Guys, > > We need some input on how to design API regarding use of intermediate > caches [1]. As you might know one of the requirements for improving our > M/R is allowing applications to use custom defined intermediate > key/value cache used to store keys/values of map/combine phase before > being reduced in reduced phase. > > > Currently we have a constructor where one can specify whether to use > shared or per-task intermediate cache. And now we wanted to add an > additional method: > > usingIntermediateCache(String cacheName, String cacheConfigurationName); > > that will enable use of custom intermediate cache. > > Now, Dan, and rightly so, thought this was a bit confusing. Are we > referring to intermediate shared or per-task intermediate cache when > using the above mentioned method. > > His proposal is touse a per-task intermediate cache with our default > specified intermediate cache configuration. Remove the constructor > parameter in MapReduceTask regarding shared or non shared cache and add > configuration methods for both caches: > > > usingIntermediateCache(String configName) - use a per-task > intermediate cache with the given configuration > usingSharedIntermediateCache(String cache) - use a shared cache > with our default configuration > usingSharedIntermediateCache(String cache, String configName) - use > a shared cache with the given configuration > > > Note that we need a name for shared cache because we want to enable > application to easily remove/inspect that cache after all m/r tasks > sharing that intermediate cache have been executed. > > What are your thoughts here? > > Vladimir > > > > [1] https://issues.jboss.org/browse/ISPN-4021 > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Radim Vansa JBoss DataGrid QA From ygokirmak at gmail.com Tue Mar 18 04:23:00 2014 From: ygokirmak at gmail.com (yavuz gokirmak) Date: Tue, 18 Mar 2014 10:23:00 +0200 Subject: [infinispan-dev] event processing integration In-Reply-To: <5326D5D0.4010900@redhat.com> References: <5326D5D0.4010900@redhat.com> Message-ID: Hi All, it will interested to give cep capabilities to infinispan caches, I have some comments, On 17 March 2014 13:00, Jonathan Halliday wrote: > > Alongside recent talk of integrating infinispan with hadoop batch > processing, there has been some discussion of using the data grid > alongside an event stream processing system. > > There are several directions we could consider here. In approximate > order of increasing complexity these are: > > - Allow bi-directional flow of events, such that listeners on the cache > can be used to cause events in the processing engine, or events in the > processing engine can update the cache. > To catch events from cache, I propose to develop a simple infinispanSource for flume ( http://flume.apache.org ), using this infinispanSource, one can listen any cache for updates or inserts and redirect this events to either a cep engine or other destination. Updating cache will be similar, we may have a infinispanSink for flume and if any application that needs to update any cache via sending events, he can use infinispanSink in its application. Actually, developing such flume components we will have a change data capture tool ( http://en.wikipedia.org/wiki/Change_data_capture ) for infinispan. CDC tools are vital for complex event processing integrations and I think this will be a good starting point. > > - Allow the cache to be used to hold lookup data for reference from user > code running the processing engine, to speed up joining streamed events > to what would otherwise be data tables on disk. > > Actually it is important to cache some rdms table into memory in such systems and sync this cache periodically from rdms table to be up-to-date. I think this requirement can be achived via infinispan's "cache loader"s . > - Integrate with the processing engine itself, such that infinispan can > be used to store items that would otherwise occupy precious RAM. This > one is probably only viable with the cooperation of the stream > processing system, so I'll base further discussion on Drools Fusion. > > The engine uses memory for a) rules, i.e. processing logic. Some of this > is infrequently accessed. Think of a decision tree in which some > branches are traversed more than others. So, opportunities to swap bits > out to cache perhaps. b) state, particularly sliding windows. Again > some data is infrequently accessed. For many sliding window calculations > in particular (e.g. running average), only the head and tail of the > window are actually used. The events in-between can be swapped out. > Holding states are the most important case. In this requirement off-heap cache will be a must. ( Ben Coton is implementing Peter Lawrey's hugemaps into infinispan for off-heap cache you may know ) > > Of course these integrations require the stream processing engine to be > written to support such operations - careful handling of object > references is needed. Currently the engine doesn't work that way - > everything is focussed on speed at the expense of memory. > > - Borrow some ideas from the event processing DSLs, such that the data > grid query engine can independently support continuous (standing) > queries rather than just one-off queries. Arguably this is reinventing > the wheel, but for simple use cases it may be preferable to run the > stream processing logic directly in the grid rather than deploying a > dedicated event stream processing system. I think it's probably going to > require supporting lists as a first class construct alongside maps > though. There are various cludges possible here, including the brute > force approach of faking continuous query by re-executing a one-off > query on each mutation, but they tend to be inefficient. There is also > the thorny problem of supporting a (potentially distributed) clock, > since a lot of use cases need to reference the passage of time in the > query e.g. 'send event to listener if avg in last N minutes > x'. > > > > regards Yavuz G?k?rmak - tr.linkedin.com/pub/yavuz-gokirmak/20/a11/23b/ Jonathan Halliday > Core developer, JBoss. > > -- > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham (USA), Paul Hickey (Ireland), Matt Parson > (USA), Charlie Peters (USA) > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140318/bfc7dc61/attachment-0001.html From dan.berindei at gmail.com Tue Mar 18 06:41:58 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Tue, 18 Mar 2014 12:41:58 +0200 Subject: [infinispan-dev] Intermediate cache in M/R API In-Reply-To: <5327FC97.4030701@redhat.com> References: <53270DAF.6090702@redhat.com> <5327FC97.4030701@redhat.com> Message-ID: On Tue, Mar 18, 2014 at 9:58 AM, Radim Vansa wrote: > What does exactly the configName refer to? Some cache name as it appears > in configuration, and then the intermediate cache will copy that > configuration and create new custom cache? That means, that configured > cache will never be instantiated? > Yes. Only usingSharedIntermediateCache( String cache) uses the same cache name and config name. > > Why don't you use Configuration directly instead, if you want to specify > configuration? > Two reasons: 1. The Configuration classes are not serializable. 2. Some users might prefer to define the configuration via XML. Dan > > Radim > > On 03/17/2014 03:58 PM, Vladimir Blagojevic wrote: > > Guys, > > > > We need some input on how to design API regarding use of intermediate > > caches [1]. As you might know one of the requirements for improving our > > M/R is allowing applications to use custom defined intermediate > > key/value cache used to store keys/values of map/combine phase before > > being reduced in reduced phase. > > > > > > Currently we have a constructor where one can specify whether to use > > shared or per-task intermediate cache. And now we wanted to add an > > additional method: > > > > usingIntermediateCache(String cacheName, String cacheConfigurationName); > > > > that will enable use of custom intermediate cache. > > > > Now, Dan, and rightly so, thought this was a bit confusing. Are we > > referring to intermediate shared or per-task intermediate cache when > > using the above mentioned method. > > > > His proposal is touse a per-task intermediate cache with our default > > specified intermediate cache configuration. Remove the constructor > > parameter in MapReduceTask regarding shared or non shared cache and add > > configuration methods for both caches: > > > > > > usingIntermediateCache(String configName) - use a per-task > > intermediate cache with the given configuration > > usingSharedIntermediateCache(String cache) - use a shared cache > > with our default configuration > > usingSharedIntermediateCache(String cache, String configName) - use > > a shared cache with the given configuration > > > > > > Note that we need a name for shared cache because we want to enable > > application to easily remove/inspect that cache after all m/r tasks > > sharing that intermediate cache have been executed. > > > > What are your thoughts here? > > > > Vladimir > > > > > > > > [1] https://issues.jboss.org/browse/ISPN-4021 > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Radim Vansa > JBoss DataGrid QA > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140318/282660e0/attachment.html From mmarkus at redhat.com Tue Mar 18 11:20:02 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Tue, 18 Mar 2014 15:20:02 +0000 Subject: [infinispan-dev] dropping the Avro dependency from the Java HotRod client Message-ID: <396DE169-F88D-4245-AE21-E99F024FE3B3@redhat.com> Hi Galder, Now that we have protobuf/protostream in place, any reason to keep the avro dependecy/marshaller around anymore, or shall we just stick to proto? Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From galder at redhat.com Wed Mar 19 03:07:46 2014 From: galder at redhat.com (=?iso-8859-1?Q?Galder_Zamarre=F1o?=) Date: Wed, 19 Mar 2014 08:07:46 +0100 Subject: [infinispan-dev] Infinispan 6.0.2.Final includes ASL2-licensed JBoss Marshalling Message-ID: Hi, Infinispan 6.0.2.Final is out addressing a licensing issue we were having with JBoss Marshalling: http://blog.infinispan.org/2014/03/infinispan-602final-includes-asl2.html Cheers, -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From sanne at infinispan.org Wed Mar 19 06:02:04 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Wed, 19 Mar 2014 10:02:04 +0000 Subject: [infinispan-dev] Infinispan 6.0.2.Final includes ASL2-licensed JBoss Marshalling In-Reply-To: References: Message-ID: Nice, thanks! I wouldn't classify it as an "issue" however ;-) We have plenty of other such dependencies. On 19 March 2014 07:07, Galder Zamarre?o wrote: > Hi, > > Infinispan 6.0.2.Final is out addressing a licensing issue we were having with JBoss Marshalling: > http://blog.infinispan.org/2014/03/infinispan-602final-includes-asl2.html > > Cheers, > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz > > Project Lead, Escalante > http://escalante.io > > Engineer, Infinispan > http://infinispan.org > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From rory.odonnell at oracle.com Wed Mar 19 06:34:05 2014 From: rory.odonnell at oracle.com (Rory O'Donnell Oracle, Dublin Ireland) Date: Wed, 19 Mar 2014 10:34:05 +0000 Subject: [infinispan-dev] JDK 8 Update 20 build 05 & JDK 7 Update 60 build 10 are available on java.net In-Reply-To: <532971B5.1010207@oracle.com> References: <532971B5.1010207@oracle.com> Message-ID: <5329729D.2090707@oracle.com> Hi Galder, Mark Reinhold announced yesterday JDK 8: General Availability here JDK 8 u20 Build 05 Early Access Build is now available for download & test. JDK 7 u60 Build 10 Early Access Build is also available for download & test. Rgds, Rory -- Rgds,Rory O'Donnell Quality Engineering Manager Oracle EMEA , Dublin, Ireland -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140319/12e18585/attachment.html From galder at redhat.com Wed Mar 19 06:38:10 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Wed, 19 Mar 2014 11:38:10 +0100 Subject: [infinispan-dev] Deprecating parent dependencies & migrating to JUnit In-Reply-To: References: Message-ID: <74E2DD2B-AA33-4690-9B42-94AF4B8B8739@redhat.com> I haven?t got any objections. This is already the case with arquillian tests, which rely on junit. Cheers, On 14 Mar 2014, at 16:55, Sanne Grinovero wrote: > TLDR: I'm banning TestNG from the Query modules, and this will affect > the parent poms. > -- > > I needed to run some quick experiment rounds in Query, but *some* > tests fail, and I simply want to know which ones to evaluate some > approaches. > > But I can't get to know the full list easily, as failing a test in our > current testing framework apparently implies you'll be leaking > resources, and the suite slowly grinds to a slow death running out of > resources. > > The problem in this, is when I need a quick evaluation which is > blocking other people in other teams, embarking in such witch hunting > is not an option; while I'm co-responsible of some of the embarassing > ugliness in our test's resource handling, I do primarily blame to > tooling rather than ourselves: especially TestNG's ridiculously > unreliable handling of shutdown hooks - and have enough of it. > Compared to that, the resource handling of JUnit is extremely handy > and reliable: requires a bit more coding as it provides less features > than TestNG, but can be relied on. > I can't really afford spending time to migrate all tests to JUnit > today, but I will start doing so in smaller iterations. > > Now one problem is that TestNG, and several other dependencies, are > included by default in the parent classpath. Having a set of default > dependencies - even just in provided or test mode - forcefully > provided to all modules has created some other trouble too; for > example I really don't want the slf4j logger in my classpath as it has > hidden problems in classpath structure of our own modules. > > ## -> Would you all be ok if I move the dependencies to the dependency > management section? > (You can all use TestNG in your own modules if you want, but please > allow me to opt out) > > That implies each module will have to explicitly pick what it needs; I > consider that a good thing as explicit is good, and getting utilities > on classpath from out-of-the-blue is confusing. For example, moving > just one dependency yesterday I could figure out that the migration to > Netty4 was actually incomplete and some tests where still using > Netty3. I consider that a proof of a more healthy structure. > > Sanne > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From rory.odonnell at oracle.com Wed Mar 19 07:08:20 2014 From: rory.odonnell at oracle.com (Rory O'Donnell Oracle, Dublin Ireland) Date: Wed, 19 Mar 2014 11:08:20 +0000 Subject: [infinispan-dev] JDK 8 Update 20 build 05 & JDK 7 Update 60 build 10 are available on java.net In-Reply-To: <5329729D.2090707@oracle.com> References: <532971B5.1010207@oracle.com> <5329729D.2090707@oracle.com> Message-ID: <53297AA4.8050908@oracle.com> Hi Galder, I forgot to mention the issue you logged has been fixed in b05. https://bugs.openjdk.java.net/browse/JDK-8036554 Can you confirm all is ok ? Rgds,Rory On 19/03/2014 10:34, Rory O'Donnell Oracle, Dublin Ireland wrote: > Hi Galder, > > Mark Reinhold announced yesterday JDK 8: General Availability here > > > JDK 8 u20 Build 05 Early Access > Build is now available for download > & test. > JDK 7 u60 Build 10 Early Access Build is also available for download > & test. > > Rgds, Rory > -- > Rgds,Rory O'Donnell > Quality Engineering Manager > Oracle EMEA , Dublin, Ireland -- Rgds,Rory O'Donnell Quality Engineering Manager Oracle EMEA , Dublin, Ireland -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140319/163bc89d/attachment-0001.html From galder at redhat.com Wed Mar 19 08:17:07 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Wed, 19 Mar 2014 13:17:07 +0100 Subject: [infinispan-dev] OSX users, upgrade to JDK8u20 Message-ID: <439D7BBC-CC29-42EE-A71B-AEA73F2F1295@redhat.com> Hi, Some OSX users had issues running Hot Rod code due a JDK bug [1]. This has been fixed in JDK8u20 for which there?s an early access build in [2]. Cheers, [1] https://issues.jboss.org/browse/ISPN-3896 [2] https://jdk8.java.net/download.html -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From galder at redhat.com Wed Mar 19 09:36:37 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Wed, 19 Mar 2014 14:36:37 +0100 Subject: [infinispan-dev] Disabling unstable tests: push on master directly? In-Reply-To: References: Message-ID: <42A020B8-F10A-442B-A4E0-26A1BFB66382@redhat.com> I?d push directly. On 14 Mar 2014, at 18:00, Sanne Grinovero wrote: > I'm blocked as usual on some core tests failing. > Since I've found evidence that it's not "just me", I'm using the new > cute annotation to mark them as unstable. > > Would we all be good for me to push such findings - when I'm sure - > directly to master skipping the Pull Request process? > > My reasoning is that: > - these are urgent as they block other work and rise FUD among other > occasional contributors > - you seem overwhelmed with PRs which deserve actual attention > - worst case you can revert: these commits can't go unnoticed as there > are JIRAs created to track these, so essentially it's not skipping > peer review but postponing it to a more suitable point in time. > > Good idea? > > Cheers, > Sanne > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From sanne at infinispan.org Wed Mar 19 09:46:39 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Wed, 19 Mar 2014 13:46:39 +0000 Subject: [infinispan-dev] Disabling unstable tests: push on master directly? In-Reply-To: <42A020B8-F10A-442B-A4E0-26A1BFB66382@redhat.com> References: <42A020B8-F10A-442B-A4E0-26A1BFB66382@redhat.com> Message-ID: On 19 March 2014 13:36, Galder Zamarre?o wrote: > I?d push directly. Thanks, good to see some opinion :-) I've been doing it since some days. > > On 14 Mar 2014, at 18:00, Sanne Grinovero wrote: > >> I'm blocked as usual on some core tests failing. >> Since I've found evidence that it's not "just me", I'm using the new >> cute annotation to mark them as unstable. >> >> Would we all be good for me to push such findings - when I'm sure - >> directly to master skipping the Pull Request process? >> >> My reasoning is that: >> - these are urgent as they block other work and rise FUD among other >> occasional contributors >> - you seem overwhelmed with PRs which deserve actual attention >> - worst case you can revert: these commits can't go unnoticed as there >> are JIRAs created to track these, so essentially it's not skipping >> peer review but postponing it to a more suitable point in time. >> >> Good idea? >> >> Cheers, >> Sanne >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz > > Project Lead, Escalante > http://escalante.io > > Engineer, Infinispan > http://infinispan.org > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From sanne at infinispan.org Thu Mar 20 06:56:11 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Thu, 20 Mar 2014 10:56:11 +0000 Subject: [infinispan-dev] Cerealization protocols Message-ID: I just heard about "Cerealization". Looks like tasty: http://kentonv.github.io/capnproto/ From anistor at redhat.com Thu Mar 20 12:48:13 2014 From: anistor at redhat.com (Adrian Nistor) Date: Thu, 20 Mar 2014 18:48:13 +0200 Subject: [infinispan-dev] Cerealization protocols In-Reply-To: References: Message-ID: <532B1BCD.7040003@redhat.com> "Think JSON, except binary. Or think Protocol Buffers , except faster. In fact, in benchmarks, Cap?n Proto is INFINITY TIMES faster than Protocol Buffers." L.O.L. On 03/20/2014 12:56 PM, Sanne Grinovero wrote: > I just heard about "Cerealization". Looks like tasty: > > http://kentonv.github.io/capnproto/ > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140320/fc6f0421/attachment.html From galder at redhat.com Fri Mar 21 09:46:45 2014 From: galder at redhat.com (=?iso-8859-1?Q?Galder_Zamarre=F1o?=) Date: Fri, 21 Mar 2014 14:46:45 +0100 Subject: [infinispan-dev] dropping the Avro dependency from the Java HotRod client In-Reply-To: <396DE169-F88D-4245-AE21-E99F024FE3B3@redhat.com> References: <396DE169-F88D-4245-AE21-E99F024FE3B3@redhat.com> Message-ID: <02C9E49D-1395-474A-882F-7B0274EB9E94@redhat.com> Avro can be removed. On 18 Mar 2014, at 16:20, Mircea Markus wrote: > Hi Galder, > > Now that we have protobuf/protostream in place, any reason to keep the avro dependecy/marshaller around anymore, or shall we just stick to proto? > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From ttarrant at redhat.com Mon Mar 24 12:53:23 2014 From: ttarrant at redhat.com (Tristan Tarrant) Date: Mon, 24 Mar 2014 17:53:23 +0100 Subject: [infinispan-dev] Weekly IRC meeting minutes Message-ID: <53306303.4090304@redhat.com> Hi all, the minutes for this week's IRC meeting are available at: http://transcripts.jboss.org/meeting/irc.freenode.org/infinispan/2014/infinispan.2014-03-24-14.58.html Cheers Tristan From sanne at infinispan.org Mon Mar 24 19:43:19 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Mon, 24 Mar 2014 23:43:19 +0000 Subject: [infinispan-dev] Next release? Message-ID: On Hibernate Search we're blocked by the need of Infinispan to publish an Alpha2; I need this to move on with Infinispan Query. It would be great if we could get a release this week? TiA, Sanne From sanne at infinispan.org Mon Mar 24 19:47:04 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Mon, 24 Mar 2014 23:47:04 +0000 Subject: [infinispan-dev] Why no JGroups 3.5.0.Beta1 yet? Message-ID: I'm wondering what the plans are around updating JGroups. I'd like to update Search to use the latest JGroups 3.5.0.Beta1, but: - no good for us to strive ahead of Infinispan as we need to test them all aligned - there's an "interesting" situation around JGRP-1808: doesn't work on a Mac unless you reconfigure your system for proper multicast routes I'm hoping someone who cares about it to work on Mac to take ownership of it, as it doesn't affect me but it's quite annoying for other contributors. There are many interesting performance improvements in this release, so I'm surprised it wasn't eagerly adopted. Sanne From isavin at redhat.com Tue Mar 25 13:26:33 2014 From: isavin at redhat.com (Ion Savin) Date: Tue, 25 Mar 2014 19:26:33 +0200 Subject: [infinispan-dev] ci test failures - MemcachedClusteredStatsTest EmbeddedRestMemcachedHotRodTest Message-ID: <5331BC49.5040808@redhat.com> Hi Galder, Planning to cut Alpha2 tomorrow evening and there are two failing tests on jdk7 (jdk6 seems OK with the same changes). http://ci.infinispan.org/viewLog.html?buildId=7283&tab=buildResultsDiv&buildTypeId=bt8#testNameId-3650806209082815564 http://ci.infinispan.org/viewLog.html?buildId=7283&tab=buildResultsDiv&buildTypeId=bt8#testNameId-7927103100511017767 Can you have a look? Thanks Regards, Ion Savin From ales.justin at gmail.com Wed Mar 26 07:25:41 2014 From: ales.justin at gmail.com (Ales Justin) Date: Wed, 26 Mar 2014 12:25:41 +0100 Subject: [infinispan-dev] ispn + quartz Message-ID: <32807188-6200-4B08-9CAA-D9B9CB2DAB42@gmail.com> Was there ever any attempt to bridge Infinispan and Quartz? * http://quartz-scheduler.org/documentation/faq#FAQ-clustering As I'll probably need it for this: * https://developers.google.com/appengine/docs/java/config/cron If nothing exists, I'll have a crack at it. -Ales -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140326/93679906/attachment-0001.html From galder at redhat.com Wed Mar 26 06:26:07 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Wed, 26 Mar 2014 11:26:07 +0100 Subject: [infinispan-dev] Infinspan configuration Distributed Node with distributed index and HotRod usage In-Reply-To: References: Message-ID: <4D65CE8C-7EF1-4FBF-801F-52E3B8CB8799@redhat.com> Hi Guillaume, I?m looking at your project and not sure I understand what you are trying to do. If your queries are limited to put/get calls, indexing the contents in the Hot Rod server is probably pointless because it just stores binary data, unless you have remote querying enabled, in which it uses protobuf stuff to index data (Adrian Nistor can provide more details). Unless you are doing remote querying (http://infinispan.org/docs/6.0.x/user_guide/user_guide.html#_querying_via_the_java_hot_rod_client) with complex queries, I?d disable any indexing. In either case, Hot Rod protocol is non transactional, so normally i?d start disabling transactions. Now, without transactions, if there?s some lag, you should inspect the usual suspects: GC logs, thread dumps,?etc and see whether anything pops up. Cheers, On 17 Mar 2014, at 11:45, Guillaume terral wrote: > Hello Infinispan Dev Team, > > > > After several days of testing and research with Sanne Grinovero, he asked me to send an email to this mailing list to expose you my issue with Infinispan. > > > I'm actually working on a project to build an infinispan cluster with the following requirement: > > > - Clustered nodes in Distributed mode > > - Usage of HotRod to send and query data from the cluster > > - Low latency for read and write > > - Distributed lucene index > > - JGroups back-end > > > I posted the configuration i have for my cluster and for the hot rod client on a github under: > > > https://github.com/guillaumeterral/infinispan-test > > > You will see a little readme explaining how to use this java project. > > > I faced some issues along my development, > > > What i noticed is: > > > If i remove the line .transaction().transactionMode(TransactionMode.NON_TRANSACTIONAL), I have good latencies showed in the statistics MBean for averageWritingTime but I get an unstable cluster with the following issues: > > > > - lock issue, one node lock the index and try to write in it, at the same time another node is trying to lock the index, this behavior lead to a lock timeout. > > - send message timeout, some node timeout to send message to others, i was not able to understand the root code. > > - compound file not found, this appears only if I use compound file (default behavior) I think those errors are due to previous ones. > > > But with a non transactional cluster i get a pretty stable cluster but with really bad latencies (~ 1.5s) showed in the cluster and in the GUI client i wrote, this client is not present of GitHub I can push a console client on demand if you want. > > > We tried to investigate the issue with Sanne, but across IRC it's not that easy. > > > Could you provide me some help/advice, we are really motivated to use Infinispan and to give you some feedback on it, especially the HotRod client/server component. > > > We plan to test Infinispan 7.0 and provide you some feedback when we will complete our requirements on Infinispan 6.0 cluster. > > > You can join me at this mail adress at any time, or contact me on IRC (#infinispan) my nickname is guillaume__ , i'm almost always connected on it. > > > Thanks in advance, > > > > Guillaume > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From belaran at gmail.com Wed Mar 26 07:37:01 2014 From: belaran at gmail.com (Romain Pelisse) Date: Wed, 26 Mar 2014 12:37:01 +0100 Subject: [infinispan-dev] ispn + quartz In-Reply-To: <32807188-6200-4B08-9CAA-D9B9CB2DAB42@gmail.com> References: <32807188-6200-4B08-9CAA-D9B9CB2DAB42@gmail.com> Message-ID: Hi Ales, I'm not bridging Quartz with ISPN, I'm currently using ISPN to synchronise Quartz task (instead of clustering Quartz), mostly to simplify a customer app (ie just need to tweak the ISPN cluster, nothing else). Out of curiosity, can you comment a bit about why/how you want to bridge Quartz with ISPN ? On 26 March 2014 12:25, Ales Justin wrote: > Was there ever any attempt to bridge Infinispan and Quartz? > * http://quartz-scheduler.org/documentation/faq#FAQ-clustering > > As I'll probably need it for this: > * https://developers.google.com/appengine/docs/java/config/cron > > If nothing exists, I'll have a crack at it. > > -Ales > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -- Romain PELISSE, *"The trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it" -- Terry Pratchett* Belaran ins Prussia (blog) (... finally up and running !) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140326/769a987d/attachment.html From ales.justin at gmail.com Wed Mar 26 08:35:48 2014 From: ales.justin at gmail.com (Ales Justin) Date: Wed, 26 Mar 2014 13:35:48 +0100 Subject: [infinispan-dev] ispn + quartz In-Reply-To: References: <32807188-6200-4B08-9CAA-D9B9CB2DAB42@gmail.com> Message-ID: > I'm not bridging Quartz with ISPN, I'm currently using ISPN to synchronise Quartz task (instead of clustering Quartz), mostly to simplify a customer app (ie just need to tweak the ISPN cluster, nothing else). How do you do that? By executing a job on entry eviction? > Out of curiosity, can you comment a bit about why/how you want to bridge Quartz with ISPN ? Wrt how, no idea yet. :-) It's been a while since I used Quartz. My requirement is to have a single job per trigger execution across all nodes. e.g. only one node should handle particular job -Ales > On 26 March 2014 12:25, Ales Justin wrote: > Was there ever any attempt to bridge Infinispan and Quartz? > * http://quartz-scheduler.org/documentation/faq#FAQ-clustering > > As I'll probably need it for this: > * https://developers.google.com/appengine/docs/java/config/cron > > If nothing exists, I'll have a crack at it. > > -Ales > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > -- > Romain PELISSE, > "The trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it" -- Terry Pratchett > Belaran ins Prussia (blog) (... finally up and running !) > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140326/023dc2bb/attachment.html From guillaume.terral at gmail.com Wed Mar 26 09:20:52 2014 From: guillaume.terral at gmail.com (Guillaume terral) Date: Wed, 26 Mar 2014 14:20:52 +0100 Subject: [infinispan-dev] Infinspan configuration Distributed Node with distributed index and HotRod usage In-Reply-To: <4D65CE8C-7EF1-4FBF-801F-52E3B8CB8799@redhat.com> References: <4D65CE8C-7EF1-4FBF-801F-52E3B8CB8799@redhat.com> Message-ID: Hello Galder, We are not limited to put/get calls, we are using complex remote queries to retrieve specific subset of data (sort by date, containing field = %test%), this kind of request. We use protobuf to serialize/ deserialize data into the infinispan cache. I will try to disable transaction and give you some information on what happend. Thanks Guillaume 2014-03-26 11:26 GMT+01:00 Galder Zamarre?o : > Hi Guillaume, > > I'm looking at your project and not sure I understand what you are trying > to do. > > If your queries are limited to put/get calls, indexing the contents in the > Hot Rod server is probably pointless because it just stores binary data, > unless you have remote querying enabled, in which it uses protobuf stuff to > index data (Adrian Nistor can provide more details). Unless you are doing > remote querying ( > http://infinispan.org/docs/6.0.x/user_guide/user_guide.html#_querying_via_the_java_hot_rod_client) > with complex queries, I'd disable any indexing. > > In either case, Hot Rod protocol is non transactional, so normally i'd > start disabling transactions. Now, without transactions, if there's some > lag, you should inspect the usual suspects: GC logs, thread dumps,...etc and > see whether anything pops up. > > Cheers, > > On 17 Mar 2014, at 11:45, Guillaume terral > wrote: > > > Hello Infinispan Dev Team, > > > > > > > > After several days of testing and research with Sanne Grinovero, he > asked me to send an email to this mailing list to expose you my issue with > Infinispan. > > > > > > I'm actually working on a project to build an infinispan cluster with > the following requirement: > > > > > > - Clustered nodes in Distributed mode > > > > - Usage of HotRod to send and query data from the cluster > > > > - Low latency for read and write > > > > - Distributed lucene index > > > > - JGroups back-end > > > > > > I posted the configuration i have for my cluster and for the hot rod > client on a github under: > > > > > > https://github.com/guillaumeterral/infinispan-test > > > > > > You will see a little readme explaining how to use this java project. > > > > > > I faced some issues along my development, > > > > > > What i noticed is: > > > > > > If i remove the line > .transaction().transactionMode(TransactionMode.NON_TRANSACTIONAL), I have > good latencies showed in the statistics MBean for averageWritingTime but I > get an unstable cluster with the following issues: > > > > > > > > - lock issue, one node lock the index and try to write in it, at the > same time another node is trying to lock the index, this behavior lead to a > lock timeout. > > > > - send message timeout, some node timeout to send message to others, i > was not able to understand the root code. > > > > - compound file not found, this appears only if I use compound file > (default behavior) I think those errors are due to previous ones. > > > > > > But with a non transactional cluster i get a pretty stable cluster but > with really bad latencies (~ 1.5s) showed in the cluster and in the GUI > client i wrote, this client is not present of GitHub I can push a console > client on demand if you want. > > > > > > We tried to investigate the issue with Sanne, but across IRC it's not > that easy. > > > > > > Could you provide me some help/advice, we are really motivated to use > Infinispan and to give you some feedback on it, especially the HotRod > client/server component. > > > > > > We plan to test Infinispan 7.0 and provide you some feedback when we > will complete our requirements on Infinispan 6.0 cluster. > > > > > > You can join me at this mail adress at any time, or contact me on IRC > (#infinispan) my nickname is guillaume__ , i'm almost always connected on > it. > > > > > > Thanks in advance, > > > > > > > > Guillaume > > > > > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz > > Project Lead, Escalante > http://escalante.io > > Engineer, Infinispan > http://infinispan.org > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140326/e6c66f53/attachment-0001.html From paul.ferraro at redhat.com Wed Mar 26 09:57:03 2014 From: paul.ferraro at redhat.com (Paul Ferraro) Date: Wed, 26 Mar 2014 09:57:03 -0400 Subject: [infinispan-dev] Feature requests for 7.0 Message-ID: <1395842223.4949.12.camel@T520> Hey guys, I have created a number of requests for features that I'd like to be able to leverage for WildFly 9/10. Can the appropriate component owners (which I think is Dan in all cases) comment on the following issues? The following issues prevent WF from leveraging Infinispan expiration: * Expiration events from cache store https://issues.jboss.org/browse/ISPN-3064 * Group-based expiration https://issues.jboss.org/browse/ISPN-2916 Now that Infinispan eviction is safe for use by transactional caches, there remain a few issues complicating the ability for WF to fully leverage the eviction manager for passivation: * Group-based eviction https://issues.jboss.org/browse/ISPN-4132 * Clustered eviction (this one is really only an inconvenience for those of us using manual eviction since I can't use infinispan eviction) https://issues.jboss.org/browse/ISPN-4134 Optimizations: * Enumerate cache keys for group https://issues.jboss.org/browse/ISPN-3900 * Unloadable Key2StringMapper https://issues.jboss.org/browse/ISPN-3979 Thanks, Paul From belaran at gmail.com Wed Mar 26 11:54:33 2014 From: belaran at gmail.com (Romain Pelisse) Date: Wed, 26 Mar 2014 16:54:33 +0100 Subject: [infinispan-dev] ispn + quartz In-Reply-To: References: <32807188-6200-4B08-9CAA-D9B9CB2DAB42@gmail.com> Message-ID: Well, my requirement is exactly the same ! Just that one node (and not all of them) runs a task. So instead of setting up Quartz clustering, we let the Seam/Quartz scheduling fire up task on each node, but the "first node" to get there locks some cache (which is replicated sync. with pessimistic locking), so that the other ones cannot acquire the lock. It's not that elegant, but this remove the need of clustering quartz and make the needs for configuration to the bare minimal (more/less just ispn inside wildfly needs to be configured). The difference between a test env and a pre prod and prod env is only that ISPN is distributed (and the mentioned cache replicated). Implementation wide, I think the customer end up having a static method being call on all Quartz classes, but I guess this could be placed in somekind of interceptor. I can ask the developer involved at the customer to discuss this with you if you want. On 26 March 2014 13:35, Ales Justin wrote: > I'm not bridging Quartz with ISPN, I'm currently using ISPN to synchronise > Quartz task (instead of clustering Quartz), mostly to simplify a customer > app (ie just need to tweak the ISPN cluster, nothing else). > > > How do you do that? > By executing a job on entry eviction? > > Out of curiosity, can you comment a bit about why/how you want to bridge > Quartz with ISPN ? > > > Wrt how, no idea yet. :-) > It's been a while since I used Quartz. > > My requirement is to have a single job per trigger execution across all > nodes. > e.g. only one node should handle particular job > > -Ales > > On 26 March 2014 12:25, Ales Justin wrote: > >> Was there ever any attempt to bridge Infinispan and Quartz? >> * http://quartz-scheduler.org/documentation/faq#FAQ-clustering >> >> As I'll probably need it for this: >> * https://developers.google.com/appengine/docs/java/config/cron >> >> If nothing exists, I'll have a crack at it. >> >> -Ales >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> > > > > -- > Romain PELISSE, > *"The trouble with having an open mind, of course, is that people will > insist on coming along and trying to put things in it" -- Terry Pratchett* > Belaran ins Prussia (blog) (... > finally up and running !) > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -- Romain PELISSE, *"The trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it" -- Terry Pratchett* Belaran ins Prussia (blog) (... finally up and running !) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140326/0d1036aa/attachment.html From bban at redhat.com Wed Mar 26 12:16:21 2014 From: bban at redhat.com (Bela Ban) Date: Wed, 26 Mar 2014 17:16:21 +0100 Subject: [infinispan-dev] ispn + quartz In-Reply-To: References: <32807188-6200-4B08-9CAA-D9B9CB2DAB42@gmail.com> Message-ID: <5332FD55.1040600@redhat.com> Romain, do you really need Infinispan to do that ? Look at [1], IMO it's much more elegant... :-) [1] http://www.jgroups.org/taskdistribution.html On 26/03/14 16:54, Romain Pelisse wrote: > Well, my requirement is exactly the same ! Just that one node (and not > all of them) runs a task. So instead of setting up Quartz clustering, we > let the Seam/Quartz scheduling fire up task on each node, but the "first > node" to get there locks some cache (which is replicated sync. with > pessimistic locking), so that the other ones cannot acquire the lock. > > It's not that elegant, but this remove the need of clustering quartz and > make the needs for configuration to the bare minimal (more/less just > ispn inside wildfly needs to be configured). The difference between a > test env and a pre prod and prod env is only that ISPN is distributed > (and the mentioned cache replicated). > > Implementation wide, I think the customer end up having a static method > being call on all Quartz classes, but I guess this could be placed in > somekind of interceptor. > > I can ask the developer involved at the customer to discuss this with > you if you want. > > > > On 26 March 2014 13:35, Ales Justin > wrote: > >> I'm not bridging Quartz with ISPN, I'm currently using ISPN to >> synchronise Quartz task (instead of clustering Quartz), mostly to >> simplify a customer app (ie just need to tweak the ISPN cluster, >> nothing else). > > How do you do that? > By executing a job on entry eviction? > >> Out of curiosity, can you comment a bit about why/how you want to >> bridge Quartz with ISPN ? > > Wrt how, no idea yet. :-) > It's been a while since I used Quartz. > > My requirement is to have a single job per trigger execution across > all nodes. > e.g. only one node should handle particular job > > -Ales > >> On 26 March 2014 12:25, Ales Justin > > wrote: >> >> Was there ever any attempt to bridge Infinispan and Quartz? >> * http://quartz-scheduler.org/documentation/faq#FAQ-clustering >> >> As I'll probably need it for this: >> * https://developers.google.com/appengine/docs/java/config/cron >> >> If nothing exists, I'll have a crack at it. >> >> -Ales >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> >> >> -- >> Romain PELISSE, >> /"The trouble with having an open mind, of course, is that people >> will insist on coming along and trying to put things in it" -- >> Terry Pratchett/ >> Belaran ins Prussia (blog) >> (... finally up and running !) >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > -- > Romain PELISSE, > /"The trouble with having an open mind, of course, is that people will > insist on coming along and trying to put things in it" -- Terry Pratchett/ > Belaran ins Prussia (blog) (... > finally up and running !) > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -- Bela Ban, JGroups lead (http://www.jgroups.org) From sanne at infinispan.org Wed Mar 26 21:34:39 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Thu, 27 Mar 2014 01:34:39 +0000 Subject: [infinispan-dev] Next release? In-Reply-To: References: Message-ID: Anybody? On 24 March 2014 23:43, Sanne Grinovero wrote: > On Hibernate Search we're blocked by the need of Infinispan to publish > an Alpha2; I need this to move on with Infinispan Query. > It would be great if we could get a release this week? > > TiA, > Sanne From isavin at redhat.com Thu Mar 27 03:17:52 2014 From: isavin at redhat.com (Ion Savin) Date: Thu, 27 Mar 2014 09:17:52 +0200 Subject: [infinispan-dev] Next release? In-Reply-To: References: Message-ID: <5333D0A0.9080202@redhat.com> Hi Sanne, There was still one test failing yesterday evening. Resuming today. Regards, Ion Savin On 03/27/2014 03:34 AM, Sanne Grinovero wrote: > Anybody? > > On 24 March 2014 23:43, Sanne Grinovero wrote: >> On Hibernate Search we're blocked by the need of Infinispan to publish >> an Alpha2; I need this to move on with Infinispan Query. >> It would be great if we could get a release this week? >> >> TiA, >> Sanne > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > From emmanuel at hibernate.org Thu Mar 27 05:28:05 2014 From: emmanuel at hibernate.org (Emmanuel Bernard) Date: Thu, 27 Mar 2014 10:28:05 +0100 Subject: [infinispan-dev] OSGi Message-ID: <3AB8C609-A110-442B-B4E5-733745AFD2DD@hibernate.org> Hey guys, Sanne and Hardy are working on the OSGi-ification of Hibernate Search and it does not come without trouble. Who is leading this effort on the Infinispan side? I recommend you start early in a release cycle because you will have to butcher APIs and packages to do it properly. Worse, you will suffer from you dependencies. Brett has offered to help you but I know for sure he won?t lead it. He would be more like a good expect to talk to. Bilgin has shown a Camel integration prototype but he also seemed to imply that he had some significant problems that needed Infinispan improvements. Also, I?m not quite sure but looking at these Camel routes, they seem to be very URI driven. If we want to support query over a Camel route and express them via a URI, we will need a string based query language. I might be talking nonsense and somehow the query is written in Java. But better anticipate. Bilgin would know more, he has written in his demo CamelInfinispanOperationQuery after all :) Emmanuel From gmeo at cisco.com Thu Mar 27 06:32:37 2014 From: gmeo at cisco.com (Giovanni Meo) Date: Thu, 27 Mar 2014 11:32:37 +0100 Subject: [infinispan-dev] OSGi In-Reply-To: <3AB8C609-A110-442B-B4E5-733745AFD2DD@hibernate.org> References: <3AB8C609-A110-442B-B4E5-733745AFD2DD@hibernate.org> Message-ID: <5333FE45.3080909@cisco.com> Hi Emmanuel and infinispan folks, we have been using Infinispan in OSGi environment in a project called OpenDayLight, if interested you can look at: > https://git.opendaylight.org/gerrit/gitweb?p=controller.git;a=blob;f=opendaylight/clustering/services_implementation/pom.xml;h=d7a3db3841888f3c08e5cf8795aa42cf9cd9b4bc;hb=HEAD Granted we are using a tiny part of the infinispan capabilities, but we found very helpful to first of all define the contract the applications would have with the infinispan. For other issues like the classloading, we just made sure to provider a ClassResolver that always enforce the lookup in the OSGi class loader, and in spite of some initial unreliability things has been doing ok for us. My 2 cents, Giovanni On 27-Mar-14 10:28, Emmanuel Bernard wrote: > Hey guys, > > Sanne and Hardy are working on the OSGi-ification of Hibernate Search and it > does not come without trouble. > > Who is leading this effort on the Infinispan side? I recommend you start > early in a release cycle because you will have to butcher APIs and packages > to do it properly. Worse, you will suffer from you dependencies. > > Brett has offered to help you but I know for sure he won?t lead it. He would > be more like a good expect to talk to. Bilgin has shown a Camel integration > prototype but he also seemed to imply that he had some significant problems > that needed Infinispan improvements. > > Also, I?m not quite sure but looking at these Camel routes, they seem to be > very URI driven. If we want to support query over a Camel route and express > them via a URI, we will need a string based query language. I might be > talking nonsense and somehow the query is written in Java. But better > anticipate. Bilgin would know more, he has written in his demo > CamelInfinispanOperationQuery after all :) > > Emmanuel _______________________________________________ infinispan-dev > mailing list infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -- Giovanni Meo Via del Serafico, 200 Telephone: +390651644000 00142, Roma Mobile: +393480700958 Italia Fax: +390651645917 VOIP: 8-3964000 ?The pessimist complains about the wind; the optimist expects it to change; the realist adjusts the sails.? -- Wm. Arthur Ward IETF credo: "Rough consensus and running code" From sanne at infinispan.org Thu Mar 27 08:14:57 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Thu, 27 Mar 2014 12:14:57 +0000 Subject: [infinispan-dev] Next release? In-Reply-To: <5333D0A0.9080202@redhat.com> References: <5333D0A0.9080202@redhat.com> Message-ID: Thanks Ion, I was wondering :) On 27 March 2014 07:17, Ion Savin wrote: > Hi Sanne, > > There was still one test failing yesterday evening. Resuming today. > > Regards, > Ion Savin > > On 03/27/2014 03:34 AM, Sanne Grinovero wrote: >> Anybody? >> >> On 24 March 2014 23:43, Sanne Grinovero wrote: >>> On Hibernate Search we're blocked by the need of Infinispan to publish >>> an Alpha2; I need this to move on with Infinispan Query. >>> It would be great if we could get a release this week? >>> >>> TiA, >>> Sanne >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From mmarkus at redhat.com Thu Mar 27 08:23:48 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Thu, 27 Mar 2014 12:23:48 +0000 Subject: [infinispan-dev] OSGi In-Reply-To: <3AB8C609-A110-442B-B4E5-733745AFD2DD@hibernate.org> References: <3AB8C609-A110-442B-B4E5-733745AFD2DD@hibernate.org> Message-ID: <050EFA02-456F-4B9D-8401-4D1698CA9CCD@redhat.com> It is Ion on our side, thank you for the heads up. On Mar 27, 2014, at 9:28, Emmanuel Bernard wrote: > Hey guys, > > Sanne and Hardy are working on the OSGi-ification of Hibernate Search and it does not come without trouble. > > Who is leading this effort on the Infinispan side? I recommend you start early in a release cycle because you will have to butcher APIs and packages to do it properly. Worse, you will suffer from you dependencies. > > Brett has offered to help you but I know for sure he won?t lead it. He would be more like a good expect to talk to. > Bilgin has shown a Camel integration prototype but he also seemed to imply that he had some significant problems that needed Infinispan improvements. > > Also, I?m not quite sure but looking at these Camel routes, they seem to be very URI driven. If we want to support query over a Camel route and express them via a URI, we will need a string based query language. I might be talking nonsense and somehow the query is written in Java. But better anticipate. Bilgin would know more, he has written in his demo CamelInfinispanOperationQuery after all :) > > Emmanuel > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From belaran at gmail.com Thu Mar 27 08:39:00 2014 From: belaran at gmail.com (Romain Pelisse) Date: Thu, 27 Mar 2014 13:39:00 +0100 Subject: [infinispan-dev] ispn + quartz In-Reply-To: <5332FD55.1040600@redhat.com> References: <32807188-6200-4B08-9CAA-D9B9CB2DAB42@gmail.com> <5332FD55.1040600@redhat.com> Message-ID: Hi Bela, I said the idea is to limit the configuration stuff to ISPN. Before this change, the clustering of the application required JMS queue to be clustered, along with Quartz. The JMS Queue was misused to trigger a cache invalidation, so I removed this complexity by simply replicate the cache. Once this was done, I realized, I could misused a bit ISPN to also remove the need for Quartz sync. I'm pretty sure ISPN is not the best tool for the work (task sync) but it is the best to reduce the complexity (in this use case) :) That being said, this task distribution feature of JGroups is pretty awesome, I'm quite happy to have heard about it :) PS: I though of you last Tuesday - I, again, went to a customer and remove cluster from their EAP configuration :) On 26 March 2014 17:16, Bela Ban wrote: > Romain, > > do you really need Infinispan to do that ? Look at [1], IMO it's much > more elegant... :-) > > [1] http://www.jgroups.org/taskdistribution.html > > On 26/03/14 16:54, Romain Pelisse wrote: > > Well, my requirement is exactly the same ! Just that one node (and not > > all of them) runs a task. So instead of setting up Quartz clustering, we > > let the Seam/Quartz scheduling fire up task on each node, but the "first > > node" to get there locks some cache (which is replicated sync. with > > pessimistic locking), so that the other ones cannot acquire the lock. > > > > It's not that elegant, but this remove the need of clustering quartz and > > make the needs for configuration to the bare minimal (more/less just > > ispn inside wildfly needs to be configured). The difference between a > > test env and a pre prod and prod env is only that ISPN is distributed > > (and the mentioned cache replicated). > > > > Implementation wide, I think the customer end up having a static method > > being call on all Quartz classes, but I guess this could be placed in > > somekind of interceptor. > > > > I can ask the developer involved at the customer to discuss this with > > you if you want. > > > > > > > > On 26 March 2014 13:35, Ales Justin > > wrote: > > > >> I'm not bridging Quartz with ISPN, I'm currently using ISPN to > >> synchronise Quartz task (instead of clustering Quartz), mostly to > >> simplify a customer app (ie just need to tweak the ISPN cluster, > >> nothing else). > > > > How do you do that? > > By executing a job on entry eviction? > > > >> Out of curiosity, can you comment a bit about why/how you want to > >> bridge Quartz with ISPN ? > > > > Wrt how, no idea yet. :-) > > It's been a while since I used Quartz. > > > > My requirement is to have a single job per trigger execution across > > all nodes. > > e.g. only one node should handle particular job > > > > -Ales > > > >> On 26 March 2014 12:25, Ales Justin >> > wrote: > >> > >> Was there ever any attempt to bridge Infinispan and Quartz? > >> * http://quartz-scheduler.org/documentation/faq#FAQ-clustering > >> > >> As I'll probably need it for this: > >> * https://developers.google.com/appengine/docs/java/config/cron > >> > >> If nothing exists, I'll have a crack at it. > >> > >> -Ales > >> > >> > >> _______________________________________________ > >> infinispan-dev mailing list > >> infinispan-dev at lists.jboss.org > >> > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> > >> > >> > >> > >> -- > >> Romain PELISSE, > >> /"The trouble with having an open mind, of course, is that people > >> will insist on coming along and trying to put things in it" -- > >> Terry Pratchett/ > >> Belaran ins Prussia (blog) > >> (... finally up and running !) > >> _______________________________________________ > >> infinispan-dev mailing list > >> infinispan-dev at lists.jboss.org infinispan-dev at lists.jboss.org> > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org infinispan-dev at lists.jboss.org> > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > > > > > > -- > > Romain PELISSE, > > /"The trouble with having an open mind, of course, is that people will > > insist on coming along and trying to put things in it" -- Terry > Pratchett/ > > Belaran ins Prussia (blog) (... > > finally up and running !) > > > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > -- > Bela Ban, JGroups lead (http://www.jgroups.org) > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -- Romain PELISSE, *"The trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it" -- Terry Pratchett* Belaran ins Prussia (blog) (... finally up and running !) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140327/f7c1b990/attachment.html From brmeyer at redhat.com Thu Mar 27 10:15:39 2014 From: brmeyer at redhat.com (Brett Meyer) Date: Thu, 27 Mar 2014 10:15:39 -0400 (EDT) Subject: [infinispan-dev] OSGi In-Reply-To: <050EFA02-456F-4B9D-8401-4D1698CA9CCD@redhat.com> References: <3AB8C609-A110-442B-B4E5-733745AFD2DD@hibernate.org> <050EFA02-456F-4B9D-8401-4D1698CA9CCD@redhat.com> Message-ID: <1196159034.2848265.1395929739559.JavaMail.zimbra@redhat.com> https://github.com/infinispan/infinispan/pull/2457 I'm still more than willing to continue helping, but need some eyes on that first. There's one architectural question that, until I have all of your inputs, I can't really proceed. If you have a moment, I'd appreciate it! Brett Meyer Red Hat, Hibernate ORM ----- Original Message ----- From: "Mircea Markus" To: "infinispan -Dev List" Sent: Thursday, March 27, 2014 8:23:48 AM Subject: Re: [infinispan-dev] OSGi It is Ion on our side, thank you for the heads up. On Mar 27, 2014, at 9:28, Emmanuel Bernard wrote: > Hey guys, > > Sanne and Hardy are working on the OSGi-ification of Hibernate Search and it does not come without trouble. > > Who is leading this effort on the Infinispan side? I recommend you start early in a release cycle because you will have to butcher APIs and packages to do it properly. Worse, you will suffer from you dependencies. > > Brett has offered to help you but I know for sure he won?t lead it. He would be more like a good expect to talk to. > Bilgin has shown a Camel integration prototype but he also seemed to imply that he had some significant problems that needed Infinispan improvements. > > Also, I?m not quite sure but looking at these Camel routes, they seem to be very URI driven. If we want to support query over a Camel route and express them via a URI, we will need a string based query language. I might be talking nonsense and somehow the query is written in Java. But better anticipate. Bilgin would know more, he has written in his demo CamelInfinispanOperationQuery after all :) > > Emmanuel > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) _______________________________________________ infinispan-dev mailing list infinispan-dev at lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev From galder at redhat.com Thu Mar 27 11:32:54 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Thu, 27 Mar 2014 16:32:54 +0100 Subject: [infinispan-dev] ci test failures - MemcachedClusteredStatsTest EmbeddedRestMemcachedHotRodTest In-Reply-To: <5331BC49.5040808@redhat.com> References: <5331BC49.5040808@redhat.com> Message-ID: The 500 errors from the REST server have appeared randomly in some REST tests. I?ve not been able to replicate them locally. However, I?ve marked one of them as unstable so that at least we can get some TRACE on them. The other test, MemcachedClusteredStatsTest.testSingleConnectionPerServer, has also been marked as unstable. Let?s see logs there. Cheers, On 25 Mar 2014, at 18:26, Ion Savin wrote: > Hi Galder, > > Planning to cut Alpha2 tomorrow evening and there are two failing tests on jdk7 (jdk6 seems OK with the same changes). > > http://ci.infinispan.org/viewLog.html?buildId=7283&tab=buildResultsDiv&buildTypeId=bt8#testNameId-3650806209082815564 > > http://ci.infinispan.org/viewLog.html?buildId=7283&tab=buildResultsDiv&buildTypeId=bt8#testNameId-7927103100511017767 > > Can you have a look? > Thanks > > Regards, > Ion Savin -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From isavin at redhat.com Thu Mar 27 18:42:41 2014 From: isavin at redhat.com (Ion Savin) Date: Fri, 28 Mar 2014 00:42:41 +0200 Subject: [infinispan-dev] Infinispan 7.0.0.Alpha2 Message-ID: <5334A961.8040605@redhat.com> Dear Infinispan community, The Alpha2 release of Infinispan 7.0.0 is now available. This release brings Cross-Site Replication State Transfer in a stable topology (i.e. no nodes leaving or joining during the state transfer) and several bug fixes. For a complete list of features and bug fixes included in this release please refer to the release notes [1]. Visit our downloads [2] section to find the latest release. Note: The LevelDB and Rest Cache Store artifacts published to the Maven repo for this version are broken. Please wait for the next Alpha release if you plan to use them. Sorry for the inconvenience. If you have any questions please check our forums [3], our mailing lists [4] or ping us directly on IRC [5]. [1] https://issues.jboss.org/secure/ReleaseNote.jspa?projectId=12310799&version=12324205 [2] http://www.jboss.org/infinispan/downloads [3] http://www.jboss.org/infinispan/forums [4] https://lists.jboss.org/mailman/listinfo/infinispan-dev [5] irc://irc.freenode.org/infinispan Regards, Ion Savin From brmeyer at redhat.com Thu Mar 27 22:25:11 2014 From: brmeyer at redhat.com (Brett Meyer) Date: Thu, 27 Mar 2014 22:25:11 -0400 (EDT) Subject: [infinispan-dev] OSGi In-Reply-To: <1196159034.2848265.1395929739559.JavaMail.zimbra@redhat.com> References: <3AB8C609-A110-442B-B4E5-733745AFD2DD@hibernate.org> <050EFA02-456F-4B9D-8401-4D1698CA9CCD@redhat.com> <1196159034.2848265.1395929739559.JavaMail.zimbra@redhat.com> Message-ID: <719721308.3330602.1395973511947.JavaMail.zimbra@redhat.com> Success! Paired with a work-in-progress branch [1] for ORM, I'm now able to use hibernate-infinispan, infinispan-core, and infinispan-commons in OSGi (Karaf). Obviously, we still need unit tests (especially for the other modules), but hibernate-infinispan working is very indicative of successful class loading, service loading, resource loading, and client bundles by at least core. Thanks for the discussion on IRC, today! The OSGi pull request is now "complete" [2], but needs reviewing. [1] https://github.com/brmeyer/hibernate-orm/tree/hibernate-infinispan-osgi [2] https://github.com/infinispan/infinispan/pull/2457 Brett Meyer Red Hat, Hibernate ORM ----- Original Message ----- From: "Brett Meyer" To: "infinispan -Dev List" Sent: Thursday, March 27, 2014 10:15:39 AM Subject: Re: [infinispan-dev] OSGi https://github.com/infinispan/infinispan/pull/2457 I'm still more than willing to continue helping, but need some eyes on that first. There's one architectural question that, until I have all of your inputs, I can't really proceed. If you have a moment, I'd appreciate it! Brett Meyer Red Hat, Hibernate ORM ----- Original Message ----- From: "Mircea Markus" To: "infinispan -Dev List" Sent: Thursday, March 27, 2014 8:23:48 AM Subject: Re: [infinispan-dev] OSGi It is Ion on our side, thank you for the heads up. On Mar 27, 2014, at 9:28, Emmanuel Bernard wrote: > Hey guys, > > Sanne and Hardy are working on the OSGi-ification of Hibernate Search and it does not come without trouble. > > Who is leading this effort on the Infinispan side? I recommend you start early in a release cycle because you will have to butcher APIs and packages to do it properly. Worse, you will suffer from you dependencies. > > Brett has offered to help you but I know for sure he won?t lead it. He would be more like a good expect to talk to. > Bilgin has shown a Camel integration prototype but he also seemed to imply that he had some significant problems that needed Infinispan improvements. > > Also, I?m not quite sure but looking at these Camel routes, they seem to be very URI driven. If we want to support query over a Camel route and express them via a URI, we will need a string based query language. I might be talking nonsense and somehow the query is written in Java. But better anticipate. Bilgin would know more, he has written in his demo CamelInfinispanOperationQuery after all :) > > Emmanuel > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) _______________________________________________ infinispan-dev mailing list infinispan-dev at lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev From ben.cotton at ALUMNI.RUTGERS.EDU Fri Mar 28 17:35:49 2014 From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben) Date: Fri, 28 Mar 2014 14:35:49 -0700 (PDT) Subject: [infinispan-dev] Infinispan embedded off-heap cache In-Reply-To: <1391992923651-4028800.post@n3.nabble.com> References: <52D67480.9020908@redhat.com> <52D678BD.5010400@redhat.com> <52D67A1E.2010509@redhat.com> <1389808266330-4028649.post@n3.nabble.com> <1389893871449-4028653.post@n3.nabble.com> <1391715983011-4028794.post@n3.nabble.com> <3560BCBE-3D25-472B-9D8B-C16230FBCB4B@redhat.com> <1391992923651-4028800.post@n3.nabble.com> Message-ID: <1396042548850-4029112.post@n3.nabble.com> Minor update: Dmitry and I have established an initital bridge connecting OpenHFT's off heap SharedHashMap to an ISPN7 DataContainer API view. https://github.com/Cotton-Ben/infinispan (details found in [offheap] module) TODO: tests. -- View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4029112.html Sent from the Infinispan Developer List mailing list archive at Nabble.com.