From bban at redhat.com Sun Feb 2 05:33:51 2014 From: bban at redhat.com (Bela Ban) Date: Sun, 02 Feb 2014 11:33:51 +0100 Subject: [infinispan-dev] Ditching ASYNC modes for REPL/DIST/INV/CacheStores? In-Reply-To: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com> References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com> Message-ID: <52EE1F0F.40103@redhat.com> At the JGroups level, ASYNC generates *less* traffic than SYNC. So if you do sync under the cover and use a future to make it async at the API level, you're incurring more overhead, namely the messages sending back the responses. Not sure about the Infinispan async API, but I'd assume this would also use more threads. On 31/01/14 08:08, Galder Zamarre?o wrote: > Hi all, > > The following came to my mind yesterday: I think we should ditch > ASYNC modes for DIST/REPL/INV and our async cache store > functionality. > > Instead, whoever wants to store something asyncronously should use > asynchronous methods, i.e. call putAsync. So, this would mean that > when you call put(), it's always sync. This would reduce the > complexity and configuration of our code base, without affecting our > functionality, and it would make things more logical IMO. > > WDYT? > > Cheers, -- Galder Zamarre?o galder at redhat.com twitter.com/galderz > > Project Lead, Escalante http://escalante.io > > Engineer, Infinispan http://infinispan.org > > > _______________________________________________ infinispan-dev > mailing list infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -- Bela Ban, JGroups lead (http://www.jgroups.org) From bban at redhat.com Sun Feb 2 05:35:15 2014 From: bban at redhat.com (Bela Ban) Date: Sun, 02 Feb 2014 11:35:15 +0100 Subject: [infinispan-dev] Ditching ASYNC modes for REPL/DIST/INV/CacheStores? In-Reply-To: <52EB9889.9070800@redhat.com> References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com> <52EB5197.4050801@redhat.com> <52EB7F6C.505@redhat.com> <52EB9889.9070800@redhat.com> Message-ID: <52EE1F63.5050606@redhat.com> On 31/01/14 13:35, Radim Vansa wrote: > Worth to note that Infinispan does not have true async operation - > executing synchronous request in another threadpool is rather simplistic > solution that has serious drawbacks (I can imagine a situation where I'd > do 100 async gets in parallel, but this would drain the whole threadpool). +1000, I should have read the entire thread before replying... :-) > > Implementing that would require serious changes in all interceptors, > because you wouldn't be able to call > > visitWhateverCommand(command) { > /* do something */ > try { > invokeNextInterceptor(command); > } finally { > /* do another stuff */ > } > } > > - you'd have to put all local state prior to invoking next interceptor > to context. And you'd need twice as many methods, because now the code > would explicitly traverse interceptor stack in both directions. > > Still, I believe that this may be something to consider/plan for future. > > And then, yes, you'd need just > > put(key, value) { > future = putAsync(key, value); > return sync ? future.get() : null; > } > > Radim > > On 01/31/2014 11:48 AM, Tristan Tarrant wrote: >> Couldn't this be handled higher up in our implementatoin then ? >> >> If I enable an async mode, all puts / gets become putAsync/getAsync >> transparently to both the application and to the state transfer. >> >> Tristan >> >> On 01/31/2014 08:32 AM, Dennis Reed wrote: >>> It would be a loss of functionality. >>> >>> As a common example, the AS web session replication cache is configured >>> for ASYNC by default, for performance reasons. >>> But it can be changed to SYNC to guarantee that when the request >>> finishes that the session was replicated. >>> >>> That wouldn't be possible if you could no longer switch between >>> ASYNC/SYNC with just a configuration change. >>> >>> -Dennis >>> >>> On 01/31/2014 01:08 AM, Galder Zamarre?o wrote: >>>> Hi all, >>>> >>>> The following came to my mind yesterday: I think we should ditch ASYNC modes for DIST/REPL/INV and our async cache store functionality. >>>> >>>> Instead, whoever wants to store something asyncronously should use asynchronous methods, i.e. call putAsync. So, this would mean that when you call put(), it's always sync. This would reduce the complexity and configuration of our code base, without affecting our functionality, and it would make things more logical IMO. >>>> >>>> WDYT? >>>> >>>> Cheers, >>>> -- >>>> Galder Zamarre?o >>>> galder at redhat.com >>>> twitter.com/galderz >>>> >>>> Project Lead, Escalante >>>> http://escalante.io >>>> >>>> Engineer, Infinispan >>>> http://infinispan.org >>>> >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > -- Bela Ban, JGroups lead (http://www.jgroups.org) From bban at redhat.com Sun Feb 2 05:42:40 2014 From: bban at redhat.com (Bela Ban) Date: Sun, 02 Feb 2014 11:42:40 +0100 Subject: [infinispan-dev] Kyro performance (Was: reusing infinispan's marshalling) In-Reply-To: References: Message-ID: <52EE2120.2090806@redhat.com> I recently had a very bad experience with Kryo. A JGroups user used Kryo to marshal data types into byte buffers which he then broadcast around using JGroups (details in [1]). Turns out the culprit was Kryo temporarily flipping bits in an already marshalled buffer passed to JGroups. Of course retransmission would then cause a corrupted buffer to be sent. The solution was to copy the buffer which forfeits the advantages of using Kryo. Not being an expert on Kryo, perhaps this could be done differently, e.g. by synchronizing around a buffer... [1] https://issues.jboss.org/browse/JGRP-1718 On 31/01/14 17:59, Sanne Grinovero wrote: > Changing the subject, as Adrian will need a reply to his (more > important) question. > > I don't think we should go shopping for different marshaller > implementations, especially given other priorities. > > I've been keeping an eye on Kryo since a while and it looks very good > indeed, but JBMarshaller is serving us pretty well and I'm loving its > reliability. > > If we need more speed in this area, I'd rather see us perform some > very accurate benchmark development and try to understand why Kyro is > faster than JBM (if it really is), and potentially improve JBM. > For example as I've already suggested, it's using an internal > identityMap to detect graphs, and often we might not need that, or > also it would be nice to refactor it to write to an existing byte > stream rather than having it allocate internal buffers, and finally we > might want a "stateless edition" so to get rid of need for pooling of > JBMar instances. > > -- Sanne > > > > On 31 January 2014 16:29, Vladimir Blagojevic wrote: >> Not 100% related to what you are asking about but have a look at this >> post and the discussion that "erupted": >> >> http://gridgain.blogspot.ca/2012/12/java-serialization-good-fast-and-faster.html >> >> Vladimir >> On 1/30/2014, 7:13 AM, Adrian Nistor wrote: >>> Hi list! >>> >>> I've been pondering about re-using the marshalling machinery of >>> Infinispan in another project, specifically in ProtoStream, where I'm >>> planning to add it as a test scoped dependency so I can create a >>> benchmark to compare marshalling performace. I'm basically interested >>> in comparing ProtoStream and Infinispan's JBoss Marshalling based >>> mechanism. Comparing against plain JBMAR, without using the >>> ExternalizerTable and Externalizers introduced by Infinispan is not >>> going to get me accurate results. >>> >>> But how? I see the marshaling is spread across infinispan-commons and >>> infinispan-core modules. >>> >>> Thanks! >>> Adrian >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -- Bela Ban, JGroups lead (http://www.jgroups.org) From rvansa at redhat.com Mon Feb 3 09:10:29 2014 From: rvansa at redhat.com (Radim Vansa) Date: Mon, 03 Feb 2014 15:10:29 +0100 Subject: [infinispan-dev] Ditching ASYNC modes for REPL/DIST/INV/CacheStores? In-Reply-To: References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com> <52EB5197.4050801@redhat.com> <52EB7F6C.505@redhat.com> <52EB9889.9070800@redhat.com> Message-ID: <52EFA355.2070203@redhat.com> See below... On Fri, Jan 31, 2014 at 7:35 AM, Radim Vansa wrote: >> Worth to note that Infinispan does not have true async operation - >> executing synchronous request in another threadpool is rather simplistic >> solution that has serious drawbacks (I can imagine a situation where I'd >> do 100 async gets in parallel, but this would drain the whole threadpool). > I agree if we could optimize this with batching it would make it better. > >> Implementing that would require serious changes in all interceptors, >> because you wouldn't be able to call >> >> visitWhateverCommand(command) { >> /* do something */ >> try { >> invokeNextInterceptor(command); >> } finally { >> /* do another stuff */ >> } >> } >> >> - you'd have to put all local state prior to invoking next interceptor >> to context. And you'd need twice as many methods, because now the code >> would explicitly traverse interceptor stack in both directions. > I am not quite sure what you mean here. Async transport currently > traverses the interceptors for originator and receiver (albeit > originator goes back up without a response). > >> Still, I believe that this may be something to consider/plan for future. >> >> And then, yes, you'd need just >> >> put(key, value) { >> future = putAsync(key, value); >> return sync ? future.get() : null; >> } > For sync we would want to invoke directly to avoid context switching. I think you haven't properly understood what I was talking about: the putAsync should not switch context at all in the ideal design. It should traverse through the interceptors all the way down (logically, in current behaviour), invoke JGroups async API and jump out. Then, as soon as the response is received, the thread which delivered it should traverse the interceptor stack up (again, logically), and fire the future. Radim From sanne at infinispan.org Mon Feb 3 09:54:52 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Mon, 3 Feb 2014 14:54:52 +0000 Subject: [infinispan-dev] Ditching ASYNC modes for REPL/DIST/INV/CacheStores? In-Reply-To: <52EFA355.2070203@redhat.com> References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com> <52EB5197.4050801@redhat.com> <52EB7F6C.505@redhat.com> <52EB9889.9070800@redhat.com> <52EFA355.2070203@redhat.com> Message-ID: On 3 February 2014 14:10, Radim Vansa wrote: > See below... > > On Fri, Jan 31, 2014 at 7:35 AM, Radim Vansa wrote: >>> Worth to note that Infinispan does not have true async operation - >>> executing synchronous request in another threadpool is rather simplistic >>> solution that has serious drawbacks (I can imagine a situation where I'd >>> do 100 async gets in parallel, but this would drain the whole threadpool). >> I agree if we could optimize this with batching it would make it better. >> >>> Implementing that would require serious changes in all interceptors, >>> because you wouldn't be able to call >>> >>> visitWhateverCommand(command) { >>> /* do something */ >>> try { >>> invokeNextInterceptor(command); >>> } finally { >>> /* do another stuff */ >>> } >>> } >>> >>> - you'd have to put all local state prior to invoking next interceptor >>> to context. And you'd need twice as many methods, because now the code >>> would explicitly traverse interceptor stack in both directions. >> I am not quite sure what you mean here. Async transport currently >> traverses the interceptors for originator and receiver (albeit >> originator goes back up without a response). >> >>> Still, I believe that this may be something to consider/plan for future. >>> >>> And then, yes, you'd need just >>> >>> put(key, value) { >>> future = putAsync(key, value); >>> return sync ? future.get() : null; >>> } >> For sync we would want to invoke directly to avoid context switching. > > I think you haven't properly understood what I was talking about: the > putAsync should not switch context at all in the ideal design. It should > traverse through the interceptors all the way down (logically, in > current behaviour), invoke JGroups async API and jump out. Then, as soon > as the response is received, the thread which delivered it should > traverse the interceptor stack up (again, logically), and fire the future. +1 much cleaner, I love it. Actually wasn't aware the current code didn't do this :-( Sanne > > Radim > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From mudokonman at gmail.com Mon Feb 3 10:02:41 2014 From: mudokonman at gmail.com (William Burns) Date: Mon, 3 Feb 2014 10:02:41 -0500 Subject: [infinispan-dev] Ditching ASYNC modes for REPL/DIST/INV/CacheStores? In-Reply-To: References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com> <52EB5197.4050801@redhat.com> <52EB7F6C.505@redhat.com> <52EB9889.9070800@redhat.com> <52EFA355.2070203@redhat.com> Message-ID: On Mon, Feb 3, 2014 at 9:54 AM, Sanne Grinovero wrote: > On 3 February 2014 14:10, Radim Vansa wrote: >> See below... >> >> On Fri, Jan 31, 2014 at 7:35 AM, Radim Vansa wrote: >>>> Worth to note that Infinispan does not have true async operation - >>>> executing synchronous request in another threadpool is rather simplistic >>>> solution that has serious drawbacks (I can imagine a situation where I'd >>>> do 100 async gets in parallel, but this would drain the whole threadpool). >>> I agree if we could optimize this with batching it would make it better. >>> >>>> Implementing that would require serious changes in all interceptors, >>>> because you wouldn't be able to call >>>> >>>> visitWhateverCommand(command) { >>>> /* do something */ >>>> try { >>>> invokeNextInterceptor(command); >>>> } finally { >>>> /* do another stuff */ >>>> } >>>> } >>>> >>>> - you'd have to put all local state prior to invoking next interceptor >>>> to context. And you'd need twice as many methods, because now the code >>>> would explicitly traverse interceptor stack in both directions. >>> I am not quite sure what you mean here. Async transport currently >>> traverses the interceptors for originator and receiver (albeit >>> originator goes back up without a response). >>> >>>> Still, I believe that this may be something to consider/plan for future. >>>> >>>> And then, yes, you'd need just >>>> >>>> put(key, value) { >>>> future = putAsync(key, value); >>>> return sync ? future.get() : null; >>>> } >>> For sync we would want to invoke directly to avoid context switching. >> >> I think you haven't properly understood what I was talking about: the >> putAsync should not switch context at all in the ideal design. It should >> traverse through the interceptors all the way down (logically, in >> current behaviour), invoke JGroups async API and jump out. Then, as soon >> as the response is received, the thread which delivered it should >> traverse the interceptor stack up (again, logically), and fire the future. A Future doesn't make much sense with an async transport. The problem is with an async transport you never get back a response so you never know when the actual command is completed and thus a Future is worthless. The caller wouldn't know if they could rely on the use of the Future or not. Also it depends what you are trying to do with async. Currently async transport is only for sending messages to another node, we never think of when we are the owning node. In this case the calling thread would have to go down the interceptor stack and acquire any locks if it is the owner, thus causing this "async" to block if you have any contention on the given key. The use of another thread would allow the calling thread to be able to return immediately no matter what else is occurring. Also I don't see what is so wrong about having a context switch to run something asynchronously, we shouldn't have a context switch to block the user thread imo, which is very possible with locking. > > +1 much cleaner, I love it. Actually wasn't aware the current code > didn't do this :-( This is what the current async transport does, but it does nothing with Futures. > > Sanne > >> >> Radim >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From ttarrant at redhat.com Mon Feb 3 10:49:02 2014 From: ttarrant at redhat.com (Tristan Tarrant) Date: Mon, 03 Feb 2014 16:49:02 +0100 Subject: [infinispan-dev] Weekly IRC meeting minutes Message-ID: <52EFBA6E.2060603@redhat.com> Dear all, you can read the transcript of this week's IRC meeting at: http://transcripts.jboss.org/meeting/irc.freenode.org/infinispan/2014/infinispan.2014-02-03-15.12.html Tristan From galder at redhat.com Mon Feb 3 11:07:52 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Mon, 3 Feb 2014 17:07:52 +0100 Subject: [infinispan-dev] New Cache Entry Notifications In-Reply-To: <1FAA84A0-3AFE-4DA7-980F-AF1FB5725F5A@redhat.com> References: <1FAA84A0-3AFE-4DA7-980F-AF1FB5725F5A@redhat.com> Message-ID: On 23 Jan 2014, at 18:54, Mircea Markus wrote: > > On Jan 23, 2014, at 5:48 PM, William Burns wrote: > >> Hello all, >> >> I have been working with notifications and most recently I have come >> to look into events generated when a new entry is created. Now >> normally I would just expect a CacheEntryCreatedEvent to be raised. >> However we currently raise a CacheEntryModifiedEvent event and then a >> CacheEntryCreatedEvent. I notice that there are comments around the >> code saying that tests require both to be fired. > > it doesn't sound right to me: modified is different than created. I?ve lost count the number of times I?ve raised this up in the dev mailing list :| And, if CacheEntryModifiedEvent has a method called isCreated(), is cos I added it in order to differentiate between a modify and a create without breaking backwards compatibility and expectations of events to be received. Just need to trace back the jira issue number, and associated forum threads ;) :p > >> >> I am wondering if anyone has an objection to only raising a >> CacheEntryCreatedEvent on a new cache entry being created. It?d break expectations of existing applications that expect certain events. It?s a very difficult one to swallow. Plus, there?s JCache specifications, which adds listeners, and gets them right. Eventually everyone should move towards that. >> Does >> anyone know why we raise both currently? Legacy really. >> Was it just so the >> PutKeyValueCommand could more ignorantly just raise the >> CacheEntryModified pre Event? >> >> Any input would be appreciated, Thanks. > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From rvansa at redhat.com Mon Feb 3 11:28:31 2014 From: rvansa at redhat.com (Radim Vansa) Date: Mon, 03 Feb 2014 17:28:31 +0100 Subject: [infinispan-dev] Ditching ASYNC modes for REPL/DIST/INV/CacheStores? In-Reply-To: References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com> <52EB5197.4050801@redhat.com> <52EB7F6C.505@redhat.com> <52EB9889.9070800@redhat.com> <52EFA355.2070203@redhat.com> Message-ID: <52EFC3AF.5060201@redhat.com> >>>> For sync we would want to invoke directly to avoid context switching. >>> I think you haven't properly understood what I was talking about: the >>> putAsync should not switch context at all in the ideal design. It should >>> traverse through the interceptors all the way down (logically, in >>> current behaviour), invoke JGroups async API and jump out. Then, as soon >>> as the response is received, the thread which delivered it should >>> traverse the interceptor stack up (again, logically), and fire the future. > A Future doesn't make much sense with an async transport. The problem > is with an async transport you never get back a response so you never > know when the actual command is completed and thus a Future is > worthless. The caller wouldn't know if they could rely on the use of > the Future or not. You're right, there's one important difference between putAsync and put with async transport: in the first case you can find out when the request is completed while you cannot with the latter. Not requiring the ack can be an important optimization. I think that both versions are very valid: first mostly for bulk operations = reduction of latency, second for modifications that are acceptable to fail without handling that. I had the first case in my mind when talking about async operations, and there the futures are necessary. > > Also it depends what you are trying to do with async. Currently async > transport is only for sending messages to another node, we never think > of when we are the owning node. In this case the calling thread would > have to go down the interceptor stack and acquire any locks if it is > the owner, thus causing this "async" to block if you have any > contention on the given key. The use of another thread would allow > the calling thread to be able to return immediately no matter what > else is occurring. Also I don't see what is so wrong about having a > context switch to run something asynchronously, we shouldn't have a > context switch to block the user thread imo, which is very possible > with locking. This is an important notice! Locking would complicate the design a lot, because the thread in "async" mode should do only tryLocks - if this fails, further processing should be dispatched to another thread. Not sure if this could be implemented at all, because the thread may be blocked inside JGroups as well (async API is about receiving the response asynchronously, not about sending the message asynchronously). I don't say that the context switch is that bad. My concern is that you have a very limited amount of requests that can be processed in parallel. I consider a "request" something pretty lightweight in concept - but one thread per request makes this rather heavyweight stuff. > >> +1 much cleaner, I love it. Actually wasn't aware the current code >> didn't do this :-( > This is what the current async transport does, but it does nothing with Futures. Nevermind the futures, this is not the important part. It's not about async transport neither, it's about async executors. (okay, the thread was about dropping async transport, I have hijacked it) Radim -- Radim Vansa JBoss DataGrid QA From mudokonman at gmail.com Mon Feb 3 11:29:56 2014 From: mudokonman at gmail.com (William Burns) Date: Mon, 3 Feb 2014 11:29:56 -0500 Subject: [infinispan-dev] New Cache Entry Notifications In-Reply-To: References: <1FAA84A0-3AFE-4DA7-980F-AF1FB5725F5A@redhat.com> Message-ID: On Mon, Feb 3, 2014 at 11:07 AM, Galder Zamarre?o wrote: > > On 23 Jan 2014, at 18:54, Mircea Markus wrote: > >> >> On Jan 23, 2014, at 5:48 PM, William Burns wrote: >> >>> Hello all, >>> >>> I have been working with notifications and most recently I have come >>> to look into events generated when a new entry is created. Now >>> normally I would just expect a CacheEntryCreatedEvent to be raised. >>> However we currently raise a CacheEntryModifiedEvent event and then a >>> CacheEntryCreatedEvent. I notice that there are comments around the >>> code saying that tests require both to be fired. >> >> it doesn't sound right to me: modified is different than created. > > I've lost count the number of times I've raised this up in the dev mailing list :| > > And, if CacheEntryModifiedEvent has a method called isCreated(), is cos I added it in order to differentiate between a modify and a create without breaking backwards compatibility and expectations of events to be received. Just need to trace back the jira issue number, and associated forum threads ;) :p Ah nice I didn't even notice the method until you pointed it out. > >> >>> >>> I am wondering if anyone has an objection to only raising a >>> CacheEntryCreatedEvent on a new cache entry being created. > > It'd break expectations of existing applications that expect certain events. It's a very difficult one to swallow. I agree. Maybe I should change to if anyone minds if Cluster Listeners only raise the CacheEntryModifiedEvent on an entry creation for cluster listeners instead? This wouldn't break existing assumptions since we don't currently support Cluster Listeners. The only thing is it wouldn't be consistent with regular listeners... > > Plus, there's JCache specifications, which adds listeners, and gets them right. Eventually everyone should move towards that. Just to be clear you are saying the JCache only raises a single event for change and create right? > >>> Does >>> anyone know why we raise both currently? > > Legacy really. > >>> Was it just so the >>> PutKeyValueCommand could more ignorantly just raise the >>> CacheEntryModified pre Event? >>> >>> Any input would be appreciated, Thanks. >> >> Cheers, >> -- >> Mircea Markus >> Infinispan lead (www.infinispan.org) >> >> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz > > Project Lead, Escalante > http://escalante.io > > Engineer, Infinispan > http://infinispan.org > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From dan.berindei at gmail.com Mon Feb 3 13:01:43 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Mon, 3 Feb 2014 20:01:43 +0200 Subject: [infinispan-dev] Ditching ASYNC modes for REPL/DIST/INV/CacheStores? In-Reply-To: <52EFC3AF.5060201@redhat.com> References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com> <52EB5197.4050801@redhat.com> <52EB7F6C.505@redhat.com> <52EB9889.9070800@redhat.com> <52EFA355.2070203@redhat.com> <52EFC3AF.5060201@redhat.com> Message-ID: On Mon, Feb 3, 2014 at 6:28 PM, Radim Vansa wrote: > >>>> For sync we would want to invoke directly to avoid context switching. > >>> I think you haven't properly understood what I was talking about: the > >>> putAsync should not switch context at all in the ideal design. It > should > >>> traverse through the interceptors all the way down (logically, in > >>> current behaviour), invoke JGroups async API and jump out. Then, as > soon > >>> as the response is received, the thread which delivered it should > >>> traverse the interceptor stack up (again, logically), and fire the > future. > > A Future doesn't make much sense with an async transport. The problem > > is with an async transport you never get back a response so you never > > know when the actual command is completed and thus a Future is > > worthless. The caller wouldn't know if they could rely on the use of > > the Future or not. > > You're right, there's one important difference between putAsync and put > with async transport: in the first case you can find out when the > request is completed while you cannot with the latter. Not requiring the > ack can be an important optimization. I think that both versions are > very valid: first mostly for bulk operations = reduction of latency, > second for modifications that are acceptable to fail without handling that. > I had the first case in my mind when talking about async operations, and > there the futures are necessary. > A couple more differences: 1. You can't do commitAsync(), but you can configure the commit to be replicated asynchronously (1PC). Although we did talk about removing that option... 2. If you do putAsync(k, v1); putAsync(k, v2), there is no ordering between the two and you might end up with k=v1 in the cache. > > > > Also it depends what you are trying to do with async. Currently async > > transport is only for sending messages to another node, we never think > > of when we are the owning node. In this case the calling thread would > > have to go down the interceptor stack and acquire any locks if it is > > the owner, thus causing this "async" to block if you have any > > contention on the given key. The use of another thread would allow > > the calling thread to be able to return immediately no matter what > > else is occurring. Also I don't see what is so wrong about having a > > context switch to run something asynchronously, we shouldn't have a > > context switch to block the user thread imo, which is very possible > > with locking. > > This is an important notice! Locking would complicate the design a lot, > because the thread in "async" mode should do only tryLocks - if this > fails, further processing should be dispatched to another thread. Not > sure if this could be implemented at all, because the thread may be > blocked inside JGroups as well (async API is about receiving the > response asynchronously, not about sending the message asynchronously). > > I don't say that the context switch is that bad. My concern is that you > have a very limited amount of requests that can be processed in > parallel. I consider a "request" something pretty lightweight in concept > - but one thread per request makes this rather heavyweight stuff. > We did talk in Farnborough/Palma about removing the current LockManager with a queue-based structure like the one used for ordering total-order transactions. And about removing the implicit stack in the current interceptor stack with an explicit stack, to allow resuming a command mid-execution. But the feeling I got was that neither is going to make it into 7.0. > > > > >> +1 much cleaner, I love it. Actually wasn't aware the current code > >> didn't do this :-( > > This is what the current async transport does, but it does nothing with > Futures. > > Nevermind the futures, this is not the important part. It's not about > async transport neither, it's about async executors. > (okay, the thread was about dropping async transport, I have hijacked it) > > Radim > > -- > Radim Vansa > JBoss DataGrid QA > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140203/8e6773f7/attachment.html From galder at redhat.com Mon Feb 3 13:24:52 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Mon, 3 Feb 2014 19:24:52 +0100 Subject: [infinispan-dev] reusing infinispan's marshalling In-Reply-To: <52EA41E0.2010505@redhat.com> References: <52EA41E0.2010505@redhat.com> Message-ID: <1824630C-1D48-480A-8687-E563A54E7E6A@redhat.com> Not sure I understand the need to compare this. JBMAR and ProtoStream are solving different problems. The former is focused on getting the best out of Java persistence. The latter is focused on serializing stuff in a plattform independent way. IMO, it?s not an apples to apples comparison. Cheers, On 30 Jan 2014, at 13:13, Adrian Nistor wrote: > Hi list! > > I've been pondering about re-using the marshalling machinery of > Infinispan in another project, specifically in ProtoStream, where I'm > planning to add it as a test scoped dependency so I can create a > benchmark to compare marshalling performace. I'm basically interested > in comparing ProtoStream and Infinispan's JBoss Marshalling based > mechanism. Comparing against plain JBMAR, without using the > ExternalizerTable and Externalizers introduced by Infinispan is not > going to get me accurate results. > > But how? I see the marshaling is spread across infinispan-commons and > infinispan-core modules. > > Thanks! > Adrian > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From galder at redhat.com Tue Feb 4 02:14:23 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Tue, 4 Feb 2014 08:14:23 +0100 Subject: [infinispan-dev] Store as binary In-Reply-To: <68B26C2A-389B-4C0A-A3C6-DBE3B0526DAC@redhat.com> References: <52D92AC4.7080701@redhat.com> <52DCF101.3020903@infinispan.org> <87020416-72D3-412E-818B-A7F9161355CC@redhat.com> <52DCF70C.4090404@infinispan.org> <52DD4534.7080209@redhat.com> <68B26C2A-389B-4C0A-A3C6-DBE3B0526DAC@redhat.com> Message-ID: On 21 Jan 2014, at 17:45, Mircea Markus wrote: > > On Jan 21, 2014, at 2:13 PM, Sanne Grinovero wrote: > >> On 21 January 2014 13:37, Mircea Markus wrote: >>> >>> On Jan 21, 2014, at 1:21 PM, Galder Zamarre?o wrote: >>> >>>>> What's the point for these tests? >>>> >>>> +1 >>> >>> To validate if storing the data in binary format yields better performance than store is as a POJO. >> >> That will highly depend on the scenarios you want to test for. AFAIK >> this started after Paul described how session replication works in >> WildFly, and we already know that both strategies are suboptimal with >> the current options available: in his case the active node will always >> write on the POJO, while the backup node will essentially only need to >> store the buffer "just in case" he might need to take over. > > Indeed as it is today, it doesn't make sense for WildFly's session replication. > >> >> Sure, one will be slower, but if you want to make a suggestion to him >> about which configuration he should be using, we should measure his >> use case, not a different one. >> >> Even then as discussed in Palma, an in memory String representation >> might be way more compact because of pooling of strings and a very >> high likelihood for repeated headers (as common in web frameworks), > > pooling like in String.intern()? > Even so, if most of your access to the String is to serialize it and sent is remotely then you have a serialization cost(CPU) to pay for the reduced size. Serialization has a cost, but nothing compared with the transport itself, and you don?t have to go very far to see the impact of transport. Just recently we were chasing some performance regression and even though there were some changes in serialization, the impact of my improvements was minimal, max 2-3%. Optimal network and transport configuration is more important IMO, and once again, misconfiguration in that layer is what was causing us to be ~20% slower. > >> so >> you might want to measure the CPU vs storage cost on the receiving >> side.. but then again your results will definitely depend on the input >> data and assumptions on likelihood of failover, how often is being >> written on the owner node vs on the other node (since he uses >> locality), etc.. many factors I'm not seeing being considered here and >> which could make a significant difference. > > I'm looking for the default setting of storeAsBinary in the configurations we ship. I think the default configs should be optimized for distribution, random key access (every reads/writes for any key executes on every node of the cluster with the same probability) for both read an write. I?m with Sanne on this. I still think this is not a useful exercise really, since serialization is not huge cost in total time spent. Our latency is driven by waiting for others to reply to our requests, and that?s the driver on sync mode. In async, you can forget about the serialization cost if you use putAsync(). I find it way more useful to look at Infinispan all the time and consider what things we should be ditching to make our configuration smaller, our memory consumption smaller, and a smaller code base. > >> >>> As of now, it doesn't so I need to check why. >> >> You could play with the test parameters until it produces an output >> you like better, but I still see no point? > > the point is to provide the best defaults params for the default config, and see what's the usefulness of storeAsBinary. > >> This is not a realistic >> scenario, at best it could help us document suggestions about which >> scenarios you'd want to keep the option enabled vs disabled, but then >> again I think we're wasting time as we could implement a better >> strategy for Paul's use case: one which never deserializes a value >> received from a remote node until it's been requested as a POJO, but >> keeps the POJO as-is when it's stored locally. > > I disagree: Paul's scenario, whilst very important, is quite specific. For what I consider the general case (random key access, see above), your approach is suboptimal. > > >> I believe that would >> make sense also for OGM and probably most other users of Embedded. >> Basically, that would re-implement something similar to the previous >> design but simplifying it a bit so that it doesn't allow for a >> back-and-forth conversion between storage types but rather dynamically >> favors a specific storage strategy. > > It all boils down to what we want to optimize for: random key access or some degree of affinity. I think the former is the default. > One way or the other, from the test Radim ran with random key access, the storeAsBinary doesn't bring any benefit and it should: http://lists.jboss.org/pipermail/infinispan-dev/2009-October/004299.html > >> >> Cheers, >> Sanne >> >>> >>> Cheers, >>> -- >>> Mircea Markus >>> Infinispan lead (www.infinispan.org) >>> >>> >>> >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From galder at redhat.com Tue Feb 4 03:07:09 2014 From: galder at redhat.com (=?iso-8859-1?Q?Galder_Zamarre=F1o?=) Date: Tue, 4 Feb 2014 09:07:09 +0100 Subject: [infinispan-dev] L1OnRehash Discussion In-Reply-To: References: Message-ID: <5853A076-0FEF-4782-A4A3-6C84999DCA2D@redhat.com> On 28 Jan 2014, at 15:29, William Burns wrote: > Hello everyone, > > I wanted to discuss what I would say as dubious benefit of L1OnRehash > especially compared to the benefits it provide. > > L1OnRehash is used to retain a value by moving a previously owned > value into the L1 when a rehash occurs and this node no longer owns > that value Also any current L1 values are removed when a rehash > occurs. Therefore it can only save a single remote get for only a few > keys when a rehash occurs. > > This by itself is fine however L1OnRehash has many edge cases to > guarantee consistency as can be seen from > https://issues.jboss.org/browse/ISPN-3838. This can get quite > complicated for a feature that gives marginal performance increases > (especially given that this value may never have been read recently - > at least normal L1 usage guarantees this). > > My first suggestion is instead to deprecate the L1OnRehash > configuration option and to remove this logic. +1 > My second suggestion is a new implementation of L1OnRehash that is > always enabled when L1 threshold is configured to 0. For those not > familiar L1 threshold controls whether invalidations are broadcasted > instead of individual messages. A value of 0 means to always > broadcast. This would allow for some benefits that we can't currently > do: > > 1. L1 values would never have to be invalidated on a rehash event > (guarantee locality reads under rehash) > 2. L1 requestors would not have to be tracked any longer > > However every write would be required to send an invalidation which > could slow write performance in additional cases (since we currently > only send invalidations when requestors are found). The difference > would be lessened with udp, which is the transport I would assume > someone would use when configuring L1 threshold to 0. Sounds good to me, but I think you could go even beyond this and maybe get rid of threshold configuration option too? If the transport is UDP and multicast is configured, invalidations are broadcasted (and apply the two benefits you mention). If UDP w/ unicast or TCP used, track invalidations and send them as unicasts. Do we really need to expose these configuration options to the user? > What do you guys think? I am thinking that no one minds the removal > of L1OnRehash that we have currently (if so let me know). I am quite > curious what others think about the changes for L1 threshold value of > 0, maybe this configuration value is never used? > > Thanks, > > - Will > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From galder at redhat.com Tue Feb 4 03:21:13 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Tue, 4 Feb 2014 09:21:13 +0100 Subject: [infinispan-dev] Module jars dissapearing leaving empty classes/ folders and errors Message-ID: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com> Hi all, We?re having issues with our Infinispan Server integration tests, which run within Wildfly 8.0.0.Beta1 (as I?m typing I?m wondering if we should just upgrade it to see if this goes away...?). Quite often some of the runs fail with error message [1]. Having looked at the build environment when a run fails, you see this: -- $ ls modules/system/layers/base/org/infinispan/server/rest/main drwxrwxr-x 2 g staff 68B Feb 3 18:41 classes ( ... ... This is completely different to what happens with a successful run: -- $ ls modules/system/layers/base/org/infinispan/server/rest/main -rw-r--r-- 1 g staff 103K Feb 3 19:40 infinispan-classes.jar ( ... > ... > > This is completely different to what happens with a successful run: > > -- > $ ls modules/system/layers/base/org/infinispan/server/rest/main > -rw-r--r-- 1 g staff 103K Feb 3 19:40 infinispan-classes.jar ( -rw-r--r-- 1 g staff 278B Feb 3 19:40 infinispan-classes.jar.index > -rw-r--r-- 1 g staff 2.1K Feb 3 19:40 module.xml > > $ jar tf modules/system/layers/base/org/infinispan/server/rest/main/infinispan-classes.jar | grep ExtendedHeaders > org/infinispan/rest/configuration/ExtendedHeaders.class > > $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml > > ? > > Anyone can explain what is going on here? Does it ring a bell to anyone? Is this a known Wildfly issue by any chance? > > [1] https://gist.github.com/galderz/bd74cebfc840ef3ae284 > > Cheers, > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz > > Project Lead, Escalante > http://escalante.io > > Engineer, Infinispan > http://infinispan.org > > > _______________________________________________ > jboss-as7-dev mailing list > jboss-as7-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/jboss-as7-dev > -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From galder at redhat.com Tue Feb 4 04:14:34 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Tue, 4 Feb 2014 10:14:34 +0100 Subject: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing leaving empty classes/ folders and errors In-Reply-To: References: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com> Message-ID: On 04 Feb 2014, at 10:01, Stuart Douglas wrote: > > > > On Tue, Feb 4, 2014 at 11:00 AM, Stuart Douglas wrote: > Yes, there is nothing in the server code that modified the modules directory. > > Well, except for the new patching stuff, but that is not really relevant here. The testsuite AS/WF builds are built out of the distribution build, which shows the same problem. The distribution we build uses the scripts we got from AS [1]. Do you see anything in there that could be causing this? We are using maven-antrun-plugin version 1.3, and take into account the lib.xml in [2]. Finally, do you have any suggestions on changes we could make to these files to further debug the issue? Thanks a lot for your help! [1] https://github.com/infinispan/infinispan/blob/master/server/integration/build/build.xml [2] https://github.com/infinispan/infinispan/blob/master/server/integration/build/lib.xml > > Stuart > > > Stuart > > > On Tue, Feb 4, 2014 at 10:56 AM, Galder Zamarre?o wrote: > > On 04 Feb 2014, at 09:37, Stuart Douglas wrote: > > > This looks like an issue with your environment. The modules directory is static. Wildfly does not contain any code that messes with it. I would say the culprit is probably something in either your build process or your test suite. > > Correction, this is happening with AS 7.2.0.Final (Wildfly 8 used somewhere else). I guess your answer still applies? > > Cheers, > > > > > Stuart > > > > > > On Tue, Feb 4, 2014 at 10:21 AM, Galder Zamarre?o wrote: > > Hi all, > > > > We?re having issues with our Infinispan Server integration tests, which run within Wildfly 8.0.0.Beta1 (as I?m typing I?m wondering if we should just upgrade it to see if this goes away...?). > > > > Quite often some of the runs fail with error message [1]. > > > > Having looked at the build environment when a run fails, you see this: > > > > -- > > $ ls modules/system/layers/base/org/infinispan/server/rest/main > > drwxrwxr-x 2 g staff 68B Feb 3 18:41 classes ( > -rw-r--r-- 1 g staff 1B Feb 3 18:41 classes.index > > -rw-r--r-- 1 g staff 2.1K Feb 3 18:41 module.xml > > > > $ ls modules/system/layers/base/org/infinispan/server/rest/main/classes > > drwxrwxr-x 2 g staff 68B Feb 3 18:41 . > > drwxrwxr-x 5 g staff 170B Feb 3 18:41 .. > > > > $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml > > > > ... > > > > This is completely different to what happens with a successful run: > > > > -- > > $ ls modules/system/layers/base/org/infinispan/server/rest/main > > -rw-r--r-- 1 g staff 103K Feb 3 19:40 infinispan-classes.jar ( > -rw-r--r-- 1 g staff 278B Feb 3 19:40 infinispan-classes.jar.index > > -rw-r--r-- 1 g staff 2.1K Feb 3 19:40 module.xml > > > > $ jar tf modules/system/layers/base/org/infinispan/server/rest/main/infinispan-classes.jar | grep ExtendedHeaders > > org/infinispan/rest/configuration/ExtendedHeaders.class > > > > $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml > > > > ? > > > > Anyone can explain what is going on here? Does it ring a bell to anyone? Is this a known Wildfly issue by any chance? > > > > [1] https://gist.github.com/galderz/bd74cebfc840ef3ae284 > > > > Cheers, > > -- > > Galder Zamarre?o > > galder at redhat.com > > twitter.com/galderz > > > > Project Lead, Escalante > > http://escalante.io > > > > Engineer, Infinispan > > http://infinispan.org > > > > > > _______________________________________________ > > jboss-as7-dev mailing list > > jboss-as7-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/jboss-as7-dev > > > > > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz > > Project Lead, Escalante > http://escalante.io > > Engineer, Infinispan > http://infinispan.org > > > -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From dan.berindei at gmail.com Tue Feb 4 06:04:22 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Tue, 4 Feb 2014 13:04:22 +0200 Subject: [infinispan-dev] L1OnRehash Discussion In-Reply-To: <5853A076-0FEF-4782-A4A3-6C84999DCA2D@redhat.com> References: <5853A076-0FEF-4782-A4A3-6C84999DCA2D@redhat.com> Message-ID: On Tue, Feb 4, 2014 at 10:07 AM, Galder Zamarre?o wrote: > > On 28 Jan 2014, at 15:29, William Burns wrote: > > > Hello everyone, > > > > I wanted to discuss what I would say as dubious benefit of L1OnRehash > > especially compared to the benefits it provide. > > > > L1OnRehash is used to retain a value by moving a previously owned > > value into the L1 when a rehash occurs and this node no longer owns > > that value Also any current L1 values are removed when a rehash > > occurs. Therefore it can only save a single remote get for only a few > > keys when a rehash occurs. > > > > This by itself is fine however L1OnRehash has many edge cases to > > guarantee consistency as can be seen from > > https://issues.jboss.org/browse/ISPN-3838. This can get quite > > complicated for a feature that gives marginal performance increases > > (especially given that this value may never have been read recently - > > at least normal L1 usage guarantees this). > > > > My first suggestion is instead to deprecate the L1OnRehash > > configuration option and to remove this logic. > > +1 > +1 from me as well > > > My second suggestion is a new implementation of L1OnRehash that is > > always enabled when L1 threshold is configured to 0. For those not > > familiar L1 threshold controls whether invalidations are broadcasted > > instead of individual messages. A value of 0 means to always > > broadcast. This would allow for some benefits that we can't currently > > do: > > > > 1. L1 values would never have to be invalidated on a rehash event > > (guarantee locality reads under rehash) > > 2. L1 requestors would not have to be tracked any longer > > > > However every write would be required to send an invalidation which > > could slow write performance in additional cases (since we currently > > only send invalidations when requestors are found). The difference > > would be lessened with udp, which is the transport I would assume > > someone would use when configuring L1 threshold to 0. > > Sounds good to me, but I think you could go even beyond this and maybe get > rid of threshold configuration option too? > > If the transport is UDP and multicast is configured, invalidations are > broadcasted (and apply the two benefits you mention). > If UDP w/ unicast or TCP used, track invalidations and send them as > unicasts. > > Do we really need to expose these configuration options to the user? > I think the idea was that even with UDP, sending 2 unicasts and waiting for only 2 responses may be faster than sending a multicast and waiting for 10 responses. However, I'm not sure that's the case if we send 1 unicast invalidation from each owner instead of a single multicast invalidation from the primary owner/originator [1]. Maybe if each owner would return a list of requestors and the originator would do the invalidation at the end... One tangible benefit of having the setting is that we can run the test suite with TCP only, and still cover every path in L1Manager. If removed it completely, it would still be possible to change the toggle in L1ManagerImpl via reflection, but it would be a little hacky. > > What do you guys think? I am thinking that no one minds the removal > > of L1OnRehash that we have currently (if so let me know). I am quite > > curious what others think about the changes for L1 threshold value of > > 0, maybe this configuration value is never used? > > > Since we don't give any guidance as to what a good threshold value would be, I doubt many people use it. My alternative proposal would be to replace the invalidationThreshold=-1|0|>0 setting with a traceRequestors=true|false setting. 1. If traceRequestors == false, don't keep track of requestors, only send the invalidation from the originator, and enable l1OnRehash. This means we can keep the entries that are in L1 after a rehash as well. 2. If traceRequestors == true, track requestors, send unicast/multicast invalidations depending on the transport, and disable l1OnRehash. [1] https://issues.jboss.org/browse/ISPN-186 Cheers Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140204/cd16e2df/attachment-0001.html From galder at redhat.com Tue Feb 4 07:30:35 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Tue, 4 Feb 2014 13:30:35 +0100 Subject: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing leaving empty classes/ folders and errors In-Reply-To: References: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com> Message-ID: <05187A01-9E89-41D7-9E85-0FE4AC9DACCA@redhat.com> On 04 Feb 2014, at 10:38, Stuart Douglas wrote: > It is almost certainly something to do with this: > > > > > > > > I guess sometimes the classes artefact is being attached as a reference to the classes directory, rather than a reference to a jar, which causes the issue. Here?s a gist with a subset of the build log [1]. When it works fine, it?s copying a jar, when it?s not, it?s copying an empty folder. However, this is not only happening for the org.infinispan.server.rest module, others show the same issue [2]. What seems to be a pattern is that it only happens with modules that are built by us, it?s not happening for modules coming with the base AS/WF instance. I?ve traced back and this might be due to build failures that are not producing the right jars [3]. @Stuart, this is really our problem. Sorry for the inconvenience! [1] https://gist.github.com/galderz/b9286f385aad1316df51 [2] https://gist.github.com/galderz/9e6a9bd9b18b805db323 [3] https://gist.github.com/galderz/6ab662a1027cd96cbd8c > > Stuart > > > On Tue, Feb 4, 2014 at 11:14 AM, Galder Zamarre?o wrote: > > On 04 Feb 2014, at 10:01, Stuart Douglas wrote: > > > > > > > > > On Tue, Feb 4, 2014 at 11:00 AM, Stuart Douglas wrote: > > Yes, there is nothing in the server code that modified the modules directory. > > > > Well, except for the new patching stuff, but that is not really relevant here. > > The testsuite AS/WF builds are built out of the distribution build, which shows the same problem. The distribution we build uses the scripts we got from AS [1]. > > Do you see anything in there that could be causing this? We are using maven-antrun-plugin version 1.3, and take into account the lib.xml in [2]. > > Finally, do you have any suggestions on changes we could make to these files to further debug the issue? > > Thanks a lot for your help! > > [1] https://github.com/infinispan/infinispan/blob/master/server/integration/build/build.xml > [2] https://github.com/infinispan/infinispan/blob/master/server/integration/build/lib.xml > > > > > Stuart > > > > > > Stuart > > > > > > On Tue, Feb 4, 2014 at 10:56 AM, Galder Zamarre?o wrote: > > > > On 04 Feb 2014, at 09:37, Stuart Douglas wrote: > > > > > This looks like an issue with your environment. The modules directory is static. Wildfly does not contain any code that messes with it. I would say the culprit is probably something in either your build process or your test suite. > > > > Correction, this is happening with AS 7.2.0.Final (Wildfly 8 used somewhere else). I guess your answer still applies? > > > > Cheers, > > > > > > > > Stuart > > > > > > > > > On Tue, Feb 4, 2014 at 10:21 AM, Galder Zamarre?o wrote: > > > Hi all, > > > > > > We?re having issues with our Infinispan Server integration tests, which run within Wildfly 8.0.0.Beta1 (as I?m typing I?m wondering if we should just upgrade it to see if this goes away...?). > > > > > > Quite often some of the runs fail with error message [1]. > > > > > > Having looked at the build environment when a run fails, you see this: > > > > > > -- > > > $ ls modules/system/layers/base/org/infinispan/server/rest/main > > > drwxrwxr-x 2 g staff 68B Feb 3 18:41 classes ( > > -rw-r--r-- 1 g staff 1B Feb 3 18:41 classes.index > > > -rw-r--r-- 1 g staff 2.1K Feb 3 18:41 module.xml > > > > > > $ ls modules/system/layers/base/org/infinispan/server/rest/main/classes > > > drwxrwxr-x 2 g staff 68B Feb 3 18:41 . > > > drwxrwxr-x 5 g staff 170B Feb 3 18:41 .. > > > > > > $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml > > > > > > ... > > > > > > This is completely different to what happens with a successful run: > > > > > > -- > > > $ ls modules/system/layers/base/org/infinispan/server/rest/main > > > -rw-r--r-- 1 g staff 103K Feb 3 19:40 infinispan-classes.jar ( > > -rw-r--r-- 1 g staff 278B Feb 3 19:40 infinispan-classes.jar.index > > > -rw-r--r-- 1 g staff 2.1K Feb 3 19:40 module.xml > > > > > > $ jar tf modules/system/layers/base/org/infinispan/server/rest/main/infinispan-classes.jar | grep ExtendedHeaders > > > org/infinispan/rest/configuration/ExtendedHeaders.class > > > > > > $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml > > > > > > ? > > > > > > Anyone can explain what is going on here? Does it ring a bell to anyone? Is this a known Wildfly issue by any chance? > > > > > > [1] https://gist.github.com/galderz/bd74cebfc840ef3ae284 > > > > > > Cheers, > > > -- > > > Galder Zamarre?o > > > galder at redhat.com > > > twitter.com/galderz > > > > > > Project Lead, Escalante > > > http://escalante.io > > > > > > Engineer, Infinispan > > > http://infinispan.org > > > > > > > > > _______________________________________________ > > > jboss-as7-dev mailing list > > > jboss-as7-dev at lists.jboss.org > > > https://lists.jboss.org/mailman/listinfo/jboss-as7-dev > > > > > > > > > -- > > Galder Zamarre?o > > galder at redhat.com > > twitter.com/galderz > > > > Project Lead, Escalante > > http://escalante.io > > > > Engineer, Infinispan > > http://infinispan.org > > > > > > > > > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz > > Project Lead, Escalante > http://escalante.io > > Engineer, Infinispan > http://infinispan.org > > -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From galder at redhat.com Tue Feb 4 07:36:54 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Tue, 4 Feb 2014 13:36:54 +0100 Subject: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing leaving empty classes/ folders and errors In-Reply-To: <05187A01-9E89-41D7-9E85-0FE4AC9DACCA@redhat.com> References: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com> <05187A01-9E89-41D7-9E85-0FE4AC9DACCA@redhat.com> Message-ID: Narrowing down the list now, since this is a problem of how our CI is doing builds. These logs are retrieved from [1]. Dunno how our CI is configured but this is odd. Seems like the build is halt due to test failures, but it continues somehow? I mean, the jars are not being produced properly, but the build is not halting. It?s about time we did the following: 1) Any test failures should halt the build there and then. IOW, do not continue the build at all. 2) Any tests that fail randomly should be disabled. Cheers, [1] https://dl.dropboxusercontent.com/u/6148072/does-not-work.log On 04 Feb 2014, at 13:30, Galder Zamarre?o wrote: > > On 04 Feb 2014, at 10:38, Stuart Douglas wrote: > >> It is almost certainly something to do with this: >> >> >> >> >> >> >> >> I guess sometimes the classes artefact is being attached as a reference to the classes directory, rather than a reference to a jar, which causes the issue. > > Here?s a gist with a subset of the build log [1]. When it works fine, it?s copying a jar, when it?s not, it?s copying an empty folder. > > However, this is not only happening for the org.infinispan.server.rest module, others show the same issue [2]. What seems to be a pattern is that it only happens with modules that are built by us, it?s not happening for modules coming with the base AS/WF instance. > > I?ve traced back and this might be due to build failures that are not producing the right jars [3]. > > @Stuart, this is really our problem. Sorry for the inconvenience! > > [1] https://gist.github.com/galderz/b9286f385aad1316df51 > [2] https://gist.github.com/galderz/9e6a9bd9b18b805db323 > [3] https://gist.github.com/galderz/6ab662a1027cd96cbd8c > >> >> Stuart >> >> >> On Tue, Feb 4, 2014 at 11:14 AM, Galder Zamarre?o wrote: >> >> On 04 Feb 2014, at 10:01, Stuart Douglas wrote: >> >>> >>> >>> >>> On Tue, Feb 4, 2014 at 11:00 AM, Stuart Douglas wrote: >>> Yes, there is nothing in the server code that modified the modules directory. >>> >>> Well, except for the new patching stuff, but that is not really relevant here. >> >> The testsuite AS/WF builds are built out of the distribution build, which shows the same problem. The distribution we build uses the scripts we got from AS [1]. >> >> Do you see anything in there that could be causing this? We are using maven-antrun-plugin version 1.3, and take into account the lib.xml in [2]. >> >> Finally, do you have any suggestions on changes we could make to these files to further debug the issue? >> >> Thanks a lot for your help! >> >> [1] https://github.com/infinispan/infinispan/blob/master/server/integration/build/build.xml >> [2] https://github.com/infinispan/infinispan/blob/master/server/integration/build/lib.xml >> >>> >>> Stuart >>> >>> >>> Stuart >>> >>> >>> On Tue, Feb 4, 2014 at 10:56 AM, Galder Zamarre?o wrote: >>> >>> On 04 Feb 2014, at 09:37, Stuart Douglas wrote: >>> >>>> This looks like an issue with your environment. The modules directory is static. Wildfly does not contain any code that messes with it. I would say the culprit is probably something in either your build process or your test suite. >>> >>> Correction, this is happening with AS 7.2.0.Final (Wildfly 8 used somewhere else). I guess your answer still applies? >>> >>> Cheers, >>> >>>> >>>> Stuart >>>> >>>> >>>> On Tue, Feb 4, 2014 at 10:21 AM, Galder Zamarre?o wrote: >>>> Hi all, >>>> >>>> We?re having issues with our Infinispan Server integration tests, which run within Wildfly 8.0.0.Beta1 (as I?m typing I?m wondering if we should just upgrade it to see if this goes away...?). >>>> >>>> Quite often some of the runs fail with error message [1]. >>>> >>>> Having looked at the build environment when a run fails, you see this: >>>> >>>> -- >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main >>>> drwxrwxr-x 2 g staff 68B Feb 3 18:41 classes (>>> -rw-r--r-- 1 g staff 1B Feb 3 18:41 classes.index >>>> -rw-r--r-- 1 g staff 2.1K Feb 3 18:41 module.xml >>>> >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main/classes >>>> drwxrwxr-x 2 g staff 68B Feb 3 18:41 . >>>> drwxrwxr-x 5 g staff 170B Feb 3 18:41 .. >>>> >>>> $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml >>>> >>>> ... >>>> >>>> This is completely different to what happens with a successful run: >>>> >>>> -- >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main >>>> -rw-r--r-- 1 g staff 103K Feb 3 19:40 infinispan-classes.jar (>>> -rw-r--r-- 1 g staff 278B Feb 3 19:40 infinispan-classes.jar.index >>>> -rw-r--r-- 1 g staff 2.1K Feb 3 19:40 module.xml >>>> >>>> $ jar tf modules/system/layers/base/org/infinispan/server/rest/main/infinispan-classes.jar | grep ExtendedHeaders >>>> org/infinispan/rest/configuration/ExtendedHeaders.class >>>> >>>> $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml >>>> >>>> ? >>>> >>>> Anyone can explain what is going on here? Does it ring a bell to anyone? Is this a known Wildfly issue by any chance? >>>> >>>> [1] https://gist.github.com/galderz/bd74cebfc840ef3ae284 >>>> >>>> Cheers, >>>> -- >>>> Galder Zamarre?o >>>> galder at redhat.com >>>> twitter.com/galderz >>>> >>>> Project Lead, Escalante >>>> http://escalante.io >>>> >>>> Engineer, Infinispan >>>> http://infinispan.org >>>> >>>> >>>> _______________________________________________ >>>> jboss-as7-dev mailing list >>>> jboss-as7-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev >>>> >>> >>> >>> -- >>> Galder Zamarre?o >>> galder at redhat.com >>> twitter.com/galderz >>> >>> Project Lead, Escalante >>> http://escalante.io >>> >>> Engineer, Infinispan >>> http://infinispan.org >>> >>> >>> >> >> >> -- >> Galder Zamarre?o >> galder at redhat.com >> twitter.com/galderz >> >> Project Lead, Escalante >> http://escalante.io >> >> Engineer, Infinispan >> http://infinispan.org >> >> > > > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz > > Project Lead, Escalante > http://escalante.io > > Engineer, Infinispan > http://infinispan.org > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From dan.berindei at gmail.com Tue Feb 4 07:50:55 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Tue, 4 Feb 2014 14:50:55 +0200 Subject: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing leaving empty classes/ folders and errors In-Reply-To: References: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com> <05187A01-9E89-41D7-9E85-0FE4AC9DACCA@redhat.com> Message-ID: On Tue, Feb 4, 2014 at 2:36 PM, Galder Zamarre?o wrote: > Narrowing down the list now, since this is a problem of how our CI is > doing builds. > > These logs are retrieved from [1]. > > Dunno how our CI is configured but this is odd. Seems like the build is > halt due to test failures, but it continues somehow? I mean, the jars are > not being produced properly, but the build is not halting. > We run the build with -fn (fail-never), so the build should never be halted because of a test failure. The configuration is here: http://ci.infinispan.org/admin/editRunType.html?id=buildType:bt8&runnerId=RUNNER_1 > > It's about time we did the following: > 1) Any test failures should halt the build there and then. IOW, do not > continue the build at all. > Will having 100 tests in one run and 2000 tests in another really help? > 2) Any tests that fail randomly should be disabled. > Let's go ahead and disable all the server tests then? ;) > > Cheers, > > [1] https://dl.dropboxusercontent.com/u/6148072/does-not-work.log > > On 04 Feb 2014, at 13:30, Galder Zamarre?o wrote: > > > > > On 04 Feb 2014, at 10:38, Stuart Douglas > wrote: > > > >> It is almost certainly something to do with this: > >> > >> src="${infinispan.server.modules.dir}"> > >> > >> artifact="infinispan-server-rest" classifier="classes" /> > >> > >> > >> > >> I guess sometimes the classes artefact is being attached as a reference > to the classes directory, rather than a reference to a jar, which causes > the issue. > > > > Here's a gist with a subset of the build log [1]. When it works fine, > it's copying a jar, when it's not, it's copying an empty folder. > > > > However, this is not only happening for the org.infinispan.server.rest > module, others show the same issue [2]. What seems to be a pattern is that > it only happens with modules that are built by us, it's not happening for > modules coming with the base AS/WF instance. > > > > I've traced back and this might be due to build failures that are not > producing the right jars [3]. > > > > @Stuart, this is really our problem. Sorry for the inconvenience! > > > > [1] https://gist.github.com/galderz/b9286f385aad1316df51 > > [2] https://gist.github.com/galderz/9e6a9bd9b18b805db323 > > [3] https://gist.github.com/galderz/6ab662a1027cd96cbd8c > > > >> > >> Stuart > >> > >> > >> On Tue, Feb 4, 2014 at 11:14 AM, Galder Zamarre?o > wrote: > >> > >> On 04 Feb 2014, at 10:01, Stuart Douglas > wrote: > >> > >>> > >>> > >>> > >>> On Tue, Feb 4, 2014 at 11:00 AM, Stuart Douglas < > stuart.w.douglas at gmail.com> wrote: > >>> Yes, there is nothing in the server code that modified the modules > directory. > >>> > >>> Well, except for the new patching stuff, but that is not really > relevant here. > >> > >> The testsuite AS/WF builds are built out of the distribution build, > which shows the same problem. The distribution we build uses the scripts we > got from AS [1]. > >> > >> Do you see anything in there that could be causing this? We are using > maven-antrun-plugin version 1.3, and take into account the lib.xml in [2]. > >> > >> Finally, do you have any suggestions on changes we could make to these > files to further debug the issue? > >> > >> Thanks a lot for your help! > >> > >> [1] > https://github.com/infinispan/infinispan/blob/master/server/integration/build/build.xml > >> [2] > https://github.com/infinispan/infinispan/blob/master/server/integration/build/lib.xml > >> > >>> > >>> Stuart > >>> > >>> > >>> Stuart > >>> > >>> > >>> On Tue, Feb 4, 2014 at 10:56 AM, Galder Zamarre?o > wrote: > >>> > >>> On 04 Feb 2014, at 09:37, Stuart Douglas > wrote: > >>> > >>>> This looks like an issue with your environment. The modules directory > is static. Wildfly does not contain any code that messes with it. I would > say the culprit is probably something in either your build process or your > test suite. > >>> > >>> Correction, this is happening with AS 7.2.0.Final (Wildfly 8 used > somewhere else). I guess your answer still applies? > >>> > >>> Cheers, > >>> > >>>> > >>>> Stuart > >>>> > >>>> > >>>> On Tue, Feb 4, 2014 at 10:21 AM, Galder Zamarre?o > wrote: > >>>> Hi all, > >>>> > >>>> We're having issues with our Infinispan Server integration tests, > which run within Wildfly 8.0.0.Beta1 (as I'm typing I'm wondering if we > should just upgrade it to see if this goes away...?). > >>>> > >>>> Quite often some of the runs fail with error message [1]. > >>>> > >>>> Having looked at the build environment when a run fails, you see this: > >>>> > >>>> -- > >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main > >>>> drwxrwxr-x 2 g staff 68B Feb 3 18:41 classes (<-- a directory??) > >>>> -rw-r--r-- 1 g staff 1B Feb 3 18:41 classes.index > >>>> -rw-r--r-- 1 g staff 2.1K Feb 3 18:41 module.xml > >>>> > >>>> $ ls > modules/system/layers/base/org/infinispan/server/rest/main/classes > >>>> drwxrwxr-x 2 g staff 68B Feb 3 18:41 . > >>>> drwxrwxr-x 5 g staff 170B Feb 3 18:41 .. > >>>> > >>>> $ more > modules/system/layers/base/org/infinispan/server/rest/main/module.xml > >>>> name="org.infinispan.server.rest"> > >>>> ... > >>>> > >>>> ... > >>>> > >>>> This is completely different to what happens with a successful run: > >>>> > >>>> -- > >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main > >>>> -rw-r--r-- 1 g staff 103K Feb 3 19:40 infinispan-classes.jar (<-- > a jar file!) > >>>> -rw-r--r-- 1 g staff 278B Feb 3 19:40 > infinispan-classes.jar.index > >>>> -rw-r--r-- 1 g staff 2.1K Feb 3 19:40 module.xml > >>>> > >>>> $ jar tf > modules/system/layers/base/org/infinispan/server/rest/main/infinispan-classes.jar > | grep ExtendedHeaders > >>>> org/infinispan/rest/configuration/ExtendedHeaders.class > >>>> > >>>> $ more > modules/system/layers/base/org/infinispan/server/rest/main/module.xml > >>>> name="org.infinispan.server.rest"> > >>>> ... > >>>> > >>>> -- > >>>> > >>>> Anyone can explain what is going on here? Does it ring a bell to > anyone? Is this a known Wildfly issue by any chance? > >>>> > >>>> [1] https://gist.github.com/galderz/bd74cebfc840ef3ae284 > >>>> > >>>> Cheers, > >>>> -- > >>>> Galder Zamarre?o > >>>> galder at redhat.com > >>>> twitter.com/galderz > >>>> > >>>> Project Lead, Escalante > >>>> http://escalante.io > >>>> > >>>> Engineer, Infinispan > >>>> http://infinispan.org > >>>> > >>>> > >>>> _______________________________________________ > >>>> jboss-as7-dev mailing list > >>>> jboss-as7-dev at lists.jboss.org > >>>> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev > >>>> > >>> > >>> > >>> -- > >>> Galder Zamarre?o > >>> galder at redhat.com > >>> twitter.com/galderz > >>> > >>> Project Lead, Escalante > >>> http://escalante.io > >>> > >>> Engineer, Infinispan > >>> http://infinispan.org > >>> > >>> > >>> > >> > >> > >> -- > >> Galder Zamarre?o > >> galder at redhat.com > >> twitter.com/galderz > >> > >> Project Lead, Escalante > >> http://escalante.io > >> > >> Engineer, Infinispan > >> http://infinispan.org > >> > >> > > > > > > -- > > Galder Zamarre?o > > galder at redhat.com > > twitter.com/galderz > > > > Project Lead, Escalante > > http://escalante.io > > > > Engineer, Infinispan > > http://infinispan.org > > > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz > > Project Lead, Escalante > http://escalante.io > > Engineer, Infinispan > http://infinispan.org > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140204/a15edc6a/attachment-0001.html From galder at redhat.com Tue Feb 4 07:52:09 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Tue, 4 Feb 2014 13:52:09 +0100 Subject: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing leaving empty classes/ folders and errors In-Reply-To: References: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com> <05187A01-9E89-41D7-9E85-0FE4AC9DACCA@redhat.com> Message-ID: <7D6D9D4B-8023-4A04-B946-8CB90640319F@redhat.com> On 04 Feb 2014, at 13:36, Galder Zamarre?o wrote: > Narrowing down the list now, since this is a problem of how our CI is doing builds. > > These logs are retrieved from [1]. > > Dunno how our CI is configured but this is odd. Seems like the build is halt due to test failures, but it continues somehow? I mean, the jars are not being produced properly, but the build is not halting. > > It?s about time we did the following: > 1) Any test failures should halt the build there and then. IOW, do not continue the build at all. > 2) Any tests that fail randomly should be disabled. Having had to debug throw this, I can certainly understand Sanne?s frustration, and as server component lead, I?m not going to bother looking at any CI builds until all modules that server modules depend on are green and their testsuites are passing. And I?m gonna do the same. I?m going to disable all tests that are failing for which I?m lead, and try to solve them in the next few days. I won?t do any further Infinispan development until then. Cheers, > > Cheers, > > [1] https://dl.dropboxusercontent.com/u/6148072/does-not-work.log > > On 04 Feb 2014, at 13:30, Galder Zamarre?o wrote: > >> >> On 04 Feb 2014, at 10:38, Stuart Douglas wrote: >> >>> It is almost certainly something to do with this: >>> >>> >>> >>> >>> >>> >>> >>> I guess sometimes the classes artefact is being attached as a reference to the classes directory, rather than a reference to a jar, which causes the issue. >> >> Here?s a gist with a subset of the build log [1]. When it works fine, it?s copying a jar, when it?s not, it?s copying an empty folder. >> >> However, this is not only happening for the org.infinispan.server.rest module, others show the same issue [2]. What seems to be a pattern is that it only happens with modules that are built by us, it?s not happening for modules coming with the base AS/WF instance. >> >> I?ve traced back and this might be due to build failures that are not producing the right jars [3]. >> >> @Stuart, this is really our problem. Sorry for the inconvenience! >> >> [1] https://gist.github.com/galderz/b9286f385aad1316df51 >> [2] https://gist.github.com/galderz/9e6a9bd9b18b805db323 >> [3] https://gist.github.com/galderz/6ab662a1027cd96cbd8c >> >>> >>> Stuart >>> >>> >>> On Tue, Feb 4, 2014 at 11:14 AM, Galder Zamarre?o wrote: >>> >>> On 04 Feb 2014, at 10:01, Stuart Douglas wrote: >>> >>>> >>>> >>>> >>>> On Tue, Feb 4, 2014 at 11:00 AM, Stuart Douglas wrote: >>>> Yes, there is nothing in the server code that modified the modules directory. >>>> >>>> Well, except for the new patching stuff, but that is not really relevant here. >>> >>> The testsuite AS/WF builds are built out of the distribution build, which shows the same problem. The distribution we build uses the scripts we got from AS [1]. >>> >>> Do you see anything in there that could be causing this? We are using maven-antrun-plugin version 1.3, and take into account the lib.xml in [2]. >>> >>> Finally, do you have any suggestions on changes we could make to these files to further debug the issue? >>> >>> Thanks a lot for your help! >>> >>> [1] https://github.com/infinispan/infinispan/blob/master/server/integration/build/build.xml >>> [2] https://github.com/infinispan/infinispan/blob/master/server/integration/build/lib.xml >>> >>>> >>>> Stuart >>>> >>>> >>>> Stuart >>>> >>>> >>>> On Tue, Feb 4, 2014 at 10:56 AM, Galder Zamarre?o wrote: >>>> >>>> On 04 Feb 2014, at 09:37, Stuart Douglas wrote: >>>> >>>>> This looks like an issue with your environment. The modules directory is static. Wildfly does not contain any code that messes with it. I would say the culprit is probably something in either your build process or your test suite. >>>> >>>> Correction, this is happening with AS 7.2.0.Final (Wildfly 8 used somewhere else). I guess your answer still applies? >>>> >>>> Cheers, >>>> >>>>> >>>>> Stuart >>>>> >>>>> >>>>> On Tue, Feb 4, 2014 at 10:21 AM, Galder Zamarre?o wrote: >>>>> Hi all, >>>>> >>>>> We?re having issues with our Infinispan Server integration tests, which run within Wildfly 8.0.0.Beta1 (as I?m typing I?m wondering if we should just upgrade it to see if this goes away...?). >>>>> >>>>> Quite often some of the runs fail with error message [1]. >>>>> >>>>> Having looked at the build environment when a run fails, you see this: >>>>> >>>>> -- >>>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main >>>>> drwxrwxr-x 2 g staff 68B Feb 3 18:41 classes (>>>> -rw-r--r-- 1 g staff 1B Feb 3 18:41 classes.index >>>>> -rw-r--r-- 1 g staff 2.1K Feb 3 18:41 module.xml >>>>> >>>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main/classes >>>>> drwxrwxr-x 2 g staff 68B Feb 3 18:41 . >>>>> drwxrwxr-x 5 g staff 170B Feb 3 18:41 .. >>>>> >>>>> $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml >>>>> >>>>> ... >>>>> >>>>> This is completely different to what happens with a successful run: >>>>> >>>>> -- >>>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main >>>>> -rw-r--r-- 1 g staff 103K Feb 3 19:40 infinispan-classes.jar (>>>> -rw-r--r-- 1 g staff 278B Feb 3 19:40 infinispan-classes.jar.index >>>>> -rw-r--r-- 1 g staff 2.1K Feb 3 19:40 module.xml >>>>> >>>>> $ jar tf modules/system/layers/base/org/infinispan/server/rest/main/infinispan-classes.jar | grep ExtendedHeaders >>>>> org/infinispan/rest/configuration/ExtendedHeaders.class >>>>> >>>>> $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml >>>>> >>>>> ? >>>>> >>>>> Anyone can explain what is going on here? Does it ring a bell to anyone? Is this a known Wildfly issue by any chance? >>>>> >>>>> [1] https://gist.github.com/galderz/bd74cebfc840ef3ae284 >>>>> >>>>> Cheers, >>>>> -- >>>>> Galder Zamarre?o >>>>> galder at redhat.com >>>>> twitter.com/galderz >>>>> >>>>> Project Lead, Escalante >>>>> http://escalante.io >>>>> >>>>> Engineer, Infinispan >>>>> http://infinispan.org >>>>> >>>>> >>>>> _______________________________________________ >>>>> jboss-as7-dev mailing list >>>>> jboss-as7-dev at lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev >>>>> >>>> >>>> >>>> -- >>>> Galder Zamarre?o >>>> galder at redhat.com >>>> twitter.com/galderz >>>> >>>> Project Lead, Escalante >>>> http://escalante.io >>>> >>>> Engineer, Infinispan >>>> http://infinispan.org >>>> >>>> >>>> >>> >>> >>> -- >>> Galder Zamarre?o >>> galder at redhat.com >>> twitter.com/galderz >>> >>> Project Lead, Escalante >>> http://escalante.io >>> >>> Engineer, Infinispan >>> http://infinispan.org >>> >>> >> >> >> -- >> Galder Zamarre?o >> galder at redhat.com >> twitter.com/galderz >> >> Project Lead, Escalante >> http://escalante.io >> >> Engineer, Infinispan >> http://infinispan.org >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz > > Project Lead, Escalante > http://escalante.io > > Engineer, Infinispan > http://infinispan.org > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From galder at redhat.com Tue Feb 4 08:03:16 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Tue, 4 Feb 2014 14:03:16 +0100 Subject: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing leaving empty classes/ folders and errors In-Reply-To: References: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com> <05187A01-9E89-41D7-9E85-0FE4AC9DACCA@redhat.com> Message-ID: <7B023B75-68A3-46D1-A77F-F355F84CABCF@redhat.com> On 04 Feb 2014, at 13:50, Dan Berindei wrote: > > On Tue, Feb 4, 2014 at 2:36 PM, Galder Zamarre?o wrote: > Narrowing down the list now, since this is a problem of how our CI is doing builds. > > These logs are retrieved from [1]. > > Dunno how our CI is configured but this is odd. Seems like the build is halt due to test failures, but it continues somehow? I mean, the jars are not being produced properly, but the build is not halting. > > We run the build with -fn (fail-never), so the build should never be halted because of a test failure. The configuration is here: http://ci.infinispan.org/admin/editRunType.html?id=buildType:bt8&runnerId=RUNNER_1 ^ That?s not working as expected, see the build log, my snippets?etc. > > > > It?s about time we did the following: > 1) Any test failures should halt the build there and then. IOW, do not continue the build at all. > > Will having 100 tests in one run and 2000 tests in another really help? As you disable randomly failing tests, and do not integrate commits making the testsuite fail, these number should even out. > > > 2) Any tests that fail randomly should be disabled. > > Let's go ahead and disable all the server tests then? ;) Those server tests that are randomly failing should be disabled and looked at. Those tests that are failing as a result of container not starting are side effects of things not working properly, and these should not be disabled. > > > > Cheers, > > [1] https://dl.dropboxusercontent.com/u/6148072/does-not-work.log > > On 04 Feb 2014, at 13:30, Galder Zamarre?o wrote: > > > > > On 04 Feb 2014, at 10:38, Stuart Douglas wrote: > > > >> It is almost certainly something to do with this: > >> > >> > >> > >> > >> > >> > >> > >> I guess sometimes the classes artefact is being attached as a reference to the classes directory, rather than a reference to a jar, which causes the issue. > > > > Here?s a gist with a subset of the build log [1]. When it works fine, it?s copying a jar, when it?s not, it?s copying an empty folder. > > > > However, this is not only happening for the org.infinispan.server.rest module, others show the same issue [2]. What seems to be a pattern is that it only happens with modules that are built by us, it?s not happening for modules coming with the base AS/WF instance. > > > > I?ve traced back and this might be due to build failures that are not producing the right jars [3]. > > > > @Stuart, this is really our problem. Sorry for the inconvenience! > > > > [1] https://gist.github.com/galderz/b9286f385aad1316df51 > > [2] https://gist.github.com/galderz/9e6a9bd9b18b805db323 > > [3] https://gist.github.com/galderz/6ab662a1027cd96cbd8c > > > >> > >> Stuart > >> > >> > >> On Tue, Feb 4, 2014 at 11:14 AM, Galder Zamarre?o wrote: > >> > >> On 04 Feb 2014, at 10:01, Stuart Douglas wrote: > >> > >>> > >>> > >>> > >>> On Tue, Feb 4, 2014 at 11:00 AM, Stuart Douglas wrote: > >>> Yes, there is nothing in the server code that modified the modules directory. > >>> > >>> Well, except for the new patching stuff, but that is not really relevant here. > >> > >> The testsuite AS/WF builds are built out of the distribution build, which shows the same problem. The distribution we build uses the scripts we got from AS [1]. > >> > >> Do you see anything in there that could be causing this? We are using maven-antrun-plugin version 1.3, and take into account the lib.xml in [2]. > >> > >> Finally, do you have any suggestions on changes we could make to these files to further debug the issue? > >> > >> Thanks a lot for your help! > >> > >> [1] https://github.com/infinispan/infinispan/blob/master/server/integration/build/build.xml > >> [2] https://github.com/infinispan/infinispan/blob/master/server/integration/build/lib.xml > >> > >>> > >>> Stuart > >>> > >>> > >>> Stuart > >>> > >>> > >>> On Tue, Feb 4, 2014 at 10:56 AM, Galder Zamarre?o wrote: > >>> > >>> On 04 Feb 2014, at 09:37, Stuart Douglas wrote: > >>> > >>>> This looks like an issue with your environment. The modules directory is static. Wildfly does not contain any code that messes with it. I would say the culprit is probably something in either your build process or your test suite. > >>> > >>> Correction, this is happening with AS 7.2.0.Final (Wildfly 8 used somewhere else). I guess your answer still applies? > >>> > >>> Cheers, > >>> > >>>> > >>>> Stuart > >>>> > >>>> > >>>> On Tue, Feb 4, 2014 at 10:21 AM, Galder Zamarre?o wrote: > >>>> Hi all, > >>>> > >>>> We?re having issues with our Infinispan Server integration tests, which run within Wildfly 8.0.0.Beta1 (as I?m typing I?m wondering if we should just upgrade it to see if this goes away...?). > >>>> > >>>> Quite often some of the runs fail with error message [1]. > >>>> > >>>> Having looked at the build environment when a run fails, you see this: > >>>> > >>>> -- > >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main > >>>> drwxrwxr-x 2 g staff 68B Feb 3 18:41 classes ( >>>> -rw-r--r-- 1 g staff 1B Feb 3 18:41 classes.index > >>>> -rw-r--r-- 1 g staff 2.1K Feb 3 18:41 module.xml > >>>> > >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main/classes > >>>> drwxrwxr-x 2 g staff 68B Feb 3 18:41 . > >>>> drwxrwxr-x 5 g staff 170B Feb 3 18:41 .. > >>>> > >>>> $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml > >>>> > >>>> ... > >>>> > >>>> This is completely different to what happens with a successful run: > >>>> > >>>> -- > >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main > >>>> -rw-r--r-- 1 g staff 103K Feb 3 19:40 infinispan-classes.jar ( >>>> -rw-r--r-- 1 g staff 278B Feb 3 19:40 infinispan-classes.jar.index > >>>> -rw-r--r-- 1 g staff 2.1K Feb 3 19:40 module.xml > >>>> > >>>> $ jar tf modules/system/layers/base/org/infinispan/server/rest/main/infinispan-classes.jar | grep ExtendedHeaders > >>>> org/infinispan/rest/configuration/ExtendedHeaders.class > >>>> > >>>> $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml > >>>> > >>>> ? > >>>> > >>>> Anyone can explain what is going on here? Does it ring a bell to anyone? Is this a known Wildfly issue by any chance? > >>>> > >>>> [1] https://gist.github.com/galderz/bd74cebfc840ef3ae284 > >>>> > >>>> Cheers, > >>>> -- > >>>> Galder Zamarre?o > >>>> galder at redhat.com > >>>> twitter.com/galderz > >>>> > >>>> Project Lead, Escalante > >>>> http://escalante.io > >>>> > >>>> Engineer, Infinispan > >>>> http://infinispan.org > >>>> > >>>> > >>>> _______________________________________________ > >>>> jboss-as7-dev mailing list > >>>> jboss-as7-dev at lists.jboss.org > >>>> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev > >>>> > >>> > >>> > >>> -- > >>> Galder Zamarre?o > >>> galder at redhat.com > >>> twitter.com/galderz > >>> > >>> Project Lead, Escalante > >>> http://escalante.io > >>> > >>> Engineer, Infinispan > >>> http://infinispan.org > >>> > >>> > >>> > >> > >> > >> -- > >> Galder Zamarre?o > >> galder at redhat.com > >> twitter.com/galderz > >> > >> Project Lead, Escalante > >> http://escalante.io > >> > >> Engineer, Infinispan > >> http://infinispan.org > >> > >> > > > > > > -- > > Galder Zamarre?o > > galder at redhat.com > > twitter.com/galderz > > > > Project Lead, Escalante > > http://escalante.io > > > > Engineer, Infinispan > > http://infinispan.org > > > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz > > Project Lead, Escalante > http://escalante.io > > Engineer, Infinispan > http://infinispan.org > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From dan.berindei at gmail.com Tue Feb 4 08:47:54 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Tue, 4 Feb 2014 15:47:54 +0200 Subject: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing leaving empty classes/ folders and errors In-Reply-To: <7B023B75-68A3-46D1-A77F-F355F84CABCF@redhat.com> References: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com> <05187A01-9E89-41D7-9E85-0FE4AC9DACCA@redhat.com> <7B023B75-68A3-46D1-A77F-F355F84CABCF@redhat.com> Message-ID: On Tue, Feb 4, 2014 at 3:03 PM, Galder Zamarre?o wrote: > > On 04 Feb 2014, at 13:50, Dan Berindei wrote: > > > > > On Tue, Feb 4, 2014 at 2:36 PM, Galder Zamarre?o > wrote: > > Narrowing down the list now, since this is a problem of how our CI is > doing builds. > > > > These logs are retrieved from [1]. > > > > Dunno how our CI is configured but this is odd. Seems like the build is > halt due to test failures, but it continues somehow? I mean, the jars are > not being produced properly, but the build is not halting. > > > > We run the build with -fn (fail-never), so the build should never be > halted because of a test failure. The configuration is here: > http://ci.infinispan.org/admin/editRunType.html?id=buildType:bt8&runnerId=RUNNER_1 > > ^ That's not working as expected, see the build log, my snippets...etc. > Sorry, I didn't understand what's happening in those snippets. All I saw was an Ant script that doesn't do what it's supposed to do :) I did see some differences in the configuration between the JDK6 and the JDK7 builds: * JDK7 uses -fn and JDK6 uses -Dmaven.test.failure.ignore * JDK7 uses -nsu (no snapshot updates), JDK6 doesn't I've changed both builds to use -Dmaven.test.failure.ignore and -nsu, let's see how it goes. > > > > > > > > > It's about time we did the following: > > 1) Any test failures should halt the build there and then. IOW, do not > continue the build at all. > > > > Will having 100 tests in one run and 2000 tests in another really help? > > As you disable randomly failing tests, and do not integrate commits making > the testsuite fail, these number should even out. > Not integrating commits that fail every time is easy, not integrating commits that fail randomly (maybe only in some environments) is trickier. > > > > > > > 2) Any tests that fail randomly should be disabled. > > > > Let's go ahead and disable all the server tests then? ;) > > Those server tests that are randomly failing should be disabled and looked > at. Those tests that are failing as a result of container not starting are > side effects of things not working properly, and these should not be > disabled. > Why treat the tests that are failing because of a build problem differently? What about the tests that fail only on IBM JDK6? > > > > > > > > Cheers, > > > > [1] https://dl.dropboxusercontent.com/u/6148072/does-not-work.log > > > > On 04 Feb 2014, at 13:30, Galder Zamarre?o wrote: > > > > > > > > On 04 Feb 2014, at 10:38, Stuart Douglas > wrote: > > > > > >> It is almost certainly something to do with this: > > >> > > >> src="${infinispan.server.modules.dir}"> > > >> > > >> artifact="infinispan-server-rest" classifier="classes" /> > > >> > > >> > > >> > > >> I guess sometimes the classes artefact is being attached as a > reference to the classes directory, rather than a reference to a jar, which > causes the issue. > > > > > > Here's a gist with a subset of the build log [1]. When it works fine, > it's copying a jar, when it's not, it's copying an empty folder. > > > > > > However, this is not only happening for the org.infinispan.server.rest > module, others show the same issue [2]. What seems to be a pattern is that > it only happens with modules that are built by us, it's not happening for > modules coming with the base AS/WF instance. > > > > > > I've traced back and this might be due to build failures that are not > producing the right jars [3]. > > > > > > @Stuart, this is really our problem. Sorry for the inconvenience! > > > > > > [1] https://gist.github.com/galderz/b9286f385aad1316df51 > > > [2] https://gist.github.com/galderz/9e6a9bd9b18b805db323 > > > [3] https://gist.github.com/galderz/6ab662a1027cd96cbd8c > > > > > >> > > >> Stuart > > >> > > >> > > >> On Tue, Feb 4, 2014 at 11:14 AM, Galder Zamarre?o > wrote: > > >> > > >> On 04 Feb 2014, at 10:01, Stuart Douglas > wrote: > > >> > > >>> > > >>> > > >>> > > >>> On Tue, Feb 4, 2014 at 11:00 AM, Stuart Douglas < > stuart.w.douglas at gmail.com> wrote: > > >>> Yes, there is nothing in the server code that modified the modules > directory. > > >>> > > >>> Well, except for the new patching stuff, but that is not really > relevant here. > > >> > > >> The testsuite AS/WF builds are built out of the distribution build, > which shows the same problem. The distribution we build uses the scripts we > got from AS [1]. > > >> > > >> Do you see anything in there that could be causing this? We are using > maven-antrun-plugin version 1.3, and take into account the lib.xml in [2]. > > >> > > >> Finally, do you have any suggestions on changes we could make to > these files to further debug the issue? > > >> > > >> Thanks a lot for your help! > > >> > > >> [1] > https://github.com/infinispan/infinispan/blob/master/server/integration/build/build.xml > > >> [2] > https://github.com/infinispan/infinispan/blob/master/server/integration/build/lib.xml > > >> > > >>> > > >>> Stuart > > >>> > > >>> > > >>> Stuart > > >>> > > >>> > > >>> On Tue, Feb 4, 2014 at 10:56 AM, Galder Zamarre?o > wrote: > > >>> > > >>> On 04 Feb 2014, at 09:37, Stuart Douglas > wrote: > > >>> > > >>>> This looks like an issue with your environment. The modules > directory is static. Wildfly does not contain any code that messes with it. > I would say the culprit is probably something in either your build process > or your test suite. > > >>> > > >>> Correction, this is happening with AS 7.2.0.Final (Wildfly 8 used > somewhere else). I guess your answer still applies? > > >>> > > >>> Cheers, > > >>> > > >>>> > > >>>> Stuart > > >>>> > > >>>> > > >>>> On Tue, Feb 4, 2014 at 10:21 AM, Galder Zamarre?o < > galder at redhat.com> wrote: > > >>>> Hi all, > > >>>> > > >>>> We're having issues with our Infinispan Server integration tests, > which run within Wildfly 8.0.0.Beta1 (as I'm typing I'm wondering if we > should just upgrade it to see if this goes away...?). > > >>>> > > >>>> Quite often some of the runs fail with error message [1]. > > >>>> > > >>>> Having looked at the build environment when a run fails, you see > this: > > >>>> > > >>>> -- > > >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main > > >>>> drwxrwxr-x 2 g staff 68B Feb 3 18:41 classes (<-- a > directory??) > > >>>> -rw-r--r-- 1 g staff 1B Feb 3 18:41 classes.index > > >>>> -rw-r--r-- 1 g staff 2.1K Feb 3 18:41 module.xml > > >>>> > > >>>> $ ls > modules/system/layers/base/org/infinispan/server/rest/main/classes > > >>>> drwxrwxr-x 2 g staff 68B Feb 3 18:41 . > > >>>> drwxrwxr-x 5 g staff 170B Feb 3 18:41 .. > > >>>> > > >>>> $ more > modules/system/layers/base/org/infinispan/server/rest/main/module.xml > > >>>> name="org.infinispan.server.rest"> > > >>>> ... > > >>>> > > >>>> ... > > >>>> > > >>>> This is completely different to what happens with a successful run: > > >>>> > > >>>> -- > > >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main > > >>>> -rw-r--r-- 1 g staff 103K Feb 3 19:40 infinispan-classes.jar > (<-- a jar file!) > > >>>> -rw-r--r-- 1 g staff 278B Feb 3 19:40 > infinispan-classes.jar.index > > >>>> -rw-r--r-- 1 g staff 2.1K Feb 3 19:40 module.xml > > >>>> > > >>>> $ jar tf > modules/system/layers/base/org/infinispan/server/rest/main/infinispan-classes.jar > | grep ExtendedHeaders > > >>>> org/infinispan/rest/configuration/ExtendedHeaders.class > > >>>> > > >>>> $ more > modules/system/layers/base/org/infinispan/server/rest/main/module.xml > > >>>> name="org.infinispan.server.rest"> > > >>>> ... > > >>>> > > >>>> -- > > >>>> > > >>>> Anyone can explain what is going on here? Does it ring a bell to > anyone? Is this a known Wildfly issue by any chance? > > >>>> > > >>>> [1] https://gist.github.com/galderz/bd74cebfc840ef3ae284 > > >>>> > > >>>> Cheers, > > >>>> -- > > >>>> Galder Zamarre?o > > >>>> galder at redhat.com > > >>>> twitter.com/galderz > > >>>> > > >>>> Project Lead, Escalante > > >>>> http://escalante.io > > >>>> > > >>>> Engineer, Infinispan > > >>>> http://infinispan.org > > >>>> > > >>>> > > >>>> _______________________________________________ > > >>>> jboss-as7-dev mailing list > > >>>> jboss-as7-dev at lists.jboss.org > > >>>> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev > > >>>> > > >>> > > >>> > > >>> -- > > >>> Galder Zamarre?o > > >>> galder at redhat.com > > >>> twitter.com/galderz > > >>> > > >>> Project Lead, Escalante > > >>> http://escalante.io > > >>> > > >>> Engineer, Infinispan > > >>> http://infinispan.org > > >>> > > >>> > > >>> > > >> > > >> > > >> -- > > >> Galder Zamarre?o > > >> galder at redhat.com > > >> twitter.com/galderz > > >> > > >> Project Lead, Escalante > > >> http://escalante.io > > >> > > >> Engineer, Infinispan > > >> http://infinispan.org > > >> > > >> > > > > > > > > > -- > > > Galder Zamarre?o > > > galder at redhat.com > > > twitter.com/galderz > > > > > > Project Lead, Escalante > > > http://escalante.io > > > > > > Engineer, Infinispan > > > http://infinispan.org > > > > > > > > > _______________________________________________ > > > infinispan-dev mailing list > > > infinispan-dev at lists.jboss.org > > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > > -- > > Galder Zamarre?o > > galder at redhat.com > > twitter.com/galderz > > > > Project Lead, Escalante > > http://escalante.io > > > > Engineer, Infinispan > > http://infinispan.org > > > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz > > Project Lead, Escalante > http://escalante.io > > Engineer, Infinispan > http://infinispan.org > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140204/4cb657c3/attachment-0001.html From galder at redhat.com Tue Feb 4 08:03:16 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Tue, 4 Feb 2014 14:03:16 +0100 Subject: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing leaving empty classes/ folders and errors In-Reply-To: References: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com> <05187A01-9E89-41D7-9E85-0FE4AC9DACCA@redhat.com> Message-ID: <7B023B75-68A3-46D1-A77F-F355F84CABCF@redhat.com> On 04 Feb 2014, at 13:50, Dan Berindei wrote: > > On Tue, Feb 4, 2014 at 2:36 PM, Galder Zamarre?o wrote: > Narrowing down the list now, since this is a problem of how our CI is doing builds. > > These logs are retrieved from [1]. > > Dunno how our CI is configured but this is odd. Seems like the build is halt due to test failures, but it continues somehow? I mean, the jars are not being produced properly, but the build is not halting. > > We run the build with -fn (fail-never), so the build should never be halted because of a test failure. The configuration is here: http://ci.infinispan.org/admin/editRunType.html?id=buildType:bt8&runnerId=RUNNER_1 ^ That?s not working as expected, see the build log, my snippets?etc. > > > > It?s about time we did the following: > 1) Any test failures should halt the build there and then. IOW, do not continue the build at all. > > Will having 100 tests in one run and 2000 tests in another really help? As you disable randomly failing tests, and do not integrate commits making the testsuite fail, these number should even out. > > > 2) Any tests that fail randomly should be disabled. > > Let's go ahead and disable all the server tests then? ;) Those server tests that are randomly failing should be disabled and looked at. Those tests that are failing as a result of container not starting are side effects of things not working properly, and these should not be disabled. > > > > Cheers, > > [1] https://dl.dropboxusercontent.com/u/6148072/does-not-work.log > > On 04 Feb 2014, at 13:30, Galder Zamarre?o wrote: > > > > > On 04 Feb 2014, at 10:38, Stuart Douglas wrote: > > > >> It is almost certainly something to do with this: > >> > >> > >> > >> > >> > >> > >> > >> I guess sometimes the classes artefact is being attached as a reference to the classes directory, rather than a reference to a jar, which causes the issue. > > > > Here?s a gist with a subset of the build log [1]. When it works fine, it?s copying a jar, when it?s not, it?s copying an empty folder. > > > > However, this is not only happening for the org.infinispan.server.rest module, others show the same issue [2]. What seems to be a pattern is that it only happens with modules that are built by us, it?s not happening for modules coming with the base AS/WF instance. > > > > I?ve traced back and this might be due to build failures that are not producing the right jars [3]. > > > > @Stuart, this is really our problem. Sorry for the inconvenience! > > > > [1] https://gist.github.com/galderz/b9286f385aad1316df51 > > [2] https://gist.github.com/galderz/9e6a9bd9b18b805db323 > > [3] https://gist.github.com/galderz/6ab662a1027cd96cbd8c > > > >> > >> Stuart > >> > >> > >> On Tue, Feb 4, 2014 at 11:14 AM, Galder Zamarre?o wrote: > >> > >> On 04 Feb 2014, at 10:01, Stuart Douglas wrote: > >> > >>> > >>> > >>> > >>> On Tue, Feb 4, 2014 at 11:00 AM, Stuart Douglas wrote: > >>> Yes, there is nothing in the server code that modified the modules directory. > >>> > >>> Well, except for the new patching stuff, but that is not really relevant here. > >> > >> The testsuite AS/WF builds are built out of the distribution build, which shows the same problem. The distribution we build uses the scripts we got from AS [1]. > >> > >> Do you see anything in there that could be causing this? We are using maven-antrun-plugin version 1.3, and take into account the lib.xml in [2]. > >> > >> Finally, do you have any suggestions on changes we could make to these files to further debug the issue? > >> > >> Thanks a lot for your help! > >> > >> [1] https://github.com/infinispan/infinispan/blob/master/server/integration/build/build.xml > >> [2] https://github.com/infinispan/infinispan/blob/master/server/integration/build/lib.xml > >> > >>> > >>> Stuart > >>> > >>> > >>> Stuart > >>> > >>> > >>> On Tue, Feb 4, 2014 at 10:56 AM, Galder Zamarre?o wrote: > >>> > >>> On 04 Feb 2014, at 09:37, Stuart Douglas wrote: > >>> > >>>> This looks like an issue with your environment. The modules directory is static. Wildfly does not contain any code that messes with it. I would say the culprit is probably something in either your build process or your test suite. > >>> > >>> Correction, this is happening with AS 7.2.0.Final (Wildfly 8 used somewhere else). I guess your answer still applies? > >>> > >>> Cheers, > >>> > >>>> > >>>> Stuart > >>>> > >>>> > >>>> On Tue, Feb 4, 2014 at 10:21 AM, Galder Zamarre?o wrote: > >>>> Hi all, > >>>> > >>>> We?re having issues with our Infinispan Server integration tests, which run within Wildfly 8.0.0.Beta1 (as I?m typing I?m wondering if we should just upgrade it to see if this goes away...?). > >>>> > >>>> Quite often some of the runs fail with error message [1]. > >>>> > >>>> Having looked at the build environment when a run fails, you see this: > >>>> > >>>> -- > >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main > >>>> drwxrwxr-x 2 g staff 68B Feb 3 18:41 classes ( >>>> -rw-r--r-- 1 g staff 1B Feb 3 18:41 classes.index > >>>> -rw-r--r-- 1 g staff 2.1K Feb 3 18:41 module.xml > >>>> > >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main/classes > >>>> drwxrwxr-x 2 g staff 68B Feb 3 18:41 . > >>>> drwxrwxr-x 5 g staff 170B Feb 3 18:41 .. > >>>> > >>>> $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml > >>>> > >>>> ... > >>>> > >>>> This is completely different to what happens with a successful run: > >>>> > >>>> -- > >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main > >>>> -rw-r--r-- 1 g staff 103K Feb 3 19:40 infinispan-classes.jar ( >>>> -rw-r--r-- 1 g staff 278B Feb 3 19:40 infinispan-classes.jar.index > >>>> -rw-r--r-- 1 g staff 2.1K Feb 3 19:40 module.xml > >>>> > >>>> $ jar tf modules/system/layers/base/org/infinispan/server/rest/main/infinispan-classes.jar | grep ExtendedHeaders > >>>> org/infinispan/rest/configuration/ExtendedHeaders.class > >>>> > >>>> $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml > >>>> > >>>> ? > >>>> > >>>> Anyone can explain what is going on here? Does it ring a bell to anyone? Is this a known Wildfly issue by any chance? > >>>> > >>>> [1] https://gist.github.com/galderz/bd74cebfc840ef3ae284 > >>>> > >>>> Cheers, > >>>> -- > >>>> Galder Zamarre?o > >>>> galder at redhat.com > >>>> twitter.com/galderz > >>>> > >>>> Project Lead, Escalante > >>>> http://escalante.io > >>>> > >>>> Engineer, Infinispan > >>>> http://infinispan.org > >>>> > >>>> > >>>> _______________________________________________ > >>>> jboss-as7-dev mailing list > >>>> jboss-as7-dev at lists.jboss.org > >>>> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev > >>>> > >>> > >>> > >>> -- > >>> Galder Zamarre?o > >>> galder at redhat.com > >>> twitter.com/galderz > >>> > >>> Project Lead, Escalante > >>> http://escalante.io > >>> > >>> Engineer, Infinispan > >>> http://infinispan.org > >>> > >>> > >>> > >> > >> > >> -- > >> Galder Zamarre?o > >> galder at redhat.com > >> twitter.com/galderz > >> > >> Project Lead, Escalante > >> http://escalante.io > >> > >> Engineer, Infinispan > >> http://infinispan.org > >> > >> > > > > > > -- > > Galder Zamarre?o > > galder at redhat.com > > twitter.com/galderz > > > > Project Lead, Escalante > > http://escalante.io > > > > Engineer, Infinispan > > http://infinispan.org > > > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz > > Project Lead, Escalante > http://escalante.io > > Engineer, Infinispan > http://infinispan.org > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From galder at redhat.com Tue Feb 4 09:10:35 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Tue, 4 Feb 2014 15:10:35 +0100 Subject: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing leaving empty classes/ folders and errors In-Reply-To: References: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com> <05187A01-9E89-41D7-9E85-0FE4AC9DACCA@redhat.com> <7B023B75-68A3-46D1-A77F-F355F84CABCF@redhat.com> Message-ID: <063F6F57-8F25-4622-9BBF-355806DA6294@redhat.com> On 04 Feb 2014, at 14:47, Dan Berindei wrote: > > > > On Tue, Feb 4, 2014 at 3:03 PM, Galder Zamarre?o wrote: > > On 04 Feb 2014, at 13:50, Dan Berindei wrote: > > > > > On Tue, Feb 4, 2014 at 2:36 PM, Galder Zamarre?o wrote: > > Narrowing down the list now, since this is a problem of how our CI is doing builds. > > > > These logs are retrieved from [1]. > > > > Dunno how our CI is configured but this is odd. Seems like the build is halt due to test failures, but it continues somehow? I mean, the jars are not being produced properly, but the build is not halting. > > > > We run the build with -fn (fail-never), so the build should never be halted because of a test failure. The configuration is here: http://ci.infinispan.org/admin/editRunType.html?id=buildType:bt8&runnerId=RUNNER_1 > > ^ That?s not working as expected, see the build log, my snippets?etc. > > Sorry, I didn't understand what's happening in those snippets. The log shows it quite clearly that after those tests fail, nothing else runs in that module, including producing the jar. It halts. That?s > All I saw was an Ant script that doesn't do what it's supposed to do :) The ant script not doing it?s job is because there modules are not completing the build. There?s a direct correlation between the three modules that fail with tests and the 3 modules that are copying an empty folder instead of the jar. > > I did see some differences in the configuration between the JDK6 and the JDK7 builds: > * JDK7 uses -fn and JDK6 uses -Dmaven.test.failure.ignore > * JDK7 uses -nsu (no snapshot updates), JDK6 doesn't > > I've changed both builds to use -Dmaven.test.failure.ignore and -nsu, let's see how it goes. You are solving the wrong problem. > > > > > > > > > > > It?s about time we did the following: > > 1) Any test failures should halt the build there and then. IOW, do not continue the build at all. > > > > Will having 100 tests in one run and 2000 tests in another really help? > > As you disable randomly failing tests, and do not integrate commits making the testsuite fail, these number should even out. > > Not integrating commits that fail every time is easy, not integrating commits that fail randomly (maybe only in some environments) is trickier. I know it?s tricky, but the only thing we can do is disable those really. I don?t see how keeping them enabled is helping at all. > > > > > > > > 2) Any tests that fail randomly should be disabled. > > > > Let's go ahead and disable all the server tests then? ;) > > Those server tests that are randomly failing should be disabled and looked at. Those tests that are failing as a result of container not starting are side effects of things not working properly, and these should not be disabled. > > Why treat the tests that are failing because of a build problem differently? What about the tests that fail only on IBM JDK6? Disable and indicate that the test fails on IBM JDK6. Once the issue is fixed, reenable it. > > > > > > > > > > Cheers, > > > > [1] https://dl.dropboxusercontent.com/u/6148072/does-not-work.log > > > > On 04 Feb 2014, at 13:30, Galder Zamarre?o wrote: > > > > > > > > On 04 Feb 2014, at 10:38, Stuart Douglas wrote: > > > > > >> It is almost certainly something to do with this: > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> I guess sometimes the classes artefact is being attached as a reference to the classes directory, rather than a reference to a jar, which causes the issue. > > > > > > Here?s a gist with a subset of the build log [1]. When it works fine, it?s copying a jar, when it?s not, it?s copying an empty folder. > > > > > > However, this is not only happening for the org.infinispan.server.rest module, others show the same issue [2]. What seems to be a pattern is that it only happens with modules that are built by us, it?s not happening for modules coming with the base AS/WF instance. > > > > > > I?ve traced back and this might be due to build failures that are not producing the right jars [3]. > > > > > > @Stuart, this is really our problem. Sorry for the inconvenience! > > > > > > [1] https://gist.github.com/galderz/b9286f385aad1316df51 > > > [2] https://gist.github.com/galderz/9e6a9bd9b18b805db323 > > > [3] https://gist.github.com/galderz/6ab662a1027cd96cbd8c > > > > > >> > > >> Stuart > > >> > > >> > > >> On Tue, Feb 4, 2014 at 11:14 AM, Galder Zamarre?o wrote: > > >> > > >> On 04 Feb 2014, at 10:01, Stuart Douglas wrote: > > >> > > >>> > > >>> > > >>> > > >>> On Tue, Feb 4, 2014 at 11:00 AM, Stuart Douglas wrote: > > >>> Yes, there is nothing in the server code that modified the modules directory. > > >>> > > >>> Well, except for the new patching stuff, but that is not really relevant here. > > >> > > >> The testsuite AS/WF builds are built out of the distribution build, which shows the same problem. The distribution we build uses the scripts we got from AS [1]. > > >> > > >> Do you see anything in there that could be causing this? We are using maven-antrun-plugin version 1.3, and take into account the lib.xml in [2]. > > >> > > >> Finally, do you have any suggestions on changes we could make to these files to further debug the issue? > > >> > > >> Thanks a lot for your help! > > >> > > >> [1] https://github.com/infinispan/infinispan/blob/master/server/integration/build/build.xml > > >> [2] https://github.com/infinispan/infinispan/blob/master/server/integration/build/lib.xml > > >> > > >>> > > >>> Stuart > > >>> > > >>> > > >>> Stuart > > >>> > > >>> > > >>> On Tue, Feb 4, 2014 at 10:56 AM, Galder Zamarre?o wrote: > > >>> > > >>> On 04 Feb 2014, at 09:37, Stuart Douglas wrote: > > >>> > > >>>> This looks like an issue with your environment. The modules directory is static. Wildfly does not contain any code that messes with it. I would say the culprit is probably something in either your build process or your test suite. > > >>> > > >>> Correction, this is happening with AS 7.2.0.Final (Wildfly 8 used somewhere else). I guess your answer still applies? > > >>> > > >>> Cheers, > > >>> > > >>>> > > >>>> Stuart > > >>>> > > >>>> > > >>>> On Tue, Feb 4, 2014 at 10:21 AM, Galder Zamarre?o wrote: > > >>>> Hi all, > > >>>> > > >>>> We?re having issues with our Infinispan Server integration tests, which run within Wildfly 8.0.0.Beta1 (as I?m typing I?m wondering if we should just upgrade it to see if this goes away...?). > > >>>> > > >>>> Quite often some of the runs fail with error message [1]. > > >>>> > > >>>> Having looked at the build environment when a run fails, you see this: > > >>>> > > >>>> -- > > >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main > > >>>> drwxrwxr-x 2 g staff 68B Feb 3 18:41 classes ( > >>>> -rw-r--r-- 1 g staff 1B Feb 3 18:41 classes.index > > >>>> -rw-r--r-- 1 g staff 2.1K Feb 3 18:41 module.xml > > >>>> > > >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main/classes > > >>>> drwxrwxr-x 2 g staff 68B Feb 3 18:41 . > > >>>> drwxrwxr-x 5 g staff 170B Feb 3 18:41 .. > > >>>> > > >>>> $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml > > >>>> > > >>>> ... > > >>>> > > >>>> This is completely different to what happens with a successful run: > > >>>> > > >>>> -- > > >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main > > >>>> -rw-r--r-- 1 g staff 103K Feb 3 19:40 infinispan-classes.jar ( > >>>> -rw-r--r-- 1 g staff 278B Feb 3 19:40 infinispan-classes.jar.index > > >>>> -rw-r--r-- 1 g staff 2.1K Feb 3 19:40 module.xml > > >>>> > > >>>> $ jar tf modules/system/layers/base/org/infinispan/server/rest/main/infinispan-classes.jar | grep ExtendedHeaders > > >>>> org/infinispan/rest/configuration/ExtendedHeaders.class > > >>>> > > >>>> $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml > > >>>> > > >>>> ? > > >>>> > > >>>> Anyone can explain what is going on here? Does it ring a bell to anyone? Is this a known Wildfly issue by any chance? > > >>>> > > >>>> [1] https://gist.github.com/galderz/bd74cebfc840ef3ae284 > > >>>> > > >>>> Cheers, > > >>>> -- > > >>>> Galder Zamarre?o > > >>>> galder at redhat.com > > >>>> twitter.com/galderz > > >>>> > > >>>> Project Lead, Escalante > > >>>> http://escalante.io > > >>>> > > >>>> Engineer, Infinispan > > >>>> http://infinispan.org > > >>>> > > >>>> > > >>>> _______________________________________________ > > >>>> jboss-as7-dev mailing list > > >>>> jboss-as7-dev at lists.jboss.org > > >>>> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev > > >>>> > > >>> > > >>> > > >>> -- > > >>> Galder Zamarre?o > > >>> galder at redhat.com > > >>> twitter.com/galderz > > >>> > > >>> Project Lead, Escalante > > >>> http://escalante.io > > >>> > > >>> Engineer, Infinispan > > >>> http://infinispan.org > > >>> > > >>> > > >>> > > >> > > >> > > >> -- > > >> Galder Zamarre?o > > >> galder at redhat.com > > >> twitter.com/galderz > > >> > > >> Project Lead, Escalante > > >> http://escalante.io > > >> > > >> Engineer, Infinispan > > >> http://infinispan.org > > >> > > >> > > > > > > > > > -- > > > Galder Zamarre?o > > > galder at redhat.com > > > twitter.com/galderz > > > > > > Project Lead, Escalante > > > http://escalante.io > > > > > > Engineer, Infinispan > > > http://infinispan.org > > > > > > > > > _______________________________________________ > > > infinispan-dev mailing list > > > infinispan-dev at lists.jboss.org > > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > > -- > > Galder Zamarre?o > > galder at redhat.com > > twitter.com/galderz > > > > Project Lead, Escalante > > http://escalante.io > > > > Engineer, Infinispan > > http://infinispan.org > > > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz > > Project Lead, Escalante > http://escalante.io > > Engineer, Infinispan > http://infinispan.org > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From galder at redhat.com Tue Feb 4 09:36:23 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Tue, 4 Feb 2014 15:36:23 +0100 Subject: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing leaving empty classes/ folders and errors In-Reply-To: <063F6F57-8F25-4622-9BBF-355806DA6294@redhat.com> References: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com> <05187A01-9E89-41D7-9E85-0FE4AC9DACCA@redhat.com> <7B023B75-68A3-46D1-A77F-F355F84CABCF@redhat.com> <063F6F57-8F25-4622-9BBF-355806DA6294@redhat.com> Message-ID: All, Sanne, Pedro, Dan and I had a very productive discussion on IRC on this topic [1]. We?ve decided that instead of disabling tests, we need them to run in order to get recent stacktraces, logs, etc. So, we?ve decided to create a new test group called ?unstable?. This test group would only be run in CI once a day and it?d be run in a different build. This build would also enable TRACE logging for standalone and server tests. For server, I need to create a task to do this selectively. The rest of builds, masters and PRS would not run the ?unstable? group, and would not have TRACE enabled. The responsibility of unstable tests are the component owners. They need to handle them and decide what to do with them. Cheers, [1] https://gist.github.com/galderz/3563d1b23b5d50f80d82 On 04 Feb 2014, at 15:10, Galder Zamarre?o wrote: > > On 04 Feb 2014, at 14:47, Dan Berindei wrote: > >> >> >> >> On Tue, Feb 4, 2014 at 3:03 PM, Galder Zamarre?o wrote: >> >> On 04 Feb 2014, at 13:50, Dan Berindei wrote: >> >>> >>> On Tue, Feb 4, 2014 at 2:36 PM, Galder Zamarre?o wrote: >>> Narrowing down the list now, since this is a problem of how our CI is doing builds. >>> >>> These logs are retrieved from [1]. >>> >>> Dunno how our CI is configured but this is odd. Seems like the build is halt due to test failures, but it continues somehow? I mean, the jars are not being produced properly, but the build is not halting. >>> >>> We run the build with -fn (fail-never), so the build should never be halted because of a test failure. The configuration is here: http://ci.infinispan.org/admin/editRunType.html?id=buildType:bt8&runnerId=RUNNER_1 >> >> ^ That?s not working as expected, see the build log, my snippets?etc. >> >> Sorry, I didn't understand what's happening in those snippets. > > The log shows it quite clearly that after those tests fail, nothing else runs in that module, including producing the jar. It halts. That?s > >> All I saw was an Ant script that doesn't do what it's supposed to do :) > > The ant script not doing it?s job is because there modules are not completing the build. There?s a direct correlation between the three modules that fail with tests and the 3 modules that are copying an empty folder instead of the jar. > >> >> I did see some differences in the configuration between the JDK6 and the JDK7 builds: >> * JDK7 uses -fn and JDK6 uses -Dmaven.test.failure.ignore >> * JDK7 uses -nsu (no snapshot updates), JDK6 doesn't >> >> I've changed both builds to use -Dmaven.test.failure.ignore and -nsu, let's see how it goes. > > You are solving the wrong problem. > >> >> >> >>> >>> >>> >>> It?s about time we did the following: >>> 1) Any test failures should halt the build there and then. IOW, do not continue the build at all. >>> >>> Will having 100 tests in one run and 2000 tests in another really help? >> >> As you disable randomly failing tests, and do not integrate commits making the testsuite fail, these number should even out. >> >> Not integrating commits that fail every time is easy, not integrating commits that fail randomly (maybe only in some environments) is trickier. > > I know it?s tricky, but the only thing we can do is disable those really. I don?t see how keeping them enabled is helping at all. > >> >> >>> >>> >>> 2) Any tests that fail randomly should be disabled. >>> >>> Let's go ahead and disable all the server tests then? ;) >> >> Those server tests that are randomly failing should be disabled and looked at. Those tests that are failing as a result of container not starting are side effects of things not working properly, and these should not be disabled. >> >> Why treat the tests that are failing because of a build problem differently? What about the tests that fail only on IBM JDK6? > > Disable and indicate that the test fails on IBM JDK6. Once the issue is fixed, reenable it. > >> >> >>> >>> >>> >>> Cheers, >>> >>> [1] https://dl.dropboxusercontent.com/u/6148072/does-not-work.log >>> >>> On 04 Feb 2014, at 13:30, Galder Zamarre?o wrote: >>> >>>> >>>> On 04 Feb 2014, at 10:38, Stuart Douglas wrote: >>>> >>>>> It is almost certainly something to do with this: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> I guess sometimes the classes artefact is being attached as a reference to the classes directory, rather than a reference to a jar, which causes the issue. >>>> >>>> Here?s a gist with a subset of the build log [1]. When it works fine, it?s copying a jar, when it?s not, it?s copying an empty folder. >>>> >>>> However, this is not only happening for the org.infinispan.server.rest module, others show the same issue [2]. What seems to be a pattern is that it only happens with modules that are built by us, it?s not happening for modules coming with the base AS/WF instance. >>>> >>>> I?ve traced back and this might be due to build failures that are not producing the right jars [3]. >>>> >>>> @Stuart, this is really our problem. Sorry for the inconvenience! >>>> >>>> [1] https://gist.github.com/galderz/b9286f385aad1316df51 >>>> [2] https://gist.github.com/galderz/9e6a9bd9b18b805db323 >>>> [3] https://gist.github.com/galderz/6ab662a1027cd96cbd8c >>>> >>>>> >>>>> Stuart >>>>> >>>>> >>>>> On Tue, Feb 4, 2014 at 11:14 AM, Galder Zamarre?o wrote: >>>>> >>>>> On 04 Feb 2014, at 10:01, Stuart Douglas wrote: >>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Feb 4, 2014 at 11:00 AM, Stuart Douglas wrote: >>>>>> Yes, there is nothing in the server code that modified the modules directory. >>>>>> >>>>>> Well, except for the new patching stuff, but that is not really relevant here. >>>>> >>>>> The testsuite AS/WF builds are built out of the distribution build, which shows the same problem. The distribution we build uses the scripts we got from AS [1]. >>>>> >>>>> Do you see anything in there that could be causing this? We are using maven-antrun-plugin version 1.3, and take into account the lib.xml in [2]. >>>>> >>>>> Finally, do you have any suggestions on changes we could make to these files to further debug the issue? >>>>> >>>>> Thanks a lot for your help! >>>>> >>>>> [1] https://github.com/infinispan/infinispan/blob/master/server/integration/build/build.xml >>>>> [2] https://github.com/infinispan/infinispan/blob/master/server/integration/build/lib.xml >>>>> >>>>>> >>>>>> Stuart >>>>>> >>>>>> >>>>>> Stuart >>>>>> >>>>>> >>>>>> On Tue, Feb 4, 2014 at 10:56 AM, Galder Zamarre?o wrote: >>>>>> >>>>>> On 04 Feb 2014, at 09:37, Stuart Douglas wrote: >>>>>> >>>>>>> This looks like an issue with your environment. The modules directory is static. Wildfly does not contain any code that messes with it. I would say the culprit is probably something in either your build process or your test suite. >>>>>> >>>>>> Correction, this is happening with AS 7.2.0.Final (Wildfly 8 used somewhere else). I guess your answer still applies? >>>>>> >>>>>> Cheers, >>>>>> >>>>>>> >>>>>>> Stuart >>>>>>> >>>>>>> >>>>>>> On Tue, Feb 4, 2014 at 10:21 AM, Galder Zamarre?o wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> We?re having issues with our Infinispan Server integration tests, which run within Wildfly 8.0.0.Beta1 (as I?m typing I?m wondering if we should just upgrade it to see if this goes away...?). >>>>>>> >>>>>>> Quite often some of the runs fail with error message [1]. >>>>>>> >>>>>>> Having looked at the build environment when a run fails, you see this: >>>>>>> >>>>>>> -- >>>>>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main >>>>>>> drwxrwxr-x 2 g staff 68B Feb 3 18:41 classes (>>>>>> -rw-r--r-- 1 g staff 1B Feb 3 18:41 classes.index >>>>>>> -rw-r--r-- 1 g staff 2.1K Feb 3 18:41 module.xml >>>>>>> >>>>>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main/classes >>>>>>> drwxrwxr-x 2 g staff 68B Feb 3 18:41 . >>>>>>> drwxrwxr-x 5 g staff 170B Feb 3 18:41 .. >>>>>>> >>>>>>> $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml >>>>>>> >>>>>>> ... >>>>>>> >>>>>>> This is completely different to what happens with a successful run: >>>>>>> >>>>>>> -- >>>>>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main >>>>>>> -rw-r--r-- 1 g staff 103K Feb 3 19:40 infinispan-classes.jar (>>>>>> -rw-r--r-- 1 g staff 278B Feb 3 19:40 infinispan-classes.jar.index >>>>>>> -rw-r--r-- 1 g staff 2.1K Feb 3 19:40 module.xml >>>>>>> >>>>>>> $ jar tf modules/system/layers/base/org/infinispan/server/rest/main/infinispan-classes.jar | grep ExtendedHeaders >>>>>>> org/infinispan/rest/configuration/ExtendedHeaders.class >>>>>>> >>>>>>> $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml >>>>>>> >>>>>>> ? >>>>>>> >>>>>>> Anyone can explain what is going on here? Does it ring a bell to anyone? Is this a known Wildfly issue by any chance? >>>>>>> >>>>>>> [1] https://gist.github.com/galderz/bd74cebfc840ef3ae284 >>>>>>> >>>>>>> Cheers, >>>>>>> -- >>>>>>> Galder Zamarre?o >>>>>>> galder at redhat.com >>>>>>> twitter.com/galderz >>>>>>> >>>>>>> Project Lead, Escalante >>>>>>> http://escalante.io >>>>>>> >>>>>>> Engineer, Infinispan >>>>>>> http://infinispan.org >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> jboss-as7-dev mailing list >>>>>>> jboss-as7-dev at lists.jboss.org >>>>>>> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Galder Zamarre?o >>>>>> galder at redhat.com >>>>>> twitter.com/galderz >>>>>> >>>>>> Project Lead, Escalante >>>>>> http://escalante.io >>>>>> >>>>>> Engineer, Infinispan >>>>>> http://infinispan.org >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Galder Zamarre?o >>>>> galder at redhat.com >>>>> twitter.com/galderz >>>>> >>>>> Project Lead, Escalante >>>>> http://escalante.io >>>>> >>>>> Engineer, Infinispan >>>>> http://infinispan.org >>>>> >>>>> >>>> >>>> >>>> -- >>>> Galder Zamarre?o >>>> galder at redhat.com >>>> twitter.com/galderz >>>> >>>> Project Lead, Escalante >>>> http://escalante.io >>>> >>>> Engineer, Infinispan >>>> http://infinispan.org >>>> >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> >>> -- >>> Galder Zamarre?o >>> galder at redhat.com >>> twitter.com/galderz >>> >>> Project Lead, Escalante >>> http://escalante.io >>> >>> Engineer, Infinispan >>> http://infinispan.org >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> -- >> Galder Zamarre?o >> galder at redhat.com >> twitter.com/galderz >> >> Project Lead, Escalante >> http://escalante.io >> >> Engineer, Infinispan >> http://infinispan.org >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz > > Project Lead, Escalante > http://escalante.io > > Engineer, Infinispan > http://infinispan.org > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From dan.berindei at gmail.com Tue Feb 4 13:13:38 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Tue, 4 Feb 2014 20:13:38 +0200 Subject: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing leaving empty classes/ folders and errors In-Reply-To: References: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com> <05187A01-9E89-41D7-9E85-0FE4AC9DACCA@redhat.com> <7B023B75-68A3-46D1-A77F-F355F84CABCF@redhat.com> <063F6F57-8F25-4622-9BBF-355806DA6294@redhat.com> Message-ID: For the record, -Dmaven.test.failure.ignore seems to do the right thing, and the JDK7 build now only has 7 test failures (+ 4 ignored): http://ci.infinispan.org/viewLog.html?buildId=5912&tab=buildResultsDiv&buildTypeId=bt8 On Tue, Feb 4, 2014 at 4:36 PM, Galder Zamarre?o wrote: > All, > > Sanne, Pedro, Dan and I had a very productive discussion on IRC on this > topic [1]. > > We've decided that instead of disabling tests, we need them to run in > order to get recent stacktraces, logs, etc. So, we've decided to create a > new test group called "unstable". This test group would only be run in CI > once a day and it'd be run in a different build. This build would also > enable TRACE logging for standalone and server tests. For server, I need to > create a task to do this selectively. > > The rest of builds, masters and PRS would not run the "unstable" group, > and would not have TRACE enabled. > > The responsibility of unstable tests are the component owners. They need > to handle them and decide what to do with them. > > Cheers, > > [1] https://gist.github.com/galderz/3563d1b23b5d50f80d82 > > On 04 Feb 2014, at 15:10, Galder Zamarre?o wrote: > > > > > On 04 Feb 2014, at 14:47, Dan Berindei wrote: > > > >> > >> > >> > >> On Tue, Feb 4, 2014 at 3:03 PM, Galder Zamarre?o > wrote: > >> > >> On 04 Feb 2014, at 13:50, Dan Berindei wrote: > >> > >>> > >>> On Tue, Feb 4, 2014 at 2:36 PM, Galder Zamarre?o > wrote: > >>> Narrowing down the list now, since this is a problem of how our CI is > doing builds. > >>> > >>> These logs are retrieved from [1]. > >>> > >>> Dunno how our CI is configured but this is odd. Seems like the build > is halt due to test failures, but it continues somehow? I mean, the jars > are not being produced properly, but the build is not halting. > >>> > >>> We run the build with -fn (fail-never), so the build should never be > halted because of a test failure. The configuration is here: > http://ci.infinispan.org/admin/editRunType.html?id=buildType:bt8&runnerId=RUNNER_1 > >> > >> ^ That's not working as expected, see the build log, my snippets...etc. > >> > >> Sorry, I didn't understand what's happening in those snippets. > > > > The log shows it quite clearly that after those tests fail, nothing else > runs in that module, including producing the jar. It halts. That's > > > >> All I saw was an Ant script that doesn't do what it's supposed to do :) > > > > The ant script not doing it's job is because there modules are not > completing the build. There's a direct correlation between the three > modules that fail with tests and the 3 modules that are copying an empty > folder instead of the jar. > > > >> > >> I did see some differences in the configuration between the JDK6 and > the JDK7 builds: > >> * JDK7 uses -fn and JDK6 uses -Dmaven.test.failure.ignore > >> * JDK7 uses -nsu (no snapshot updates), JDK6 doesn't > >> > >> I've changed both builds to use -Dmaven.test.failure.ignore and -nsu, > let's see how it goes. > > > > You are solving the wrong problem. > > > >> > >> > >> > >>> > >>> > >>> > >>> It's about time we did the following: > >>> 1) Any test failures should halt the build there and then. IOW, do not > continue the build at all. > >>> > >>> Will having 100 tests in one run and 2000 tests in another really help? > >> > >> As you disable randomly failing tests, and do not integrate commits > making the testsuite fail, these number should even out. > >> > >> Not integrating commits that fail every time is easy, not integrating > commits that fail randomly (maybe only in some environments) is trickier. > > > > I know it's tricky, but the only thing we can do is disable those > really. I don't see how keeping them enabled is helping at all. > > > >> > >> > >>> > >>> > >>> 2) Any tests that fail randomly should be disabled. > >>> > >>> Let's go ahead and disable all the server tests then? ;) > >> > >> Those server tests that are randomly failing should be disabled and > looked at. Those tests that are failing as a result of container not > starting are side effects of things not working properly, and these should > not be disabled. > >> > >> Why treat the tests that are failing because of a build problem > differently? What about the tests that fail only on IBM JDK6? > > > > Disable and indicate that the test fails on IBM JDK6. Once the issue is > fixed, reenable it. > > > >> > >> > >>> > >>> > >>> > >>> Cheers, > >>> > >>> [1] https://dl.dropboxusercontent.com/u/6148072/does-not-work.log > >>> > >>> On 04 Feb 2014, at 13:30, Galder Zamarre?o wrote: > >>> > >>>> > >>>> On 04 Feb 2014, at 10:38, Stuart Douglas > wrote: > >>>> > >>>>> It is almost certainly something to do with this: > >>>>> > >>>>> src="${infinispan.server.modules.dir}"> > >>>>> > >>>>> artifact="infinispan-server-rest" classifier="classes" /> > >>>>> > >>>>> > >>>>> > >>>>> I guess sometimes the classes artefact is being attached as a > reference to the classes directory, rather than a reference to a jar, which > causes the issue. > >>>> > >>>> Here's a gist with a subset of the build log [1]. When it works fine, > it's copying a jar, when it's not, it's copying an empty folder. > >>>> > >>>> However, this is not only happening for the > org.infinispan.server.rest module, others show the same issue [2]. What > seems to be a pattern is that it only happens with modules that are built > by us, it's not happening for modules coming with the base AS/WF instance. > >>>> > >>>> I've traced back and this might be due to build failures that are not > producing the right jars [3]. > >>>> > >>>> @Stuart, this is really our problem. Sorry for the inconvenience! > >>>> > >>>> [1] https://gist.github.com/galderz/b9286f385aad1316df51 > >>>> [2] https://gist.github.com/galderz/9e6a9bd9b18b805db323 > >>>> [3] https://gist.github.com/galderz/6ab662a1027cd96cbd8c > >>>> > >>>>> > >>>>> Stuart > >>>>> > >>>>> > >>>>> On Tue, Feb 4, 2014 at 11:14 AM, Galder Zamarre?o > wrote: > >>>>> > >>>>> On 04 Feb 2014, at 10:01, Stuart Douglas > wrote: > >>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Tue, Feb 4, 2014 at 11:00 AM, Stuart Douglas < > stuart.w.douglas at gmail.com> wrote: > >>>>>> Yes, there is nothing in the server code that modified the modules > directory. > >>>>>> > >>>>>> Well, except for the new patching stuff, but that is not really > relevant here. > >>>>> > >>>>> The testsuite AS/WF builds are built out of the distribution build, > which shows the same problem. The distribution we build uses the scripts we > got from AS [1]. > >>>>> > >>>>> Do you see anything in there that could be causing this? We are > using maven-antrun-plugin version 1.3, and take into account the lib.xml in > [2]. > >>>>> > >>>>> Finally, do you have any suggestions on changes we could make to > these files to further debug the issue? > >>>>> > >>>>> Thanks a lot for your help! > >>>>> > >>>>> [1] > https://github.com/infinispan/infinispan/blob/master/server/integration/build/build.xml > >>>>> [2] > https://github.com/infinispan/infinispan/blob/master/server/integration/build/lib.xml > >>>>> > >>>>>> > >>>>>> Stuart > >>>>>> > >>>>>> > >>>>>> Stuart > >>>>>> > >>>>>> > >>>>>> On Tue, Feb 4, 2014 at 10:56 AM, Galder Zamarre?o < > galder at redhat.com> wrote: > >>>>>> > >>>>>> On 04 Feb 2014, at 09:37, Stuart Douglas < > stuart.w.douglas at gmail.com> wrote: > >>>>>> > >>>>>>> This looks like an issue with your environment. The modules > directory is static. Wildfly does not contain any code that messes with it. > I would say the culprit is probably something in either your build process > or your test suite. > >>>>>> > >>>>>> Correction, this is happening with AS 7.2.0.Final (Wildfly 8 used > somewhere else). I guess your answer still applies? > >>>>>> > >>>>>> Cheers, > >>>>>> > >>>>>>> > >>>>>>> Stuart > >>>>>>> > >>>>>>> > >>>>>>> On Tue, Feb 4, 2014 at 10:21 AM, Galder Zamarre?o < > galder at redhat.com> wrote: > >>>>>>> Hi all, > >>>>>>> > >>>>>>> We're having issues with our Infinispan Server integration tests, > which run within Wildfly 8.0.0.Beta1 (as I'm typing I'm wondering if we > should just upgrade it to see if this goes away...?). > >>>>>>> > >>>>>>> Quite often some of the runs fail with error message [1]. > >>>>>>> > >>>>>>> Having looked at the build environment when a run fails, you see > this: > >>>>>>> > >>>>>>> -- > >>>>>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main > >>>>>>> drwxrwxr-x 2 g staff 68B Feb 3 18:41 classes (<-- a > directory??) > >>>>>>> -rw-r--r-- 1 g staff 1B Feb 3 18:41 classes.index > >>>>>>> -rw-r--r-- 1 g staff 2.1K Feb 3 18:41 module.xml > >>>>>>> > >>>>>>> $ ls > modules/system/layers/base/org/infinispan/server/rest/main/classes > >>>>>>> drwxrwxr-x 2 g staff 68B Feb 3 18:41 . > >>>>>>> drwxrwxr-x 5 g staff 170B Feb 3 18:41 .. > >>>>>>> > >>>>>>> $ more > modules/system/layers/base/org/infinispan/server/rest/main/module.xml > >>>>>>> name="org.infinispan.server.rest"> > >>>>>>> ... > >>>>>>> > >>>>>>> ... > >>>>>>> > >>>>>>> This is completely different to what happens with a successful run: > >>>>>>> > >>>>>>> -- > >>>>>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main > >>>>>>> -rw-r--r-- 1 g staff 103K Feb 3 19:40 infinispan-classes.jar > (<-- a jar file!) > >>>>>>> -rw-r--r-- 1 g staff 278B Feb 3 19:40 > infinispan-classes.jar.index > >>>>>>> -rw-r--r-- 1 g staff 2.1K Feb 3 19:40 module.xml > >>>>>>> > >>>>>>> $ jar tf > modules/system/layers/base/org/infinispan/server/rest/main/infinispan-classes.jar > | grep ExtendedHeaders > >>>>>>> org/infinispan/rest/configuration/ExtendedHeaders.class > >>>>>>> > >>>>>>> $ more > modules/system/layers/base/org/infinispan/server/rest/main/module.xml > >>>>>>> name="org.infinispan.server.rest"> > >>>>>>> ... > >>>>>>> > >>>>>>> -- > >>>>>>> > >>>>>>> Anyone can explain what is going on here? Does it ring a bell to > anyone? Is this a known Wildfly issue by any chance? > >>>>>>> > >>>>>>> [1] https://gist.github.com/galderz/bd74cebfc840ef3ae284 > >>>>>>> > >>>>>>> Cheers, > >>>>>>> -- > >>>>>>> Galder Zamarre?o > >>>>>>> galder at redhat.com > >>>>>>> twitter.com/galderz > >>>>>>> > >>>>>>> Project Lead, Escalante > >>>>>>> http://escalante.io > >>>>>>> > >>>>>>> Engineer, Infinispan > >>>>>>> http://infinispan.org > >>>>>>> > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> jboss-as7-dev mailing list > >>>>>>> jboss-as7-dev at lists.jboss.org > >>>>>>> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev > >>>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Galder Zamarre?o > >>>>>> galder at redhat.com > >>>>>> twitter.com/galderz > >>>>>> > >>>>>> Project Lead, Escalante > >>>>>> http://escalante.io > >>>>>> > >>>>>> Engineer, Infinispan > >>>>>> http://infinispan.org > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Galder Zamarre?o > >>>>> galder at redhat.com > >>>>> twitter.com/galderz > >>>>> > >>>>> Project Lead, Escalante > >>>>> http://escalante.io > >>>>> > >>>>> Engineer, Infinispan > >>>>> http://infinispan.org > >>>>> > >>>>> > >>>> > >>>> > >>>> -- > >>>> Galder Zamarre?o > >>>> galder at redhat.com > >>>> twitter.com/galderz > >>>> > >>>> Project Lead, Escalante > >>>> http://escalante.io > >>>> > >>>> Engineer, Infinispan > >>>> http://infinispan.org > >>>> > >>>> > >>>> _______________________________________________ > >>>> infinispan-dev mailing list > >>>> infinispan-dev at lists.jboss.org > >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >>> > >>> > >>> -- > >>> Galder Zamarre?o > >>> galder at redhat.com > >>> twitter.com/galderz > >>> > >>> Project Lead, Escalante > >>> http://escalante.io > >>> > >>> Engineer, Infinispan > >>> http://infinispan.org > >>> > >>> > >>> _______________________________________________ > >>> infinispan-dev mailing list > >>> infinispan-dev at lists.jboss.org > >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >>> > >>> _______________________________________________ > >>> infinispan-dev mailing list > >>> infinispan-dev at lists.jboss.org > >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> > >> > >> -- > >> Galder Zamarre?o > >> galder at redhat.com > >> twitter.com/galderz > >> > >> Project Lead, Escalante > >> http://escalante.io > >> > >> Engineer, Infinispan > >> http://infinispan.org > >> > >> > >> _______________________________________________ > >> infinispan-dev mailing list > >> infinispan-dev at lists.jboss.org > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> > >> _______________________________________________ > >> infinispan-dev mailing list > >> infinispan-dev at lists.jboss.org > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > > -- > > Galder Zamarre?o > > galder at redhat.com > > twitter.com/galderz > > > > Project Lead, Escalante > > http://escalante.io > > > > Engineer, Infinispan > > http://infinispan.org > > > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz > > Project Lead, Escalante > http://escalante.io > > Engineer, Infinispan > http://infinispan.org > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140204/2166b5b2/attachment-0001.html From mmarkus at redhat.com Wed Feb 5 07:42:34 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Wed, 5 Feb 2014 12:42:34 +0000 Subject: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing leaving empty classes/ folders and errors In-Reply-To: References: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com> <05187A01-9E89-41D7-9E85-0FE4AC9DACCA@redhat.com> Message-ID: <29846662-5D58-474C-9492-A38355BE9D02@redhat.com> On Feb 4, 2014, at 12:50 PM, Dan Berindei wrote: > > On Tue, Feb 4, 2014 at 2:36 PM, Galder Zamarre?o wrote: > Narrowing down the list now, since this is a problem of how our CI is doing builds. > > These logs are retrieved from [1]. > > Dunno how our CI is configured but this is odd. Seems like the build is halt due to test failures, but it continues somehow? I mean, the jars are not being produced properly, but the build is not halting. > > We run the build with -fn (fail-never), so the build should never be halted because of a test failure. The configuration is here:http://ci.infinispan.org/admin/editRunType.html?id=buildType:bt8&runnerId=RUNNER_1 > > > > It?s about time we did the following: > 1) Any test failures should halt the build there and then. IOW, do not continue the build at all. > > Will having 100 tests in one run and 2000 tests in another really help? > > > 2) Any tests that fail randomly should be disabled. Doing this in past didn't seem to help: tests were disabled and never re-enabled again. IMO we should fight to get the suite green and then any intermittent failure should be considered a blocker and treated as the highest prio. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Wed Feb 5 07:43:55 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Wed, 5 Feb 2014 12:43:55 +0000 Subject: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing leaving empty classes/ folders and errors In-Reply-To: <7D6D9D4B-8023-4A04-B946-8CB90640319F@redhat.com> References: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com> <05187A01-9E89-41D7-9E85-0FE4AC9DACCA@redhat.com> <7D6D9D4B-8023-4A04-B946-8CB90640319F@redhat.com> Message-ID: On Feb 4, 2014, at 12:52 PM, Galder Zamarre?o wrote: > > On 04 Feb 2014, at 13:36, Galder Zamarre?o wrote: > >> Narrowing down the list now, since this is a problem of how our CI is doing builds. >> >> These logs are retrieved from [1]. >> >> Dunno how our CI is configured but this is odd. Seems like the build is halt due to test failures, but it continues somehow? I mean, the jars are not being produced properly, but the build is not halting. >> >> It?s about time we did the following: >> 1) Any test failures should halt the build there and then. IOW, do not continue the build at all. >> 2) Any tests that fail randomly should be disabled. > > Having had to debug throw this, I can certainly understand Sanne?s frustration, and as server component lead, I?m not going to bother looking at any CI builds until all modules that server modules depend on are green and their testsuites are passing. +1 Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Wed Feb 5 07:44:50 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Wed, 5 Feb 2014 12:44:50 +0000 Subject: [infinispan-dev] L1OnRehash Discussion In-Reply-To: References: <5853A076-0FEF-4782-A4A3-6C84999DCA2D@redhat.com> Message-ID: <9C3EB525-3C12-45B1-B278-702B544BABDF@redhat.com> On Feb 4, 2014, at 11:04 AM, Dan Berindei wrote: > On Tue, Feb 4, 2014 at 10:07 AM, Galder Zamarre?o wrote: > > On 28 Jan 2014, at 15:29, William Burns wrote: > > > Hello everyone, > > > > I wanted to discuss what I would say as dubious benefit of L1OnRehash > > especially compared to the benefits it provide. > > > > L1OnRehash is used to retain a value by moving a previously owned > > value into the L1 when a rehash occurs and this node no longer owns > > that value Also any current L1 values are removed when a rehash > > occurs. Therefore it can only save a single remote get for only a few > > keys when a rehash occurs. > > > > This by itself is fine however L1OnRehash has many edge cases to > > guarantee consistency as can be seen from > > https://issues.jboss.org/browse/ISPN-3838. This can get quite > > complicated for a feature that gives marginal performance increases > > (especially given that this value may never have been read recently - > > at least normal L1 usage guarantees this). > > > > My first suggestion is instead to deprecate the L1OnRehash > > configuration option and to remove this logic. > > +1 > > +1 from me as well +1 Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Wed Feb 5 07:55:20 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Wed, 5 Feb 2014 12:55:20 +0000 Subject: [infinispan-dev] Store as binary In-Reply-To: References: <52D92AC4.7080701@redhat.com> <52DCF101.3020903@infinispan.org> <87020416-72D3-412E-818B-A7F9161355CC@redhat.com> <52DCF70C.4090404@infinispan.org> <52DD4534.7080209@redhat.com> <68B26C2A-389B-4C0A-A3C6-DBE3B0526DAC@redhat.com> Message-ID: On Feb 4, 2014, at 7:14 AM, Galder Zamarre?o wrote: > On 21 Jan 2014, at 17:45, Mircea Markus wrote: > >> >> On Jan 21, 2014, at 2:13 PM, Sanne Grinovero wrote: >> >>> On 21 January 2014 13:37, Mircea Markus wrote: >>>> >>>> On Jan 21, 2014, at 1:21 PM, Galder Zamarre?o wrote: >>>> >>>>>> What's the point for these tests? >>>>> >>>>> +1 >>>> >>>> To validate if storing the data in binary format yields better performance than store is as a POJO. >>> >>> That will highly depend on the scenarios you want to test for. AFAIK >>> this started after Paul described how session replication works in >>> WildFly, and we already know that both strategies are suboptimal with >>> the current options available: in his case the active node will always >>> write on the POJO, while the backup node will essentially only need to >>> store the buffer "just in case" he might need to take over. >> >> Indeed as it is today, it doesn't make sense for WildFly's session replication. >> >>> >>> Sure, one will be slower, but if you want to make a suggestion to him >>> about which configuration he should be using, we should measure his >>> use case, not a different one. >>> >>> Even then as discussed in Palma, an in memory String representation >>> might be way more compact because of pooling of strings and a very >>> high likelihood for repeated headers (as common in web frameworks), >> >> pooling like in String.intern()? >> Even so, if most of your access to the String is to serialize it and sent is remotely then you have a serialization cost(CPU) to pay for the reduced size. > > Serialization has a cost, but nothing compared with the transport itself, and you don?t have to go very far to see the impact of transport. Just recently we were chasing some performance regression and even though there were some changes in serialization, the impact of my improvements was minimal, max 2-3%. Optimal network and transport configuration is more important IMO, and once again, misconfiguration in that layer is what was causing us to be ~20% slower. yes, I din't expect huge improvements from storeAsBinary, but at least some improvement caused by the fact that lots of serialization should't happen in the tested scenario. 2-3% improvement wouldn't hurt, though :-) Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mudokonman at gmail.com Wed Feb 5 08:19:03 2014 From: mudokonman at gmail.com (William Burns) Date: Wed, 5 Feb 2014 08:19:03 -0500 Subject: [infinispan-dev] L1OnRehash Discussion In-Reply-To: References: <5853A076-0FEF-4782-A4A3-6C84999DCA2D@redhat.com> Message-ID: On Tue, Feb 4, 2014 at 6:04 AM, Dan Berindei wrote: > > > > On Tue, Feb 4, 2014 at 10:07 AM, Galder Zamarre?o wrote: >> >> >> On 28 Jan 2014, at 15:29, William Burns wrote: >> >> > Hello everyone, >> > >> > I wanted to discuss what I would say as dubious benefit of L1OnRehash >> > especially compared to the benefits it provide. >> > >> > L1OnRehash is used to retain a value by moving a previously owned >> > value into the L1 when a rehash occurs and this node no longer owns >> > that value Also any current L1 values are removed when a rehash >> > occurs. Therefore it can only save a single remote get for only a few >> > keys when a rehash occurs. >> > >> > This by itself is fine however L1OnRehash has many edge cases to >> > guarantee consistency as can be seen from >> > https://issues.jboss.org/browse/ISPN-3838. This can get quite >> > complicated for a feature that gives marginal performance increases >> > (especially given that this value may never have been read recently - >> > at least normal L1 usage guarantees this). >> > >> > My first suggestion is instead to deprecate the L1OnRehash >> > configuration option and to remove this logic. >> >> +1 > > > +1 from me as well > >> >> >> > My second suggestion is a new implementation of L1OnRehash that is >> > always enabled when L1 threshold is configured to 0. For those not >> > familiar L1 threshold controls whether invalidations are broadcasted >> > instead of individual messages. A value of 0 means to always >> > broadcast. This would allow for some benefits that we can't currently >> > do: >> > >> > 1. L1 values would never have to be invalidated on a rehash event >> > (guarantee locality reads under rehash) >> > 2. L1 requestors would not have to be tracked any longer >> > >> > However every write would be required to send an invalidation which >> > could slow write performance in additional cases (since we currently >> > only send invalidations when requestors are found). The difference >> > would be lessened with udp, which is the transport I would assume >> > someone would use when configuring L1 threshold to 0. >> >> Sounds good to me, but I think you could go even beyond this and maybe get >> rid of threshold configuration option too? >> >> If the transport is UDP and multicast is configured, invalidations are >> broadcasted (and apply the two benefits you mention). >> If UDP w/ unicast or TCP used, track invalidations and send them as >> unicasts. >> >> Do we really need to expose these configuration options to the user? > > > I think the idea was that even with UDP, sending 2 unicasts and waiting for > only 2 responses may be faster than sending a multicast and waiting for 10 > responses. However, I'm not sure that's the case if we send 1 unicast > invalidation from each owner instead of a single multicast invalidation from > the primary owner/originator [1]. Maybe if each owner would return a list of > requestors and the originator would do the invalidation at the end... I totally agree since we currently have to send invalidations from the primary owner and all backup owners to guarantee consistency if we have a response from the backup owner [2]. By moving to this route we only ever have to send a single multicast invalidation instead of N unicast invalidations. However this also brings up another change where we only L1 cache the primary owner response [3] :) Actually that would tilt the performance discussion the other way. Makes me think deprecating current L1OnRehash and adding primary owner L1 caching should be first and then reevaluate if the new L1OnRehash support is even needed. The originator firing the invalidations is interesting, but don't think it is feasible. With async transport this is not doable at all. Also if the originator goes down and the value is persisted we will have invalid L1 values cached still. The latter could be fixed with txs but non tx would still be broken. > > One tangible benefit of having the setting is that we can run the test suite > with TCP only, and still cover every path in L1Manager. If removed it > completely, it would still be possible to change the toggle in L1ManagerImpl > via reflection, but it would be a little hacky. > >> >> > What do you guys think? I am thinking that no one minds the removal >> > of L1OnRehash that we have currently (if so let me know). I am quite >> > curious what others think about the changes for L1 threshold value of >> > 0, maybe this configuration value is never used? >> > > > > Since we don't give any guidance as to what a good threshold value would be, > I doubt many people use it. > > My alternative proposal would be to replace the > invalidationThreshold=-1|0|>0 setting with a traceRequestors=true|false > setting. > 1. If traceRequestors == false, don't keep track of requestors, only send > the invalidation from the originator, and enable l1OnRehash. > This means we can keep the entries that are in L1 after a rehash as > well. > 2. If traceRequestors == true, track requestors, send unicast/multicast > invalidations depending on the transport, and disable l1OnRehash. I have to admit I am struggling with whether we even need this configuration option anymore and just solely enable requestors based on the transport configuration. I do like the option though, especially if we find out not tracking requestors is faster. The default value though would be based on whether the transport allows for multicast or not. > > > [1] https://issues.jboss.org/browse/ISPN-186 [2] https://issues.jboss.org/browse/ISPN-3648 [3] https://issues.jboss.org/browse/ISPN-3684 > > Cheers > Dan > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev. From mmarkus at redhat.com Wed Feb 5 09:15:41 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Wed, 5 Feb 2014 14:15:41 +0000 Subject: [infinispan-dev] reusing infinispan's marshalling In-Reply-To: <52EA41E0.2010505@redhat.com> References: <52EA41E0.2010505@redhat.com> Message-ID: <796469E7-CB0C-4E96-97BC-81D74D48D51E@redhat.com> One way to do it is use a distributed cache with two different marshallers: JBMAR and protostream. Admittedly this won't measure only the serialisation performance, but include other stuff as well, such as network time (I guess you can remove this from the result though). This way we would get a better understanding on how the two marshaller affects performance of the system as a whole. Also if using radargun, you could get more info around how much CPU time is used by each scenario. On Jan 30, 2014, at 12:13 PM, Adrian Nistor wrote: > I've been pondering about re-using the marshalling machinery of > Infinispan in another project, specifically in ProtoStream, where I'm > planning to add it as a test scoped dependency so I can create a > benchmark to compare marshalling performace. I'm basically interested > in comparing ProtoStream and Infinispan's JBoss Marshalling based > mechanism. Comparing against plain JBMAR, without using the > ExternalizerTable and Externalizers introduced by Infinispan is not > going to get me accurate results. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Wed Feb 5 09:28:00 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Wed, 5 Feb 2014 14:28:00 +0000 Subject: [infinispan-dev] reusing infinispan's marshalling In-Reply-To: <1824630C-1D48-480A-8687-E563A54E7E6A@redhat.com> References: <52EA41E0.2010505@redhat.com> <1824630C-1D48-480A-8687-E563A54E7E6A@redhat.com> Message-ID: <91EB93BD-0133-44BF-AB64-F57948621CBC@redhat.com> On Feb 3, 2014, at 6:24 PM, Galder Zamarre?o wrote: > Not sure I understand the need to compare this. > > JBMAR and ProtoStream are solving different problems. The former is focused on getting the best out of Java persistence. The latter is focused on serializing stuff in a plattform independent way. > > IMO, it?s not an apples to apples comparison. AFAIK the only thing JBMAR does and proto doesn't is tracking circular references: e.g. person has a reference to address which has a reference to the same person instance. That comes at a performance cost (I guess an IdentityMapLookup per serialized object), though and for many users tracking circular dependencies is not needed, because of their data model. My expectation is that ISPN+protostram will be faster than ISPN+JBMAR because: - protostream doesn't track circular references (AFAIK this is something that can be disabled in JBMAR as well) - protostream allows for partial deserialization, that is only deserialize a specific attribute of a class On top of that, it is platform independent, so if you start using it as the default serialization format, it will be easier for you to use ISPN from multiple platforms. The drawback protostream has over JBMAR is that it requires one to define, besides the serialized, a protofile. Las time we discussed, Adrian had some ideas on how that can be circumvented, though. IMO, in certain deployments makes sense to use protostream over JBMAR even when serializing only java objects and this benchmark would be a good tool to validate that. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Wed Feb 5 09:38:38 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Wed, 5 Feb 2014 14:38:38 +0000 Subject: [infinispan-dev] New Cache Entry Notifications In-Reply-To: References: <1FAA84A0-3AFE-4DA7-980F-AF1FB5725F5A@redhat.com> Message-ID: <263229C1-72AC-4DDC-B924-B46D26FA1C6C@redhat.com> On Feb 3, 2014, at 4:07 PM, Galder Zamarre?o wrote: >> >> On Jan 23, 2014, at 5:48 PM, William Burns wrote: >> >>> Hello all, >>> >>> I have been working with notifications and most recently I have come >>> to look into events generated when a new entry is created. Now >>> normally I would just expect a CacheEntryCreatedEvent to be raised. >>> However we currently raise a CacheEntryModifiedEvent event and then a >>> CacheEntryCreatedEvent. I notice that there are comments around the >>> code saying that tests require both to be fired. >> >> it doesn't sound right to me: modified is different than created. > > I?ve lost count the number of times I?ve raised this up in the dev mailing list :| > > And, if CacheEntryModifiedEvent has a method called isCreated(), is cos I added it in order to differentiate between a modify and a create without breaking backwards compatibility and expectations of events to be received. Just need to trace back the jira issue number, and associated forum threads ;) :p Sorry for missing this till now :-) If it was raised that frequently through, must be because people are confused about it. As we're having a major release next, I think we should get this right, even if breaking backward compatibility, and document it as such in the migration guide. > >> >>> >>> I am wondering if anyone has an objection to only raising a >>> CacheEntryCreatedEvent on a new cache entry being created. > > It?d break expectations of existing applications that expect certain events. It?s a very difficult one to swallow. we're at a major now, so we should break compatibility if it makes sense. > > Plus, there?s JCache specifications, which adds listeners, and gets them right. Eventually everyone should move towards that. Not sure I understand: JCache raises both an "created" and a "modified" event when an entry is created? or just "created" events? Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From galder at redhat.com Wed Feb 5 09:40:31 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Wed, 5 Feb 2014 15:40:31 +0100 Subject: [infinispan-dev] New Cache Entry Notifications In-Reply-To: References: <1FAA84A0-3AFE-4DA7-980F-AF1FB5725F5A@redhat.com> Message-ID: <9289487C-EF37-4A9B-9491-69787F32AC4D@redhat.com> On 03 Feb 2014, at 17:29, William Burns wrote: > On Mon, Feb 3, 2014 at 11:07 AM, Galder Zamarre?o wrote: >> >> On 23 Jan 2014, at 18:54, Mircea Markus wrote: >> >>> >>> On Jan 23, 2014, at 5:48 PM, William Burns wrote: >>> >>>> Hello all, >>>> >>>> I have been working with notifications and most recently I have come >>>> to look into events generated when a new entry is created. Now >>>> normally I would just expect a CacheEntryCreatedEvent to be raised. >>>> However we currently raise a CacheEntryModifiedEvent event and then a >>>> CacheEntryCreatedEvent. I notice that there are comments around the >>>> code saying that tests require both to be fired. >>> >>> it doesn't sound right to me: modified is different than created. >> >> I've lost count the number of times I've raised this up in the dev mailing list :| >> >> And, if CacheEntryModifiedEvent has a method called isCreated(), is cos I added it in order to differentiate between a modify and a create without breaking backwards compatibility and expectations of events to be received. Just need to trace back the jira issue number, and associated forum threads ;) :p > > Ah nice I didn't even notice the method until you pointed it out. > >> >>> >>>> >>>> I am wondering if anyone has an objection to only raising a >>>> CacheEntryCreatedEvent on a new cache entry being created. >> >> It'd break expectations of existing applications that expect certain events. It's a very difficult one to swallow. > > I agree. Maybe I should change to if anyone minds if Cluster Listeners > only raise the CacheEntryModifiedEvent on an entry creation for > cluster listeners instead? This wouldn't break existing assumptions > since we don't currently support Cluster Listeners. The only thing is > it wouldn't be consistent with regular listeners? Yeah, it?s a tricky one. You don?t wanna raise both cos that?d be expensive to ship it around for no extra gain. If you are going to choose one that?d be CacheEntryModifiedEvent indeed. I think we can break off here for clustered listeners specifying it clearly. I don?t think there?s much point in creating a new set of listeners/event/annotations for the clustered option since eventually we should move towards JCache listeners and only have custom ones for the extra stuff we provide callbacks for. > > >> >> Plus, there's JCache specifications, which adds listeners, and gets them right. Eventually everyone should move towards that. > > Just to be clear you are saying the JCache only raises a single event > for change and create right? Yeah, see JCacheListenerAdapter class. > >> >>>> Does >>>> anyone know why we raise both currently? >> >> Legacy really. >> >>>> Was it just so the >>>> PutKeyValueCommand could more ignorantly just raise the >>>> CacheEntryModified pre Event? >>>> >>>> Any input would be appreciated, Thanks. >>> >>> Cheers, >>> -- >>> Mircea Markus >>> Infinispan lead (www.infinispan.org) >>> >>> >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> -- >> Galder Zamarre?o >> galder at redhat.com >> twitter.com/galderz >> >> Project Lead, Escalante >> http://escalante.io >> >> Engineer, Infinispan >> http://infinispan.org >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From sanne at infinispan.org Wed Feb 5 09:43:34 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Wed, 5 Feb 2014 14:43:34 +0000 Subject: [infinispan-dev] L1OnRehash Discussion In-Reply-To: References: <5853A076-0FEF-4782-A4A3-6C84999DCA2D@redhat.com> Message-ID: I'm all for simplification, assuming that this will deliver better reliability and easier maintenance, but let's not forget that some entries might be actually large. Saving a couple of transfers might be a pointless complexity for our usual small-key tests but maybe it's an interesting feature when you store gigabytes per value. Also, performance "hiccups" are not desirable even in small-key scenarios: an often read key should stay where it is rather than needing an occasional RPC. I haven't looked into the details of your problem, so if you think it's too complex I'm not against ditching this, I'm just trying to make sure we evaluate the full picture. I think you made a great point when specifying that the entry remaining in place might actually not get any hit - so being pointless - but that should be a decision the eviction strategy should be able to handle? Cheers, Sanne On 5 February 2014 13:19, William Burns wrote: > On Tue, Feb 4, 2014 at 6:04 AM, Dan Berindei wrote: >> >> >> >> On Tue, Feb 4, 2014 at 10:07 AM, Galder Zamarre?o wrote: >>> >>> >>> On 28 Jan 2014, at 15:29, William Burns wrote: >>> >>> > Hello everyone, >>> > >>> > I wanted to discuss what I would say as dubious benefit of L1OnRehash >>> > especially compared to the benefits it provide. >>> > >>> > L1OnRehash is used to retain a value by moving a previously owned >>> > value into the L1 when a rehash occurs and this node no longer owns >>> > that value Also any current L1 values are removed when a rehash >>> > occurs. Therefore it can only save a single remote get for only a few >>> > keys when a rehash occurs. >>> > >>> > This by itself is fine however L1OnRehash has many edge cases to >>> > guarantee consistency as can be seen from >>> > https://issues.jboss.org/browse/ISPN-3838. This can get quite >>> > complicated for a feature that gives marginal performance increases >>> > (especially given that this value may never have been read recently - >>> > at least normal L1 usage guarantees this). >>> > >>> > My first suggestion is instead to deprecate the L1OnRehash >>> > configuration option and to remove this logic. >>> >>> +1 >> >> >> +1 from me as well >> >>> >>> >>> > My second suggestion is a new implementation of L1OnRehash that is >>> > always enabled when L1 threshold is configured to 0. For those not >>> > familiar L1 threshold controls whether invalidations are broadcasted >>> > instead of individual messages. A value of 0 means to always >>> > broadcast. This would allow for some benefits that we can't currently >>> > do: >>> > >>> > 1. L1 values would never have to be invalidated on a rehash event >>> > (guarantee locality reads under rehash) >>> > 2. L1 requestors would not have to be tracked any longer >>> > >>> > However every write would be required to send an invalidation which >>> > could slow write performance in additional cases (since we currently >>> > only send invalidations when requestors are found). The difference >>> > would be lessened with udp, which is the transport I would assume >>> > someone would use when configuring L1 threshold to 0. >>> >>> Sounds good to me, but I think you could go even beyond this and maybe get >>> rid of threshold configuration option too? >>> >>> If the transport is UDP and multicast is configured, invalidations are >>> broadcasted (and apply the two benefits you mention). >>> If UDP w/ unicast or TCP used, track invalidations and send them as >>> unicasts. >>> >>> Do we really need to expose these configuration options to the user? >> >> >> I think the idea was that even with UDP, sending 2 unicasts and waiting for >> only 2 responses may be faster than sending a multicast and waiting for 10 >> responses. However, I'm not sure that's the case if we send 1 unicast >> invalidation from each owner instead of a single multicast invalidation from >> the primary owner/originator [1]. Maybe if each owner would return a list of >> requestors and the originator would do the invalidation at the end... > > I totally agree since we currently have to send invalidations from the > primary owner and all backup owners to guarantee consistency if we > have a response from the backup owner [2]. By moving to this route we > only ever have to send a single multicast invalidation instead of N > unicast invalidations. However this also brings up another change > where we only L1 cache the primary owner response [3] :) Actually that > would tilt the performance discussion the other way. Makes me think > deprecating current L1OnRehash and adding primary owner L1 caching > should be first and then reevaluate if the new L1OnRehash support is > even needed. > > The originator firing the invalidations is interesting, but don't > think it is feasible. With async transport this is not doable at all. > Also if the originator goes down and the value is persisted we will > have invalid L1 values cached still. The latter could be fixed with > txs but non tx would still be broken. > >> >> One tangible benefit of having the setting is that we can run the test suite >> with TCP only, and still cover every path in L1Manager. If removed it >> completely, it would still be possible to change the toggle in L1ManagerImpl >> via reflection, but it would be a little hacky. >> >>> >>> > What do you guys think? I am thinking that no one minds the removal >>> > of L1OnRehash that we have currently (if so let me know). I am quite >>> > curious what others think about the changes for L1 threshold value of >>> > 0, maybe this configuration value is never used? >>> > >> >> >> Since we don't give any guidance as to what a good threshold value would be, >> I doubt many people use it. >> >> My alternative proposal would be to replace the >> invalidationThreshold=-1|0|>0 setting with a traceRequestors=true|false >> setting. >> 1. If traceRequestors == false, don't keep track of requestors, only send >> the invalidation from the originator, and enable l1OnRehash. >> This means we can keep the entries that are in L1 after a rehash as >> well. >> 2. If traceRequestors == true, track requestors, send unicast/multicast >> invalidations depending on the transport, and disable l1OnRehash. > > I have to admit I am struggling with whether we even need this > configuration option anymore and just solely enable requestors based > on the transport configuration. I do like the option though, > especially if we find out not tracking requestors is faster. The > default value though would be based on whether the transport allows > for multicast or not. > >> >> >> [1] https://issues.jboss.org/browse/ISPN-186 > > [2] https://issues.jboss.org/browse/ISPN-3648 > [3] https://issues.jboss.org/browse/ISPN-3684 > >> >> Cheers >> Dan >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev. > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From galder at redhat.com Wed Feb 5 10:03:50 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Wed, 5 Feb 2014 16:03:50 +0100 Subject: [infinispan-dev] New Cache Entry Notifications In-Reply-To: <263229C1-72AC-4DDC-B924-B46D26FA1C6C@redhat.com> References: <1FAA84A0-3AFE-4DA7-980F-AF1FB5725F5A@redhat.com> <263229C1-72AC-4DDC-B924-B46D26FA1C6C@redhat.com> Message-ID: On 05 Feb 2014, at 15:38, Mircea Markus wrote: > > On Feb 3, 2014, at 4:07 PM, Galder Zamarre?o wrote: > >>> >>> On Jan 23, 2014, at 5:48 PM, William Burns wrote: >>> >>>> Hello all, >>>> >>>> I have been working with notifications and most recently I have come >>>> to look into events generated when a new entry is created. Now >>>> normally I would just expect a CacheEntryCreatedEvent to be raised. >>>> However we currently raise a CacheEntryModifiedEvent event and then a >>>> CacheEntryCreatedEvent. I notice that there are comments around the >>>> code saying that tests require both to be fired. >>> >>> it doesn't sound right to me: modified is different than created. >> >> I?ve lost count the number of times I?ve raised this up in the dev mailing list :| >> >> And, if CacheEntryModifiedEvent has a method called isCreated(), is cos I added it in order to differentiate between a modify and a create without breaking backwards compatibility and expectations of events to be received. Just need to trace back the jira issue number, and associated forum threads ;) :p > > Sorry for missing this till now :-) If it was raised that frequently through, must be because people are confused about it. As we're having a major release next, I think we should get this right, even if breaking backward compatibility, and document it as such in the migration guide. -1. As already mentioned, the reason why we?ve never this tackled this problem is cos of JCache, which gets listeners right in this area. JCache is about to go final and people should start moving towards that. Redoing our listeners would be a waste of time IMO. You?d be doing some work to fix something people should stop using in near-medium future. > >> >>> >>>> >>>> I am wondering if anyone has an objection to only raising a >>>> CacheEntryCreatedEvent on a new cache entry being created. >> >> It?d break expectations of existing applications that expect certain events. It?s a very difficult one to swallow. > > we're at a major now, so we should break compatibility if it makes sense. > >> >> Plus, there?s JCache specifications, which adds listeners, and gets them right. Eventually everyone should move towards that. > > Not sure I understand: JCache raises both an "created" and a "modified" event when an entry is created? or just "created" events? > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From galder at redhat.com Wed Feb 5 10:05:34 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Wed, 5 Feb 2014 16:05:34 +0100 Subject: [infinispan-dev] New Cache Entry Notifications In-Reply-To: <263229C1-72AC-4DDC-B924-B46D26FA1C6C@redhat.com> References: <1FAA84A0-3AFE-4DA7-980F-AF1FB5725F5A@redhat.com> <263229C1-72AC-4DDC-B924-B46D26FA1C6C@redhat.com> Message-ID: On 05 Feb 2014, at 15:38, Mircea Markus wrote: > > On Feb 3, 2014, at 4:07 PM, Galder Zamarre?o wrote: > >>> >>> On Jan 23, 2014, at 5:48 PM, William Burns wrote: >>> >>>> Hello all, >>>> >>>> I have been working with notifications and most recently I have come >>>> to look into events generated when a new entry is created. Now >>>> normally I would just expect a CacheEntryCreatedEvent to be raised. >>>> However we currently raise a CacheEntryModifiedEvent event and then a >>>> CacheEntryCreatedEvent. I notice that there are comments around the >>>> code saying that tests require both to be fired. >>> >>> it doesn't sound right to me: modified is different than created. >> >> I?ve lost count the number of times I?ve raised this up in the dev mailing list :| >> >> And, if CacheEntryModifiedEvent has a method called isCreated(), is cos I added it in order to differentiate between a modify and a create without breaking backwards compatibility and expectations of events to be received. Just need to trace back the jira issue number, and associated forum threads ;) :p > > Sorry for missing this till now :-) If it was raised that frequently through, must be because people are confused about it. As we're having a major release next, I think we should get this right, even if breaking backward compatibility, and document it as such in the migration guide. > >> >>> >>>> >>>> I am wondering if anyone has an objection to only raising a >>>> CacheEntryCreatedEvent on a new cache entry being created. >> >> It?d break expectations of existing applications that expect certain events. It?s a very difficult one to swallow. > > we're at a major now, so we should break compatibility if it makes sense. > >> >> Plus, there?s JCache specifications, which adds listeners, and gets them right. Eventually everyone should move towards that. > > Not sure I understand: JCache raises both an "created" and a "modified" event when an entry is created? or just "created" events? JCache differentiates between an entry being created vs being updated, and hence it sends different events depending of which case it is. See JCacheListenerAdapter and JCacheListenerNotifier classes in our JCache impl. > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From mmarkus at redhat.com Wed Feb 5 10:40:41 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Wed, 5 Feb 2014 15:40:41 +0000 Subject: [infinispan-dev] New Cache Entry Notifications In-Reply-To: References: <1FAA84A0-3AFE-4DA7-980F-AF1FB5725F5A@redhat.com> <263229C1-72AC-4DDC-B924-B46D26FA1C6C@redhat.com> Message-ID: <49A1AD3E-A626-4FB7-A415-0398E3DC65D7@redhat.com> On Feb 5, 2014, at 3:03 PM, Galder Zamarre?o wrote: > On 05 Feb 2014, at 15:38, Mircea Markus wrote: > >> >> On Feb 3, 2014, at 4:07 PM, Galder Zamarre?o wrote: >> >>>> >>>> On Jan 23, 2014, at 5:48 PM, William Burns wrote: >>>> >>>>> Hello all, >>>>> >>>>> I have been working with notifications and most recently I have come >>>>> to look into events generated when a new entry is created. Now >>>>> normally I would just expect a CacheEntryCreatedEvent to be raised. >>>>> However we currently raise a CacheEntryModifiedEvent event and then a >>>>> CacheEntryCreatedEvent. I notice that there are comments around the >>>>> code saying that tests require both to be fired. >>>> >>>> it doesn't sound right to me: modified is different than created. >>> >>> I?ve lost count the number of times I?ve raised this up in the dev mailing list :| >>> >>> And, if CacheEntryModifiedEvent has a method called isCreated(), is cos I added it in order to differentiate between a modify and a create without breaking backwards compatibility and expectations of events to be received. Just need to trace back the jira issue number, and associated forum threads ;) :p >> >> Sorry for missing this till now :-) If it was raised that frequently through, must be because people are confused about it. As we're having a major release next, I think we should get this right, even if breaking backward compatibility, and document it as such in the migration guide. > > -1. > > As already mentioned, the reason why we?ve never this tackled this problem is cos of JCache, which gets listeners right in this area. JCache is about to go final and people should start moving towards that. Redoing our listeners would be a waste of time IMO. The effort here is minimum, pretty much adding an if statement. The good thing though is that you won't have to raise this on the mailing list again :-) > You?d be doing some work to fix something people should stop using in near-medium future. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Wed Feb 5 10:53:38 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Wed, 5 Feb 2014 15:53:38 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> Message-ID: <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard wrote: > Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed. Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-) > > I have to admit that having to add a cache name to the stored elements of the index documents makes me a bit sad. sad because of the increased index size? > I was already unhappy when I had to do it for class names. Renaming a cache will be a heavy operation too. > Sanne, if we know that we don't share the semi index for different caches, can we avoid the need to store the cache name in each document? > > BTW, this discussion should be in the open. +1 > > On 31 janv. 2014, at 18:04, Adrian Nistor wrote: > >> I think it conceptually makes sense to have one entity type per cache but this should be a good practice rather than an enforced constraint. It would be a bit late and difficult to add such a constraint now. >> >> The design change we are talking about is being able to search across caches. That can easily be implemented regardless of this. We can move the SearchManager from Cache scope to CacheManager scope. Indexes are bound to types not to caches anyway, so same-type entities from multiple caches can end up in the same index, we just need to store an extra hidden field: the name of the originating cache. This move would also allow us to share some lucene/hsearch resources. >> >> We can easily continue to support Search.getSearchManager(cache) so old api usages continue to work. This would return a delegating/decorating SearchManager that creates queries that are automatically restricted to the scope of the given cache. >> >> Piece of cake? :) >> >> >> >> On Thu, Jan 30, 2014 at 9:56 PM, Mircea Markus wrote: >> curious to see your thoughts on this: it is a recurring topic and will affects the way we design things in future in a significant way. >> E.g. if we think (recommend) that a distinct cache should be used for each entity, then we'll need querying to work between caches. Also some cache stores can be built along these lines (e.g. for the JPA cache store we only need it to support a single entity type). >> >> Begin forwarded message: >> >> > On Jan 30, 2014, at 9:42 AM, Galder Zamarre?o wrote: >> > >> >> >> >> On Jan 21, 2014, at 11:52 PM, Mircea Markus wrote: >> >> >> >>> >> >>> On Jan 15, 2014, at 1:42 PM, Emmanuel Bernard wrote: >> >>> >> >>>> By the way, people looking for that feature are also asking for a unified Cache API accessing these several caches right? Otherwise I am not fully understanding why they ask for a unified query. >> >>>> Do you have written detailed use cases somewhere for me to better understand what is really requested? >> >>> >> >>> IMO from a user perspective, being able to run queries spreading several caches makes the programming simplifies the programming model: each cache corresponding to a single entity type, with potentially different configuration. >> >> >> >> Not sure if it simplifies things TBH if the configuration is the same. IMO, it just adds clutter. >> > >> > Not sure I follow: having a cache that contains both Cars and Persons sound more cluttering to me. I think it's cumbersome to write any kind of querying with an heterogenous cache, e.g. Map/Reduce tasks that need to count all the green Cars would need to be aware of Persons and ignore them. Not only it is harder to write, but discourages code reuse and makes it hard to maintain (if you'll add Pets in the same cache in future you need to update the M/R code as well). And of course there are also different cache-based configuration options that are not immediately obvious (at design time) but will be in the future (there are more Persons than Cars, they live longer/expiry etc): mixing everything together in the same cache from the begging is a design decision that might bite you in the future. >> > >> > The way I see it - and very curious to see your opinion on this - following an database analogy, the CacheManager corresponds to an Database and the Cache to a Table. Hence my thought that queries spreading multiple caches are both useful and needed (same as query spreading over multiple tables). >> > >> > >> >> Cheers, >> -- >> Mircea Markus >> Infinispan lead (www.infinispan.org) >> >> >> >> >> Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From emmanuel at hibernate.org Wed Feb 5 11:30:32 2014 From: emmanuel at hibernate.org (Emmanuel Bernard) Date: Wed, 5 Feb 2014 17:30:32 +0100 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> Message-ID: <20140205163032.GB93108@hibernate.org> On Wed 2014-02-05 15:53, Mircea Markus wrote: > > On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard wrote: > > > Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed. > > Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-) //some unified query giving me entries pointing by fk copy to bar and //buz objects. So I need to manually load these references. //happy emmanuel Cache unifiedCache = cacheManager.getMotherOfAllCaches(); Bar bar = unifiedCache.get(foo); Buz buz = unifiedCache.get(baz); //not so happy emmanuel Cache fooCache = cacheManager.getCache("foo"); Bar bar = fooCache.get(foo); Cache bazCache = cacheManager.getCache("baz"); Buz buz = bazCache.put(baz); > > > > > I have to admit that having to add a cache name to the stored elements of the index documents makes me a bit sad. > > sad because of the increased index size? It makes the index non natural and less reusable using direct Lucene APIs. But that might be less of a concern for Infinispan. > > > I was already unhappy when I had to do it for class names. Renaming a cache will be a heavy operation too. > > Sanne, if we know that we don't share the semi index for different caches, can we avoid the need to store the cache name in each document? From rvansa at redhat.com Wed Feb 5 11:44:59 2014 From: rvansa at redhat.com (Radim Vansa) Date: Wed, 05 Feb 2014 17:44:59 +0100 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <20140205163032.GB93108@hibernate.org> References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> Message-ID: <52F26A8B.60306@redhat.com> On 02/05/2014 05:30 PM, Emmanuel Bernard wrote: > On Wed 2014-02-05 15:53, Mircea Markus wrote: >> On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard wrote: >> >>> Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed. >> Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-) > > //some unified query giving me entries pointing by fk copy to bar and > //buz objects. So I need to manually load these references. > > //happy emmanuel > Cache unifiedCache = cacheManager.getMotherOfAllCaches(); > Bar bar = unifiedCache.get(foo); > Buz buz = unifiedCache.get(baz); > > //not so happy emmanuel > Cache fooCache = cacheManager.getCache("foo"); > Bar bar = fooCache.get(foo); > Cache bazCache = cacheManager.getCache("baz"); > Buz buz = bazCache.put(baz); cacheManager.getCache("foo").put("xxx", "yyy"); cacheManager.getCache("foo").put("xxx", "zzz"); String xxx = cacheManager.getMotherOfAllCaches().get("xxx"); System.out.println(xxx); What should it print? Should an exception be thrown? Or should get on mother of all caches return Map, String>? Radim > > >>> I have to admit that having to add a cache name to the stored elements of the index documents makes me a bit sad. >> sad because of the increased index size? > It makes the index non natural and less reusable using direct Lucene > APIs. But that might be less of a concern for Infinispan. > >>> I was already unhappy when I had to do it for class names. Renaming a cache will be a heavy operation too. >>> Sanne, if we know that we don't share the semi index for different caches, can we avoid the need to store the cache name in each document? > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Radim Vansa JBoss DataGrid QA From mmarkus at redhat.com Wed Feb 5 11:59:41 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Wed, 5 Feb 2014 16:59:41 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <20140205163032.GB93108@hibernate.org> References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> Message-ID: <066E5871-8FAD-4F89-9705-767B1BC41037@redhat.com> On Feb 5, 2014, at 4:30 PM, Emmanuel Bernard wrote: > On Wed 2014-02-05 15:53, Mircea Markus wrote: >> >> On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard wrote: >> >>> Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed. >> >> Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-) > > > //some unified query giving me entries pointing by fk copy to bar and > //buz objects. So I need to manually load these references. > > //happy emmanuel > Cache unifiedCache = cacheManager.getMotherOfAllCaches(); > Bar bar = unifiedCache.get(foo); > Buz buz = unifiedCache.get(baz); Can you please elaborate the advantages the mother of all caches would bring? :-) It but feels to me like querying a whole database by a primary key without mentioning the table name :-) Also might get nasty if multiple caches have the same key. > //not so happy emmanuel > Cache fooCache = cacheManager.getCache("foo"); > Bar bar = fooCache.get(foo); > Cache bazCache = cacheManager.getCache("baz"); > Buz buz = bazCache.put(baz); Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From emmanuel at hibernate.org Wed Feb 5 14:34:45 2014 From: emmanuel at hibernate.org (Emmanuel Bernard) Date: Wed, 5 Feb 2014 20:34:45 +0100 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <52F26A8B.60306@redhat.com> References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <52F26A8B.60306@redhat.com> Message-ID: <20140205193445.GC93108@hibernate.org> On Wed 2014-02-05 17:44, Radim Vansa wrote: > On 02/05/2014 05:30 PM, Emmanuel Bernard wrote: > > On Wed 2014-02-05 15:53, Mircea Markus wrote: > >> On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard wrote: > >> > >>> Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed. > >> Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-) > > > > //some unified query giving me entries pointing by fk copy to bar and > > //buz objects. So I need to manually load these references. > > > > //happy emmanuel > > Cache unifiedCache = cacheManager.getMotherOfAllCaches(); > > Bar bar = unifiedCache.get(foo); > > Buz buz = unifiedCache.get(baz); > > > > //not so happy emmanuel > > Cache fooCache = cacheManager.getCache("foo"); > > Bar bar = fooCache.get(foo); > > Cache bazCache = cacheManager.getCache("baz"); > > Buz buz = bazCache.put(baz); > > cacheManager.getCache("foo").put("xxx", "yyy"); > cacheManager.getCache("foo").put("xxx", "zzz"); > > String xxx = cacheManager.getMotherOfAllCaches().get("xxx"); > System.out.println(xxx); > > What should it print? Should an exception be thrown? Or should get on > mother of all caches return Map, String>? > Yes I'm aware of that. What I am saying is that the idea of search across caches as appealing as it is is is not the whole story. People search, read, navigate and M/R their data in interleaved ways. You need to project and think about a 100-200 lines of code that would use that feature in combination with other related features to see if that will be useful in the end (or gimmicky) and if the user experience (API mostly in our case) will be good or make people kill themselves. The feeling I have is that we are too feature focused and not enough use case and experience focused. From rhauch at redhat.com Wed Feb 5 14:54:02 2014 From: rhauch at redhat.com (Randall Hauch) Date: Wed, 5 Feb 2014 13:54:02 -0600 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <20140205193445.GC93108@hibernate.org> References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <52F26A8B.60306@redhat.com> <20140205193445.GC93108@hibernate.org> Message-ID: <9C4DBDF7-EB51-45F3-B9D2-815E9D215C9B@redhat.com> On Feb 5, 2014, at 1:34 PM, Emmanuel Bernard wrote: > What I am saying is that the idea of search across caches as > appealing as it is is is not the whole story. > > People search, read, navigate and M/R their data in interleaved ways. > You need to project and think about a 100-200 lines of code that would > use that feature in combination with other related features to see if > that will be useful in the end (or gimmicky) and if the user experience > (API mostly in our case) will be good or make people kill themselves. > What is the plan for supporting joins across entity types? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140205/ab7ccc23/attachment.html From mmarkus at redhat.com Wed Feb 5 16:40:57 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Wed, 5 Feb 2014 21:40:57 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <20140205193445.GC93108@hibernate.org> References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <52F26A8B.60306@redhat.com> <20140205193445.GC93108@hibernate.org> Message-ID: On Feb 5, 2014, at 7:34 PM, Emmanuel Bernard wrote: > On Wed 2014-02-05 17:44, Radim Vansa wrote: >> On 02/05/2014 05:30 PM, Emmanuel Bernard wrote: >>> On Wed 2014-02-05 15:53, Mircea Markus wrote: >>>> On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard wrote: >>>> >>>>> Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed. >>>> Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-) >>> >>> //some unified query giving me entries pointing by fk copy to bar and >>> //buz objects. So I need to manually load these references. >>> >>> //happy emmanuel >>> Cache unifiedCache = cacheManager.getMotherOfAllCaches(); >>> Bar bar = unifiedCache.get(foo); >>> Buz buz = unifiedCache.get(baz); >>> >>> //not so happy emmanuel >>> Cache fooCache = cacheManager.getCache("foo"); >>> Bar bar = fooCache.get(foo); >>> Cache bazCache = cacheManager.getCache("baz"); >>> Buz buz = bazCache.put(baz); >> >> cacheManager.getCache("foo").put("xxx", "yyy"); >> cacheManager.getCache("foo").put("xxx", "zzz"); >> >> String xxx = cacheManager.getMotherOfAllCaches().get("xxx"); >> System.out.println(xxx); >> >> What should it print? Should an exception be thrown? Or should get on >> mother of all caches return Map, String>? >> > > Yes I'm aware of that. > What I am saying is that the idea of search across caches as > appealing as it is is is not the whole story. > > People search, read, navigate and M/R their data in interleaved ways. In all the non-trivial deployments I saw people used multiple caches for different data, instead of one. That's why for me this came as the straight forward way of structuring data and naturally I thought that querying multiple caches makes sense in this context: to allow querying to run over a model that is already in use and not to change the model to accommodate querying. > You need to project and think about a 100-200 lines of code that would > use that feature in combination with other related features to see if > that will be useful in the end (or gimmicky) and if the user experience > (API mostly in our case) will be good or make people kill themselves. > > The feeling I have is that we are too feature focused and not enough use > case and experience focused. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mgencur at redhat.com Thu Feb 6 04:52:37 2014 From: mgencur at redhat.com (Martin Gencur) Date: Thu, 06 Feb 2014 10:52:37 +0100 Subject: [infinispan-dev] infinispan-bom vs. infinispan-parent dependencies Message-ID: <52F35B65.1080106@redhat.com> Hi, there are currently two Maven pom files in Infinispan where dependency versions are defined - infinispan-bom and infinispan-parent. For instance, version.protostream is defined in the BOM while version.commons.pool is defined in infinispan-parent. This causes me troubles when I want to do filtering with maven-resources-plugin and substitute versions of dependencies in certain configuration file because properties defined in the BOM are not visible to other modules (I'm currently trying to generate "features" file for HotRod to be easily deployable into Karaf - https://issues.jboss.org/browse/ISPN-3967, and I can't really access versions of some dependencies) We include the BOM file in infinispan-parent as a dependency with scope "import" which causes the properties defined in the BOM to be lost. Questions: Is there a reason why we include it as a dependency and do not have it as a parent of infinispan-parent? (as suggested in [1]) Can someone explain the reason why we have version declarations in two separate files? If you possibly know how to access properties in the BOM, please advise. To me it seems impossible without some nasty hacks. Thanks, Martin [1] http://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140206/9fc95081/attachment.html From ttarrant at redhat.com Thu Feb 6 07:19:46 2014 From: ttarrant at redhat.com (Tristan Tarrant) Date: Thu, 06 Feb 2014 06:19:46 -0600 Subject: [infinispan-dev] infinispan-bom vs. infinispan-parent dependencies In-Reply-To: <52F35B65.1080106@redhat.com> References: <52F35B65.1080106@redhat.com> Message-ID: <52F37DE2.9030603@redhat.com> The idea is that the bom should have dependencies required by applications using Infinispan, whereas -parent includes build-time dependencies. Tristan On 02/06/2014 03:52 AM, Martin Gencur wrote: > Hi, > there are currently two Maven pom files in Infinispan where dependency > versions are defined - infinispan-bom and infinispan-parent. For > instance, version.protostream is defined in the BOM while > version.commons.pool is defined in infinispan-parent. > > This causes me troubles when I want to do filtering with > maven-resources-plugin and substitute versions of dependencies in > certain configuration file because properties defined in the BOM are > not visible to other modules (I'm currently trying to generate > "features" file for HotRod to be easily deployable into Karaf - > https://issues.jboss.org/browse/ISPN-3967, and I can't really access > versions of some dependencies) > > We include the BOM file in infinispan-parent as a dependency with > scope "import" which causes the properties defined in the BOM to be lost. > > Questions: > Is there a reason why we include it as a dependency and do not have it > as a parent of infinispan-parent? (as suggested in [1]) > Can someone explain the reason why we have version declarations in two > separate files? > If you possibly know how to access properties in the BOM, please > advise. To me it seems impossible without some nasty hacks. > > Thanks, > Martin > > > [1] > http://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From emmanuel at hibernate.org Thu Feb 6 04:27:45 2014 From: emmanuel at hibernate.org (Emmanuel Bernard) Date: Thu, 6 Feb 2014 10:27:45 +0100 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <52F26A8B.60306@redhat.com> <20140205193445.GC93108@hibernate.org> Message-ID: <20140206092745.GA95590@hibernate.org> On Wed 2014-02-05 21:40, Mircea Markus wrote: > > On Feb 5, 2014, at 7:34 PM, Emmanuel Bernard wrote: > > > On Wed 2014-02-05 17:44, Radim Vansa wrote: > >> On 02/05/2014 05:30 PM, Emmanuel Bernard wrote: > >>> On Wed 2014-02-05 15:53, Mircea Markus wrote: > >>>> On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard wrote: > >>>> > >>>>> Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed. > >>>> Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-) > >>> > >>> //some unified query giving me entries pointing by fk copy to bar and > >>> //buz objects. So I need to manually load these references. > >>> > >>> //happy emmanuel > >>> Cache unifiedCache = cacheManager.getMotherOfAllCaches(); > >>> Bar bar = unifiedCache.get(foo); > >>> Buz buz = unifiedCache.get(baz); > >>> > >>> //not so happy emmanuel > >>> Cache fooCache = cacheManager.getCache("foo"); > >>> Bar bar = fooCache.get(foo); > >>> Cache bazCache = cacheManager.getCache("baz"); > >>> Buz buz = bazCache.put(baz); > >> > >> cacheManager.getCache("foo").put("xxx", "yyy"); > >> cacheManager.getCache("foo").put("xxx", "zzz"); > >> > >> String xxx = cacheManager.getMotherOfAllCaches().get("xxx"); > >> System.out.println(xxx); > >> > >> What should it print? Should an exception be thrown? Or should get on > >> mother of all caches return Map, String>? > >> > > > > Yes I'm aware of that. > > What I am saying is that the idea of search across caches as > > appealing as it is is is not the whole story. > > > > People search, read, navigate and M/R their data in interleaved ways. > > In all the non-trivial deployments I saw people used multiple caches for different data, instead of one. That's why for me this came as the straight forward way of structuring data and naturally I thought that querying multiple caches makes sense in this context: to allow querying to run over a model that is already in use and not to change the model to accommodate querying. Maybe it is but what is the right way to address that? What does the API flow look like? Is that one app using 50 or 100 cache and juggling with them or rather 50 apps using the same shared grid and using 1 maybe 2 cache. Just to be clear, I think cross cache querying is something we need. I am just questioning how it will be used in a bigger context and how the over Infinispan API should look like to address the bigger context. BTW the example you saw, Is that one cache per atomic type or rather one cache per family of data. From ben.cotton at ALUMNI.RUTGERS.EDU Thu Feb 6 14:46:23 2014 From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben) Date: Thu, 6 Feb 2014 11:46:23 -0800 (PST) Subject: [infinispan-dev] Infinispan embedded off-heap cache In-Reply-To: <1389893871449-4028653.post@n3.nabble.com> References: <3BE9E09A-6651-45D9-B7F1-891C111F232C@redhat.com> <1389783264288-4028642.post@n3.nabble.com> <52D67480.9020908@redhat.com> <52D678BD.5010400@redhat.com> <52D67A1E.2010509@redhat.com> <1389808266330-4028649.post@n3.nabble.com> <1389893871449-4028653.post@n3.nabble.com> Message-ID: <1391715983011-4028794.post@n3.nabble.com> Hi everybody. We are getting started with our POC design/build of this post's ambition. Currently at an ISPN build-from-scratch newbie roadblock. I know I should be patient, but if any of you have time could one of you hook me up with the official "How 2 Fork/Clone/Extend/Build your own ISPN Master from GIT " wiki link? ROADBLOCK details here --> https://community.jboss.org/thread/236848 Thx, Ben -- View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028794.html Sent from the Infinispan Developer List mailing list archive at Nabble.com. From mmarkus at redhat.com Thu Feb 6 15:49:01 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Thu, 6 Feb 2014 20:49:01 +0000 Subject: [infinispan-dev] Infinispan embedded off-heap cache In-Reply-To: <1391715983011-4028794.post@n3.nabble.com> References: <3BE9E09A-6651-45D9-B7F1-891C111F232C@redhat.com> <1389783264288-4028642.post@n3.nabble.com> <52D67480.9020908@redhat.com> <52D678BD.5010400@redhat.com> <52D67A1E.2010509@redhat.com> <1389808266330-4028649.post@n3.nabble.com> <1389893871449-4028653.post@n3.nabble.com> <1391715983011-4028794.post@n3.nabble.com> Message-ID: <3560BCBE-3D25-472B-9D8B-C16230FBCB4B@redhat.com> Replied on the forum ;) On Feb 6, 2014, at 7:46 PM, cotton-ben wrote: > > Hi everybody. > > We are getting started with our POC design/build of this post's ambition. > > Currently at an ISPN build-from-scratch newbie roadblock. I know I should > be patient, but if any of you have time could one of you hook me up with the > official "How 2 Fork/Clone/Extend/Build your own ISPN Master from GIT " wiki > link? > > ROADBLOCK details here --> https://community.jboss.org/thread/236848 > > Thx, > Ben > > > > > -- > View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028794.html > Sent from the Infinispan Developer List mailing list archive at Nabble.com. > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From galder at redhat.com Fri Feb 7 09:27:11 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Fri, 7 Feb 2014 15:27:11 +0100 Subject: [infinispan-dev] Wildfly's build/lib.xml behaves unexpectedly with JDK8 Message-ID: Hi, In JDK8, [1] causes issues, since the replace only happens the first time the character is found. We use this lib.xml in Infinispan as well [2]. I?ve workaround it by doing this instead: name = name.split(".").join("/"); This seems to work fine, but have not fully tested it. Cheers, [1] https://github.com/wildfly/wildfly/blob/master/build/lib.xml#L75 [2] https://issues.jboss.org/browse/ISPN-3974?focusedCommentId=12942643&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12942643 -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From mmarkus at redhat.com Fri Feb 7 14:44:37 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Fri, 7 Feb 2014 19:44:37 +0000 Subject: [infinispan-dev] 7.0.0.Alpha1 Message-ID: Hey guys, I think we have enough stuff to cut a 7.0.0.Alpha1 next week. Besides quite some fixes that came in, we have: - Vladimir's parallel map reduce (ISPN-2284) - Tristan's autorisation for embedded mode (ISPN-3909) - Will's clustered listeners (ISPN-3355) Let's aim for Thu 20 Feb. Next in charge with releasing is Dan (release rotation is defined in the release doc now). Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From vblagoje at redhat.com Fri Feb 7 14:54:42 2014 From: vblagoje at redhat.com (Vladimir Blagojevic) Date: Fri, 07 Feb 2014 14:54:42 -0500 Subject: [infinispan-dev] 7.0.0.Alpha1 In-Reply-To: References: Message-ID: <52F53A02.5050602@redhat.com> Mircea, ISPN-2284 has not been integrated yet. Dan and Will gave me some really good feedback that resulted in additional fixes, further explicit testing of parallel execution. I think we have it ready for integration now. Regards, Vladimir On 2/7/2014, 2:44 PM, Mircea Markus wrote: > Hey guys, > > I think we have enough stuff to cut a 7.0.0.Alpha1 next week. Besides quite some fixes that came in, we have: > - Vladimir's parallel map reduce (ISPN-2284) > - Tristan's autorisation for embedded mode (ISPN-3909) > - Will's clustered listeners (ISPN-3355) > > Let's aim for Thu 20 Feb. Next in charge with releasing is Dan (release rotation is defined in the release doc now). > > Cheers, From mmarkus at redhat.com Fri Feb 7 15:18:56 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Fri, 7 Feb 2014 15:18:56 -0500 (EST) Subject: [infinispan-dev] 7.0.0.Alpha1 In-Reply-To: <52F53A02.5050602@redhat.com> References: <52F53A02.5050602@redhat.com> Message-ID: <971B535B-95A0-4DDC-9904-242D7DDD8F33@redhat.com> Yep, non of the above has, the plan is to get them in, though. > On 7 Feb 2014, at 19:54, Vladimir Blagojevic wrote: > > Mircea, > > ISPN-2284 has not been integrated yet. Dan and Will gave me some really > good feedback that resulted in additional fixes, further explicit > testing of parallel execution. I think we have it ready for integration now. > > Regards, > Vladimir > > >> On 2/7/2014, 2:44 PM, Mircea Markus wrote: >> Hey guys, >> >> I think we have enough stuff to cut a 7.0.0.Alpha1 next week. Besides quite some fixes that came in, we have: >> - Vladimir's parallel map reduce (ISPN-2284) >> - Tristan's autorisation for embedded mode (ISPN-3909) >> - Will's clustered listeners (ISPN-3355) >> >> Let's aim for Thu 20 Feb. Next in charge with releasing is Dan (release rotation is defined in the release doc now). >> >> Cheers, > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From ben.cotton at ALUMNI.RUTGERS.EDU Sun Feb 9 19:42:03 2014 From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben) Date: Sun, 9 Feb 2014 16:42:03 -0800 (PST) Subject: [infinispan-dev] Infinispan embedded off-heap cache In-Reply-To: <3560BCBE-3D25-472B-9D8B-C16230FBCB4B@redhat.com> References: <1389783264288-4028642.post@n3.nabble.com> <52D67480.9020908@redhat.com> <52D678BD.5010400@redhat.com> <52D67A1E.2010509@redhat.com> <1389808266330-4028649.post@n3.nabble.com> <1389893871449-4028653.post@n3.nabble.com> <1391715983011-4028794.post@n3.nabble.com> <3560BCBE-3D25-472B-9D8B-C16230FBCB4B@redhat.com> Message-ID: <1391992923651-4028800.post@n3.nabble.com> FYI, we've got all the "can we build thisfrom w/in JPM.com?" plumbing concerns 100% resolved. So now it is "Heap No! Heap No! It's off to work we go ...." https://github.com/Cotton-Ben/infinispan Will share musings/fears/roadblocks/triumphs/etc here and at https://issues.jboss.org/browse/ISPN-871 Thx, Ben -- View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028800.html Sent from the Infinispan Developer List mailing list archive at Nabble.com. From galder at redhat.com Mon Feb 10 03:34:43 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Mon, 10 Feb 2014 09:34:43 +0100 Subject: [infinispan-dev] Wildfly's build/lib.xml behaves unexpectedly with JDK8 In-Reply-To: References: Message-ID: <2771A788-3C19-4E11-9668-AE79469EEC36@redhat.com> Actually, split/join does not work with JDK7. The following code seems to work with both: name = name.split(".").join("/"); if (name) { self.log("Use JDK8 method to build module names"); } else { name = attributes.get("name"); name = name.replace(".", "/"); self.log("Use JDK7 method to build module names"); } Cheers, On 07 Feb 2014, at 15:27, Galder Zamarre?o wrote: > Hi, > > In JDK8, [1] causes issues, since the replace only happens the first time the character is found. > > We use this lib.xml in Infinispan as well [2]. I?ve workaround it by doing this instead: > > name = name.split(".").join("/"); > > This seems to work fine, but have not fully tested it. > > Cheers, > > [1] https://github.com/wildfly/wildfly/blob/master/build/lib.xml#L75 > [2] https://issues.jboss.org/browse/ISPN-3974?focusedCommentId=12942643&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12942643 > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz > > Project Lead, Escalante > http://escalante.io > > Engineer, Infinispan > http://infinispan.org > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From galder at redhat.com Mon Feb 10 03:51:59 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Mon, 10 Feb 2014 09:51:59 +0100 Subject: [infinispan-dev] Dropping AtomicMap/FineGrainedAtomicMap In-Reply-To: <52DD0961.90600@infinispan.org> References: <52DD0961.90600@infinispan.org> Message-ID: <04AA54F3-1FA1-4BFF-9C0B-D8263E6FBBA2@redhat.com> On 20 Jan 2014, at 12:32, Pedro Ruivo wrote: > Hi, > > On 01/20/2014 11:28 AM, Galder Zamarre?o wrote: >> Hi all, >> >> Dropping AtomicMap and FineGrainedAtomicMap was discussed last week in the F2F meeting [1]. It's complex and buggy, and we'd recommend people to use the Grouping API instead [2]. Grouping API would allow data to reside together, while the standard map API would apply per-key locking. > > +1. are we going to dropping the Delta stuff? The delta would be the k/v pair. Say you are storing HTTP sessions. With AMs, the key would be the session ID and all its attributes would be stored in the atomic map. Once you remove that, each session?s attributes is a single k/v pair in the cache, so that?s your delta. >> >> We don't have a timeline for this yet, but we want to get as much feedback on the topic as possible so that we can evaluate the options. > > before starting with it, I would recommend to add the following method > to cache API: > > /** > * returns all the keys and values associated with the group name. The > Map is immutable (i.e. read-only) > **/ > Map getGroup(String groupName); Yes, I think we?d need this for grouping to be able have a full replacement for atomic maps. You need a way to retrieve all the data associated with that group without the to iterate the cache yourself, or the need to add indexing. In the case of HTTP sessions, you?d give the session ID as key and it?d give you a map view of all the attributes associated with that session. Cheers, > > Cheers, > Pedro > >> >> Cheers, >> >> [1] https://issues.jboss.org/browse/ISPN-3901 >> [2] http://infinispan.org/docs/6.0.x/user_guide/user_guide.html#_the_grouping_api >> -- >> Galder Zamarre?o >> galder at redhat.com >> twitter.com/galderz >> >> Project Lead, Escalante >> http://escalante.io >> >> Engineer, Infinispan >> http://infinispan.org >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From galder at redhat.com Mon Feb 10 03:58:26 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Mon, 10 Feb 2014 09:58:26 +0100 Subject: [infinispan-dev] Dropping AtomicMap/FineGrainedAtomicMap In-Reply-To: <52DEDBCF.7030204@redhat.com> References: <52DD0961.90600@infinispan.org> <52DEDBCF.7030204@redhat.com> Message-ID: The delta?s will remain, but they?re each key/value pair. Example: say you want to store a dehydrated list of three elements (?one?, ?two?, ?three?) in Infinispan Before you?d do (approx): key=my-list value=AtomicMap(k=1,v=?one?, k=2,v=?two?, k3=?v3?) Internally, we?d track deltas and only send those changes. What I propose we do is: key=1 (group=?my-list") value=?one? key=2 (group=?my-list") value=?two? key=3 (group=?my-list") value=?three? The delta?s are still there. Each changed key is sent separately, when it changes. This is not the final product of course. As agreed with Pedro, we?d need a way to have a view map for all key/value pairs associated with a given group, i.e. cache.getGroups(?my-list?). I know Sanne et al also need a way to have coarse grained locking on the entire group sometimes, as well as fine grained locking, so we?d need to find a way to accomodate that. Cheers, On 21 Jan 2014, at 21:42, Vladimir Blagojevic wrote: > I agree with Erik here. Deltas are used in M/R and I've never detected > any problems so far. > On 1/21/2014, 1:39 PM, Erik Salter wrote: >> Please don't remove the Delta stuff. That's quite useful, especially for >> large collections. >> >> Erik >> > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From galder at redhat.com Mon Feb 10 04:02:58 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Mon, 10 Feb 2014 10:02:58 +0100 Subject: [infinispan-dev] Dropping AtomicMap/FineGrainedAtomicMap In-Reply-To: References: <52DD0961.90600@infinispan.org> <52DEDBCF.7030204@redhat.com> <1EB0E9C8-AFD2-4172-874F-25BC2B12C6C4@redhat.com> Message-ID: <14F07DD7-8432-4E3E-B758-8F8AE5956FE8@redhat.com> On 27 Jan 2014, at 11:27, Dan Berindei wrote: > I think it's way too early to discuss removing FineGrainedAtomicMap and AtomicMap, as long as we don't have a concrete alternative with similar properties. You have a point there, but we can?t ignore the feedback that says that atomic maps are not being used because they are buggy, and instead they are using grouping. Deeply, I think we have two ways of doing the same thing, which is confusing from my POV, and one of them is not being used enough, or we?re not fixing the stuff there. Regardless of whether it?s too early or not, this email is trying to spark a consolidation of the two technologies into a single solution that works for everyone and we maintained it actively :) > Cache.getGroup(groupName) is just a method name at this point, we don't have any idea how it will compare to AtomicMap/FineGrainedAtomicMap from a transaction isolation or performance perspective. BTW, do we really need the group name to be a String? > > A good way to prove that the grouping API is a proper replacement for the atomic maps would be to replace the usage of atomic maps in the Tree module with the grouping API. Unless we plan to drop the Tree module completely? Tree was only ever meant as a bridge for JBC users to move to Infinispan. Paul F et al tried to build HTTP sessions on top of that, it didn?t work. Then they tried to do it on top of Atomic Maps, and it didn?t work either, and finally they?re using grouping and seems to work? Cheers, > > Cheers > Dan > > > > On Wed, Jan 22, 2014 at 2:45 PM, Mircea Markus wrote: > > On Jan 21, 2014, at 8:42 PM, Vladimir Blagojevic wrote: > > > I agree with Erik here. Deltas are used in M/R and I've never detected > > any problems so far. > > On 1/21/2014, 1:39 PM, Erik Salter wrote: > >> Please don't remove the Delta stuff. That's quite useful, especially for > >> large collections. > > +1 to keep DeltaAware. Thanks for the feedbak > > >> > >> Erik > >> > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From sanne at infinispan.org Mon Feb 10 04:57:51 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Mon, 10 Feb 2014 09:57:51 +0000 Subject: [infinispan-dev] Dropping AtomicMap/FineGrainedAtomicMap In-Reply-To: <14F07DD7-8432-4E3E-B758-8F8AE5956FE8@redhat.com> References: <52DD0961.90600@infinispan.org> <52DEDBCF.7030204@redhat.com> <1EB0E9C8-AFD2-4172-874F-25BC2B12C6C4@redhat.com> <14F07DD7-8432-4E3E-B758-8F8AE5956FE8@redhat.com> Message-ID: On 10 February 2014 09:02, Galder Zamarre?o wrote: > > On 27 Jan 2014, at 11:27, Dan Berindei wrote: > >> I think it's way too early to discuss removing FineGrainedAtomicMap and AtomicMap, as long as we don't have a concrete alternative with similar properties. > > You have a point there, but we can?t ignore the feedback that says that atomic maps are not being used because they are buggy, and instead they are using grouping. Let's not generalize too much, some are still doing the opposite and have commented on their good reasons ;-) > > Deeply, I think we have two ways of doing the same thing, which is confusing from my POV, and one of them is not being used enough, or we?re not fixing the stuff there. +1 but since as you say there is confusion, I'm not sure if they really are the same thing. I've asked for a detailed comparison but the discussion derailed. It would probably help a lot if someone from the Infinispan core team would reimplement the FGAM API on top of Grouping, making sure to guarantee the same semantics also in terms of concurrency, isolation and acidity. That would provide the implementation cleanup you'd all love, a migration path, and probably some deeper considerations on their differences; I also suspect there would be some roadblocks, potentially subtle differences which could then be better documented? > > Regardless of whether it?s too early or not, this email is trying to spark a consolidation of the two technologies into a single solution that works for everyone and we maintained it actively :) > >> Cache.getGroup(groupName) is just a method name at this point, we don't have any idea how it will compare to AtomicMap/FineGrainedAtomicMap from a transaction isolation or performance perspective. BTW, do we really need the group name to be a String? >> >> A good way to prove that the grouping API is a proper replacement for the atomic maps would be to replace the usage of atomic maps in the Tree module with the grouping API. Unless we plan to drop the Tree module completely? > > Tree was only ever meant as a bridge for JBC users to move to Infinispan. Paul F et al tried to build HTTP sessions on top of that, it didn?t work. Then they tried to do it on top of Atomic Maps, and it didn?t work either, and finally they?re using grouping and seems to work? I don't think that proves that Atomic Maps where not working, if any it's a statement that grouping is a better fit for this specific use case? BTW having a use case which matches way better that the other just highlights that this is no duplicate functionality, but rather quite different stuff. >From an Hibernate OGM perspective it would be great to have some more stability in not so old APIs, at least until there's a clearly documented migration to grouping. Sanne > > Cheers, > >> >> Cheers >> Dan >> >> >> >> On Wed, Jan 22, 2014 at 2:45 PM, Mircea Markus wrote: >> >> On Jan 21, 2014, at 8:42 PM, Vladimir Blagojevic wrote: >> >> > I agree with Erik here. Deltas are used in M/R and I've never detected >> > any problems so far. >> > On 1/21/2014, 1:39 PM, Erik Salter wrote: >> >> Please don't remove the Delta stuff. That's quite useful, especially for >> >> large collections. >> >> +1 to keep DeltaAware. Thanks for the feedbak >> >> >> >> >> Erik >> >> >> > >> > _______________________________________________ >> > infinispan-dev mailing list >> > infinispan-dev at lists.jboss.org >> > https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> Cheers, >> -- >> Mircea Markus >> Infinispan lead (www.infinispan.org) >> >> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz > > Project Lead, Escalante > http://escalante.io > > Engineer, Infinispan > http://infinispan.org > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From tomaz.cerar at gmail.com Mon Feb 10 04:59:27 2014 From: tomaz.cerar at gmail.com (=?UTF-8?B?VG9tYcW+IENlcmFy?=) Date: Mon, 10 Feb 2014 10:59:27 +0100 Subject: [infinispan-dev] [wildfly-dev] Wildfly's build/lib.xml behaves unexpectedly with JDK8 In-Reply-To: <2771A788-3C19-4E11-9668-AE79469EEC36@redhat.com> References: <2771A788-3C19-4E11-9668-AE79469EEC36@redhat.com> Message-ID: Can you send PR with a fix? On Mon, Feb 10, 2014 at 9:34 AM, Galder Zamarre?o wrote: > Actually, split/join does not work with JDK7. The following code seems to > work with both: > > name = name.split(".").join("/"); > if (name) { > self.log("Use JDK8 method to build module names"); > } else { > name = attributes.get("name"); > name = name.replace(".", "/"); > self.log("Use JDK7 method to build module names"); > } > > Cheers, > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140210/f9bdbbe7/attachment-0001.html From mmarkus at redhat.com Mon Feb 10 06:12:06 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Mon, 10 Feb 2014 11:12:06 +0000 Subject: [infinispan-dev] Dropping AtomicMap/FineGrainedAtomicMap In-Reply-To: References: <52DD0961.90600@infinispan.org> <52DEDBCF.7030204@redhat.com> <1EB0E9C8-AFD2-4172-874F-25BC2B12C6C4@redhat.com> <14F07DD7-8432-4E3E-B758-8F8AE5956FE8@redhat.com> Message-ID: <4D0AD0A0-44E8-41DF-8201-AB75DA1D2BAE@redhat.com> Dropping the FGAM API is just an idea and there were valid concerns for not doing it immediately. Indeed the reason it was considered for removal is in order not to keep around two APIs that do the same thing. That is if really do the same thing. As a first step would be good to enhance the grouping API[1] to support group handling methods. Then see if grouping works for the (FG)AM users: if it does, we can drop FGAM. If it doesn't we'll fix FGAM. [1] https://issues.jboss.org/browse/ISPN-3981 On Feb 10, 2014, at 9:57 AM, Sanne Grinovero wrote: > On 10 February 2014 09:02, Galder Zamarre?o wrote: >> >> On 27 Jan 2014, at 11:27, Dan Berindei wrote: >> >>> I think it's way too early to discuss removing FineGrainedAtomicMap and AtomicMap, as long as we don't have a concrete alternative with similar properties. >> >> You have a point there, but we can?t ignore the feedback that says that atomic maps are not being used because they are buggy, and instead they are using grouping. > > Let's not generalize too much, some are still doing the opposite and > have commented on their good reasons ;-) > >> >> Deeply, I think we have two ways of doing the same thing, which is confusing from my POV, and one of them is not being used enough, or we?re not fixing the stuff there. > > +1 but since as you say there is confusion, I'm not sure if they > really are the same thing. I've asked for a detailed comparison but > the discussion derailed. It would probably help a lot if someone from > the Infinispan core team would reimplement the FGAM API on top of > Grouping, making sure to guarantee the same semantics also in terms of > concurrency, isolation and acidity. > That would provide the implementation cleanup you'd all love, a > migration path, and probably some deeper considerations on their > differences; I also suspect there would be some roadblocks, > potentially subtle differences which could then be better documented? > >> >> Regardless of whether it?s too early or not, this email is trying to spark a consolidation of the two technologies into a single solution that works for everyone and we maintained it actively :) >> >>> Cache.getGroup(groupName) is just a method name at this point, we don't have any idea how it will compare to AtomicMap/FineGrainedAtomicMap from a transaction isolation or performance perspective. BTW, do we really need the group name to be a String? >>> >>> A good way to prove that the grouping API is a proper replacement for the atomic maps would be to replace the usage of atomic maps in the Tree module with the grouping API. Unless we plan to drop the Tree module completely? >> >> Tree was only ever meant as a bridge for JBC users to move to Infinispan. Paul F et al tried to build HTTP sessions on top of that, it didn?t work. Then they tried to do it on top of Atomic Maps, and it didn?t work either, and finally they?re using grouping and seems to work? > > I don't think that proves that Atomic Maps where not working, if any > it's a statement that grouping is a better fit for this specific use > case? > BTW having a use case which matches way better that the other just > highlights that this is no duplicate functionality, but rather quite > different stuff. > > From an Hibernate OGM perspective it would be great to have some more > stability in not so old APIs, at least until there's a clearly > documented migration to grouping. > > Sanne > >> >> Cheers, >> >>> >>> Cheers >>> Dan >>> >>> >>> >>> On Wed, Jan 22, 2014 at 2:45 PM, Mircea Markus wrote: >>> >>> On Jan 21, 2014, at 8:42 PM, Vladimir Blagojevic wrote: >>> >>>> I agree with Erik here. Deltas are used in M/R and I've never detected >>>> any problems so far. >>>> On 1/21/2014, 1:39 PM, Erik Salter wrote: >>>>> Please don't remove the Delta stuff. That's quite useful, especially for >>>>> large collections. >>> >>> +1 to keep DeltaAware. Thanks for the feedbak >>> >>>>> >>>>> Erik >>>>> >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> Cheers, >>> -- >>> Mircea Markus >>> Infinispan lead (www.infinispan.org) >>> >>> >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> -- >> Galder Zamarre?o >> galder at redhat.com >> twitter.com/galderz >> >> Project Lead, Escalante >> http://escalante.io >> >> Engineer, Infinispan >> http://infinispan.org >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From ttarrant at redhat.com Mon Feb 10 08:24:16 2014 From: ttarrant at redhat.com (Tristan Tarrant) Date: Mon, 10 Feb 2014 14:24:16 +0100 Subject: [infinispan-dev] UI-Portlet-Plugins In-Reply-To: <2D562768-0730-44F8-B937-23DED5E26557@redhat.com> References: <2D562768-0730-44F8-B937-23DED5E26557@redhat.com> Message-ID: <52F8D300.6030401@redhat.com> Hi Heiko, adding infinispan-dev. Thanks for taking the time to investigate this. One of the things that would need to be "exposed" to such portlets is the ability to link to RHQ views/portlets (e.g. go to a specific service view) so that "drilling-down" would show the appropriate detailed node. Additionally we would like to provide RHQ-specific configuration when installing our "server" plugin, such as cache/containers dynagroups, maybe even a custom initial dashboard. Can it be done ? Tristan On 02/08/2014 07:43 PM, Heiko W.Rupp wrote: > Hey, > > after talking with Tristan Tarrant from Infinispan I got the idea, that we could create a generic Portlet, that > gets its content data as HTML from a server plugin. The server plugin then has access to all the server logic > to do its task and can e.g. compute various stats of an Infinispan cluster. > > The following drawing illustrates that idea: > > > > > Instances of the portlet will call to the selected server plugin and invoking a well known "interface" like "getMessage". > This message will then do the processing and return a HTML-snippet (not a full page), which is then displayed > inside the portlet window. > > Attached are two screen shots from such a portlet + some PoC code. > > > > > This is created in the backend via (abbreviated) > > complexResults.put(new PropertySimple("results", "

Hello World

Welcome to RHQ
Have FUN
Current date: " + date)); > > This is the "generic" config screen: > > > > > The drop down shows the list of plugins available. > > In this PoC, the plugin writer is responsible for creating sane HTML, > if we decided to put that into RHQ, we may want to do some additional > sanitation. I also have no idea about styling the inner content. > > While this is probably not the way for the (long term) future, at least > the backend plugins can be re-used if we move to an Angular-based UI, > so this investment would not be lost. > > Heiko > > > > > From ttarrant at redhat.com Mon Feb 10 11:54:02 2014 From: ttarrant at redhat.com (Tristan Tarrant) Date: Mon, 10 Feb 2014 17:54:02 +0100 Subject: [infinispan-dev] Remote Query improvements Message-ID: <52F9042A.3010205@redhat.com> Hi everybody, last week I developed a simple application using Remote Query, and ran into a few issues. Some of them are just technical hurdles, while others have to do with the complexity of the developer experience. Here they are for open discussion: - the schemas registry should be persistent. Alternatively being able to either specify the ProtoBuf schema from the configuration in the server subsystem or use server's deployment processor to "deploy" schemas. - the server should store the single protobuf source schemas to allow for easy inspection/update of each using our management tools. The server itself should then compile the protobuf schemas into the binary representation when any of the source schemas changes. This would require a Java implementation of the ProtoBuf schema compiler, which wouldn't probably be too hard to do with Antlr. - we need to be able to annotate single protobuf fields for indexing (probably by using specially-formatted comments, a la doclets) to avoid indexing all of the fields - since remote query is already imbued with JPA in some form, an interesting project would be to implement a JPA annotation processor which can produce a set of ProtoBuf schemas from JPA-annotated classes. - on top of the above, a ProtoBuf marshaller/unmarshaller which can use the JPA entities directly. Tristan From mmarkus at redhat.com Mon Feb 10 12:34:29 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Mon, 10 Feb 2014 17:34:29 +0000 Subject: [infinispan-dev] Remote Query improvements In-Reply-To: <52F9042A.3010205@redhat.com> References: <52F9042A.3010205@redhat.com> Message-ID: <7A6FCCEF-71A9-4C22-A907-A963BF5B4E1F@redhat.com> On Feb 10, 2014, at 4:54 PM, Tristan Tarrant wrote: > - since remote query is already imbued with JPA in some form, an > interesting project would be to implement a JPA annotation processor > which can produce a set of ProtoBuf schemas from JPA-annotated classes. > - on top of the above, a ProtoBuf marshaller/unmarshaller which can use > the JPA entities directly. I think it would be even more useful to infer the protbuf schema from the protostream marshaller: the marshaller is required in order to serialize object into the proto format and has the advantage that it works even without Java. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From emmanuel at hibernate.org Mon Feb 10 13:14:18 2014 From: emmanuel at hibernate.org (Emmanuel Bernard) Date: Mon, 10 Feb 2014 18:14:18 +0000 Subject: [infinispan-dev] Remote Query improvements In-Reply-To: <52F9042A.3010205@redhat.com> References: <52F9042A.3010205@redhat.com> Message-ID: <20140210181418.GC84404@hibernate.org> On Mon 2014-02-10 17:54, Tristan Tarrant wrote: > Hi everybody, > > last week I developed a simple application using Remote Query, and ran > into a few issues. Some of them are just technical hurdles, while others > have to do with the complexity of the developer experience. Here they > are for open discussion: > > - the schemas registry should be persistent. Alternatively being able to > either specify the ProtoBuf schema from the configuration > in the server subsystem or use server's deployment processor to "deploy" > schemas. > - the server should store the single protobuf source schemas to allow > for easy inspection/update of each using our management tools. The > server itself should then compile the protobuf schemas into the binary > representation when any of the source schemas changes. This would > require a Java implementation of the ProtoBuf schema compiler, which > wouldn't probably be too hard to do with Antlr. > - we need to be able to annotate single protobuf fields for indexing > (probably by using specially-formatted comments, a la doclets) to avoid > indexing all of the fields > - since remote query is already imbued with JPA in some form, an > interesting project would be to implement a JPA annotation processor > which can produce a set of ProtoBuf schemas from JPA-annotated classes. > - on top of the above, a ProtoBuf marshaller/unmarshaller which can use > the JPA entities directly. I already argued in the last few weeks in the same vein but to me reusing JPA's metadata or API and support 15% of it is going to be misleading and confusing for the user. Plus it's Java only. But I agree that by making things use a hand written hard schema we make things suck equally for all client platforms :) > > Tristan > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From anistor at redhat.com Mon Feb 10 13:43:38 2014 From: anistor at redhat.com (Adrian Nistor) Date: Mon, 10 Feb 2014 20:43:38 +0200 Subject: [infinispan-dev] Remote Query improvements In-Reply-To: <7A6FCCEF-71A9-4C22-A907-A963BF5B4E1F@redhat.com> References: <52F9042A.3010205@redhat.com> <7A6FCCEF-71A9-4C22-A907-A963BF5B4E1F@redhat.com> Message-ID: <52F91DDA.4070800@redhat.com> The idea of auto-generating protobuf schemas based on the marshaller code was briefly mentioned last time we met in Palma. I would not qualify it as impossible to implement, but it would certainly be hacky and leads to more trouble than it's worth. A lot of info is missing from the marshaller code (API calls) precisely because it is not normally needed, being provided by the schema already. Now trying to go backwards means we'll have to 'invent' that metadata using some common sense (examples: which field is required vs optional, which field is indexable, indexing options, etc). Too many options. I bet the notion of 'common sense' would quickly need to be configured somehow, for uncommon use cases :). But that's why we have protobuf schemas for. Plus, to run a marshaller for inferring the schema you'll first need a prototypical instance of your entity. Where from? So no, -1, now I have serious concerns about this, even though I initially nodded in approval. And that would work only for Java anyway, because the marshaller and the schema-infering-process needs to run on the server side. On 02/10/2014 07:34 PM, Mircea Markus wrote: > On Feb 10, 2014, at 4:54 PM, Tristan Tarrant wrote: > >> - since remote query is already imbued with JPA in some form, an >> interesting project would be to implement a JPA annotation processor >> which can produce a set of ProtoBuf schemas from JPA-annotated classes. >> - on top of the above, a ProtoBuf marshaller/unmarshaller which can use >> the JPA entities directly. > I think it would be even more useful to infer the protbuf schema from the protostream marshaller: the marshaller is required in order to serialize object into the proto format and has the advantage that it works even without Java. > > Cheers, From anistor at redhat.com Mon Feb 10 13:49:54 2014 From: anistor at redhat.com (Adrian Nistor) Date: Mon, 10 Feb 2014 20:49:54 +0200 Subject: [infinispan-dev] Remote Query improvements In-Reply-To: <52F9042A.3010205@redhat.com> References: <52F9042A.3010205@redhat.com> Message-ID: <52F91F52.1010805@redhat.com> Most of this is in jira already, so it would be good to comment there. #1 = ISPN-3747 & ISPN-3926 #2 = ISPN-3480 (wording is not the same, but it's the same issue) #3 = ISPN-3718 #4 = ???? On 02/10/2014 06:54 PM, Tristan Tarrant wrote: > Hi everybody, > > last week I developed a simple application using Remote Query, and ran > into a few issues. Some of them are just technical hurdles, while others > have to do with the complexity of the developer experience. Here they > are for open discussion: > > - the schemas registry should be persistent. Alternatively being able to > either specify the ProtoBuf schema from the configuration > in the server subsystem or use server's deployment processor to "deploy" > schemas. > - the server should store the single protobuf source schemas to allow > for easy inspection/update of each using our management tools. The > server itself should then compile the protobuf schemas into the binary > representation when any of the source schemas changes. This would > require a Java implementation of the ProtoBuf schema compiler, which > wouldn't probably be too hard to do with Antlr. > - we need to be able to annotate single protobuf fields for indexing > (probably by using specially-formatted comments, a la doclets) to avoid > indexing all of the fields > - since remote query is already imbued with JPA in some form, an > interesting project would be to implement a JPA annotation processor > which can produce a set of ProtoBuf schemas from JPA-annotated classes. > - on top of the above, a ProtoBuf marshaller/unmarshaller which can use > the JPA entities directly. > > Tristan > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From ttarrant at redhat.com Tue Feb 11 02:55:30 2014 From: ttarrant at redhat.com (Tristan Tarrant) Date: Tue, 11 Feb 2014 08:55:30 +0100 Subject: [infinispan-dev] HotRod near caches Message-ID: <52F9D772.8080006@redhat.com> Hi people, this is a bit of a dump of ideas for getting our HotRod client in shape for supporting near caches: - RemoteCaches should have an optional internal cache. This cache should probably be some form of bounded expiration-aware hashmap which would serve as a local copy of data retrieved over the wire. In the past we have advocated the use of combining an EmbeddedCacheManager with a RemoteCacheStore to achieve this, but this is only applicable to Java clients, while we need to think of a solution for our other clients too. - Once remote listeners are in place, a RemoteCache would automatically invalidate entries in the near-cache. - Remote Query should "pass-through" the near-cache, so that entries retrieved from a query would essentially be cached locally following the same semantics. This can be achieved by having the QUERY verb return just the set of matching keys instead of the whole entries - Optionally we can even think about a query cache which would hash the query DSL and store the resulting keys locally so that successive invocations of a cached query wouldn't go through the wire. Matching this with invalidation is probably a tad more complex, and I'd probably avoid going down that path. Tristan From mmarkus at redhat.com Tue Feb 11 04:18:50 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Tue, 11 Feb 2014 09:18:50 +0000 Subject: [infinispan-dev] Remote Query improvements In-Reply-To: <52F91DDA.4070800@redhat.com> References: <52F9042A.3010205@redhat.com> <7A6FCCEF-71A9-4C22-A907-A963BF5B4E1F@redhat.com> <52F91DDA.4070800@redhat.com> Message-ID: I guess I put the solution before the problem, but basically where I want to get to is to allow people to write protostream marshallers without requiring them to write the proto file. This would mean the same effort for java users to write either JBMAR marshallers or proto marshallers. If that's possible and people and protostream is as fast as JBMAR (do you have any perf numbers on that BTW?) then we can suggest people use proto marshallers by default. On Feb 10, 2014, at 6:43 PM, Adrian Nistor wrote: > The idea of auto-generating protobuf schemas based on the marshaller > code was briefly mentioned last time we met in Palma. I would not > qualify it as impossible to implement, but it would certainly be hacky > and leads to more trouble than it's worth. > > A lot of info is missing from the marshaller code (API calls) precisely > because it is not normally needed, being provided by the schema already. > Now trying to go backwards means we'll have to 'invent' that metadata > using some common sense (examples: which field is required vs optional, > which field is indexable, indexing options, etc). Too many options. I > bet the notion of 'common sense' would quickly need to be configured > somehow, for uncommon use cases :). But that's why we have protobuf > schemas for. Plus, to run a marshaller for inferring the schema you'll > first need a prototypical instance of your entity. Where from? So no, > -1, now I have serious concerns about this, even though I initially > nodded in approval. > > And that would work only for Java anyway, because the marshaller and the > schema-infering-process needs to run on the server side. > > > On 02/10/2014 07:34 PM, Mircea Markus wrote: >> On Feb 10, 2014, at 4:54 PM, Tristan Tarrant wrote: >> >>> - since remote query is already imbued with JPA in some form, an >>> interesting project would be to implement a JPA annotation processor >>> which can produce a set of ProtoBuf schemas from JPA-annotated classes. >>> - on top of the above, a ProtoBuf marshaller/unmarshaller which can use >>> the JPA entities directly. >> I think it would be even more useful to infer the protbuf schema from the protostream marshaller: the marshaller is required in order to serialize object into the proto format and has the advantage that it works even without Java. >> >> Cheers, > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From sanne at infinispan.org Tue Feb 11 08:53:42 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Tue, 11 Feb 2014 13:53:42 +0000 Subject: [infinispan-dev] Remote Query improvements In-Reply-To: References: <52F9042A.3010205@redhat.com> <7A6FCCEF-71A9-4C22-A907-A963BF5B4E1F@redhat.com> <52F91DDA.4070800@redhat.com> Message-ID: In my experience people express a bit of confusion on their first impact with our query/indexing technology as there is a strong conceptual difference compared to the more familiar relational databases. The primary WTF effect is usually on the fact that when a field included in a query is not indexed the query is not just "slower" but it won't work at all; we have plans to compensate for that in the scope of the simplified DSL (and remote queries) to fall-back to an ad-hoc crafted map/reduce task which essentially implements a table scan, but I'm thinking now that we should take it a step beyond and do better. Another source of trouble is that the fields need not just be indexed, but also need to be indexed with the correct attributes depending on the kind of query you mean to run: essentially this leads in practice for people to need to have a very clear idea of which queries they will be running.. and over the lifecycle of a complex application this might become a complex to maintain, especially if you want to keep peak performance you need to regularly cleanup indexing flags which are related to queries no longer in use. Nowadays we do some kind of validation of the queries to catch situations in which these can't possibly match the metadata we have about indexed fields, but this validation needs to be quite permissive to not prevent rare and unusual advanced queries which are technically valid, although potential candidates for a strong misunderstanding. This all leads to a single clean solution: if we start from a declarative set of query definitions, in which each query has the specific extra metadata needed about their runtime execution (e.g. using a specific Analyzer on a specific field, query time boosting, hints about good candidates for filters), then we can actually get rid of the need to define the indexing attributes at the schema level. It would still be useful to maintain the current explicit control of the indexing process: for example you might be building an index which is consumed by a different application, or you simply know about an advanced data mining feature that you're building on a custom Query/Filtering/Collector which bypasses our helpful but constraining query definition strategy. Following this proposal, we wouldn't need to bother with extending the document metadata with indexing annotations (annotations as a non-Java term) but we'd need to focus on a way to pre-declare all queries users intend to use. I admit that this might sound limiting, but consider: - serialization of queries and all their potential advanced options (not many in the remote case so far) needs to be done anyway, and needs to be language agnostic anyway. - we'd be able to better validate complex query structures - when a user registers/unregisters "query definitions" from the server we have a better opportunity to: -- cache parsing -- cache execution plans -- track metrics to improve on the execution plans -- adapt the indexes automatically (immediatelly or warn that it needs to be done before the query is runnable) -- I suspect it would be easier to match queries with security ACLs, both in terms of execution permission but also in terms of scoping on a subset of the visible data (essentially I'm thinking that the execution plans could be more advanced and prepare/hint about filter caching and even adapt the indexing structure to better match the security constraints). # Essentially We need to expose a standard, cross language and declarative form of the queries the user intends to run from remote, and provide a way to register these queries on the server, where registration/deregistration triggers certain actions. This would not be mandatory as you can still ask for ad-hoc queries, but these will only take advantage of indexes which happen to exist because of some registered query, or of no index at all. I'm proposing for the format to be - initially to support only the simple functions exposed by the remote DSL - a simple query String, essentially the HQL we already use but obviously limited to the base constraints we need. This language will probably evolve in future *if* we ever want to expose also fulltext over this.. For the embedded query world - less of a priority - we could start experimenting with more richer and typesafe query definitions, to also provide the benefit listed above. -- Sanne On 11 February 2014 09:18, Mircea Markus wrote: > I guess I put the solution before the problem, but basically where I want to get to is to allow people to write protostream marshallers without requiring them to write the proto file. This would mean the same effort for java users to write either JBMAR marshallers or proto marshallers. If that's possible and people and protostream is as fast as JBMAR (do you have any perf numbers on that BTW?) then we can suggest people use proto marshallers by default. > > On Feb 10, 2014, at 6:43 PM, Adrian Nistor wrote: > >> The idea of auto-generating protobuf schemas based on the marshaller >> code was briefly mentioned last time we met in Palma. I would not >> qualify it as impossible to implement, but it would certainly be hacky >> and leads to more trouble than it's worth. >> >> A lot of info is missing from the marshaller code (API calls) precisely >> because it is not normally needed, being provided by the schema already. >> Now trying to go backwards means we'll have to 'invent' that metadata >> using some common sense (examples: which field is required vs optional, >> which field is indexable, indexing options, etc). Too many options. I >> bet the notion of 'common sense' would quickly need to be configured >> somehow, for uncommon use cases :). But that's why we have protobuf >> schemas for. Plus, to run a marshaller for inferring the schema you'll >> first need a prototypical instance of your entity. Where from? So no, >> -1, now I have serious concerns about this, even though I initially >> nodded in approval. >> >> And that would work only for Java anyway, because the marshaller and the >> schema-infering-process needs to run on the server side. >> >> >> On 02/10/2014 07:34 PM, Mircea Markus wrote: >>> On Feb 10, 2014, at 4:54 PM, Tristan Tarrant wrote: >>> >>>> - since remote query is already imbued with JPA in some form, an >>>> interesting project would be to implement a JPA annotation processor >>>> which can produce a set of ProtoBuf schemas from JPA-annotated classes. >>>> - on top of the above, a ProtoBuf marshaller/unmarshaller which can use >>>> the JPA entities directly. >>> I think it would be even more useful to infer the protbuf schema from the protostream marshaller: the marshaller is required in order to serialize object into the proto format and has the advantage that it works even without Java. >>> >>> Cheers, >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From mmarkus at redhat.com Tue Feb 11 09:43:09 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Tue, 11 Feb 2014 14:43:09 +0000 Subject: [infinispan-dev] Remote Query improvements In-Reply-To: References: <52F9042A.3010205@redhat.com> <7A6FCCEF-71A9-4C22-A907-A963BF5B4E1F@redhat.com> <52F91DDA.4070800@redhat.com> Message-ID: <54B74EFB-1AE2-4E8A-BE21-63F2003CE6AD@redhat.com> On Feb 11, 2014, at 1:53 PM, Sanne Grinovero wrote: > In my experience people express a bit of confusion on their first > impact with our query/indexing technology as there is a strong > conceptual difference compared to the more familiar relational > databases. > > The primary WTF effect is usually on the fact that when a field > included in a query is not indexed the query is not just "slower" but > it won't work at all; we have plans to compensate for that in the > scope of the simplified DSL (and remote queries) to fall-back to an > ad-hoc crafted map/reduce task which essentially implements a table > scan, but I'm thinking now that we should take it a step beyond and do > better. > > Another source of trouble is that the fields need not just be indexed, > but also need to be indexed with the correct attributes depending on > the kind of query you mean to run: Can you please elaborate a bit on this? An example in which an indexed field would not be caught by a running entry would be nice :-) > essentially this leads in practice > for people to need to have a very clear idea of which queries they > will be running.. and over the lifecycle of a complex application this > might become a complex to maintain, especially if you want to keep > peak performance you need to regularly cleanup indexing flags which > are related to queries no longer in use. > > Nowadays we do some kind of validation of the queries to catch > situations in which these can't possibly match the metadata we have > about indexed fields, but this validation needs to be quite permissive > to not prevent rare and unusual advanced queries which are technically > valid, although potential candidates for a strong misunderstanding. > > This all leads to a single clean solution: if we start from a > declarative set of query definitions, in which each query has the > specific extra metadata needed about their runtime execution (e.g. > using a specific Analyzer on a specific field, query time boosting, > hints about good candidates for filters), then we can actually get rid > of the need to define the indexing attributes at the schema level. So the application would need to define the type of queries it will run statically for the entire grid? Doesn't look like a familiar model for DB users either. > > It would still be useful to maintain the current explicit control of > the indexing process: for example you might be building an index which > is consumed by a different application, or you simply know about an > advanced data mining feature that you're building on a custom > Query/Filtering/Collector which bypasses our helpful but constraining > query definition strategy. > > Following this proposal, we wouldn't need to bother with extending the > document metadata with indexing annotations (annotations as a non-Java > term) but we'd need to focus on a way to pre-declare all queries users > intend to use. > > I admit that this might sound limiting, but consider: > - serialization of queries and all their potential advanced options > (not many in the remote case so far) needs to be done anyway, and > needs to be language agnostic anyway. - would this be better than using HQL strings? > - we'd be able to better validate complex query structures > - when a user registers/unregisters "query definitions" from the > server we have a better opportunity to: > -- cache parsing > -- cache execution plans > -- track metrics to improve on the execution plans > -- adapt the indexes automatically (immediatelly or warn that it > needs to be done before the query is runnable) > -- I suspect it would be easier to match queries with security ACLs, > both in terms of execution permission but also in terms of scoping on > a subset of the visible data (essentially I'm thinking that the > execution plans could be more advanced and prepare/hint about filter > caching and even adapt the indexing structure to better match the > security constraints). > > # Essentially > > We need to expose a standard, cross language and declarative form of > the queries the user intends to run from remote, and provide a way to > register these queries on the server, where > registration/deregistration triggers certain actions. > > This would not be mandatory as you can still ask for ad-hoc queries, > but these will only take advantage of indexes which happen to exist > because of some registered query, or of no index at all. > > I'm proposing for the format to be - initially to support only the > simple functions exposed by the remote DSL - a simple query String, > essentially the HQL we already use but obviously limited to the base > constraints we need. This language will probably evolve in future *if* > we ever want to expose also fulltext over this.. > > For the embedded query world - less of a priority - we could start > experimenting with more richer and typesafe query definitions, to also > provide the benefit listed above. I think specifying the querying instead of requiring the user to index specific fields would result in a better user experience. OTOH I think it's not a common model for indexing data, as people are more accustomed to indexing specific fields. For me would also be good to know the limitations that current indexing model has, as at this stage I'm not that familiar with that. > > -- Sanne > > > > > > > On 11 February 2014 09:18, Mircea Markus wrote: >> I guess I put the solution before the problem, but basically where I want to get to is to allow people to write protostream marshallers without requiring them to write the proto file. This would mean the same effort for java users to write either JBMAR marshallers or proto marshallers. If that's possible and people and protostream is as fast as JBMAR (do you have any perf numbers on that BTW?) then we can suggest people use proto marshallers by default. >> >> On Feb 10, 2014, at 6:43 PM, Adrian Nistor wrote: >> >>> The idea of auto-generating protobuf schemas based on the marshaller >>> code was briefly mentioned last time we met in Palma. I would not >>> qualify it as impossible to implement, but it would certainly be hacky >>> and leads to more trouble than it's worth. >>> >>> A lot of info is missing from the marshaller code (API calls) precisely >>> because it is not normally needed, being provided by the schema already. >>> Now trying to go backwards means we'll have to 'invent' that metadata >>> using some common sense (examples: which field is required vs optional, >>> which field is indexable, indexing options, etc). Too many options. I >>> bet the notion of 'common sense' would quickly need to be configured >>> somehow, for uncommon use cases :). But that's why we have protobuf >>> schemas for. Plus, to run a marshaller for inferring the schema you'll >>> first need a prototypical instance of your entity. Where from? So no, >>> -1, now I have serious concerns about this, even though I initially >>> nodded in approval. >>> >>> And that would work only for Java anyway, because the marshaller and the >>> schema-infering-process needs to run on the server side. >>> >>> >>> On 02/10/2014 07:34 PM, Mircea Markus wrote: >>>> On Feb 10, 2014, at 4:54 PM, Tristan Tarrant wrote: >>>> >>>>> - since remote query is already imbued with JPA in some form, an >>>>> interesting project would be to implement a JPA annotation processor >>>>> which can produce a set of ProtoBuf schemas from JPA-annotated classes. >>>>> - on top of the above, a ProtoBuf marshaller/unmarshaller which can use >>>>> the JPA entities directly. >>>> I think it would be even more useful to infer the protbuf schema from the protostream marshaller: the marshaller is required in order to serialize object into the proto format and has the advantage that it works even without Java. >>>> >>>> Cheers, >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> Cheers, >> -- >> Mircea Markus >> Infinispan lead (www.infinispan.org) >> >> >> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Wed Feb 12 05:40:43 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Wed, 12 Feb 2014 10:40:43 +0000 Subject: [infinispan-dev] ClusteredListeners: message delivered twice Message-ID: <6F2DD24C-4712-48AF-ABCD-7CE280089875@redhat.com> Hey Will, With the current design, during a topology change, an event might be delivered twice to a cluster listener. I think we might be able to identify such situations (a node becomes a key owner as a result of the topology change) and add this information to the event we send, e.g. a flag "potentiallyDuplicate" or something like that. Event implementors might be able to make good use of this, e.g. checking their internal state if an event is redelivered or not. What do you think? Are there any other more-than-once delivery situations we can't keep track of? Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From sanne at infinispan.org Thu Feb 13 13:24:35 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Thu, 13 Feb 2014 18:24:35 +0000 Subject: [infinispan-dev] HotRod near caches In-Reply-To: <52F9D772.8080006@redhat.com> References: <52F9D772.8080006@redhat.com> Message-ID: On 11 February 2014 07:55, Tristan Tarrant wrote: > Hi people, > > this is a bit of a dump of ideas for getting our HotRod client in shape > for supporting near caches: > > - RemoteCaches should have an optional internal cache. This cache should > probably be some form of bounded expiration-aware hashmap which would > serve as a local copy of data retrieved over the wire. In the past we > have advocated the use of combining an EmbeddedCacheManager with a > RemoteCacheStore to achieve this, but this is only applicable to Java > clients, while we need to think of a solution for our other clients too. True we need a general solution, but only as a design: we can still think of using an EmbeddedCacheManager as an implementation detail for the JVM based clients right? For other languages, I'd probably pick a mature and well known cache from each language. We'd probably want to mask Flag usage: for example SKIP_CACHE_LOAD should only apply on the server nodes. Also we'd probably want to verify that a failure of an operation on our "cachestore" is not going to provide misleading messages, or being ignored altogether when running in independent threads. > - Once remote listeners are in place, a RemoteCache would automatically > invalidate entries in the near-cache. This is the point concerning me the most: I suspect there are so many different ways in which this could get out of synch! Essentially let's consider that a client requiring this level of consistency is becoming part of the distributed system. I'm not against doing it, just that I'm having the impression its complexity is being underestimated. > - Remote Query should "pass-through" the near-cache, so that entries > retrieved from a query would essentially be cached locally following the > same semantics. This can be achieved by having the QUERY verb return > just the set of matching keys instead of the whole entries +1, or even better - to avoid multiple roundtrips - we just store the indivual results in the local cache. The downside is that the gathering phase of query results might not be taking advantage of the locally stored individual entries (when they match); the good news is we have a similar case with Hibernate Search/ORM dealing with 2nd level cache, for which we expose an option to get a hint from the user: we could do the same. > - Optionally we can even think about a query cache which would hash the > query DSL and store the resulting keys locally so that successive > invocations of a cached query wouldn't go through the wire. Matching > this with invalidation is probably a tad more complex, and I'd probably > avoid going down that path. I'd agree especially in the first phase, but if needed that is essentially just a continuous query so we can build on top of that. Thanks for starting this! Sanne > > Tristan > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From mmarkus at redhat.com Fri Feb 14 08:07:30 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Fri, 14 Feb 2014 13:07:30 +0000 Subject: [infinispan-dev] HotRod near caches In-Reply-To: References: <52F9D772.8080006@redhat.com> Message-ID: On Feb 13, 2014, at 6:24 PM, Sanne Grinovero wrote: > On 11 February 2014 07:55, Tristan Tarrant wrote: >> Hi people, >> >> this is a bit of a dump of ideas for getting our HotRod client in shape >> for supporting near caches: >> >> - RemoteCaches should have an optional internal cache. This cache should >> probably be some form of bounded expiration-aware hashmap which would >> serve as a local copy of data retrieved over the wire. In the past we >> have advocated the use of combining an EmbeddedCacheManager with a >> RemoteCacheStore to achieve this, but this is only applicable to Java >> clients, while we need to think of a solution for our other clients too. > > True we need a general solution, but only as a design: we can still > think of using an EmbeddedCacheManager as an implementation detail for > the JVM based clients right? > For other languages, I'd probably pick a mature and well known cache > from each language. +1 > > We'd probably want to mask Flag usage: for example SKIP_CACHE_LOAD > should only apply on the server nodes. > > Also we'd probably want to verify that a failure of an operation on > our "cachestore" is not going to provide misleading messages, or being > ignored altogether when running in independent threads. Having the RemoteCacheManager and EmbeddedCacheManager following a common ancestry has caused a lot of confusion in the community, with people trying to replace one with the other and not succeeding. Might be worth splitting them, and then add/keep the relevant flags for HotRod java client only. > >> - Once remote listeners are in place, a RemoteCache would automatically >> invalidate entries in the near-cache. > > This is the point concerning me the most: I suspect there are so many > different ways in which this could get out of synch! > Essentially let's consider that a client requiring this level of > consistency is becoming part of the distributed system. > I'm not against doing it, just that I'm having the impression its > complexity is being underestimated. > >> - Remote Query should "pass-through" the near-cache, so that entries >> retrieved from a query would essentially be cached locally following the >> same semantics. This can be achieved by having the QUERY verb return >> just the set of matching keys instead of the whole entries > > +1, or even better - to avoid multiple roundtrips - we just store the > indivual results in the local cache. > The downside is that the gathering phase of query results might not be > taking advantage of the locally stored individual entries (when they > match); the good news is we have a similar case with Hibernate > Search/ORM dealing with 2nd level cache, for which we expose an option > to get a hint from the user: we could do the same. also this would not work if the queries project data, instead of returning fully fledged entries. > >> - Optionally we can even think about a query cache which would hash the >> query DSL and store the resulting keys locally so that successive >> invocations of a cached query wouldn't go through the wire. Matching >> this with invalidation is probably a tad more complex, and I'd probably >> avoid going down that path. > > I'd agree especially in the first phase, but if needed that is > essentially just a continuous query so we can build on top of that. > > Thanks for starting this! > Sanne > >> >> Tristan >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From vagvaz at gmail.com Fri Feb 14 10:10:55 2014 From: vagvaz at gmail.com (Evangelos Vazaios) Date: Fri, 14 Feb 2014 17:10:55 +0200 Subject: [infinispan-dev] MapReduce limitations and suggestions. Message-ID: <52FE31FF.5050507@gmail.com> Hello everyone, I started using the MapReduce implementation of Infinispan and I came across some possible limitations. Thus, I want to make some suggestions about the MapReduce (MR) implementation of Infinispan. Depending on the algorithm, there might be some memory problems, especially for intermediate results. An example of such a case is group by. Suppose that we have a cluster of 2 nodes with 2 GB available. Let a distributed cache, where simple car objects (id,brand,colour) are stored and the total size of data is 3.5GB. If all objects have the same colour , then all 3.5 GB would go to only one reducer, as a result an OutOfMemoryException will be thrown. To overcome these limitations, I propose to add as parameter the name of the intermediate cache to be used. This will enable the creation of a custom configured cache that deals with the memory limitations. Another feature that I would like to have is to set the name of the output cache. The reasoning behind this is similar to the one mentioned above. I wait for your thoughts on these two suggestions. Regards, Evangelos From ttarrant at redhat.com Fri Feb 14 10:16:05 2014 From: ttarrant at redhat.com (Tristan Tarrant) Date: Fri, 14 Feb 2014 16:16:05 +0100 Subject: [infinispan-dev] MapReduce limitations and suggestions. In-Reply-To: <52FE31FF.5050507@gmail.com> References: <52FE31FF.5050507@gmail.com> Message-ID: <52FE3335.1070806@redhat.com> Hi Evangelos, you might be interested in looking into a current pull request which addresses some (all?) of these issues https://github.com/infinispan/infinispan/pull/2300 Tristan On 14/02/2014 16:10, Evangelos Vazaios wrote: > Hello everyone, > > I started using the MapReduce implementation of Infinispan and I came > across some possible limitations. Thus, I want to make some suggestions > about the MapReduce (MR) implementation of Infinispan. > Depending on the algorithm, there might be some memory problems, > especially for intermediate results. > An example of such a case is group by. Suppose that we have a cluster > of 2 nodes with 2 GB available. Let a distributed cache, where simple > car objects (id,brand,colour) are stored and the total size of data is > 3.5GB. If all objects have the same colour , then all 3.5 GB would go to > only one reducer, as a result an OutOfMemoryException will be thrown. > > To overcome these limitations, I propose to add as parameter the name of > the intermediate cache to be used. This will enable the creation of a > custom configured cache that deals with the memory limitations. > > Another feature that I would like to have is to set the name of the > output cache. The reasoning behind this is similar to the one mentioned > above. > > I wait for your thoughts on these two suggestions. > > Regards, > Evangelos > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > From vblagoje at redhat.com Fri Feb 14 10:54:27 2014 From: vblagoje at redhat.com (Vladimir Blagojevic) Date: Fri, 14 Feb 2014 10:54:27 -0500 Subject: [infinispan-dev] MapReduce limitations and suggestions. In-Reply-To: <52FE3335.1070806@redhat.com> References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com> Message-ID: <52FE3C33.3070107@redhat.com> Tristan, Actually they are not addressed in this pull request but the feature where custom output cache is used instead of results being returned is next in the implementation pipeline. Evangelos, indeed, depending on a reducer function all intermediate KOut/VOut pairs might be moved to a single node. How would custom cache help in this case? Regards, Vladimir On 2/14/2014, 10:16 AM, Tristan Tarrant wrote: > Hi Evangelos, > > you might be interested in looking into a current pull request which > addresses some (all?) of these issues > > https://github.com/infinispan/infinispan/pull/2300 > > Tristan > > On 14/02/2014 16:10, Evangelos Vazaios wrote: >> Hello everyone, >> >> I started using the MapReduce implementation of Infinispan and I came >> across some possible limitations. Thus, I want to make some suggestions >> about the MapReduce (MR) implementation of Infinispan. >> Depending on the algorithm, there might be some memory problems, >> especially for intermediate results. >> An example of such a case is group by. Suppose that we have a cluster >> of 2 nodes with 2 GB available. Let a distributed cache, where simple >> car objects (id,brand,colour) are stored and the total size of data is >> 3.5GB. If all objects have the same colour , then all 3.5 GB would go to >> only one reducer, as a result an OutOfMemoryException will be thrown. >> >> To overcome these limitations, I propose to add as parameter the name of >> the intermediate cache to be used. This will enable the creation of a >> custom configured cache that deals with the memory limitations. >> >> Another feature that I would like to have is to set the name of the >> output cache. The reasoning behind this is similar to the one mentioned >> above. >> >> I wait for your thoughts on these two suggestions. >> >> Regards, >> Evangelos >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From rvansa at redhat.com Mon Feb 17 02:48:28 2014 From: rvansa at redhat.com (Radim Vansa) Date: Mon, 17 Feb 2014 08:48:28 +0100 Subject: [infinispan-dev] MapReduce limitations and suggestions. In-Reply-To: <52FE3C33.3070107@redhat.com> References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com> <52FE3C33.3070107@redhat.com> Message-ID: <5301BECC.7010901@redhat.com> I think that the intermediate cache is not required at all. The M/R algorithm itself can (and should!) run with memory occupied by the result of reduction. The current implementation with Map first and Reduce after that will always have these problems, using a cache for temporary caching the result is only a workaround. The only situation when temporary cache could be useful is when the result grows linearly (or close to that or even more) with the amount of reduced entries. This would be the case for groupBy producing Map> from all entries in cache. Then the task does not scale and should be redesigned anyway, but flushing the results into cache backed by cache store could help. Radim On 02/14/2014 04:54 PM, Vladimir Blagojevic wrote: > Tristan, > > Actually they are not addressed in this pull request but the feature > where custom output cache is used instead of results being returned is > next in the implementation pipeline. > > Evangelos, indeed, depending on a reducer function all intermediate > KOut/VOut pairs might be moved to a single node. How would custom cache > help in this case? > > Regards, > Vladimir > > > On 2/14/2014, 10:16 AM, Tristan Tarrant wrote: >> Hi Evangelos, >> >> you might be interested in looking into a current pull request which >> addresses some (all?) of these issues >> >> https://github.com/infinispan/infinispan/pull/2300 >> >> Tristan >> >> On 14/02/2014 16:10, Evangelos Vazaios wrote: >>> Hello everyone, >>> >>> I started using the MapReduce implementation of Infinispan and I came >>> across some possible limitations. Thus, I want to make some suggestions >>> about the MapReduce (MR) implementation of Infinispan. >>> Depending on the algorithm, there might be some memory problems, >>> especially for intermediate results. >>> An example of such a case is group by. Suppose that we have a cluster >>> of 2 nodes with 2 GB available. Let a distributed cache, where simple >>> car objects (id,brand,colour) are stored and the total size of data is >>> 3.5GB. If all objects have the same colour , then all 3.5 GB would go to >>> only one reducer, as a result an OutOfMemoryException will be thrown. >>> >>> To overcome these limitations, I propose to add as parameter the name of >>> the intermediate cache to be used. This will enable the creation of a >>> custom configured cache that deals with the memory limitations. >>> >>> Another feature that I would like to have is to set the name of the >>> output cache. The reasoning behind this is similar to the one mentioned >>> above. >>> >>> I wait for your thoughts on these two suggestions. >>> >>> Regards, >>> Evangelos >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Radim Vansa JBoss DataGrid QA From etienne.riviere at unine.ch Mon Feb 17 03:18:38 2014 From: etienne.riviere at unine.ch (Etienne Riviere) Date: Mon, 17 Feb 2014 09:18:38 +0100 Subject: [infinispan-dev] MapReduce limitations and suggestions. In-Reply-To: <5301BECC.7010901@redhat.com> References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com> <52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com> Message-ID: Hi Radim, I might misunderstand your suggestion but many M/R jobs actually require to run the two phases one after the other, and henceforth to store the intermediate results somewhere. While some may slightly reduce intermediate memory usage by using a combiner function (e.g., the word-count example), I don?t see how we can avoid intermediate storage altogether. Thanks, Etienne (leads project ? as Evangelos who initiated the thread) On 17 Feb 2014, at 08:48, Radim Vansa wrote: > I think that the intermediate cache is not required at all. The M/R > algorithm itself can (and should!) run with memory occupied by the > result of reduction. The current implementation with Map first and > Reduce after that will always have these problems, using a cache for > temporary caching the result is only a workaround. > > The only situation when temporary cache could be useful is when the > result grows linearly (or close to that or even more) with the amount of > reduced entries. This would be the case for groupBy producing Map List> from all entries in cache. Then the task does not scale and > should be redesigned anyway, but flushing the results into cache backed > by cache store could help. > > Radim > > On 02/14/2014 04:54 PM, Vladimir Blagojevic wrote: >> Tristan, >> >> Actually they are not addressed in this pull request but the feature >> where custom output cache is used instead of results being returned is >> next in the implementation pipeline. >> >> Evangelos, indeed, depending on a reducer function all intermediate >> KOut/VOut pairs might be moved to a single node. How would custom cache >> help in this case? >> >> Regards, >> Vladimir >> >> >> On 2/14/2014, 10:16 AM, Tristan Tarrant wrote: >>> Hi Evangelos, >>> >>> you might be interested in looking into a current pull request which >>> addresses some (all?) of these issues >>> >>> https://github.com/infinispan/infinispan/pull/2300 >>> >>> Tristan >>> >>> On 14/02/2014 16:10, Evangelos Vazaios wrote: >>>> Hello everyone, >>>> >>>> I started using the MapReduce implementation of Infinispan and I came >>>> across some possible limitations. Thus, I want to make some suggestions >>>> about the MapReduce (MR) implementation of Infinispan. >>>> Depending on the algorithm, there might be some memory problems, >>>> especially for intermediate results. >>>> An example of such a case is group by. Suppose that we have a cluster >>>> of 2 nodes with 2 GB available. Let a distributed cache, where simple >>>> car objects (id,brand,colour) are stored and the total size of data is >>>> 3.5GB. If all objects have the same colour , then all 3.5 GB would go to >>>> only one reducer, as a result an OutOfMemoryException will be thrown. >>>> >>>> To overcome these limitations, I propose to add as parameter the name of >>>> the intermediate cache to be used. This will enable the creation of a >>>> custom configured cache that deals with the memory limitations. >>>> >>>> Another feature that I would like to have is to set the name of the >>>> output cache. The reasoning behind this is similar to the one mentioned >>>> above. >>>> >>>> I wait for your thoughts on these two suggestions. >>>> >>>> Regards, >>>> Evangelos >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>>> >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Radim Vansa > JBoss DataGrid QA > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From dan.berindei at gmail.com Mon Feb 17 03:42:22 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Mon, 17 Feb 2014 10:42:22 +0200 Subject: [infinispan-dev] MapReduce limitations and suggestions. In-Reply-To: References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com> <52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com> Message-ID: Hi Etienne I was going to suggest using a combiner - the combiner would process the mapper results from just one node, so you should need at most double the memory on that node. I guess we could reduce the memory requirements even more if the combiner could run concurrently with the mapper... Vladimir, does it sound like a reasonable feature request? I'm afraid in your situation using a cache store wouldn't help, as the intermediate values for the same key are stored as a list in a single entry. So if all cars are red, there would be just one intermediate key in the intermediate cache, and there would be nothing to evict to the cache store. Vladimir, do you think we could somehow "chunk" the intermediary values into multiple entries grouped by the intermediary key, to support this scenario? For reference, though, a limited version of what you're asking for is already available. You can change the configuration of the intermediary cache by defining a "__tmpMapReduce" cache in your configuration. That configuration will be used for all M/R tasks, whether they use the shared intermediate cache or they create their own. Cheers Dan On Mon, Feb 17, 2014 at 10:18 AM, Etienne Riviere wrote: > Hi Radim, > > I might misunderstand your suggestion but many M/R jobs actually require > to run the two phases one after the other, and henceforth to store the > intermediate results somewhere. While some may slightly reduce intermediate > memory usage by using a combiner function (e.g., the word-count example), I > don't see how we can avoid intermediate storage altogether. > > Thanks, > Etienne (leads project -- as Evangelos who initiated the thread) > > On 17 Feb 2014, at 08:48, Radim Vansa wrote: > > > I think that the intermediate cache is not required at all. The M/R > > algorithm itself can (and should!) run with memory occupied by the > > result of reduction. The current implementation with Map first and > > Reduce after that will always have these problems, using a cache for > > temporary caching the result is only a workaround. > > > > The only situation when temporary cache could be useful is when the > > result grows linearly (or close to that or even more) with the amount of > > reduced entries. This would be the case for groupBy producing Map > List> from all entries in cache. Then the task does not scale and > > should be redesigned anyway, but flushing the results into cache backed > > by cache store could help. > > > > Radim > > > > On 02/14/2014 04:54 PM, Vladimir Blagojevic wrote: > >> Tristan, > >> > >> Actually they are not addressed in this pull request but the feature > >> where custom output cache is used instead of results being returned is > >> next in the implementation pipeline. > >> > >> Evangelos, indeed, depending on a reducer function all intermediate > >> KOut/VOut pairs might be moved to a single node. How would custom cache > >> help in this case? > >> > >> Regards, > >> Vladimir > >> > >> > >> On 2/14/2014, 10:16 AM, Tristan Tarrant wrote: > >>> Hi Evangelos, > >>> > >>> you might be interested in looking into a current pull request which > >>> addresses some (all?) of these issues > >>> > >>> https://github.com/infinispan/infinispan/pull/2300 > >>> > >>> Tristan > >>> > >>> On 14/02/2014 16:10, Evangelos Vazaios wrote: > >>>> Hello everyone, > >>>> > >>>> I started using the MapReduce implementation of Infinispan and I came > >>>> across some possible limitations. Thus, I want to make some > suggestions > >>>> about the MapReduce (MR) implementation of Infinispan. > >>>> Depending on the algorithm, there might be some memory problems, > >>>> especially for intermediate results. > >>>> An example of such a case is group by. Suppose that we have a cluster > >>>> of 2 nodes with 2 GB available. Let a distributed cache, where simple > >>>> car objects (id,brand,colour) are stored and the total size of data is > >>>> 3.5GB. If all objects have the same colour , then all 3.5 GB would go > to > >>>> only one reducer, as a result an OutOfMemoryException will be thrown. > >>>> > >>>> To overcome these limitations, I propose to add as parameter the name > of > >>>> the intermediate cache to be used. This will enable the creation of a > >>>> custom configured cache that deals with the memory limitations. > >>>> > >>>> Another feature that I would like to have is to set the name of the > >>>> output cache. The reasoning behind this is similar to the one > mentioned > >>>> above. > >>>> > >>>> I wait for your thoughts on these two suggestions. > >>>> > >>>> Regards, > >>>> Evangelos > >>>> _______________________________________________ > >>>> infinispan-dev mailing list > >>>> infinispan-dev at lists.jboss.org > >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >>>> > >>>> > >>> > >>> > >>> _______________________________________________ > >>> infinispan-dev mailing list > >>> infinispan-dev at lists.jboss.org > >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> _______________________________________________ > >> infinispan-dev mailing list > >> infinispan-dev at lists.jboss.org > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > > -- > > Radim Vansa > > JBoss DataGrid QA > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140217/7c747384/attachment-0001.html From vagvaz at gmail.com Mon Feb 17 06:48:29 2014 From: vagvaz at gmail.com (Evangelos Vazaios) Date: Mon, 17 Feb 2014 13:48:29 +0200 Subject: [infinispan-dev] MapReduce limitations and suggestions. In-Reply-To: References: Message-ID: <5301F70D.6050605@gmail.com> On 02/17/2014 10:42 AM, infinispan-dev-request at lists.jboss.org wrote: > Hi Etienne > > I was going to suggest using a combiner - the combiner would process the > mapper results from just one node, so you should need at most double the > memory on that node. I guess we could reduce the memory requirements even > more if the combiner could run concurrently with the mapper... Vladimir, > does it sound like a reasonable feature request? > There are algorithms where combiners cannot be applied. > I'm afraid in your situation using a cache store wouldn't help, as the > intermediate values for the same key are stored as a list in a single > entry. So if all cars are red, there would be just one intermediate key in > the intermediate cache, and there would be nothing to evict to the cache > store. Vladimir, do you think we could somehow "chunk" the intermediary > values into multiple entries grouped by the intermediary key, to support > this scenario? > I was thinking a custom cache implementation that maintains the overall size of cache and each key individually and when a threshold is reached it spills things on disk. Note that I am not familiar with the internals of Infinispan, but I think it is doable. Such a cache solves the problem in both cases (when one key is too large to be in memory as my example and the case where the keys assigned to one reducer exceeds its memory). > For reference, though, a limited version of what you're asking for is > already available. You can change the configuration of the intermediary > cache by defining a "__tmpMapReduce" cache in your configuration. That > configuration will be used for all M/R tasks, whether they use the shared > intermediate cache or they create their own. > I have one question about this. if I start two MR tasks at once will these tasks use the same Cache? thus, the intermediate results are going to be mixed?. This cache can be used in order as a test case. Regards, Evangelos > Cheers > Dan > > > > On Mon, Feb 17, 2014 at 10:18 AM, Etienne Riviere > wrote: > >> > Hi Radim, >> > >> > I might misunderstand your suggestion but many M/R jobs actually require >> > to run the two phases one after the other, and henceforth to store the >> > intermediate results somewhere. While some may slightly reduce intermediate >> > memory usage by using a combiner function (e.g., the word-count example), I >> > don't see how we can avoid intermediate storage altogether. >> > >> > Thanks, >> > Etienne (leads project -- as Evangelos who initiated the thread) >> > >> > On 17 Feb 2014, at 08:48, Radim Vansa wrote: >> > >>> > > I think that the intermediate cache is not required at all. The M/R >>> > > algorithm itself can (and should!) run with memory occupied by the >>> > > result of reduction. The current implementation with Map first and >>> > > Reduce after that will always have these problems, using a cache for >>> > > temporary caching the result is only a workaround. >>> > > >>> > > The only situation when temporary cache could be useful is when the >>> > > result grows linearly (or close to that or even more) with the amount of >>> > > reduced entries. This would be the case for groupBy producing Map>> > > List> from all entries in cache. Then the task does not scale and >>> > > should be redesigned anyway, but flushing the results into cache backed >>> > > by cache store could help. >>> > > >>> > > Radim >>> > > >>> > > On 02/14/2014 04:54 PM, Vladimir Blagojevic wrote: >>>> > >> Tristan, >>>> > >> >>>> > >> Actually they are not addressed in this pull request but the feature >>>> > >> where custom output cache is used instead of results being returned is >>>> > >> next in the implementation pipeline. >>>> > >> >>>> > >> Evangelos, indeed, depending on a reducer function all intermediate >>>> > >> KOut/VOut pairs might be moved to a single node. How would custom cache >>>> > >> help in this case? >>>> > >> >>>> > >> Regards, >>>> > >> Vladimir >>>> > >> >>>> > >> >>>> > >> On 2/14/2014, 10:16 AM, Tristan Tarrant wrote: >>>>> > >>> Hi Evangelos, >>>>> > >>> >>>>> > >>> you might be interested in looking into a current pull request which >>>>> > >>> addresses some (all?) of these issues >>>>> > >>> >>>>> > >>> https://github.com/infinispan/infinispan/pull/2300 >>>>> > >>> >>>>> > >>> Tristan >>>>> > >>> >>>>> > >>> On 14/02/2014 16:10, Evangelos Vazaios wrote: >>>>>> > >>>> Hello everyone, >>>>>> > >>>> >>>>>> > >>>> I started using the MapReduce implementation of Infinispan and I came >>>>>> > >>>> across some possible limitations. Thus, I want to make some >> > suggestions >>>>>> > >>>> about the MapReduce (MR) implementation of Infinispan. >>>>>> > >>>> Depending on the algorithm, there might be some memory problems, >>>>>> > >>>> especially for intermediate results. >>>>>> > >>>> An example of such a case is group by. Suppose that we have a cluster >>>>>> > >>>> of 2 nodes with 2 GB available. Let a distributed cache, where simple >>>>>> > >>>> car objects (id,brand,colour) are stored and the total size of data is >>>>>> > >>>> 3.5GB. If all objects have the same colour , then all 3.5 GB would go >> > to >>>>>> > >>>> only one reducer, as a result an OutOfMemoryException will be thrown. >>>>>> > >>>> >>>>>> > >>>> To overcome these limitations, I propose to add as parameter the name >> > of >>>>>> > >>>> the intermediate cache to be used. This will enable the creation of a >>>>>> > >>>> custom configured cache that deals with the memory limitations. >>>>>> > >>>> >>>>>> > >>>> Another feature that I would like to have is to set the name of the >>>>>> > >>>> output cache. The reasoning behind this is similar to the one >> > mentioned >>>>>> > >>>> above. >>>>>> > >>>> >>>>>> > >>>> I wait for your thoughts on these two suggestions. >>>>>> > >>>> >>>>>> > >>>> Regards, >>>>>> > >>>> Evangelos >>>>>> > >>>> _______________________________________________ >>>>>> > >>>> infinispan-dev mailing list >>>>>> > >>>> infinispan-dev at lists.jboss.org >>>>>> > >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>> > >>>> >>>>>> > >>>> >>>>> > >>> >>>>> > >>> >>>>> > >>> _______________________________________________ >>>>> > >>> infinispan-dev mailing list >>>>> > >>> infinispan-dev at lists.jboss.org >>>>> > >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> > >> _______________________________________________ >>>> > >> infinispan-dev mailing list >>>> > >> infinispan-dev at lists.jboss.org >>>> > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> > > >>> > > >>> > > -- >>> > > Radim Vansa >>> > > JBoss DataGrid QA >>> > > >>> > > _______________________________________________ >>> > > infinispan-dev mailing list >>> > > infinispan-dev at lists.jboss.org >>> > > https://lists.jboss.org/mailman/listinfo/infinispan-d From sanne at infinispan.org Mon Feb 17 07:25:35 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Mon, 17 Feb 2014 12:25:35 +0000 Subject: [infinispan-dev] MapReduce limitations and suggestions. In-Reply-To: References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com> <52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com> Message-ID: On 17 February 2014 08:42, Dan Berindei wrote: > Hi Etienne > > I was going to suggest using a combiner - the combiner would process the > mapper results from just one node, so you should need at most double the > memory on that node. I guess we could reduce the memory requirements even > more if the combiner could run concurrently with the mapper... Vladimir, > does it sound like a reasonable feature request? Yes that's something I've discussed with Vladimir before. The problem is - as the LEADS experts of M/R like Evangelios explained to us in London - is that in many practical use cases you can't apply a combiner, and the Reducer needs to be run on the full set. I think Evangelios also mentioned that the actual the set processed by the Reducer is also expected to be sorted, apparently it's "interesting" that we don't do such things. This can be taken as a negative point as not all problems are solvable, but is also making it interesting for being able to resolve some other problems with a higher level of efficiency so it's not necessarily something that we might want to throw away. Might be interesting to keep our design with the current limitations, and to also pursue a second mode of operation in which we make a good Hadoop integration, to not reinvent the wheel in the area of the more complex tasks, also providing the benefit of API compatibility to allow other systems such as Apache Nutch and Mahout to run on Infinispan without significant changes. > > I'm afraid in your situation using a cache store wouldn't help, as the > intermediate values for the same key are stored as a list in a single entry. > So if all cars are red, there would be just one intermediate key in the > intermediate cache, and there would be nothing to evict to the cache store. > Vladimir, do you think we could somehow "chunk" the intermediary values into > multiple entries grouped by the intermediary key, to support this scenario? > > For reference, though, a limited version of what you're asking for is > already available. You can change the configuration of the intermediary > cache by defining a "__tmpMapReduce" cache in your configuration. That > configuration will be used for all M/R tasks, whether they use the shared > intermediate cache or they create their own. I really hope we can get rid of temporary caches, but if need be please make sure each task has an isolated execution context: names of temporary caches - or their keys - need to avoid collisions with other jobs. Also, if we start spawning additional caches automagically I have no idea how people will be able to define boundaries of heap size we're allowed to use: such matters can not be left to the user's responsibility to figure out. Sanne > > Cheers > Dan > > > > On Mon, Feb 17, 2014 at 10:18 AM, Etienne Riviere > wrote: >> >> Hi Radim, >> >> I might misunderstand your suggestion but many M/R jobs actually require >> to run the two phases one after the other, and henceforth to store the >> intermediate results somewhere. While some may slightly reduce intermediate >> memory usage by using a combiner function (e.g., the word-count example), I >> don?t see how we can avoid intermediate storage altogether. >> >> Thanks, >> Etienne (leads project ? as Evangelos who initiated the thread) >> >> On 17 Feb 2014, at 08:48, Radim Vansa wrote: >> >> > I think that the intermediate cache is not required at all. The M/R >> > algorithm itself can (and should!) run with memory occupied by the >> > result of reduction. The current implementation with Map first and >> > Reduce after that will always have these problems, using a cache for >> > temporary caching the result is only a workaround. >> > >> > The only situation when temporary cache could be useful is when the >> > result grows linearly (or close to that or even more) with the amount of >> > reduced entries. This would be the case for groupBy producing Map> > List> from all entries in cache. Then the task does not scale and >> > should be redesigned anyway, but flushing the results into cache backed >> > by cache store could help. >> > >> > Radim >> > >> > On 02/14/2014 04:54 PM, Vladimir Blagojevic wrote: >> >> Tristan, >> >> >> >> Actually they are not addressed in this pull request but the feature >> >> where custom output cache is used instead of results being returned is >> >> next in the implementation pipeline. >> >> >> >> Evangelos, indeed, depending on a reducer function all intermediate >> >> KOut/VOut pairs might be moved to a single node. How would custom cache >> >> help in this case? >> >> >> >> Regards, >> >> Vladimir >> >> >> >> >> >> On 2/14/2014, 10:16 AM, Tristan Tarrant wrote: >> >>> Hi Evangelos, >> >>> >> >>> you might be interested in looking into a current pull request which >> >>> addresses some (all?) of these issues >> >>> >> >>> https://github.com/infinispan/infinispan/pull/2300 >> >>> >> >>> Tristan >> >>> >> >>> On 14/02/2014 16:10, Evangelos Vazaios wrote: >> >>>> Hello everyone, >> >>>> >> >>>> I started using the MapReduce implementation of Infinispan and I came >> >>>> across some possible limitations. Thus, I want to make some >> >>>> suggestions >> >>>> about the MapReduce (MR) implementation of Infinispan. >> >>>> Depending on the algorithm, there might be some memory problems, >> >>>> especially for intermediate results. >> >>>> An example of such a case is group by. Suppose that we have a >> >>>> cluster >> >>>> of 2 nodes with 2 GB available. Let a distributed cache, where >> >>>> simple >> >>>> car objects (id,brand,colour) are stored and the total size of data >> >>>> is >> >>>> 3.5GB. If all objects have the same colour , then all 3.5 GB would go >> >>>> to >> >>>> only one reducer, as a result an OutOfMemoryException will be thrown. >> >>>> >> >>>> To overcome these limitations, I propose to add as parameter the name >> >>>> of >> >>>> the intermediate cache to be used. This will enable the creation of a >> >>>> custom configured cache that deals with the memory limitations. >> >>>> >> >>>> Another feature that I would like to have is to set the name of the >> >>>> output cache. The reasoning behind this is similar to the one >> >>>> mentioned >> >>>> above. >> >>>> >> >>>> I wait for your thoughts on these two suggestions. >> >>>> >> >>>> Regards, >> >>>> Evangelos >> >>>> _______________________________________________ >> >>>> infinispan-dev mailing list >> >>>> infinispan-dev at lists.jboss.org >> >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >>>> >> >>>> >> >>> >> >>> >> >>> _______________________________________________ >> >>> infinispan-dev mailing list >> >>> infinispan-dev at lists.jboss.org >> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> _______________________________________________ >> >> infinispan-dev mailing list >> >> infinispan-dev at lists.jboss.org >> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> > >> > >> > -- >> > Radim Vansa >> > JBoss DataGrid QA >> > >> > _______________________________________________ >> > infinispan-dev mailing list >> > infinispan-dev at lists.jboss.org >> > https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From sanne at infinispan.org Mon Feb 17 07:53:16 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Mon, 17 Feb 2014 12:53:16 +0000 Subject: [infinispan-dev] ClusteredListeners: message delivered twice In-Reply-To: <6F2DD24C-4712-48AF-ABCD-7CE280089875@redhat.com> References: <6F2DD24C-4712-48AF-ABCD-7CE280089875@redhat.com> Message-ID: On 12 February 2014 10:40, Mircea Markus wrote: > Hey Will, > > With the current design, during a topology change, an event might be delivered twice to a cluster listener. I think we might be able to identify such situations (a node becomes a key owner as a result of the topology change) and add this information to the event we send, e.g. a flag "potentiallyDuplicate" or something like that. Event implementors might be able to make good use of this, e.g. checking their internal state if an event is redelivered or not. What do you think? Are there any other more-than-once delivery situations we can't keep track of? I would really wish we would not push such a burden to the API consumer. If we at least had a modification counter associated with each entry this could help to identify duplicate triggers as well (on top of ordering of modification events as already discussed many times). Sanne > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From etienne.riviere at unine.ch Mon Feb 17 09:57:00 2014 From: etienne.riviere at unine.ch (Etienne Riviere) Date: Mon, 17 Feb 2014 15:57:00 +0100 Subject: [infinispan-dev] MapReduce limitations and suggestions. In-Reply-To: References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com> <52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com> Message-ID: <8B043BD7-8331-4720-B242-D8FC24B42004@unine.ch> Hi Sanne, As Evangelos pointed out in London, it is not possible to run a mapper and a combiner concurrently in the general case (there are exceptions where the combiner can run on the stream of tuples generated by the Mapper). The proposal to tighter integrate with Hadoop would make sense also for the support of Nutch that we need in the project. How complex do you think this would be? Etienne On 17 Feb 2014, at 13:25, Sanne Grinovero wrote: > On 17 February 2014 08:42, Dan Berindei wrote: >> Hi Etienne >> >> I was going to suggest using a combiner - the combiner would process the >> mapper results from just one node, so you should need at most double the >> memory on that node. I guess we could reduce the memory requirements even >> more if the combiner could run concurrently with the mapper... Vladimir, >> does it sound like a reasonable feature request? > > Yes that's something I've discussed with Vladimir before. > The problem is - as the LEADS experts of M/R like Evangelios explained > to us in London - is that in many practical use cases you can't apply > a combiner, and the Reducer needs to be run on the full set. > > I think Evangelios also mentioned that the actual the set processed by > the Reducer is also expected to be sorted, apparently it's > "interesting" that we don't do such things. This can be taken as a > negative point as not all problems are solvable, but is also making it > interesting for being able to resolve some other problems with a > higher level of efficiency so it's not necessarily something that we > might want to throw away. > > Might be interesting to keep our design with the current limitations, > and to also pursue a second mode of operation in which we make a good > Hadoop integration, to not reinvent the wheel in the area of the more > complex tasks, also providing the benefit of API compatibility to > allow other systems such as Apache Nutch and Mahout to run on > Infinispan without significant changes. > >> >> I'm afraid in your situation using a cache store wouldn't help, as the >> intermediate values for the same key are stored as a list in a single entry. >> So if all cars are red, there would be just one intermediate key in the >> intermediate cache, and there would be nothing to evict to the cache store. >> Vladimir, do you think we could somehow "chunk" the intermediary values into >> multiple entries grouped by the intermediary key, to support this scenario? >> >> For reference, though, a limited version of what you're asking for is >> already available. You can change the configuration of the intermediary >> cache by defining a "__tmpMapReduce" cache in your configuration. That >> configuration will be used for all M/R tasks, whether they use the shared >> intermediate cache or they create their own. > > I really hope we can get rid of temporary caches, but if need be > please make sure each task has an isolated execution context: names of > temporary caches - or their keys - need to avoid collisions with other > jobs. > Also, if we start spawning additional caches automagically I have no > idea how people will be able to define boundaries of heap size we're > allowed to use: such matters can not be left to the user's > responsibility to figure out. > > Sanne > >> >> Cheers >> Dan >> >> >> >> On Mon, Feb 17, 2014 at 10:18 AM, Etienne Riviere >> wrote: >>> >>> Hi Radim, >>> >>> I might misunderstand your suggestion but many M/R jobs actually require >>> to run the two phases one after the other, and henceforth to store the >>> intermediate results somewhere. While some may slightly reduce intermediate >>> memory usage by using a combiner function (e.g., the word-count example), I >>> don?t see how we can avoid intermediate storage altogether. >>> >>> Thanks, >>> Etienne (leads project ? as Evangelos who initiated the thread) >>> >>> On 17 Feb 2014, at 08:48, Radim Vansa wrote: >>> >>>> I think that the intermediate cache is not required at all. The M/R >>>> algorithm itself can (and should!) run with memory occupied by the >>>> result of reduction. The current implementation with Map first and >>>> Reduce after that will always have these problems, using a cache for >>>> temporary caching the result is only a workaround. >>>> >>>> The only situation when temporary cache could be useful is when the >>>> result grows linearly (or close to that or even more) with the amount of >>>> reduced entries. This would be the case for groupBy producing Map>>> List> from all entries in cache. Then the task does not scale and >>>> should be redesigned anyway, but flushing the results into cache backed >>>> by cache store could help. >>>> >>>> Radim >>>> >>>> On 02/14/2014 04:54 PM, Vladimir Blagojevic wrote: >>>>> Tristan, >>>>> >>>>> Actually they are not addressed in this pull request but the feature >>>>> where custom output cache is used instead of results being returned is >>>>> next in the implementation pipeline. >>>>> >>>>> Evangelos, indeed, depending on a reducer function all intermediate >>>>> KOut/VOut pairs might be moved to a single node. How would custom cache >>>>> help in this case? >>>>> >>>>> Regards, >>>>> Vladimir >>>>> >>>>> >>>>> On 2/14/2014, 10:16 AM, Tristan Tarrant wrote: >>>>>> Hi Evangelos, >>>>>> >>>>>> you might be interested in looking into a current pull request which >>>>>> addresses some (all?) of these issues >>>>>> >>>>>> https://github.com/infinispan/infinispan/pull/2300 >>>>>> >>>>>> Tristan >>>>>> >>>>>> On 14/02/2014 16:10, Evangelos Vazaios wrote: >>>>>>> Hello everyone, >>>>>>> >>>>>>> I started using the MapReduce implementation of Infinispan and I came >>>>>>> across some possible limitations. Thus, I want to make some >>>>>>> suggestions >>>>>>> about the MapReduce (MR) implementation of Infinispan. >>>>>>> Depending on the algorithm, there might be some memory problems, >>>>>>> especially for intermediate results. >>>>>>> An example of such a case is group by. Suppose that we have a >>>>>>> cluster >>>>>>> of 2 nodes with 2 GB available. Let a distributed cache, where >>>>>>> simple >>>>>>> car objects (id,brand,colour) are stored and the total size of data >>>>>>> is >>>>>>> 3.5GB. If all objects have the same colour , then all 3.5 GB would go >>>>>>> to >>>>>>> only one reducer, as a result an OutOfMemoryException will be thrown. >>>>>>> >>>>>>> To overcome these limitations, I propose to add as parameter the name >>>>>>> of >>>>>>> the intermediate cache to be used. This will enable the creation of a >>>>>>> custom configured cache that deals with the memory limitations. >>>>>>> >>>>>>> Another feature that I would like to have is to set the name of the >>>>>>> output cache. The reasoning behind this is similar to the one >>>>>>> mentioned >>>>>>> above. >>>>>>> >>>>>>> I wait for your thoughts on these two suggestions. >>>>>>> >>>>>>> Regards, >>>>>>> Evangelos >>>>>>> _______________________________________________ >>>>>>> infinispan-dev mailing list >>>>>>> infinispan-dev at lists.jboss.org >>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> infinispan-dev mailing list >>>>>> infinispan-dev at lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev at lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>>> >>>> -- >>>> Radim Vansa >>>> JBoss DataGrid QA >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From galder at redhat.com Mon Feb 17 12:35:25 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Mon, 17 Feb 2014 18:35:25 +0100 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: References: <888EA204-30A1-4BFF-9469-7118996024A1@hibernate.org> <6211A55D-9F1D-4686-9EF4-373C216E4927@hibernate.org> Message-ID: <8955F382-8A6E-43AA-864E-1EC0C190654E@redhat.com> On 30 Jan 2014, at 20:51, Mircea Markus wrote: > > On Jan 30, 2014, at 9:42 AM, Galder Zamarre?o wrote: > >> >> On Jan 21, 2014, at 11:52 PM, Mircea Markus wrote: >> >>> >>> On Jan 15, 2014, at 1:42 PM, Emmanuel Bernard wrote: >>> >>>> By the way, people looking for that feature are also asking for a unified Cache API accessing these several caches right? Otherwise I am not fully understanding why they ask for a unified query. >>>> Do you have written detailed use cases somewhere for me to better understand what is really requested? >>> >>> IMO from a user perspective, being able to run queries spreading several caches makes the programming simplifies the programming model: each cache corresponding to a single entity type, with potentially different configuration. >> >> Not sure if it simplifies things TBH if the configuration is the same. IMO, it just adds clutter. > > Not sure I follow: having a cache that contains both Cars and Persons sound more cluttering to me. I think it's cumbersome to write any kind of querying with an heterogenous cache, e.g. Map/Reduce tasks that need to count all the green Cars would need to be aware of Persons and ignore them. Not only it is harder to write, but discourages code reuse and makes it hard to maintain (if you'll add Pets in the same cache in future you need to update the M/R code as well). And of course there are also different cache-based configuration options that are not immediately obvious (at design time) but will be in the future (there are more Persons than Cars, they live longer/expiry etc): mixing everything together in the same cache from the begging is a design decision that might bite you in the future. > > The way I see it - and very curious to see your opinion on this - following an database analogy, the CacheManager corresponds to an Database and the Cache to a Table. Hence my thought that queries spreading multiple caches are both useful and needed (same as query spreading over multiple tables). My opinion is that seeing it this way is limiting. A key/value store is schemaless. Your view is forcing a particular schema on how to structure things. I don?t pretend everyone to store everything in a single cache and of course there will be situations where it?s not ideal or the best solution, such as in cases like the ones you mention above, but if you want to do it, for any of the reasons I or Paul mentioned in [1], it?d be nice to be able to do so. Cheers, [1] https://issues.jboss.org/browse/ISPN-3640 > >> >> Just yesterday I discovered this gem in Scala's Shapeless extensions [1]. This is experimental stuff but essentially it allows to define what the key/value type pairs a map will contain, and it does type checking at compile time. I almost wet my pants when I saw that ;) :p. In the example, it defines a map as containing Int -> String, and String -> Int key/value pairs. If you try to add an Int -> Int, it fails compilation. > > Agreed the compile time check is pretty awesome :-) Still mix and matching types in a Map doesn't look great to me for ISPN. > >> >> Java's type checking is not powerful enough to do this, and it's compilation logic is not extendable in the same way Scala macros does, but I think the fact that other languages are looking into this validates Paul's suggestion in [2], on top of all the benefits listed there. >> >> Cheers, >> >> [1] https://github.com/milessabin/shapeless/wiki/Feature-overview:-shapeless-2.0.0#heterogenous-maps >> [2] https://issues.jboss.org/browse/ISPN-3640 >> >>> Besides the query API that would need to be extended to support accessing multiple caches, not sure what other APIs would need to be extended to take advantage of this? >>> >>>> >>>> Emmanuel >>>> >>>> On 14 Jan 2014, at 12:59, Sanne Grinovero wrote: >>>> >>>>> Up this: it was proposed again today ad a face to face meeting. >>>>> Apparently multiple parties have been asking to be able to run >>>>> cross-cache queries. >>>>> >>>>> Sanne >>>>> >>>>> On 11 April 2012 12:47, Emmanuel Bernard wrote: >>>>>> >>>>>> On 10 avr. 2012, at 19:10, Sanne Grinovero wrote: >>>>>> >>>>>>> Hello all, >>>>>>> currently Infinispan Query is an interceptor registering on the >>>>>>> specific Cache instance which has indexing enabled; one such >>>>>>> interceptor is doing all what it needs to do in the sole scope of the >>>>>>> cache it was registered in. >>>>>>> >>>>>>> If you enable indexing - for example - on 3 different caches, there >>>>>>> will be 3 different Hibernate Search engines started in background, >>>>>>> and they are all unaware of each other. >>>>>>> >>>>>>> After some design discussions with Ales for CapeDwarf, but also >>>>>>> calling attention on something that bothered me since some time, I'd >>>>>>> evaluate the option to have a single Hibernate Search Engine >>>>>>> registered in the CacheManager, and have it shared across indexed >>>>>>> caches. >>>>>>> >>>>>>> Current design limitations: >>>>>>> >>>>>>> A- If they are all configured to use the same base directory to >>>>>>> store indexes, and happen to have same-named indexes, they'll share >>>>>>> the index without being aware of each other. This is going to break >>>>>>> unless the user configures some tricky parameters, and even so >>>>>>> performance won't be great: instances will lock each other out, or at >>>>>>> best write in alternate turns. >>>>>>> B- The search engine isn't particularly "heavy", still it would be >>>>>>> nice to share some components and internal services. >>>>>>> C- Configuration details which need some care - like injecting a >>>>>>> JGroups channel for clustering - needs to be done right isolating each >>>>>>> instance (so large parts of configuration would be quite similar but >>>>>>> not totally equal) >>>>>>> D- Incoming messages into a JGroups Receiver need to be routed not >>>>>>> only among indexes, but also among Engine instances. This prevents >>>>>>> Query to reuse code from Hibernate Search. >>>>>>> >>>>>>> Problems with a unified Hibernate Search Engine: >>>>>>> >>>>>>> 1#- Isolation of types / indexes. If the same indexed class is >>>>>>> stored in different (indexed) caches, they'll share the same index. Is >>>>>>> it a problem? I'm tempted to consider this a good thing, but wonder if >>>>>>> it would surprise some users. Would you expect that? >>>>>> >>>>>> I would not expect that. Unicity in Hibernate Search is not defined per identity but per class + provided id. >>>>>> I can see people reusing the same class as partial DTO and willing to index that. I can even see people >>>>>> using the Hibernate Search programmatic API to index the "DTO" stored in cache 2 differently than the >>>>>> domain class stored in cache 1. >>>>>> I can concede that I am pushing a bit the use case towards bad-ish design approaches. >>>>>> >>>>>>> 2#- configuration format overhaul: indexing options won't be set on >>>>>>> the cache section but in the global section. I'm looking forward to >>>>>>> use the schema extensions anyway to provide a better configuration >>>>>>> experience than the current . >>>>>>> 3#- Assuming 1# is fine, when a search hit is found I'd need to be >>>>>>> able to figure out from which cache the value should be loaded. >>>>>>> 3#A we could have the cache name encoded in the index, as part >>>>>>> of the identifier: {PK,cacheName} >>>>>>> 3#B we actually shard the index, keeping a physically separate >>>>>>> index per cache. This would mean searching on the joint index view but >>>>>>> extracting hits from specific indexes to keep track of "which index".. >>>>>>> I think we can do that but it's definitely tricky. >>>>>>> >>>>>>> It's likely easier to keep indexed values from different caches in >>>>>>> different indexes. that would mean to reject #1 and mess with the user >>>>>>> defined index name, to add for example the cache name to the user >>>>>>> defined string. >>>>>>> >>>>>>> Any comment? >>>>>>> >>>>>>> Cheers, >>>>>>> Sanne >>>>>>> _______________________________________________ >>>>>>> infinispan-dev mailing list >>>>>>> infinispan-dev at lists.jboss.org >>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> infinispan-dev mailing list >>>>>> infinispan-dev at lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev at lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> Cheers, >>> -- >>> Mircea Markus >>> Infinispan lead (www.infinispan.org) >>> >>> >>> >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> -- >> Galder Zamarre?o >> galder at redhat.com >> twitter.com/galderz >> >> Project Lead, Escalante >> http://escalante.io >> >> Engineer, Infinispan >> http://infinispan.org >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From galder at redhat.com Mon Feb 17 12:36:39 2014 From: galder at redhat.com (=?iso-8859-1?Q?Galder_Zamarre=F1o?=) Date: Mon, 17 Feb 2014 18:36:39 +0100 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <2C233AC3-BEFC-4FD5-A297-A854FEA8165D@hibernate.org> References: <888EA204-30A1-4BFF-9469-7118996024A1@hibernate.org> <6211A55D-9F1D-4686-9EF4-373C216E4927@hibernate.org> <2C233AC3-BEFC-4FD5-A297-A854FEA8165D@hibernate.org> Message-ID: <2D1C63B2-7313-4FE4-93D2-D50B91565FF2@redhat.com> On 31 Jan 2014, at 09:28, Emmanuel Bernard wrote: > > >> On 30 janv. 2014, at 20:51, Mircea Markus wrote: >> >> >>> On Jan 30, 2014, at 9:42 AM, Galder Zamarre?o wrote: >>> >>> >>>> On Jan 21, 2014, at 11:52 PM, Mircea Markus wrote: >>>> >>>> >>>>> On Jan 15, 2014, at 1:42 PM, Emmanuel Bernard wrote: >>>>> >>>>> By the way, people looking for that feature are also asking for a unified Cache API accessing these several caches right? Otherwise I am not fully understanding why they ask for a unified query. >>>>> Do you have written detailed use cases somewhere for me to better understand what is really requested? >>>> >>>> IMO from a user perspective, being able to run queries spreading several caches makes the programming simplifies the programming model: each cache corresponding to a single entity type, with potentially different configuration. >>> >>> Not sure if it simplifies things TBH if the configuration is the same. IMO, it just adds clutter. >> >> Not sure I follow: having a cache that contains both Cars and Persons sound more cluttering to me. I think it's cumbersome to write any kind of querying with an heterogenous cache, e.g. Map/Reduce tasks that need to count all the green Cars would need to be aware of Persons and ignore them. Not only it is harder to write, but discourages code reuse and makes it hard to maintain (if you'll add Pets in the same cache in future you need to update the M/R code as well). And of course there are also different cache-based configuration options that are not immediately obvious (at design time) but will be in the future (there are more Persons than Cars, they live longer/expiry etc): mixing everything together in the same cache from the begging is a design decision that might bite you in the future. >> >> The way I see it - and very curious to see your opinion on this - following an database analogy, the CacheManager corresponds to an Database and the Cache to a Table. Hence my thought that queries spreading multiple caches are both useful and needed (same as query spreading over multiple tables). > > I know Sanne and you are keen to have one entity type per cache to be able to fine tune the configuration. I am a little more skeptical but I don't have strong opinions on the subject. > > However, I don't think you can forbid the case where people want to store heterogenous types in the same cache: > > - it's easy to start with > - configuration is indeed simpler > - when you work in the same service with cats, dogs, owners, addresses and refuges, juggling between these n Cache instances begins to be fugly I suspect - should write some application code to confirm > - people will add to the grid types unknown at configuration time. They might want a single bucket. +100 > > Btw with the distributed execution engine, it looks reasonably simple to migrate data from one cache to another. I imagine you can also focus only on the keys whose node is primary which should limit data transfers. Am I missing something? > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From galder at redhat.com Mon Feb 17 12:43:44 2014 From: galder at redhat.com (=?iso-8859-1?Q?Galder_Zamarre=F1o?=) Date: Mon, 17 Feb 2014 18:43:44 +0100 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <20140205163032.GB93108@hibernate.org> References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> Message-ID: On 05 Feb 2014, at 17:30, Emmanuel Bernard wrote: > On Wed 2014-02-05 15:53, Mircea Markus wrote: >> >> On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard wrote: >> >>> Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed. >> >> Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-) > > > //some unified query giving me entries pointing by fk copy to bar and > //buz objects. So I need to manually load these references. > > //happy emmanuel > Cache unifiedCache = cacheManager.getMotherOfAllCaches(); > Bar bar = unifiedCache.get(foo); > Buz buz = unifiedCache.get(baz); > > //not so happy emmanuel > Cache fooCache = cacheManager.getCache("foo"); > Bar bar = fooCache.get(foo); > Cache bazCache = cacheManager.getCache("baz"); > Buz buz = bazCache.put(baz); Would something like what Paul suggests in https://issues.jboss.org/browse/ISPN-3640 help you better? IOW, have a single cache, and then have a filtered view for Bar or Buz types? Not sure I understand the differences in your code changes in terms of what makes you happy vs not. Cheers, > > >> >>> >>> I have to admit that having to add a cache name to the stored elements of the index documents makes me a bit sad. >> >> sad because of the increased index size? > > It makes the index non natural and less reusable using direct Lucene > APIs. But that might be less of a concern for Infinispan. > >> >>> I was already unhappy when I had to do it for class names. Renaming a cache will be a heavy operation too. >>> Sanne, if we know that we don't share the semi index for different caches, can we avoid the need to store the cache name in each document? > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From mudokonman at gmail.com Mon Feb 17 12:44:32 2014 From: mudokonman at gmail.com (William Burns) Date: Mon, 17 Feb 2014 12:44:32 -0500 Subject: [infinispan-dev] ClusteredListeners: message delivered twice In-Reply-To: References: <6F2DD24C-4712-48AF-ABCD-7CE280089875@redhat.com> Message-ID: On Mon, Feb 17, 2014 at 7:53 AM, Sanne Grinovero wrote: > On 12 February 2014 10:40, Mircea Markus wrote: >> Hey Will, >> >> With the current design, during a topology change, an event might be delivered twice to a cluster listener. I think we might be able to identify such situations (a node becomes a key owner as a result of the topology change) and add this information to the event we send, e.g. a flag "potentiallyDuplicate" or something like that. Event implementors might be able to make good use of this, e.g. checking their internal state if an event is redelivered or not. What do you think? Are there any other more-than-once delivery situations we can't keep track of? I agree, this would be important to track. I have thus added a new flag to listeners that is set to true when a modification, removal, or create that is done on behalf of a command that was retried due to a topology change during the middle of it. Also this gives the benefit not just for cluster listeners but regular listeners, since we could have double notification currently even. > > I would really wish we would not push such a burden to the API > consumer. If we at least had a modification counter associated with > each entry this could help to identify duplicate triggers as well (on > top of ordering of modification events as already discussed many > times). The issue in particular we have issues with listeners is when the primary owner replicates the update to backup owners and then crashes before the notification is sent. In this case we have no idea from the originator's perspective if the backup owner has the update. When the topology changes if updated it will be persisted to new owners (possibly without notification). We could add a counter, however the backup owner then has no idea if the primary owner has sent the notification or not. Without adding some kind of 2PC to the primary owner to tell the backup that it occurred, he won't know. However this doesn't reliably tell the backup owner if the notification was fired even if the node goes down during this period. Without seriously rewriting our nontx dist code I don't see a viable way to do this without the API consumer having to be alerted. > > Sanne > >> >> Cheers, >> -- >> Mircea Markus >> Infinispan lead (www.infinispan.org) >> >> >> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From emmanuel at hibernate.org Mon Feb 17 12:51:15 2014 From: emmanuel at hibernate.org (Emmanuel Bernard) Date: Mon, 17 Feb 2014 18:51:15 +0100 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> Message-ID: <20140217175115.GC639@hibernate.org> On Mon 2014-02-17 18:43, Galder Zamarre?o wrote: > > On 05 Feb 2014, at 17:30, Emmanuel Bernard wrote: > > > On Wed 2014-02-05 15:53, Mircea Markus wrote: > >> > >> On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard wrote: > >> > >>> Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed. > >> > >> Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-) > > > > > > //some unified query giving me entries pointing by fk copy to bar and > > //buz objects. So I need to manually load these references. > > > > //happy emmanuel > > Cache unifiedCache = cacheManager.getMotherOfAllCaches(); > > Bar bar = unifiedCache.get(foo); > > Buz buz = unifiedCache.get(baz); > > > > //not so happy emmanuel > > Cache fooCache = cacheManager.getCache("foo"); > > Bar bar = fooCache.get(foo); > > Cache bazCache = cacheManager.getCache("baz"); > > Buz buz = bazCache.put(baz); > > Would something like what Paul suggests in https://issues.jboss.org/browse/ISPN-3640 help you better? IOW, have a single cache, and then have a filtered view for Bar or Buz types? Not sure I understand the differences in your code changes in terms of what makes you happy vs not. Not really. What makes me unhappy is to have to keep in my app all the references to these specific cache store instances. The filtering approach only moves the problem. From ben.cotton at ALUMNI.RUTGERS.EDU Mon Feb 17 16:02:17 2014 From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben) Date: Mon, 17 Feb 2014 13:02:17 -0800 (PST) Subject: [infinispan-dev] Infinispan embedded off-heap cache In-Reply-To: <1391992923651-4028800.post@n3.nabble.com> References: <52D67480.9020908@redhat.com> <52D678BD.5010400@redhat.com> <52D67A1E.2010509@redhat.com> <1389808266330-4028649.post@n3.nabble.com> <1389893871449-4028653.post@n3.nabble.com> <1391715983011-4028794.post@n3.nabble.com> <3560BCBE-3D25-472B-9D8B-C16230FBCB4B@redhat.com> <1391992923651-4028800.post@n3.nabble.com> Message-ID: <1392670937953-4028836.post@n3.nabble.com> Hi Tristan, We are still waiting for an OpenHFT HugeCollections update before we start key stroking its adaption as an Off-Heap Impl of javax.cache.Cache (via ISPN DataContainer API bridge). We envision our openHFT<-->ISPN adaptation effort to look something like the attached slide. Question for you: anywhere w/in the ISPN 7 master tree do you a class that simultaneously implements both javax.cache.Cache /and/ org.infinispan.container.Container? Thx, Ben -- View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028836.html Sent from the Infinispan Developer List mailing list archive at Nabble.com. From emmanuel at hibernate.org Mon Feb 17 17:13:56 2014 From: emmanuel at hibernate.org (Emmanuel Bernard) Date: Mon, 17 Feb 2014 23:13:56 +0100 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <20140217175115.GC639@hibernate.org> References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> Message-ID: <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome. > On 17 f?vr. 2014, at 18:51, Emmanuel Bernard wrote: > >> On Mon 2014-02-17 18:43, Galder Zamarre?o wrote: >> >>> On 05 Feb 2014, at 17:30, Emmanuel Bernard wrote: >>> >>>> On Wed 2014-02-05 15:53, Mircea Markus wrote: >>>> >>>>> On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard wrote: >>>>> >>>>> Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed. >>>> >>>> Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-) >>> >>> >>> //some unified query giving me entries pointing by fk copy to bar and >>> //buz objects. So I need to manually load these references. >>> >>> //happy emmanuel >>> Cache unifiedCache = cacheManager.getMotherOfAllCaches(); >>> Bar bar = unifiedCache.get(foo); >>> Buz buz = unifiedCache.get(baz); >>> >>> //not so happy emmanuel >>> Cache fooCache = cacheManager.getCache("foo"); >>> Bar bar = fooCache.get(foo); >>> Cache bazCache = cacheManager.getCache("baz"); >>> Buz buz = bazCache.put(baz); >> >> Would something like what Paul suggests in https://issues.jboss.org/browse/ISPN-3640 help you better? IOW, have a single cache, and then have a filtered view for Bar or Buz types? Not sure I understand the differences in your code changes in terms of what makes you happy vs not. > > Not really. > What makes me unhappy is to have to keep in my app all the > references to these specific cache store instances. The filtering > approach only moves the problem. > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From rvansa at redhat.com Tue Feb 18 03:59:37 2014 From: rvansa at redhat.com (Radim Vansa) Date: Tue, 18 Feb 2014 09:59:37 +0100 Subject: [infinispan-dev] MapReduce limitations and suggestions. In-Reply-To: References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com> <52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com> Message-ID: <530320F9.300@redhat.com> Hi Etienne, how does the requirement for all data provided to Reducer as a whole work for distributed caches? There you'd get only a subset of the whole mapped set on each node (afaik each node maps the nodes locally and performs a reduction before executing the "global" reduction). Or are these M/R jobs applicable only to local caches? I have to admit I have only a limited knowledge of M/R, could you give me an example where the algorithm works in distributed environment and still cannot be parallelized? Thanks Radim On 02/17/2014 09:18 AM, Etienne Riviere wrote: > Hi Radim, > > I might misunderstand your suggestion but many M/R jobs actually require to run the two phases one after the other, and henceforth to store the intermediate results somewhere. While some may slightly reduce intermediate memory usage by using a combiner function (e.g., the word-count example), I don?t see how we can avoid intermediate storage altogether. > > Thanks, > Etienne (leads project ? as Evangelos who initiated the thread) > > On 17 Feb 2014, at 08:48, Radim Vansa wrote: > >> I think that the intermediate cache is not required at all. The M/R >> algorithm itself can (and should!) run with memory occupied by the >> result of reduction. The current implementation with Map first and >> Reduce after that will always have these problems, using a cache for >> temporary caching the result is only a workaround. >> >> The only situation when temporary cache could be useful is when the >> result grows linearly (or close to that or even more) with the amount of >> reduced entries. This would be the case for groupBy producing Map> List> from all entries in cache. Then the task does not scale and >> should be redesigned anyway, but flushing the results into cache backed >> by cache store could help. >> >> Radim >> >> On 02/14/2014 04:54 PM, Vladimir Blagojevic wrote: >>> Tristan, >>> >>> Actually they are not addressed in this pull request but the feature >>> where custom output cache is used instead of results being returned is >>> next in the implementation pipeline. >>> >>> Evangelos, indeed, depending on a reducer function all intermediate >>> KOut/VOut pairs might be moved to a single node. How would custom cache >>> help in this case? >>> >>> Regards, >>> Vladimir >>> >>> >>> On 2/14/2014, 10:16 AM, Tristan Tarrant wrote: >>>> Hi Evangelos, >>>> >>>> you might be interested in looking into a current pull request which >>>> addresses some (all?) of these issues >>>> >>>> https://github.com/infinispan/infinispan/pull/2300 >>>> >>>> Tristan >>>> >>>> On 14/02/2014 16:10, Evangelos Vazaios wrote: >>>>> Hello everyone, >>>>> >>>>> I started using the MapReduce implementation of Infinispan and I came >>>>> across some possible limitations. Thus, I want to make some suggestions >>>>> about the MapReduce (MR) implementation of Infinispan. >>>>> Depending on the algorithm, there might be some memory problems, >>>>> especially for intermediate results. >>>>> An example of such a case is group by. Suppose that we have a cluster >>>>> of 2 nodes with 2 GB available. Let a distributed cache, where simple >>>>> car objects (id,brand,colour) are stored and the total size of data is >>>>> 3.5GB. If all objects have the same colour , then all 3.5 GB would go to >>>>> only one reducer, as a result an OutOfMemoryException will be thrown. >>>>> >>>>> To overcome these limitations, I propose to add as parameter the name of >>>>> the intermediate cache to be used. This will enable the creation of a >>>>> custom configured cache that deals with the memory limitations. >>>>> >>>>> Another feature that I would like to have is to set the name of the >>>>> output cache. The reasoning behind this is similar to the one mentioned >>>>> above. >>>>> >>>>> I wait for your thoughts on these two suggestions. >>>>> >>>>> Regards, >>>>> Evangelos >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev at lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> -- >> Radim Vansa >> JBoss DataGrid QA >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Radim Vansa JBoss DataGrid QA From dan.berindei at gmail.com Tue Feb 18 04:59:34 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Tue, 18 Feb 2014 11:59:34 +0200 Subject: [infinispan-dev] MapReduce limitations and suggestions. In-Reply-To: <530320F9.300@redhat.com> References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com> <52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com> <530320F9.300@redhat.com> Message-ID: Radim, this is how our M/R algorithm works (Hadoop may do it differently): * The mapping phase generates a Map> on each node (Int meaning intermediate). * In the combine (local reduce) phase, a combine operation takes as input an IntKey and a Collection with only the values that were produced on that node. * In the (global) reduce phase, all the intermediate values for each key are merged, and a reduce operation takes an intermediate key and a sequence of *all* the intermediate values generated for that key. These reduce operations are completely independent, so each intermediate key can be mapped to a different node (distributed reduce), while still having access to all the intermediate values at once. * In the end, the collator takes the Map from the reduce phase and produces a single value. If a combiner can be used, then I believe it can also be run in parallel with a LinkedBlockingQueue between the mapper and the combiner. But sometimes the reduce algorithm can only be run on the entire collection of values (e.g if you want to find the median, or a percentile). The limitation we have now is that in the reduce phase, the entire list of values for one intermediate key must be in memory at once. I think Hadoop only loads a block of intermediate values in memory at once, and can even sort the intermediate values (with a user-supplied comparison function) so that the reduce function can work on a sorted list without loading the values in memory itself. Cheers Dan On Tue, Feb 18, 2014 at 10:59 AM, Radim Vansa wrote: > Hi Etienne, > > how does the requirement for all data provided to Reducer as a whole > work for distributed caches? There you'd get only a subset of the whole > mapped set on each node (afaik each node maps the nodes locally and > performs a reduction before executing the "global" reduction). Or are > these M/R jobs applicable only to local caches? > I have to admit I have only a limited knowledge of M/R, could you give > me an example where the algorithm works in distributed environment and > still cannot be parallelized? > > Thanks > > Radim > > On 02/17/2014 09:18 AM, Etienne Riviere wrote: > > Hi Radim, > > > > I might misunderstand your suggestion but many M/R jobs actually require > to run the two phases one after the other, and henceforth to store the > intermediate results somewhere. While some may slightly reduce intermediate > memory usage by using a combiner function (e.g., the word-count example), I > don't see how we can avoid intermediate storage altogether. > > > > Thanks, > > Etienne (leads project -- as Evangelos who initiated the thread) > > > > On 17 Feb 2014, at 08:48, Radim Vansa wrote: > > > >> I think that the intermediate cache is not required at all. The M/R > >> algorithm itself can (and should!) run with memory occupied by the > >> result of reduction. The current implementation with Map first and > >> Reduce after that will always have these problems, using a cache for > >> temporary caching the result is only a workaround. > >> > >> The only situation when temporary cache could be useful is when the > >> result grows linearly (or close to that or even more) with the amount of > >> reduced entries. This would be the case for groupBy producing Map >> List> from all entries in cache. Then the task does not scale and > >> should be redesigned anyway, but flushing the results into cache backed > >> by cache store could help. > >> > >> Radim > >> > >> On 02/14/2014 04:54 PM, Vladimir Blagojevic wrote: > >>> Tristan, > >>> > >>> Actually they are not addressed in this pull request but the feature > >>> where custom output cache is used instead of results being returned is > >>> next in the implementation pipeline. > >>> > >>> Evangelos, indeed, depending on a reducer function all intermediate > >>> KOut/VOut pairs might be moved to a single node. How would custom cache > >>> help in this case? > >>> > >>> Regards, > >>> Vladimir > >>> > >>> > >>> On 2/14/2014, 10:16 AM, Tristan Tarrant wrote: > >>>> Hi Evangelos, > >>>> > >>>> you might be interested in looking into a current pull request which > >>>> addresses some (all?) of these issues > >>>> > >>>> https://github.com/infinispan/infinispan/pull/2300 > >>>> > >>>> Tristan > >>>> > >>>> On 14/02/2014 16:10, Evangelos Vazaios wrote: > >>>>> Hello everyone, > >>>>> > >>>>> I started using the MapReduce implementation of Infinispan and I came > >>>>> across some possible limitations. Thus, I want to make some > suggestions > >>>>> about the MapReduce (MR) implementation of Infinispan. > >>>>> Depending on the algorithm, there might be some memory problems, > >>>>> especially for intermediate results. > >>>>> An example of such a case is group by. Suppose that we have a > cluster > >>>>> of 2 nodes with 2 GB available. Let a distributed cache, where > simple > >>>>> car objects (id,brand,colour) are stored and the total size of data > is > >>>>> 3.5GB. If all objects have the same colour , then all 3.5 GB would > go to > >>>>> only one reducer, as a result an OutOfMemoryException will be thrown. > >>>>> > >>>>> To overcome these limitations, I propose to add as parameter the > name of > >>>>> the intermediate cache to be used. This will enable the creation of a > >>>>> custom configured cache that deals with the memory limitations. > >>>>> > >>>>> Another feature that I would like to have is to set the name of the > >>>>> output cache. The reasoning behind this is similar to the one > mentioned > >>>>> above. > >>>>> > >>>>> I wait for your thoughts on these two suggestions. > >>>>> > >>>>> Regards, > >>>>> Evangelos > >>>>> _______________________________________________ > >>>>> infinispan-dev mailing list > >>>>> infinispan-dev at lists.jboss.org > >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >>>>> > >>>>> > >>>> > >>>> _______________________________________________ > >>>> infinispan-dev mailing list > >>>> infinispan-dev at lists.jboss.org > >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >>> _______________________________________________ > >>> infinispan-dev mailing list > >>> infinispan-dev at lists.jboss.org > >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> > >> -- > >> Radim Vansa > >> JBoss DataGrid QA > >> > >> _______________________________________________ > >> infinispan-dev mailing list > >> infinispan-dev at lists.jboss.org > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Radim Vansa > JBoss DataGrid QA > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140218/5732287e/attachment-0001.html From marcelo.pasin at unine.ch Tue Feb 18 05:19:55 2014 From: marcelo.pasin at unine.ch (Marcelo Pasin) Date: Tue, 18 Feb 2014 11:19:55 +0100 Subject: [infinispan-dev] MapReduce limitations and suggestions. In-Reply-To: References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com> <52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com> <530320F9.300@redhat.com> Message-ID: On 18/Feb/2014, at 10:59 , Dan Berindei wrote: > I think Hadoop only loads a block of intermediate values in memory at once, and can even sort the intermediate values (with a user-supplied comparison function) so that the reduce function can work on a sorted list without loading the values in memory itself. Actually, Hadoop sorts in the map node, the last two steps being sort and combine. Reduce nodes fetch partitions from the map nodes and just merges them. Such partitions are fetched incrementally, and whenever a given key ends in all partially fetched partitions, reduce() is called. Cheers, MP -- Marcelo Pasin Universit? de Neuch?tel ? Institut d'informatique rue Emile-Argand 11 ? Case postale 158 ? 2000 Neuch?tel ? Switzerland From vagvaz at gmail.com Tue Feb 18 05:21:30 2014 From: vagvaz at gmail.com (Evangelos Vazaios) Date: Tue, 18 Feb 2014 12:21:30 +0200 Subject: [infinispan-dev] MapReduce limitations and suggestions. In-Reply-To: References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com> <52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com> <530320F9.300@redhat.com> Message-ID: <5303342A.1050800@gmail.com> Hi Radim, Since Hadoop is the most popular implementation of MapReduce I will give a brief overview of how it works and then I'll provide with an example where the reducers must run over the whole list of values with the same key. Hadoop MR overview. MAP 1) Input file(s) are split into pieces of 64MB 2) For each split hadoop creates one map task and then assign the task to a cluster node 3) The splits are read as key,value pairs and the map function of Mapper is called. The mapper can output arbitrary number of intermediate key,value pairs 4) the output from the mapper is stored in a buffer in memory. After a certain threshold is reached the pairs are sorted by key and if there is a combiner it is run on the pairs that have the same key. Then, the output is flushed on the HDFS. SHUFFLE hadoop decides the Reducer that should process each key by running a partitioner. The default partitioner decides with the following way: reducer = intermidKey.hashCode() % numberOfReducer Finally, the intermediate key,value pairs are sent to the reducers REDUCE 1) Reducer sorts all key,value pairs by key and then groups the values with the same key. As a result reducers receive their keys sorted. 2) for each Key,List the reduce function of the reducer is called. Reducer can also emit arbitrary number of key,value pairs Additionally, hadoop lets you customize almost every aspect of the code run from how the input is split and read as key value pairs to how it is partitioned and sorted. A simple example is group by and computing an average over the grouped values. Let the dataset be webpages (url,domain,sentiment) and we want to compute the average sentiment for each domain in the dataset then the mapper for each webpages wp. will run map(wp.url,wp): emit(wp.domain,wp.sentiment) and in reducer: reduce(domain,Iterable values): counter = 0 sum = 0 while(values.hasNext()) counter++; sum += values.next() emit(domain,sum/counter) I know that this approach is not optimized. But, I wanted give a simple example. Dan, only the the values for one intermediate key must be in memory? or all the intermediate key,value pairs that are assigned to one reducer must be in memory? Cheers, Evangelos On 02/18/2014 11:59 AM, Dan Berindei wrote: > Radim, this is how our M/R algorithm works (Hadoop may do it differently): > > * The mapping phase generates a Map> on each > node (Int meaning intermediate). > * In the combine (local reduce) phase, a combine operation takes as input > an IntKey and a Collection with only the values that were > produced on that node. > * In the (global) reduce phase, all the intermediate values for each key > are merged, and a reduce operation takes an intermediate key and a sequence > of *all* the intermediate values generated for that key. These reduce > operations are completely independent, so each intermediate key can be > mapped to a different node (distributed reduce), while still having access > to all the intermediate values at once. > * In the end, the collator takes the Map from the reduce > phase and produces a single value. > > If a combiner can be used, then I believe it can also be run in parallel > with a LinkedBlockingQueue between the mapper and the combiner. But > sometimes the reduce algorithm can only be run on the entire collection of > values (e.g if you want to find the median, or a percentile). > > The limitation we have now is that in the reduce phase, the entire list of > values for one intermediate key must be in memory at once. I think Hadoop > only loads a block of intermediate values in memory at once, and can even > sort the intermediate values (with a user-supplied comparison function) so > that the reduce function can work on a sorted list without loading the > values in memory itself. > > Cheers > Dan > > > > On Tue, Feb 18, 2014 at 10:59 AM, Radim Vansa wrote: > >> Hi Etienne, >> >> how does the requirement for all data provided to Reducer as a whole >> work for distributed caches? There you'd get only a subset of the whole >> mapped set on each node (afaik each node maps the nodes locally and >> performs a reduction before executing the "global" reduction). Or are >> these M/R jobs applicable only to local caches? >> I have to admit I have only a limited knowledge of M/R, could you give >> me an example where the algorithm works in distributed environment and >> still cannot be parallelized? >> >> Thanks >> >> Radim >> >> On 02/17/2014 09:18 AM, Etienne Riviere wrote: >>> Hi Radim, >>> >>> I might misunderstand your suggestion but many M/R jobs actually require >> to run the two phases one after the other, and henceforth to store the >> intermediate results somewhere. While some may slightly reduce intermediate >> memory usage by using a combiner function (e.g., the word-count example), I >> don't see how we can avoid intermediate storage altogether. >>> >>> Thanks, >>> Etienne (leads project -- as Evangelos who initiated the thread) >>> >>> On 17 Feb 2014, at 08:48, Radim Vansa wrote: >>> >>>> I think that the intermediate cache is not required at all. The M/R >>>> algorithm itself can (and should!) run with memory occupied by the >>>> result of reduction. The current implementation with Map first and >>>> Reduce after that will always have these problems, using a cache for >>>> temporary caching the result is only a workaround. >>>> >>>> The only situation when temporary cache could be useful is when the >>>> result grows linearly (or close to that or even more) with the amount of >>>> reduced entries. This would be the case for groupBy producing Map>>> List> from all entries in cache. Then the task does not scale and >>>> should be redesigned anyway, but flushing the results into cache backed >>>> by cache store could help. >>>> >>>> Radim >>>> >>>> On 02/14/2014 04:54 PM, Vladimir Blagojevic wrote: >>>>> Tristan, >>>>> >>>>> Actually they are not addressed in this pull request but the feature >>>>> where custom output cache is used instead of results being returned is >>>>> next in the implementation pipeline. >>>>> >>>>> Evangelos, indeed, depending on a reducer function all intermediate >>>>> KOut/VOut pairs might be moved to a single node. How would custom cache >>>>> help in this case? >>>>> >>>>> Regards, >>>>> Vladimir >>>>> >>>>> >>>>> On 2/14/2014, 10:16 AM, Tristan Tarrant wrote: >>>>>> Hi Evangelos, >>>>>> >>>>>> you might be interested in looking into a current pull request which >>>>>> addresses some (all?) of these issues >>>>>> >>>>>> https://github.com/infinispan/infinispan/pull/2300 >>>>>> >>>>>> Tristan >>>>>> >>>>>> On 14/02/2014 16:10, Evangelos Vazaios wrote: >>>>>>> Hello everyone, >>>>>>> >>>>>>> I started using the MapReduce implementation of Infinispan and I came >>>>>>> across some possible limitations. Thus, I want to make some >> suggestions >>>>>>> about the MapReduce (MR) implementation of Infinispan. >>>>>>> Depending on the algorithm, there might be some memory problems, >>>>>>> especially for intermediate results. >>>>>>> An example of such a case is group by. Suppose that we have a >> cluster >>>>>>> of 2 nodes with 2 GB available. Let a distributed cache, where >> simple >>>>>>> car objects (id,brand,colour) are stored and the total size of data >> is >>>>>>> 3.5GB. If all objects have the same colour , then all 3.5 GB would >> go to >>>>>>> only one reducer, as a result an OutOfMemoryException will be thrown. >>>>>>> >>>>>>> To overcome these limitations, I propose to add as parameter the >> name of >>>>>>> the intermediate cache to be used. This will enable the creation of a >>>>>>> custom configured cache that deals with the memory limitations. >>>>>>> >>>>>>> Another feature that I would like to have is to set the name of the >>>>>>> output cache. The reasoning behind this is similar to the one >> mentioned >>>>>>> above. >>>>>>> >>>>>>> I wait for your thoughts on these two suggestions. >>>>>>> >>>>>>> Regards, >>>>>>> Evangelos >>>>>>> _______________________________________________ >>>>>>> infinispan-dev mailing list >>>>>>> infinispan-dev at lists.jboss.org >>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> infinispan-dev mailing list >>>>>> infinispan-dev at lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev at lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>>> -- >>>> Radim Vansa >>>> JBoss DataGrid QA >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> -- >> Radim Vansa >> JBoss DataGrid QA >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > From dan.berindei at gmail.com Tue Feb 18 06:40:49 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Tue, 18 Feb 2014 13:40:49 +0200 Subject: [infinispan-dev] MapReduce limitations and suggestions. In-Reply-To: <5303342A.1050800@gmail.com> References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com> <52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com> <530320F9.300@redhat.com> <5303342A.1050800@gmail.com> Message-ID: On Tue, Feb 18, 2014 at 12:21 PM, Evangelos Vazaios wrote: > Hi Radim, > > Since Hadoop is the most popular implementation of MapReduce I will give > a brief overview of how it works and then I'll provide with an example > where the reducers must run over the whole list of values with the same > key. > > Hadoop MR overview. > > MAP > > 1) Input file(s) are split into pieces of 64MB > 2) For each split hadoop creates one map task and then assign the task > to a cluster node > 3) The splits are read as key,value pairs and the map function of Mapper > is called. The mapper can output arbitrary number of intermediate > key,value pairs > 4) the output from the mapper is stored in a buffer in memory. After a > certain threshold is reached the pairs are sorted by key and if there is > a combiner it is run on the pairs that have the same key. Then, the > output is flushed on the HDFS. > Ok, so Hadoop runs the combiner more or less concurrently with the mappers. I'm curious if there are any M/R tasks that benefit from the sorting the keys here, we just put the intermediate values in a Map>. We could do about the same by passing this map (or rather each entry in the map) to the combiner when it reaches a certain threshold, but I'm not convinced about the need to sort it. > SHUFFLE > > hadoop decides the Reducer that should process each key by running a > partitioner. The default partitioner decides with the following way: > reducer = intermidKey.hashCode() % numberOfReducer > Finally, the intermediate key,value pairs are sent to the reducers > Is this algorithm set in stone, in that some M/R tasks rely on it? In our impl, the user could use grouping to direct a set of intermediate keys to the same node for reducing, but otherwise the reducing node is more or less random. > REDUCE > > 1) Reducer sorts all key,value pairs by key and then groups the values > with the same key. As a result reducers receive their keys sorted. > I guess this sorting is only relevant if the reduce phase happens on a single thread, on a single node? If the reduce happens in parallel, the ordering is going to be lost anyway. > 2) for each Key,List the reduce function of the reducer is > called. Reducer can also emit arbitrary number of key,value pairs > We limit the reducer (and the combiner) to emit a single value, which is paired with the input key. We may need to lift this restriction, if only to make porting/adapting tasks easier. > > Additionally, hadoop lets you customize almost every aspect of the code > run from how the input is split and read as key value pairs to how it is > partitioned and sorted. > Does that mean you can sort the values as well? I was thinking of each reduce() call as independent, and then only the order of values for one intermediate key would be relevant. I guess some tasks may require keeping state across all the reduce() calls and then the order of key matters, but then the reduce phase can't be parallelized, either across the cluster or on a single node. > A simple example is group by and computing an average over the grouped > values. Let the dataset be webpages (url,domain,sentiment) and we want > to compute the average sentiment for each domain in the dataset then the > mapper for each webpages wp. will run > map(wp.url,wp): > emit(wp.domain,wp.sentiment) > > and in reducer: > reduce(domain,Iterable values): > counter = 0 > sum = 0 > while(values.hasNext()) > counter++; > sum += values.next() > emit(domain,sum/counter) > > I know that this approach is not optimized. But, I wanted give a simple > example. > I think it can also be optimized to use a combiner, if we emit a (domain, counter, sum) tuple :) > Dan, only the the values for one intermediate key must be in memory? or > all the intermediate key,value pairs that are assigned to one reducer > must be in memory? > With the default configuration, all the key/value pairs assigned to one reducer must be in memory. But one can define the __tmpMapReduce cache in the configuration and configure eviction with a cache store (note that because of how our eviction works, the actual container size is at least concurrencyLevel rounded up to the next power of 2). The problem is that there is only one configuration for all the M/R tasks [1]. Note that because we only run the combiner after the mapping phase is complete, we do need to keep in memory all the results of the mapping phase from that node (those are not stored in a cache). I've created an issue in JIRA for this [2]. Cheers Dan [1] https://issues.jboss.org/browse/ISPN-4021 [2] https://issues.jboss.org/browse/ISPN-4022 > Cheers, > Evangelos > > On 02/18/2014 11:59 AM, Dan Berindei wrote: > > Radim, this is how our M/R algorithm works (Hadoop may do it > differently): > > > > * The mapping phase generates a Map> on each > > node (Int meaning intermediate). > > * In the combine (local reduce) phase, a combine operation takes as input > > an IntKey and a Collection with only the values that were > > produced on that node. > > * In the (global) reduce phase, all the intermediate values for each key > > are merged, and a reduce operation takes an intermediate key and a > sequence > > of *all* the intermediate values generated for that key. These reduce > > operations are completely independent, so each intermediate key can be > > mapped to a different node (distributed reduce), while still having > access > > to all the intermediate values at once. > > * In the end, the collator takes the Map from the > reduce > > phase and produces a single value. > > > > If a combiner can be used, then I believe it can also be run in parallel > > with a LinkedBlockingQueue between the mapper and the combiner. But > > sometimes the reduce algorithm can only be run on the entire collection > of > > values (e.g if you want to find the median, or a percentile). > > > > The limitation we have now is that in the reduce phase, the entire list > of > > values for one intermediate key must be in memory at once. I think Hadoop > > only loads a block of intermediate values in memory at once, and can even > > sort the intermediate values (with a user-supplied comparison function) > so > > that the reduce function can work on a sorted list without loading the > > values in memory itself. > > > > Cheers > > Dan > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140218/a2543df8/attachment-0001.html From anistor at redhat.com Tue Feb 18 07:02:03 2014 From: anistor at redhat.com (Adrian Nistor) Date: Tue, 18 Feb 2014 14:02:03 +0200 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> Message-ID: <53034BBB.1030809@redhat.com> Well, OGM and Infinispan are different species :) So, Infinispan being what it is today - a non-homogenous, schema-less KV store, without support for entity associations (except embedding) - which simplifies the whole thing a lot, should we or should we not provide transparent cross-cacheManager search capabilities, in this exact context? Vote? There were some points raised previously like /"if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well"/. In the SQL world you would also probably CRUD against a table or set of tables and then query against a view - a bit like what we're doing here. I don't see any problem with this in principle. There is however something currently missing in the query result set API - it currently does not provide you the keys of the matching entities. People work around this by storing the key in the entity. Now with the addition of the cross-cacheManager search we'll probably need to fix the result api and also provide a reference to the cache (or just the name?) where the entity is stored. The (enforced) one entity type per cache rule is not conceptually or technically required for implementing this, so I won't start raving against it :) Sane users should apply it however. On 02/18/2014 12:13 AM, Emmanuel Bernard wrote: > By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome. > >> On 17 f?vr. 2014, at 18:51, Emmanuel Bernard wrote: >> >>> On Mon 2014-02-17 18:43, Galder Zamarre?o wrote: >>> >>>> On 05 Feb 2014, at 17:30, Emmanuel Bernard wrote: >>>> >>>>> On Wed 2014-02-05 15:53, Mircea Markus wrote: >>>>> >>>>>> On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard wrote: >>>>>> >>>>>> Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed. >>>>> Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-) >>>> >>>> //some unified query giving me entries pointing by fk copy to bar and >>>> //buz objects. So I need to manually load these references. >>>> >>>> //happy emmanuel >>>> Cache unifiedCache = cacheManager.getMotherOfAllCaches(); >>>> Bar bar = unifiedCache.get(foo); >>>> Buz buz = unifiedCache.get(baz); >>>> >>>> //not so happy emmanuel >>>> Cache fooCache = cacheManager.getCache("foo"); >>>> Bar bar = fooCache.get(foo); >>>> Cache bazCache = cacheManager.getCache("baz"); >>>> Buz buz = bazCache.put(baz); >>> Would something like what Paul suggests in https://issues.jboss.org/browse/ISPN-3640 help you better? IOW, have a single cache, and then have a filtered view for Bar or Buz types? Not sure I understand the differences in your code changes in terms of what makes you happy vs not. >> Not really. >> What makes me unhappy is to have to keep in my app all the >> references to these specific cache store instances. The filtering >> approach only moves the problem. >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140218/45d4c949/attachment.html From vagvaz at gmail.com Tue Feb 18 07:17:34 2014 From: vagvaz at gmail.com (Evangelos Vazaios) Date: Tue, 18 Feb 2014 14:17:34 +0200 Subject: [infinispan-dev] MapReduce limitations and suggestions. In-Reply-To: References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com> <52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com> <530320F9.300@redhat.com> <5303342A.1050800@gmail.com> Message-ID: <53034F5E.6060706@gmail.com> On 02/18/2014 01:40 PM, Dan Berindei wrote: > On Tue, Feb 18, 2014 at 12:21 PM, Evangelos Vazaios wrote: > >> Hi Radim, >> >> Since Hadoop is the most popular implementation of MapReduce I will give >> a brief overview of how it works and then I'll provide with an example >> where the reducers must run over the whole list of values with the same >> key. >> >> Hadoop MR overview. >> >> MAP >> >> 1) Input file(s) are split into pieces of 64MB >> 2) For each split hadoop creates one map task and then assign the task >> to a cluster node >> 3) The splits are read as key,value pairs and the map function of Mapper >> is called. The mapper can output arbitrary number of intermediate >> key,value pairs >> 4) the output from the mapper is stored in a buffer in memory. After a >> certain threshold is reached the pairs are sorted by key and if there is >> a combiner it is run on the pairs that have the same key. Then, the >> output is flushed on the HDFS. >> > > Ok, so Hadoop runs the combiner more or less concurrently with the mappers. > > I'm curious if there are any M/R tasks that benefit from the sorting the > keys here, we just put the intermediate values in a Map>. We > could do about the same by passing this map (or rather each entry in the > map) to the combiner when it reaches a certain threshold, but I'm not > convinced about the need to sort it. > Well there are algorithms that make use of it. Implementing a graph algorithm can take use of it.Where the graph is split into k partitions and each partition is assigned to one Mapper and Reducer. Mappers compute the outgoing messages and output them to reducers. Then, reducers can read the partition file sequentially to update the vertices. This is just one use case that came to my mind. > >> SHUFFLE >> >> hadoop decides the Reducer that should process each key by running a >> partitioner. The default partitioner decides with the following way: >> reducer = intermidKey.hashCode() % numberOfReducer >> Finally, the intermediate key,value pairs are sent to the reducers >> > > Is this algorithm set in stone, in that some M/R tasks rely on it? In our > impl, the user could use grouping to direct a set of intermediate keys to > the same node for reducing, but otherwise the reducing node is more or less > random. > The default partitioner does exactly that check the actual code for hadoop 1.2.1 here http://goo.gl/he9yHO > >> REDUCE >> >> 1) Reducer sorts all key,value pairs by key and then groups the values >> with the same key. As a result reducers receive their keys sorted. >> > > I guess this sorting is only relevant if the reduce phase happens on a > single thread, on a single node? If the reduce happens in parallel, the > ordering is going to be lost anyway. Each reduce task is run on a single thread, but you can run more than one reduce tasks on a given node. The key ordering will not be lost. The values are not ordered in any way. Moreover, the call to the reducer is reduce(Key key, Iterable values) I cannot think of a way that the order is lost. > > >> 2) for each Key,List the reduce function of the reducer is >> called. Reducer can also emit arbitrary number of key,value pairs >> > > We limit the reducer (and the combiner) to emit a single value, which is > paired with the input key. We may need to lift this restriction, if only to > make porting/adapting tasks easier. > > >> >> Additionally, hadoop lets you customize almost every aspect of the code >> run from how the input is split and read as key value pairs to how it is >> partitioned and sorted. >> > > Does that mean you can sort the values as well? I was thinking of each > reduce() call as independent, and then only the order of values for one > intermediate key would be relevant. I guess some tasks may require keeping > state across all the reduce() calls and then the order of key matters, but > then the reduce phase can't be parallelized, either across the cluster or > on a single node. I was not very clear here. You can set the partitioner for a specific job. You may also set the key comparator, as a result change the way that intermediate keys are sorted. Additionally, one can change how keys are grouped into one reduce call by setting the GroupComparator class. A simple example would be to have sales(date,amount) and you want to create totals for each month of the year. so for the key: (year,month) and value: amount. by overriding the keyClass hashCode function you can send all the intermediate pairs with the same year to the same reducer and then you can set the groupComparator to group together all the values with the same year. Cheers, Evangelos > >> A simple example is group by and computing an average over the grouped >> values. Let the dataset be webpages (url,domain,sentiment) and we want >> to compute the average sentiment for each domain in the dataset then the >> mapper for each webpages wp. will run >> map(wp.url,wp): >> emit(wp.domain,wp.sentiment) >> >> and in reducer: >> reduce(domain,Iterable values): >> counter = 0 >> sum = 0 >> while(values.hasNext()) >> counter++; >> sum += values.next() >> emit(domain,sum/counter) >> >> I know that this approach is not optimized. But, I wanted give a simple >> example. >> > > I think it can also be optimized to use a combiner, if we emit a (domain, > counter, sum) tuple :) > > > >> Dan, only the the values for one intermediate key must be in memory? or >> all the intermediate key,value pairs that are assigned to one reducer >> must be in memory? >> > > With the default configuration, all the key/value pairs assigned to one > reducer must be in memory. But one can define the __tmpMapReduce cache in > the configuration and configure eviction with a cache store (note that > because of how our eviction works, the actual container size is at least > concurrencyLevel rounded up to the next power of 2). The problem is that > there is only one configuration for all the M/R tasks [1]. > > Note that because we only run the combiner after the mapping phase is > complete, we do need to keep in memory all the results of the mapping phase > from that node (those are not stored in a cache). I've created an issue in > JIRA for this [2]. > > Cheers > Dan > > [1] https://issues.jboss.org/browse/ISPN-4021 > [2] https://issues.jboss.org/browse/ISPN-4022 > > > >> Cheers, >> Evangelos >> >> On 02/18/2014 11:59 AM, Dan Berindei wrote: >>> Radim, this is how our M/R algorithm works (Hadoop may do it >> differently): >>> >>> * The mapping phase generates a Map> on each >>> node (Int meaning intermediate). >>> * In the combine (local reduce) phase, a combine operation takes as input >>> an IntKey and a Collection with only the values that were >>> produced on that node. >>> * In the (global) reduce phase, all the intermediate values for each key >>> are merged, and a reduce operation takes an intermediate key and a >> sequence >>> of *all* the intermediate values generated for that key. These reduce >>> operations are completely independent, so each intermediate key can be >>> mapped to a different node (distributed reduce), while still having >> access >>> to all the intermediate values at once. >>> * In the end, the collator takes the Map from the >> reduce >>> phase and produces a single value. >>> >>> If a combiner can be used, then I believe it can also be run in parallel >>> with a LinkedBlockingQueue between the mapper and the combiner. But >>> sometimes the reduce algorithm can only be run on the entire collection >> of >>> values (e.g if you want to find the median, or a percentile). >>> >>> The limitation we have now is that in the reduce phase, the entire list >> of >>> values for one intermediate key must be in memory at once. I think Hadoop >>> only loads a block of intermediate values in memory at once, and can even >>> sort the intermediate values (with a user-supplied comparison function) >> so >>> that the reduce function can work on a sorted list without loading the >>> values in memory itself. >>> >>> Cheers >>> Dan >> > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > From tsykora at redhat.com Tue Feb 18 07:35:21 2014 From: tsykora at redhat.com (Tomas Sykora) Date: Tue, 18 Feb 2014 07:35:21 -0500 (EST) Subject: [infinispan-dev] Introducing Infinispan OData server: Remote JSON documents querying In-Reply-To: <1418358555.4942381.1392725971149.JavaMail.zimbra@redhat.com> Message-ID: <1836092005.4948036.1392726921319.JavaMail.zimbra@redhat.com> Hello all! :) It's the right time to make it a little bit more public and share some results of work on Infinispan OData server, finally! This solution can serve as a proof of concept where we are able to remotely query JSON documents stored in Infinispan caches and using industrial standard and platform independent way of communication with the server (OData). There is still much to do/implement/improve in the server, but it is working as it is now. Check a blog post if you are interested: http://tsykora-tech.blogspot.cz/2014/02/introducing-infinispan-odata-server.html Any feedback is more than welcome. + I'd like to say a big THANK YOU to all who supported me! Mainly: JDG QE guys, Manik, Mircea, Sanne and Adrian. It wouldn't be done without your patience and willingness to help me :-) Tomas From emmanuel at hibernate.org Tue Feb 18 08:01:22 2014 From: emmanuel at hibernate.org (Emmanuel Bernard) Date: Tue, 18 Feb 2014 14:01:22 +0100 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <53034BBB.1030809@redhat.com> References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <53034BBB.1030809@redhat.com> Message-ID: <20140218130122.GA11962@hibernate.org> On Tue 2014-02-18 14:02, Adrian Nistor wrote: > Well, OGM and Infinispan are different species :) So, Infinispan being what > it is today - a non-homogenous, schema-less KV store, without support for > entity associations (except embedding) - which simplifies the whole thing a > lot, should we or should we not provide transparent cross-cacheManager > search capabilities, in this exact context? Vote? Yes it makes sense to do queries like where name or title = "foo" AND description or content contains "bar" over a heterogeneous set (say books and DVDs) But if you had in mind to do joins between different entries in the cache, then this would require some cross-cache map reduce and be inefficient so that's not a good use case. > > There were some points raised previously like /"if you search for more than > one cache transparently, then you probably need to CRUD for more than one > cache transparently as well"/. In the SQL world you would also probably CRUD > against a table or set of tables and then query against a view - a bit like > what we're doing here. I don't see any problem with this in principle. There > is however something currently missing in the query result set API - it > currently does not provide you the keys of the matching entities. People Really? I think we have the info in the index at least when the "ProvidedId" and the keys are the same. > work around this by storing the key in the entity. Now with the addition of > the cross-cacheManager search we'll probably need to fix the result api and > also provide a reference to the cache (or just the name?) where the entity > is stored. Right, I'm not sure Sanne agrees with me yet but you need to store the cache name in the index. Hibernate Search can reason at query time to see if it can avoid using this term to speed things up (massively). That will depend whether or no indexes are shared between caches. From sanne at infinispan.org Tue Feb 18 08:27:03 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Tue, 18 Feb 2014 13:27:03 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <20140218130122.GA11962@hibernate.org> References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <53034BBB.1030809@redhat.com> <20140218130122.GA11962@hibernate.org> Message-ID: On 18 February 2014 13:01, Emmanuel Bernard wrote: > On Tue 2014-02-18 14:02, Adrian Nistor wrote: >> Well, OGM and Infinispan are different species :) So, Infinispan being what >> it is today - a non-homogenous, schema-less KV store, without support for >> entity associations (except embedding) - which simplifies the whole thing a >> lot, should we or should we not provide transparent cross-cacheManager >> search capabilities, in this exact context? Vote? > > Yes it makes sense to do queries like > > where name or title = "foo" AND description or content contains "bar" > > over a heterogeneous set (say books and DVDs) Right > > But if you had in mind to do joins between different entries in the > cache, then this would require some cross-cache map reduce and be > inefficient so that's not a good use case. +1 > >> >> There were some points raised previously like /"if you search for more than >> one cache transparently, then you probably need to CRUD for more than one >> cache transparently as well"/. In the SQL world you would also probably CRUD >> against a table or set of tables and then query against a view - a bit like >> what we're doing here. I don't see any problem with this in principle. There >> is however something currently missing in the query result set API - it >> currently does not provide you the keys of the matching entities. People > > Really? I think we have the info in the index at least when the > "ProvidedId" and the keys are the same. We have this info in the engine, but the results to the user don't usually include the keys. For some this is a bit unnatural: a different perspective would be to return _only_ the keys and avoid doing the lookup. We provide a "LazyIterator" on the results which fetches only each matching entry on demand, which I think covers a good deal of use cases but there might be other usages for these keys. I would be great if we had Lambda support to allow users to say what they want us to do with the resultset, rather than fetching it. > >> work around this by storing the key in the entity. Now with the addition of >> the cross-cacheManager search we'll probably need to fix the result api and >> also provide a reference to the cache (or just the name?) where the entity >> is stored. > > Right, I'm not sure Sanne agrees with me yet but you need to store the > cache name in the index. Hibernate Search can reason at query time to > see if it can avoid using this term to speed things up (massively). That > will depend whether or no indexes are shared between caches. I do agree that this would be required, but I'm sad on the implications this has. To allow those not familiar with Lucene to understand the consequences: deleting a single entry from the index by using a single term - like the key could be - is many orders of magnitude more efficient than deleting from an index by "composite keys", like it would be if we need to delete by tuples { cachename, typename, id }. Considering that in Infinispan I can never be sure if a key already existed or not (which is a fundamental difference when comparing to Search/ORM), ANY WRITE on Infinispan triggers a delete operation first. Not least, such a delete requires an index flush, while we normally just flush at the end of the batch (transaction). In other words if we could avoid needing to discriminate an index entry by Cache Name, each and every operation would be many orders of magniture more efficient. To be noted that even today we aren't achieving this higher efficiency mode because we're using the tuple { typename, id}, but that's a legacy mapping related to how Search could handle multi-table structures and I was planning to finally enable this very interesting optimization in the next few weeks in the scope of Search5. I do agree that supporting Queries on multiple Caches (cross-cache but no joins) makes sense, but if only we could figure out a way to move away from "dynamically defined indexed types" we could apply many of these optimizations transparently, when we know there is no risk of key ambiguity. We've been through a lot of trouble just to allow the user to not register his indexed types upfront, but I don't think it's worth it. After all, the user still has to annotate or provide a schema: listing the types would be the lesser pain. - Sanne > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From rvansa at redhat.com Tue Feb 18 08:36:01 2014 From: rvansa at redhat.com (Radim Vansa) Date: Tue, 18 Feb 2014 14:36:01 +0100 Subject: [infinispan-dev] MapReduce limitations and suggestions. In-Reply-To: <53034F5E.6060706@gmail.com> References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com> <52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com> <530320F9.300@redhat.com> <5303342A.1050800@gmail.com> <53034F5E.6060706@gmail.com> Message-ID: <530361C1.3030404@redhat.com> Thanks a lot for this explanations, guys (Dan and Evangelos), I was confused with nomenclature in Hadoop/Infinispan vs. wiki/something I learned in the past. I was considering M/R to be node1 | node2 | ---------------|--------------| K1,V1 | K2,V2 | K3,V3 | K4,V4| | | | | | | | | v | v | v | v |MAP Foo | null | Bar | Goo | ------------------------------| \ | \ / | LOCAL Foo | BarGoo | REDUCE | | | | ------------------------------| \ / |GLOBAL FooBarGoo | REDUCE ------------------------------| But now I understand that the model introduced here is somewhat different. I have propagated parallel Map-Combine, but I understand that now you're trying to solve problem in the reduce phase. Thanks again Radim On 02/18/2014 01:17 PM, Evangelos Vazaios wrote: > On 02/18/2014 01:40 PM, Dan Berindei wrote: >> On Tue, Feb 18, 2014 at 12:21 PM, Evangelos Vazaios wrote: >> >>> Hi Radim, >>> >>> Since Hadoop is the most popular implementation of MapReduce I will give >>> a brief overview of how it works and then I'll provide with an example >>> where the reducers must run over the whole list of values with the same >>> key. >>> >>> Hadoop MR overview. >>> >>> MAP >>> >>> 1) Input file(s) are split into pieces of 64MB >>> 2) For each split hadoop creates one map task and then assign the task >>> to a cluster node >>> 3) The splits are read as key,value pairs and the map function of Mapper >>> is called. The mapper can output arbitrary number of intermediate >>> key,value pairs >>> 4) the output from the mapper is stored in a buffer in memory. After a >>> certain threshold is reached the pairs are sorted by key and if there is >>> a combiner it is run on the pairs that have the same key. Then, the >>> output is flushed on the HDFS. >>> >> Ok, so Hadoop runs the combiner more or less concurrently with the mappers. >> >> I'm curious if there are any M/R tasks that benefit from the sorting the >> keys here, we just put the intermediate values in a Map>. We >> could do about the same by passing this map (or rather each entry in the >> map) to the combiner when it reaches a certain threshold, but I'm not >> convinced about the need to sort it. >> > Well there are algorithms that make use of it. Implementing a graph > algorithm can take use of it.Where the graph is split into k partitions > and each partition is assigned to one Mapper and Reducer. Mappers > compute the outgoing messages and output them to reducers. Then, > reducers can read the partition file sequentially to update the > vertices. This is just one use case that came to my mind. >>> SHUFFLE >>> >>> hadoop decides the Reducer that should process each key by running a >>> partitioner. The default partitioner decides with the following way: >>> reducer = intermidKey.hashCode() % numberOfReducer >>> Finally, the intermediate key,value pairs are sent to the reducers >>> >> Is this algorithm set in stone, in that some M/R tasks rely on it? In our >> impl, the user could use grouping to direct a set of intermediate keys to >> the same node for reducing, but otherwise the reducing node is more or less >> random. >> > The default partitioner does exactly that check the actual code for > hadoop 1.2.1 here > http://goo.gl/he9yHO >>> REDUCE >>> >>> 1) Reducer sorts all key,value pairs by key and then groups the values >>> with the same key. As a result reducers receive their keys sorted. >>> >> I guess this sorting is only relevant if the reduce phase happens on a >> single thread, on a single node? If the reduce happens in parallel, the >> ordering is going to be lost anyway. > Each reduce task is run on a single thread, but you can run more than > one reduce tasks on a given node. The key ordering will not be lost. The > values are not ordered in any way. Moreover, the call to the reducer is > reduce(Key key, Iterable values) I cannot think of a way that the > order is lost. >> >>> 2) for each Key,List the reduce function of the reducer is >>> called. Reducer can also emit arbitrary number of key,value pairs >>> >> We limit the reducer (and the combiner) to emit a single value, which is >> paired with the input key. We may need to lift this restriction, if only to >> make porting/adapting tasks easier. >> >> >>> Additionally, hadoop lets you customize almost every aspect of the code >>> run from how the input is split and read as key value pairs to how it is >>> partitioned and sorted. >>> >> Does that mean you can sort the values as well? I was thinking of each >> reduce() call as independent, and then only the order of values for one >> intermediate key would be relevant. I guess some tasks may require keeping >> state across all the reduce() calls and then the order of key matters, but >> then the reduce phase can't be parallelized, either across the cluster or >> on a single node. > I was not very clear here. You can set the partitioner for a specific > job. You may also set the key comparator, as a result change the way > that intermediate keys are sorted. Additionally, one can change how keys > are grouped into one reduce call by setting the GroupComparator class. A > simple example would be to have sales(date,amount) and you want to > create totals for each month of the year. > so for the key: (year,month) and value: amount. > by overriding the keyClass hashCode function you can send all the > intermediate pairs with the same year to the same reducer > > and then you can set the groupComparator to group together all the > values with the same year. > > Cheers, > Evangelos > > >>> A simple example is group by and computing an average over the grouped >>> values. Let the dataset be webpages (url,domain,sentiment) and we want >>> to compute the average sentiment for each domain in the dataset then the >>> mapper for each webpages wp. will run >>> map(wp.url,wp): >>> emit(wp.domain,wp.sentiment) >>> >>> and in reducer: >>> reduce(domain,Iterable values): >>> counter = 0 >>> sum = 0 >>> while(values.hasNext()) >>> counter++; >>> sum += values.next() >>> emit(domain,sum/counter) >>> >>> I know that this approach is not optimized. But, I wanted give a simple >>> example. >>> >> I think it can also be optimized to use a combiner, if we emit a (domain, >> counter, sum) tuple :) >> >> >>> Dan, only the the values for one intermediate key must be in memory? or >>> all the intermediate key,value pairs that are assigned to one reducer >>> must be in memory? >>> >> With the default configuration, all the key/value pairs assigned to one >> reducer must be in memory. But one can define the __tmpMapReduce cache in >> the configuration and configure eviction with a cache store (note that >> because of how our eviction works, the actual container size is at least >> concurrencyLevel rounded up to the next power of 2). The problem is that >> there is only one configuration for all the M/R tasks [1]. >> >> Note that because we only run the combiner after the mapping phase is >> complete, we do need to keep in memory all the results of the mapping phase >> from that node (those are not stored in a cache). I've created an issue in >> JIRA for this [2]. >> >> Cheers >> Dan >> >> [1] https://issues.jboss.org/browse/ISPN-4021 >> [2] https://issues.jboss.org/browse/ISPN-4022 >> >> >> >>> Cheers, >>> Evangelos >>> >>> On 02/18/2014 11:59 AM, Dan Berindei wrote: >>>> Radim, this is how our M/R algorithm works (Hadoop may do it >>> differently): >>>> * The mapping phase generates a Map> on each >>>> node (Int meaning intermediate). >>>> * In the combine (local reduce) phase, a combine operation takes as input >>>> an IntKey and a Collection with only the values that were >>>> produced on that node. >>>> * In the (global) reduce phase, all the intermediate values for each key >>>> are merged, and a reduce operation takes an intermediate key and a >>> sequence >>>> of *all* the intermediate values generated for that key. These reduce >>>> operations are completely independent, so each intermediate key can be >>>> mapped to a different node (distributed reduce), while still having >>> access >>>> to all the intermediate values at once. >>>> * In the end, the collator takes the Map from the >>> reduce >>>> phase and produces a single value. >>>> >>>> If a combiner can be used, then I believe it can also be run in parallel >>>> with a LinkedBlockingQueue between the mapper and the combiner. But >>>> sometimes the reduce algorithm can only be run on the entire collection >>> of >>>> values (e.g if you want to find the median, or a percentile). >>>> >>>> The limitation we have now is that in the reduce phase, the entire list >>> of >>>> values for one intermediate key must be in memory at once. I think Hadoop >>>> only loads a block of intermediate values in memory at once, and can even >>>> sort the intermediate values (with a user-supplied comparison function) >>> so >>>> that the reduce function can work on a sorted list without loading the >>>> values in memory itself. >>>> >>>> Cheers >>>> Dan >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Radim Vansa JBoss DataGrid QA From dan.berindei at gmail.com Tue Feb 18 09:39:04 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Tue, 18 Feb 2014 16:39:04 +0200 Subject: [infinispan-dev] MapReduce limitations and suggestions. In-Reply-To: <53034F5E.6060706@gmail.com> References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com> <52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com> <530320F9.300@redhat.com> <5303342A.1050800@gmail.com> <53034F5E.6060706@gmail.com> Message-ID: On Tue, Feb 18, 2014 at 2:17 PM, Evangelos Vazaios wrote: > On 02/18/2014 01:40 PM, Dan Berindei wrote: > > On Tue, Feb 18, 2014 at 12:21 PM, Evangelos Vazaios >wrote: > > > >> Hi Radim, > >> > >> Since Hadoop is the most popular implementation of MapReduce I will give > >> a brief overview of how it works and then I'll provide with an example > >> where the reducers must run over the whole list of values with the same > >> key. > >> > >> Hadoop MR overview. > >> > >> MAP > >> > >> 1) Input file(s) are split into pieces of 64MB > >> 2) For each split hadoop creates one map task and then assign the task > >> to a cluster node > >> 3) The splits are read as key,value pairs and the map function of Mapper > >> is called. The mapper can output arbitrary number of intermediate > >> key,value pairs > >> 4) the output from the mapper is stored in a buffer in memory. After a > >> certain threshold is reached the pairs are sorted by key and if there is > >> a combiner it is run on the pairs that have the same key. Then, the > >> output is flushed on the HDFS. > >> > > > > Ok, so Hadoop runs the combiner more or less concurrently with the > mappers. > > > > I'm curious if there are any M/R tasks that benefit from the sorting the > > keys here, we just put the intermediate values in a Map>. We > > could do about the same by passing this map (or rather each entry in the > > map) to the combiner when it reaches a certain threshold, but I'm not > > convinced about the need to sort it. > > > Well there are algorithms that make use of it. Implementing a graph > algorithm can take use of it.Where the graph is split into k partitions > and each partition is assigned to one Mapper and Reducer. Mappers > compute the outgoing messages and output them to reducers. Then, > reducers can read the partition file sequentially to update the > vertices. This is just one use case that came to my mind. > I thought the partitioning only happens during the shuffle phase, and mappers/combiners don't know about partitions at all? I understand that reducers may need the intermediary keys to be sorted, I'm asking about the combiners, since even if the keys from one block are sorted, the complete list of keys they receive is not sorted (unless a new combiner is created for each input block). > > > > >> SHUFFLE > >> > >> hadoop decides the Reducer that should process each key by running a > >> partitioner. The default partitioner decides with the following way: > >> reducer = intermidKey.hashCode() % numberOfReducer > >> Finally, the intermediate key,value pairs are sent to the reducers > >> > > > > Is this algorithm set in stone, in that some M/R tasks rely on it? In our > > impl, the user could use grouping to direct a set of intermediate keys to > > the same node for reducing, but otherwise the reducing node is more or > less > > random. > > > The default partitioner does exactly that check the actual code for > hadoop 1.2.1 here > http://goo.gl/he9yHO > So API documentation doesn't specify it, but users still rely on this particular behaviour? BTW, is there always one reducer one each node, or can there be multiple reducers on each node? If it's the latter, it should be relatively easy to model this in Infinispan using grouping. If it's the former, I'm not so sure... > > > >> REDUCE > >> > >> 1) Reducer sorts all key,value pairs by key and then groups the values > >> with the same key. As a result reducers receive their keys sorted. > >> > > > > I guess this sorting is only relevant if the reduce phase happens on a > > single thread, on a single node? If the reduce happens in parallel, the > > ordering is going to be lost anyway. > Each reduce task is run on a single thread, but you can run more than > one reduce tasks on a given node. The key ordering will not be lost. The > values are not ordered in any way. Moreover, the call to the reducer is > reduce(Key key, Iterable values) I cannot think of a way that the > order is lost. > > > Right, the call to the reducer is with a single key, but I'm assuming the order of the calls matters (e.g. because the reduces keeps some internal state across reduce() calls), otherwise there's no point in sorting the keys. Calling the same reducer from multiple threads (like we do) would definitely mess up the order of the calls. ATM we only have one reducer per node, which can be called from multiple threads, but it shouldn't be too hard to allow multiple reducers per node and to run each of them in a single thread. > > > >> 2) for each Key,List the reduce function of the reducer is > >> called. Reducer can also emit arbitrary number of key,value pairs > >> > > > > We limit the reducer (and the combiner) to emit a single value, which is > > paired with the input key. We may need to lift this restriction, if only > to > > make porting/adapting tasks easier. > > > > > >> > >> Additionally, hadoop lets you customize almost every aspect of the code > >> run from how the input is split and read as key value pairs to how it is > >> partitioned and sorted. > >> > > > > Does that mean you can sort the values as well? I was thinking of each > > reduce() call as independent, and then only the order of values for one > > intermediate key would be relevant. I guess some tasks may require > keeping > > state across all the reduce() calls and then the order of key matters, > but > > then the reduce phase can't be parallelized, either across the cluster or > > on a single node. > > I was not very clear here. You can set the partitioner for a specific > job. You may also set the key comparator, as a result change the way > that intermediate keys are sorted. Additionally, one can change how keys > are grouped into one reduce call by setting the GroupComparator class. A > simple example would be to have sales(date,amount) and you want to > create totals for each month of the year. > so for the key: (year,month) and value: amount. > by overriding the keyClass hashCode function you can send all the > intermediate pairs with the same year to the same reducer > > and then you can set the groupComparator to group together all the > values with the same year. > You mean set the groupComparator to group together all the values with the same month? I don't think so, because the key is already (year, month). But if you wanted to collect the totals for each year you could just use the year as the intermediary key. So I don't quite understand how your example is supposed to work. Besides, each reduce() call receives just one key, if you have keys (2013, 1) and (2013, 2) and the groupComparator decides they should map to the same group, which key does the reducer see? I think a regular equals() should be good enough for us here, since we already need equals() in order to put the intermediary keys in the intermediary cache. Cheers Dan > > Cheers, > Evangelos > > > > > >> A simple example is group by and computing an average over the grouped > >> values. Let the dataset be webpages (url,domain,sentiment) and we want > >> to compute the average sentiment for each domain in the dataset then the > >> mapper for each webpages wp. will run > >> map(wp.url,wp): > >> emit(wp.domain,wp.sentiment) > >> > >> and in reducer: > >> reduce(domain,Iterable values): > >> counter = 0 > >> sum = 0 > >> while(values.hasNext()) > >> counter++; > >> sum += values.next() > >> emit(domain,sum/counter) > >> > >> I know that this approach is not optimized. But, I wanted give a simple > >> example. > >> > > > > I think it can also be optimized to use a combiner, if we emit a (domain, > > counter, sum) tuple :) > > > > > > > > >> Dan, only the the values for one intermediate key must be in memory? or > >> all the intermediate key,value pairs that are assigned to one reducer > >> must be in memory? > >> > > > > With the default configuration, all the key/value pairs assigned to one > > reducer must be in memory. But one can define the __tmpMapReduce cache in > > the configuration and configure eviction with a cache store (note that > > because of how our eviction works, the actual container size is at least > > concurrencyLevel rounded up to the next power of 2). The problem is that > > there is only one configuration for all the M/R tasks [1]. > > > > Note that because we only run the combiner after the mapping phase is > > complete, we do need to keep in memory all the results of the mapping > phase > > from that node (those are not stored in a cache). I've created an issue > in > > JIRA for this [2]. > > > > Cheers > > Dan > > > > [1] https://issues.jboss.org/browse/ISPN-4021 > > [2] https://issues.jboss.org/browse/ISPN-4022 > > > > > > > >> Cheers, > >> Evangelos > >> > >> On 02/18/2014 11:59 AM, Dan Berindei wrote: > >>> Radim, this is how our M/R algorithm works (Hadoop may do it > >> differently): > >>> > >>> * The mapping phase generates a Map> on > each > >>> node (Int meaning intermediate). > >>> * In the combine (local reduce) phase, a combine operation takes as > input > >>> an IntKey and a Collection with only the values that were > >>> produced on that node. > >>> * In the (global) reduce phase, all the intermediate values for each > key > >>> are merged, and a reduce operation takes an intermediate key and a > >> sequence > >>> of *all* the intermediate values generated for that key. These reduce > >>> operations are completely independent, so each intermediate key can be > >>> mapped to a different node (distributed reduce), while still having > >> access > >>> to all the intermediate values at once. > >>> * In the end, the collator takes the Map from the > >> reduce > >>> phase and produces a single value. > >>> > >>> If a combiner can be used, then I believe it can also be run in > parallel > >>> with a LinkedBlockingQueue between the mapper and the combiner. But > >>> sometimes the reduce algorithm can only be run on the entire collection > >> of > >>> values (e.g if you want to find the median, or a percentile). > >>> > >>> The limitation we have now is that in the reduce phase, the entire list > >> of > >>> values for one intermediate key must be in memory at once. I think > Hadoop > >>> only loads a block of intermediate values in memory at once, and can > even > >>> sort the intermediate values (with a user-supplied comparison function) > >> so > >>> that the reduce function can work on a sorted list without loading the > >>> values in memory itself. > >>> > >>> Cheers > >>> Dan > >> > > > > > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140218/a1c60195/attachment-0001.html From emmanuel at hibernate.org Tue Feb 18 09:47:42 2014 From: emmanuel at hibernate.org (Emmanuel Bernard) Date: Tue, 18 Feb 2014 15:47:42 +0100 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <53034BBB.1030809@redhat.com> <20140218130122.GA11962@hibernate.org> Message-ID: <20140218144742.GD11962@hibernate.org> On Tue 2014-02-18 13:27, Sanne Grinovero wrote: > On 18 February 2014 13:01, Emmanuel Bernard wrote: > > On Tue 2014-02-18 14:02, Adrian Nistor wrote: > >> There were some points raised previously like /"if you search for more than > >> one cache transparently, then you probably need to CRUD for more than one > >> cache transparently as well"/. In the SQL world you would also probably CRUD > >> against a table or set of tables and then query against a view - a bit like > >> what we're doing here. I don't see any problem with this in principle. There > >> is however something currently missing in the query result set API - it > >> currently does not provide you the keys of the matching entities. People > > > > Really? I think we have the info in the index at least when the > > "ProvidedId" and the keys are the same. > > We have this info in the engine, but the results to the user don't > usually include the keys. > For some this is a bit unnatural: a different perspective would be to > return _only_ the keys and avoid doing the lookup. > > We provide a "LazyIterator" on the results which fetches only each > matching entry on demand, which I think covers a good deal of use > cases but there might be other usages for these keys. > > I would be great if we had Lambda support to allow users to say what > they want us to do with the resultset, rather than fetching it. I was thinking of offering a way to project the key / id select key(user) from User user where user.email = "emmanuel at hibernate.org" select key(user), user from User user where user.email = "emmanuel at hibernate.org" If you guys really want, you can add a cache(user) function as well to project the Cache instance. Looks wrong at first sight though. From vagvaz at gmail.com Tue Feb 18 10:33:20 2014 From: vagvaz at gmail.com (Evangelos Vazaios) Date: Tue, 18 Feb 2014 17:33:20 +0200 Subject: [infinispan-dev] MapReduce limitations and suggestions. In-Reply-To: References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com> <52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com> <530320F9.300@redhat.com> <5303342A.1050800@gmail.com> <53034F5E.6060706@gmail.com> Message-ID: <53037D40.6060101@gmail.com> On 02/18/2014 04:39 PM, Dan Berindei wrote: > On Tue, Feb 18, 2014 at 2:17 PM, Evangelos Vazaios wrote: > >> On 02/18/2014 01:40 PM, Dan Berindei wrote: >>> On Tue, Feb 18, 2014 at 12:21 PM, Evangelos Vazaios >> wrote: >>> >>>> Hi Radim, >>>> >>>> Since Hadoop is the most popular implementation of MapReduce I will give >>>> a brief overview of how it works and then I'll provide with an example >>>> where the reducers must run over the whole list of values with the same >>>> key. >>>> >>>> Hadoop MR overview. >>>> >>>> MAP >>>> >>>> 1) Input file(s) are split into pieces of 64MB >>>> 2) For each split hadoop creates one map task and then assign the task >>>> to a cluster node >>>> 3) The splits are read as key,value pairs and the map function of Mapper >>>> is called. The mapper can output arbitrary number of intermediate >>>> key,value pairs >>>> 4) the output from the mapper is stored in a buffer in memory. After a >>>> certain threshold is reached the pairs are sorted by key and if there is >>>> a combiner it is run on the pairs that have the same key. Then, the >>>> output is flushed on the HDFS. >>>> >>> >>> Ok, so Hadoop runs the combiner more or less concurrently with the >> mappers. >>> >>> I'm curious if there are any M/R tasks that benefit from the sorting the >>> keys here, we just put the intermediate values in a Map>. We >>> could do about the same by passing this map (or rather each entry in the >>> map) to the combiner when it reaches a certain threshold, but I'm not >>> convinced about the need to sort it. >>> >> Well there are algorithms that make use of it. Implementing a graph >> algorithm can take use of it.Where the graph is split into k partitions >> and each partition is assigned to one Mapper and Reducer. Mappers >> compute the outgoing messages and output them to reducers. Then, >> reducers can read the partition file sequentially to update the >> vertices. This is just one use case that came to my mind. >> > > I thought the partitioning only happens during the shuffle phase, and > mappers/combiners don't know about partitions at all? > I understand that reducers may need the intermediary keys to be sorted, I'm > asking about the combiners, since even if the keys from one block are > sorted, the complete list of keys they receive is not sorted (unless a new > combiner is created for each input block). You are absolutely right partitioning happens during the shuffle phase and mappers/combiners do not know about partitions. Did I say something different? > >> >>> >>>> SHUFFLE >>>> >>>> hadoop decides the Reducer that should process each key by running a >>>> partitioner. The default partitioner decides with the following way: >>>> reducer = intermidKey.hashCode() % numberOfReducer >>>> Finally, the intermediate key,value pairs are sent to the reducers >>>> >>> >>> Is this algorithm set in stone, in that some M/R tasks rely on it? In our >>> impl, the user could use grouping to direct a set of intermediate keys to >>> the same node for reducing, but otherwise the reducing node is more or >> less >>> random. >>> >> The default partitioner does exactly that check the actual code for >> hadoop 1.2.1 here >> http://goo.gl/he9yHO >> > > So API documentation doesn't specify it, but users still rely on this > particular behaviour? > > BTW, is there always one reducer one each node, or can there be multiple > reducers on each node? If it's the latter, it should be relatively easy to > model this in Infinispan using grouping. If it's the former, I'm not so > sure... > Actually, the configuration of the MapReduce job (MapReduce task in infinispan) defines the number of reducers and is programmatically configurable. The short answer to your answer is the latter multiple Reduce tasks are assigned to nodes almost equally. > >>> >>>> REDUCE >>>> >>>> 1) Reducer sorts all key,value pairs by key and then groups the values >>>> with the same key. As a result reducers receive their keys sorted. >>>> >>> >>> I guess this sorting is only relevant if the reduce phase happens on a >>> single thread, on a single node? If the reduce happens in parallel, the >>> ordering is going to be lost anyway. >> Each reduce task is run on a single thread, but you can run more than >> one reduce tasks on a given node. The key ordering will not be lost. The >> values are not ordered in any way. Moreover, the call to the reducer is >> reduce(Key key, Iterable values) I cannot think of a way that the >> order is lost. >>> >> > > Right, the call to the reducer is with a single key, but I'm assuming the > order of the calls matters (e.g. because the reduces keeps some internal > state across reduce() calls), otherwise there's no point in sorting the > keys. Calling the same reducer from multiple threads (like we do) would > definitely mess up the order of the calls. > > ATM we only have one reducer per node, which can be called from multiple > threads, but it shouldn't be too hard to allow multiple reducers per node > and to run each of them in a single thread. > I belive the sorting is done in order to group the values with same key since there are large data stored on files the easiest way to group is to sort and then group values with the same keys. > >>> >>>> 2) for each Key,List the reduce function of the reducer is >>>> called. Reducer can also emit arbitrary number of key,value pairs >>>> >>> >>> We limit the reducer (and the combiner) to emit a single value, which is >>> paired with the input key. We may need to lift this restriction, if only >> to >>> make porting/adapting tasks easier. >>> >>> >>>> >>>> Additionally, hadoop lets you customize almost every aspect of the code >>>> run from how the input is split and read as key value pairs to how it is >>>> partitioned and sorted. >>>> >>> >>> Does that mean you can sort the values as well? I was thinking of each >>> reduce() call as independent, and then only the order of values for one >>> intermediate key would be relevant. I guess some tasks may require >> keeping >>> state across all the reduce() calls and then the order of key matters, >> but >>> then the reduce phase can't be parallelized, either across the cluster or >>> on a single node. >> >> I was not very clear here. You can set the partitioner for a specific >> job. You may also set the key comparator, as a result change the way >> that intermediate keys are sorted. Additionally, one can change how keys >> are grouped into one reduce call by setting the GroupComparator class. A >> simple example would be to have sales(date,amount) and you want to >> create totals for each month of the year. >> so for the key: (year,month) and value: amount. >> by overriding the keyClass hashCode function you can send all the >> intermediate pairs with the same year to the same reducer >> >> and then you can set the groupComparator to group together all the >> values with the same year. >> > > You mean set the groupComparator to group together all the values with the > same month? I don't think so, because the key is already (year, month). But > if you wanted to collect the totals for each year you could just use the > year as the intermediary key. So I don't quite understand how your example > is supposed to work. Well you can do that as well, but I meant to group all the months of the same year in one reduce call. The idea is that you want to receive in one reduce the values for one year and the values for that year to be sorted by month. > > Besides, each reduce() call receives just one key, if you have keys (2013, > 1) and (2013, 2) and the groupComparator decides they should map to the > same group, which key does the reducer see? I think a regular equals() > should be good enough for us here, since we already need equals() in order > to put the intermediary keys in the intermediary cache. > I am not be very good with examples you can check this https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-8/sorting. It is more or less the same problem with different setting. > Cheers > Dan > > Cheers Evangelos > >> >> Cheers, >> Evangelos >> >> >>> >>>> A simple example is group by and computing an average over the grouped >>>> values. Let the dataset be webpages (url,domain,sentiment) and we want >>>> to compute the average sentiment for each domain in the dataset then the >>>> mapper for each webpages wp. will run >>>> map(wp.url,wp): >>>> emit(wp.domain,wp.sentiment) >>>> >>>> and in reducer: >>>> reduce(domain,Iterable values): >>>> counter = 0 >>>> sum = 0 >>>> while(values.hasNext()) >>>> counter++; >>>> sum += values.next() >>>> emit(domain,sum/counter) >>>> >>>> I know that this approach is not optimized. But, I wanted give a simple >>>> example. >>>> >>> >>> I think it can also be optimized to use a combiner, if we emit a (domain, >>> counter, sum) tuple :) >> >>> >>> >>> >>>> Dan, only the the values for one intermediate key must be in memory? or >>>> all the intermediate key,value pairs that are assigned to one reducer >>>> must be in memory? >>>> >>> >>> With the default configuration, all the key/value pairs assigned to one >>> reducer must be in memory. But one can define the __tmpMapReduce cache in >>> the configuration and configure eviction with a cache store (note that >>> because of how our eviction works, the actual container size is at least >>> concurrencyLevel rounded up to the next power of 2). The problem is that >>> there is only one configuration for all the M/R tasks [1]. >>> >>> Note that because we only run the combiner after the mapping phase is >>> complete, we do need to keep in memory all the results of the mapping >> phase >>> from that node (those are not stored in a cache). I've created an issue >> in >>> JIRA for this [2]. >>> >>> Cheers >>> Dan >>> >>> [1] https://issues.jboss.org/browse/ISPN-4021 >>> [2] https://issues.jboss.org/browse/ISPN-4022 >>> >>> >>> >>>> Cheers, >>>> Evangelos >>>> >>>> On 02/18/2014 11:59 AM, Dan Berindei wrote: >>>>> Radim, this is how our M/R algorithm works (Hadoop may do it >>>> differently): >>>>> >>>>> * The mapping phase generates a Map> on >> each >>>>> node (Int meaning intermediate). >>>>> * In the combine (local reduce) phase, a combine operation takes as >> input >>>>> an IntKey and a Collection with only the values that were >>>>> produced on that node. >>>>> * In the (global) reduce phase, all the intermediate values for each >> key >>>>> are merged, and a reduce operation takes an intermediate key and a >>>> sequence >>>>> of *all* the intermediate values generated for that key. These reduce >>>>> operations are completely independent, so each intermediate key can be >>>>> mapped to a different node (distributed reduce), while still having >>>> access >>>>> to all the intermediate values at once. >>>>> * In the end, the collator takes the Map from the >>>> reduce >>>>> phase and produces a single value. >>>>> >>>>> If a combiner can be used, then I believe it can also be run in >> parallel >>>>> with a LinkedBlockingQueue between the mapper and the combiner. But >>>>> sometimes the reduce algorithm can only be run on the entire collection >>>> of >>>>> values (e.g if you want to find the median, or a percentile). >>>>> >>>>> The limitation we have now is that in the reduce phase, the entire list >>>> of >>>>> values for one intermediate key must be in memory at once. I think >> Hadoop >>>>> only loads a block of intermediate values in memory at once, and can >> even >>>>> sort the intermediate values (with a user-supplied comparison function) >>>> so >>>>> that the reduce function can work on a sorted list without loading the >>>>> values in memory itself. >>>>> >>>>> Cheers >>>>> Dan >>>> >>> >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > From vblagoje at redhat.com Tue Feb 18 10:36:35 2014 From: vblagoje at redhat.com (Vladimir Blagojevic) Date: Tue, 18 Feb 2014 10:36:35 -0500 Subject: [infinispan-dev] MapReduce limitations and suggestions. In-Reply-To: References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com> <52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com> <530320F9.300@redhat.com> Message-ID: <53037E03.9020502@redhat.com> On 2/18/2014, 4:59 AM, Dan Berindei wrote: > > The limitation we have now is that in the reduce phase, the entire > list of values for one intermediate key must be in memory at once. I > think Hadoop only loads a block of intermediate values in memory at > once, and can even sort the intermediate values (with a user-supplied > comparison function) so that the reduce function can work on a sorted > list without loading the values in memory itself. > > Dan and others, This is where Sanne's idea comes into play. Why collect entire list of intermediate values for each intermediate key and then invoke reduce on those values when we can invoke reduce each time new intermediate value gets inserted? https://issues.jboss.org/browse/ISPN-3999 Cheers, Vladimir From vagvaz at gmail.com Tue Feb 18 10:46:05 2014 From: vagvaz at gmail.com (Evangelos Vazaios) Date: Tue, 18 Feb 2014 17:46:05 +0200 Subject: [infinispan-dev] MapReduce limitations and suggestions. In-Reply-To: <53037E03.9020502@redhat.com> References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com> <52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com> <530320F9.300@redhat.com> <53037E03.9020502@redhat.com> Message-ID: <5303803D.9080204@gmail.com> On 02/18/2014 05:36 PM, Vladimir Blagojevic wrote: > On 2/18/2014, 4:59 AM, Dan Berindei wrote: >> >> The limitation we have now is that in the reduce phase, the entire >> list of values for one intermediate key must be in memory at once. I >> think Hadoop only loads a block of intermediate values in memory at >> once, and can even sort the intermediate values (with a user-supplied >> comparison function) so that the reduce function can work on a sorted >> list without loading the values in memory itself. >> >> > Dan and others, > > This is where Sanne's idea comes into play. Why collect entire list of > intermediate values for each intermediate key and then invoke reduce on > those values when we can invoke reduce each time new intermediate value > gets inserted? > Because you cant. What you are saying is more like combining than reducing. If there is a combiner in the MapReduceTask you can execute the combiner on a subset (in your case 2) values with the same key and output one. But, this is not possible always. > https://issues.jboss.org/browse/ISPN-3999 > > Cheers, > Vladimir > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > From dan.berindei at gmail.com Tue Feb 18 13:41:52 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Tue, 18 Feb 2014 20:41:52 +0200 Subject: [infinispan-dev] MapReduce limitations and suggestions. In-Reply-To: <5303803D.9080204@gmail.com> References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com> <52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com> <530320F9.300@redhat.com> <53037E03.9020502@redhat.com> <5303803D.9080204@gmail.com> Message-ID: On Tue, Feb 18, 2014 at 5:46 PM, Evangelos Vazaios wrote: > On 02/18/2014 05:36 PM, Vladimir Blagojevic wrote: > > On 2/18/2014, 4:59 AM, Dan Berindei wrote: > >> > >> The limitation we have now is that in the reduce phase, the entire > >> list of values for one intermediate key must be in memory at once. I > >> think Hadoop only loads a block of intermediate values in memory at > >> once, and can even sort the intermediate values (with a user-supplied > >> comparison function) so that the reduce function can work on a sorted > >> list without loading the values in memory itself. > >> > >> > > Dan and others, > > > > This is where Sanne's idea comes into play. Why collect entire list of > > intermediate values for each intermediate key and then invoke reduce on > > those values when we can invoke reduce each time new intermediate value > > gets inserted? > > > Because you cant. What you are saying is more like combining than > reducing. If there is a combiner in the MapReduceTask you can execute > the combiner on a subset (in your case 2) values with the same key and > output one. But, this is not possible always. > In theory we could stream each intermediate value independently to the combiner and then to the node of the reducer, and the reducer could start up immediately on the reducer node instead of waiting for the mapping phase to finish on all the mapping nodes (blocking when it doesn't have any more values to process). But I imagine that would be kind of tricky to implement. > > https://issues.jboss.org/browse/ISPN-3999 > > > > Cheers, > > Vladimir > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140218/b6a52c75/attachment-0001.html From galder at redhat.com Wed Feb 19 01:57:14 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Wed, 19 Feb 2014 07:57:14 +0100 Subject: [infinispan-dev] Ditching ASYNC modes for REPL/DIST/INV/CacheStores? In-Reply-To: <52EB5197.4050801@redhat.com> References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com> <52EB5197.4050801@redhat.com> Message-ID: On 31 Jan 2014, at 08:32, Dennis Reed wrote: > It would be a loss of functionality. > > As a common example, the AS web session replication cache is configured > for ASYNC by default, for performance reasons. > But it can be changed to SYNC to guarantee that when the request > finishes that the session was replicated. > > That wouldn't be possible if you could no longer switch between > ASYNC/SYNC with just a configuration change. I disagree :). AS could abstract that configuration detail. IOW, if all Infinispan returned was Futures, AS or any other client application, has the choice in their hands: do they wait for the future to complete or not? If they do, they?re SYNC, if not ASYNC. AS can still expose this and no functionality is lost. What happens is that SYNC/ASYNC decision stops being a configuration option (bad, bad, bad) and becomes an actual programming decision Infinispan clients must address (good, good, good). Chers, > > -Dennis > > On 01/31/2014 01:08 AM, Galder Zamarre?o wrote: >> Hi all, >> >> The following came to my mind yesterday: I think we should ditch ASYNC modes for DIST/REPL/INV and our async cache store functionality. >> >> Instead, whoever wants to store something asyncronously should use asynchronous methods, i.e. call putAsync. So, this would mean that when you call put(), it's always sync. This would reduce the complexity and configuration of our code base, without affecting our functionality, and it would make things more logical IMO. >> >> WDYT? >> >> Cheers, >> -- >> Galder Zamarre?o >> galder at redhat.com >> twitter.com/galderz >> >> Project Lead, Escalante >> http://escalante.io >> >> Engineer, Infinispan >> http://infinispan.org >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From galder at redhat.com Wed Feb 19 02:05:02 2014 From: galder at redhat.com (=?iso-8859-1?Q?Galder_Zamarre=F1o?=) Date: Wed, 19 Feb 2014 08:05:02 +0100 Subject: [infinispan-dev] Ditching ASYNC modes for REPL/DIST/INV/CacheStores? In-Reply-To: References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com> <52EB5197.4050801@redhat.com> <52EB7F6C.505@redhat.com> Message-ID: <34AB5973-4745-4720-8D43-6AF8D32121B1@redhat.com> On 31 Jan 2014, at 11:59, Sanne Grinovero wrote: > Generally I like the systems designed with SYNC_DIST + async shared cachestore. > > It's probably the best setup we can offer: > - you need a shared cachestore for persistence consistency > - using SYNC distribution to other replicas provides a fairly decent resilience > - if your cachestore needs to be updated in sync, your write > performance will be limited by the cachestore performance: this > prevents you to use Infinispan to buffer, absorbing write spikes, and > reducing write latency Ok, this a limitation of my approach. For such scenarios, you could maybe leave the async store option around, with a note on when the future completes based on this option. > But I agree we should investigate on removing duplicate > "asynchronizations" where they are not needed, there might be some > opportunities to remove thread switching and blocking. > > > On 31 January 2014 10:48, Tristan Tarrant wrote: >> Couldn't this be handled higher up in our implementatoin then ? >> >> If I enable an async mode, all puts / gets become putAsync/getAsync >> transparently to both the application and to the state transfer. >> >> Tristan >> >> On 01/31/2014 08:32 AM, Dennis Reed wrote: >>> It would be a loss of functionality. >>> >>> As a common example, the AS web session replication cache is configured >>> for ASYNC by default, for performance reasons. >>> But it can be changed to SYNC to guarantee that when the request >>> finishes that the session was replicated. >>> >>> That wouldn't be possible if you could no longer switch between >>> ASYNC/SYNC with just a configuration change. >>> >>> -Dennis >>> >>> On 01/31/2014 01:08 AM, Galder Zamarre?o wrote: >>>> Hi all, >>>> >>>> The following came to my mind yesterday: I think we should ditch ASYNC modes for DIST/REPL/INV and our async cache store functionality. >>>> >>>> Instead, whoever wants to store something asyncronously should use asynchronous methods, i.e. call putAsync. So, this would mean that when you call put(), it's always sync. This would reduce the complexity and configuration of our code base, without affecting our functionality, and it would make things more logical IMO. >>>> >>>> WDYT? >>>> >>>> Cheers, >>>> -- >>>> Galder Zamarre?o >>>> galder at redhat.com >>>> twitter.com/galderz >>>> >>>> Project Lead, Escalante >>>> http://escalante.io >>>> >>>> Engineer, Infinispan >>>> http://infinispan.org >>>> >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From galder at redhat.com Wed Feb 19 02:12:08 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Wed, 19 Feb 2014 08:12:08 +0100 Subject: [infinispan-dev] Ditching ASYNC modes for REPL/DIST/INV/CacheStores? In-Reply-To: References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com> <52EB5197.4050801@redhat.com> <52EB7F6C.505@redhat.com> <52EB9889.9070800@redhat.com> <52EFA355.2070203@redhat.com> <52EFC3AF.5060201@redhat.com> Message-ID: On 03 Feb 2014, at 19:01, Dan Berindei wrote: > > > > On Mon, Feb 3, 2014 at 6:28 PM, Radim Vansa wrote: > >>>> For sync we would want to invoke directly to avoid context switching. > >>> I think you haven't properly understood what I was talking about: the > >>> putAsync should not switch context at all in the ideal design. It should > >>> traverse through the interceptors all the way down (logically, in > >>> current behaviour), invoke JGroups async API and jump out. Then, as soon > >>> as the response is received, the thread which delivered it should > >>> traverse the interceptor stack up (again, logically), and fire the future. > > A Future doesn't make much sense with an async transport. The problem > > is with an async transport you never get back a response so you never > > know when the actual command is completed and thus a Future is > > worthless. The caller wouldn't know if they could rely on the use of > > the Future or not. > > You're right, there's one important difference between putAsync and put > with async transport: in the first case you can find out when the > request is completed while you cannot with the latter. Not requiring the > ack can be an important optimization. I think that both versions are > very valid: first mostly for bulk operations = reduction of latency, > second for modifications that are acceptable to fail without handling that. > I had the first case in my mind when talking about async operations, and > there the futures are necessary. > > A couple more differences: > 1. You can't do commitAsync(), but you can configure the commit to be replicated asynchronously (1PC). Although we did talk about removing that option... > 2. If you do putAsync(k, v1); putAsync(k, v2), there is no ordering between the two and you might end up with k=v1 in the cache. If there?s any relationship between both puts for the caller thread, the caller must make sure that the second put is only called after the first has completed. If there?s separate threads calling it and it relies on this, it should call replace the second time, i.e. replaceAsync(k, v1, v2) to get the guarantees it wants. What is really important is that the order in which they are executed in one node/replica is the same order in which they?re executed in all other nodes. This was something that was not maintained when async marshalling was enabled. > > > > > > Also it depends what you are trying to do with async. Currently async > > transport is only for sending messages to another node, we never think > > of when we are the owning node. In this case the calling thread would > > have to go down the interceptor stack and acquire any locks if it is > > the owner, thus causing this "async" to block if you have any > > contention on the given key. The use of another thread would allow > > the calling thread to be able to return immediately no matter what > > else is occurring. Also I don't see what is so wrong about having a > > context switch to run something asynchronously, we shouldn't have a > > context switch to block the user thread imo, which is very possible > > with locking. > > This is an important notice! Locking would complicate the design a lot, > because the thread in "async" mode should do only tryLocks - if this > fails, further processing should be dispatched to another thread. Not > sure if this could be implemented at all, because the thread may be > blocked inside JGroups as well (async API is about receiving the > response asynchronously, not about sending the message asynchronously). > > I don't say that the context switch is that bad. My concern is that you > have a very limited amount of requests that can be processed in > parallel. I consider a "request" something pretty lightweight in concept > - but one thread per request makes this rather heavyweight stuff. > > We did talk in Farnborough/Palma about removing the current LockManager with a queue-based structure like the one used for ordering total-order transactions. And about removing the implicit stack in the current interceptor stack with an explicit stack, to allow resuming a command mid-execution. But the feeling I got was that neither is going to make it into 7.0. > > > > > >> +1 much cleaner, I love it. Actually wasn't aware the current code > >> didn't do this :-( > > This is what the current async transport does, but it does nothing with Futures. > > Nevermind the futures, this is not the important part. It's not about > async transport neither, it's about async executors. > (okay, the thread was about dropping async transport, I have hijacked it) > > Radim > > -- > Radim Vansa > JBoss DataGrid QA > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From lthon at redhat.com Wed Feb 19 04:20:09 2014 From: lthon at redhat.com (Ladislav Thon) Date: Wed, 19 Feb 2014 10:20:09 +0100 Subject: [infinispan-dev] MapReduce limitations and suggestions. In-Reply-To: <53037E03.9020502@redhat.com> References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com> <52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com> <530320F9.300@redhat.com> <53037E03.9020502@redhat.com> Message-ID: <53047749.5090602@redhat.com> On 18.2.2014 16:36, Vladimir Blagojevic wrote: > On 2/18/2014, 4:59 AM, Dan Berindei wrote: >> >> The limitation we have now is that in the reduce phase, the entire >> list of values for one intermediate key must be in memory at once. I >> think Hadoop only loads a block of intermediate values in memory at >> once, and can even sort the intermediate values (with a user-supplied >> comparison function) so that the reduce function can work on a sorted >> list without loading the values in memory itself. >> >> > Dan and others, > > This is where Sanne's idea comes into play. Why collect entire list of > intermediate values for each intermediate key and then invoke reduce on > those values when we can invoke reduce each time new intermediate value > gets inserted? I don't know about MR in Infinispan, but MR in CouchDB is doing a very similar thing to what you describe. In order to actually get a final result, they have to do an entire tree of reductions, and the reduce function has to distinguish between a "first-level" reduce (on bare values) and rereduce (on intermediate results from previous reductions). They are _not_ always the same, and it's fairly confusing. LT > > https://issues.jboss.org/browse/ISPN-3999 > > Cheers, > Vladimir > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > From sanne at infinispan.org Wed Feb 19 06:03:35 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Wed, 19 Feb 2014 11:03:35 +0000 Subject: [infinispan-dev] Ditching ASYNC modes for REPL/DIST/INV/CacheStores? In-Reply-To: References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com> <52EB5197.4050801@redhat.com> <52EB7F6C.505@redhat.com> <52EB9889.9070800@redhat.com> <52EFA355.2070203@redhat.com> <52EFC3AF.5060201@redhat.com> Message-ID: On 19 February 2014 07:12, Galder Zamarre?o wrote: > > On 03 Feb 2014, at 19:01, Dan Berindei wrote: > >> >> >> >> On Mon, Feb 3, 2014 at 6:28 PM, Radim Vansa wrote: >> >>>> For sync we would want to invoke directly to avoid context switching. >> >>> I think you haven't properly understood what I was talking about: the >> >>> putAsync should not switch context at all in the ideal design. It should >> >>> traverse through the interceptors all the way down (logically, in >> >>> current behaviour), invoke JGroups async API and jump out. Then, as soon >> >>> as the response is received, the thread which delivered it should >> >>> traverse the interceptor stack up (again, logically), and fire the future. >> > A Future doesn't make much sense with an async transport. The problem >> > is with an async transport you never get back a response so you never >> > know when the actual command is completed and thus a Future is >> > worthless. The caller wouldn't know if they could rely on the use of >> > the Future or not. >> >> You're right, there's one important difference between putAsync and put >> with async transport: in the first case you can find out when the >> request is completed while you cannot with the latter. Not requiring the >> ack can be an important optimization. I think that both versions are >> very valid: first mostly for bulk operations = reduction of latency, >> second for modifications that are acceptable to fail without handling that. >> I had the first case in my mind when talking about async operations, and >> there the futures are necessary. >> >> A couple more differences: >> 1. You can't do commitAsync(), but you can configure the commit to be replicated asynchronously (1PC). Although we did talk about removing that option... >> 2. If you do putAsync(k, v1); putAsync(k, v2), there is no ordering between the two and you might end up with k=v1 in the cache. > > If there?s any relationship between both puts for the caller thread, the caller must make sure that the second put is only called after the first has completed. Actually in such a case I would strongly expect Infinispan to keep the two operations in order. This is not to be pushed on user's responsibility. > > If there?s separate threads calling it and it relies on this, it should call replace the second time, i.e. replaceAsync(k, v1, v2) to get the guarantees it wants. > > What is really important is that the order in which they are executed in one node/replica is the same order in which they?re executed in all other nodes. This was something that was not maintained when async marshalling was enabled. +1000 But also I'd stress that any sync operation should have a Future returned, someone in this long thread suggested to have an option to drop it for example to speedup bulk imports, but I really can't see a scenario in which I wouldn't want to know about a failure. Let's not do the same mistake that made MongoDB so "popular" ;-) Bulk imports can still be mad efficient without strictly needing to go these lenghts. Sanne > >> >> >> > >> > Also it depends what you are trying to do with async. Currently async >> > transport is only for sending messages to another node, we never think >> > of when we are the owning node. In this case the calling thread would >> > have to go down the interceptor stack and acquire any locks if it is >> > the owner, thus causing this "async" to block if you have any >> > contention on the given key. The use of another thread would allow >> > the calling thread to be able to return immediately no matter what >> > else is occurring. Also I don't see what is so wrong about having a >> > context switch to run something asynchronously, we shouldn't have a >> > context switch to block the user thread imo, which is very possible >> > with locking. >> >> This is an important notice! Locking would complicate the design a lot, >> because the thread in "async" mode should do only tryLocks - if this >> fails, further processing should be dispatched to another thread. Not >> sure if this could be implemented at all, because the thread may be >> blocked inside JGroups as well (async API is about receiving the >> response asynchronously, not about sending the message asynchronously). >> >> I don't say that the context switch is that bad. My concern is that you >> have a very limited amount of requests that can be processed in >> parallel. I consider a "request" something pretty lightweight in concept >> - but one thread per request makes this rather heavyweight stuff. >> >> We did talk in Farnborough/Palma about removing the current LockManager with a queue-based structure like the one used for ordering total-order transactions. And about removing the implicit stack in the current interceptor stack with an explicit stack, to allow resuming a command mid-execution. But the feeling I got was that neither is going to make it into 7.0. >> >> >> > >> >> +1 much cleaner, I love it. Actually wasn't aware the current code >> >> didn't do this :-( >> > This is what the current async transport does, but it does nothing with Futures. >> >> Nevermind the futures, this is not the important part. It's not about >> async transport neither, it's about async executors. >> (okay, the thread was about dropping async transport, I have hijacked it) >> >> Radim >> >> -- >> Radim Vansa >> JBoss DataGrid QA >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz > > Project Lead, Escalante > http://escalante.io > > Engineer, Infinispan > http://infinispan.org > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From dan.berindei at gmail.com Wed Feb 19 08:22:23 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Wed, 19 Feb 2014 15:22:23 +0200 Subject: [infinispan-dev] MapReduce limitations and suggestions. In-Reply-To: <53037D40.6060101@gmail.com> References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com> <52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com> <530320F9.300@redhat.com> <5303342A.1050800@gmail.com> <53034F5E.6060706@gmail.com> <53037D40.6060101@gmail.com> Message-ID: On Tue, Feb 18, 2014 at 5:33 PM, Evangelos Vazaios wrote: > > On 02/18/2014 04:39 PM, Dan Berindei wrote: > > On Tue, Feb 18, 2014 at 2:17 PM, Evangelos Vazaios > wrote: > > > >> On 02/18/2014 01:40 PM, Dan Berindei wrote: > >>> On Tue, Feb 18, 2014 at 12:21 PM, Evangelos Vazaios >>> wrote: > >>> > >>>> Hi Radim, > >>>> > >>>> Since Hadoop is the most popular implementation of MapReduce I will > give > >>>> a brief overview of how it works and then I'll provide with an example > >>>> where the reducers must run over the whole list of values with the > same > >>>> key. > >>>> > >>>> Hadoop MR overview. > >>>> > >>>> MAP > >>>> > >>>> 1) Input file(s) are split into pieces of 64MB > >>>> 2) For each split hadoop creates one map task and then assign the task > >>>> to a cluster node > >>>> 3) The splits are read as key,value pairs and the map function of > Mapper > >>>> is called. The mapper can output arbitrary number of intermediate > >>>> key,value pairs > I forgot to ask about this... we already have the entries stored as key,value pairs, so we expect the data to be already in the cache. That means there is no ordering in the inputs, and the mapper can't rely on sequential inputs to be related. Would you consider that to be a reasonable expectation? > >>>> 4) the output from the mapper is stored in a buffer in memory. After a > >>>> certain threshold is reached the pairs are sorted by key and if there > is > >>>> a combiner it is run on the pairs that have the same key. Then, the > >>>> output is flushed on the HDFS. > >>>> > >>> > >>> Ok, so Hadoop runs the combiner more or less concurrently with the > >> mappers. > >>> > >>> I'm curious if there are any M/R tasks that benefit from the sorting > the > >>> keys here, we just put the intermediate values in a Map>. > We > >>> could do about the same by passing this map (or rather each entry in > the > >>> map) to the combiner when it reaches a certain threshold, but I'm not > >>> convinced about the need to sort it. > >>> > >> Well there are algorithms that make use of it. Implementing a graph > >> algorithm can take use of it.Where the graph is split into k partitions > >> and each partition is assigned to one Mapper and Reducer. Mappers > >> compute the outgoing messages and output them to reducers. Then, > >> reducers can read the partition file sequentially to update the > >> vertices. This is just one use case that came to my mind. > >> > > > > I thought the partitioning only happens during the shuffle phase, and > > mappers/combiners don't know about partitions at all? > > I understand that reducers may need the intermediary keys to be sorted, > I'm > > asking about the combiners, since even if the keys from one block are > > sorted, the complete list of keys they receive is not sorted (unless a > new > > combiner is created for each input block). > You are absolutely right partitioning happens during the shuffle phase > and mappers/combiners do not know about partitions. Did I say something > different? > > > My initial question was whether there is a real need to sort the keys before calling the combiner. So when you presented the example with the graph being split in k partitions, I got a bit confused and I thought combiners might know about partitions, too. > >> > >>> > >>>> SHUFFLE > >>>> > >>>> hadoop decides the Reducer that should process each key by running a > >>>> partitioner. The default partitioner decides with the following way: > >>>> reducer = intermidKey.hashCode() % numberOfReducer > >>>> Finally, the intermediate key,value pairs are sent to the reducers > >>>> > >>> > >>> Is this algorithm set in stone, in that some M/R tasks rely on it? In > our > >>> impl, the user could use grouping to direct a set of intermediate keys > to > >>> the same node for reducing, but otherwise the reducing node is more or > >> less > >>> random. > >>> > >> The default partitioner does exactly that check the actual code for > >> hadoop 1.2.1 here > >> http://goo.gl/he9yHO > >> > > > > So API documentation doesn't specify it, but users still rely on this > > particular behaviour? > > > > BTW, is there always one reducer one each node, or can there be multiple > > reducers on each node? If it's the latter, it should be relatively easy > to > > model this in Infinispan using grouping. If it's the former, I'm not so > > sure... > > > Actually, the configuration of the MapReduce job (MapReduce task in > infinispan) defines the number of reducers and is programmatically > configurable. The short answer to your answer is the latter multiple > Reduce tasks are assigned to nodes almost equally. > Ok, partitioning sounds like something we could do in Infinispan. Partitioning seems like a pretty big deal in Hadoop M/R descriptions, so implementing it should be quite useful. > > >>> > >>>> REDUCE > >>>> > >>>> 1) Reducer sorts all key,value pairs by key and then groups the values > >>>> with the same key. As a result reducers receive their keys sorted. > >>>> > >>> > >>> I guess this sorting is only relevant if the reduce phase happens on a > >>> single thread, on a single node? If the reduce happens in parallel, the > >>> ordering is going to be lost anyway. > >> Each reduce task is run on a single thread, but you can run more than > >> one reduce tasks on a given node. The key ordering will not be lost. The > >> values are not ordered in any way. Moreover, the call to the reducer is > >> reduce(Key key, Iterable values) I cannot think of a way that the > >> order is lost. > >>> > >> > > > > Right, the call to the reducer is with a single key, but I'm assuming the > > order of the calls matters (e.g. because the reduces keeps some internal > > state across reduce() calls), otherwise there's no point in sorting the > > keys. Calling the same reducer from multiple threads (like we do) would > > definitely mess up the order of the calls. > > > > ATM we only have one reducer per node, which can be called from multiple > > threads, but it shouldn't be too hard to allow multiple reducers per node > > and to run each of them in a single thread. > > > I belive the sorting is done in order to group the values with same key > since there are large data stored on files the easiest way to group is > to sort and then group values with the same keys. > Yeah, I realized that my idea of keeping state between reduce() calls is kind of tricky to use, because you'd have to insert a sentinel value in each partition, and make sure that after the sorting the sentinel value will come last, in order to flush the final results to the output. I see Hadoop does offer some stuff to keep global state, like counters, so perhaps it's not even necessary. > > > >>> > >>>> 2) for each Key,List the reduce function of the reducer is > >>>> called. Reducer can also emit arbitrary number of key,value pairs > >>>> > >>> > >>> We limit the reducer (and the combiner) to emit a single value, which > is > >>> paired with the input key. We may need to lift this restriction, if > only > >> to > >>> make porting/adapting tasks easier. > >>> > >>> > >>>> > >>>> Additionally, hadoop lets you customize almost every aspect of the > code > >>>> run from how the input is split and read as key value pairs to how it > is > >>>> partitioned and sorted. > >>>> > >>> > >>> Does that mean you can sort the values as well? I was thinking of each > >>> reduce() call as independent, and then only the order of values for one > >>> intermediate key would be relevant. I guess some tasks may require > >> keeping > >>> state across all the reduce() calls and then the order of key matters, > >> but > >>> then the reduce phase can't be parallelized, either across the cluster > or > >>> on a single node. > >> > >> I was not very clear here. You can set the partitioner for a specific > >> job. You may also set the key comparator, as a result change the way > >> that intermediate keys are sorted. Additionally, one can change how keys > >> are grouped into one reduce call by setting the GroupComparator class. A > >> simple example would be to have sales(date,amount) and you want to > >> create totals for each month of the year. > >> so for the key: (year,month) and value: amount. > >> by overriding the keyClass hashCode function you can send all the > >> intermediate pairs with the same year to the same reducer > >> > >> and then you can set the groupComparator to group together all the > >> values with the same year. > >> > > > > You mean set the groupComparator to group together all the values with > the > > same month? I don't think so, because the key is already (year, month). > But > > if you wanted to collect the totals for each year you could just use the > > year as the intermediary key. So I don't quite understand how your > example > > is supposed to work. > Well you can do that as well, but I meant to group all the months of the > same year in one reduce call. The idea is that you want to receive in > one reduce the values for one year and the values for that year to be > sorted by month. > Ok, I didn't get it because I was looking at the problem from the other way around: if I'd want the values to be sorted, I'd include the month in the value and configure sorting for the values. But with Hadoop's streaming model it's probably easier to always sort by the keys. > > > > Besides, each reduce() call receives just one key, if you have keys > (2013, > > 1) and (2013, 2) and the groupComparator decides they should map to the > > same group, which key does the reducer see? I think a regular equals() > > should be good enough for us here, since we already need equals() in > order > > to put the intermediary keys in the intermediary cache. > > > > I am not be very good with examples you can check this > > https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-8/sorting > . > It is more or less the same problem with different setting. > Sorry, I didn't get too much from that example either, I gave up after the second "registering is fun" popup :) One last question: with Hadoop I imagine it's quite easy to leave the results of the M/R job on the distributed FS and start a new job to M/R from that. Do you think it would be important to offer something similar in Infinispan (i.e. put the result of the reducers in a cache instead of returning it to the user)? > Cheers > > Dan > > > > > Cheers > Evangelos > > > >> > >> Cheers, > >> Evangelos > >> > >> > >>> > >>>> A simple example is group by and computing an average over the grouped > >>>> values. Let the dataset be webpages (url,domain,sentiment) and we want > >>>> to compute the average sentiment for each domain in the dataset then > the > >>>> mapper for each webpages wp. will run > >>>> map(wp.url,wp): > >>>> emit(wp.domain,wp.sentiment) > >>>> > >>>> and in reducer: > >>>> reduce(domain,Iterable values): > >>>> counter = 0 > >>>> sum = 0 > >>>> while(values.hasNext()) > >>>> counter++; > >>>> sum += values.next() > >>>> emit(domain,sum/counter) > >>>> > >>>> I know that this approach is not optimized. But, I wanted give a > simple > >>>> example. > >>>> > >>> > >>> I think it can also be optimized to use a combiner, if we emit a > (domain, > >>> counter, sum) tuple :) > >> > >>> > >>> > >>> > >>>> Dan, only the the values for one intermediate key must be in memory? > or > >>>> all the intermediate key,value pairs that are assigned to one reducer > >>>> must be in memory? > >>>> > >>> > >>> With the default configuration, all the key/value pairs assigned to one > >>> reducer must be in memory. But one can define the __tmpMapReduce cache > in > >>> the configuration and configure eviction with a cache store (note that > >>> because of how our eviction works, the actual container size is at > least > >>> concurrencyLevel rounded up to the next power of 2). The problem is > that > >>> there is only one configuration for all the M/R tasks [1]. > >>> > >>> Note that because we only run the combiner after the mapping phase is > >>> complete, we do need to keep in memory all the results of the mapping > >> phase > >>> from that node (those are not stored in a cache). I've created an issue > >> in > >>> JIRA for this [2]. > >>> > >>> Cheers > >>> Dan > >>> > >>> [1] https://issues.jboss.org/browse/ISPN-4021 > >>> [2] https://issues.jboss.org/browse/ISPN-4022 > >>> > >>> > >>> > >>>> Cheers, > >>>> Evangelos > >>>> > >>>> On 02/18/2014 11:59 AM, Dan Berindei wrote: > >>>>> Radim, this is how our M/R algorithm works (Hadoop may do it > >>>> differently): > >>>>> > >>>>> * The mapping phase generates a Map> on > >> each > >>>>> node (Int meaning intermediate). > >>>>> * In the combine (local reduce) phase, a combine operation takes as > >> input > >>>>> an IntKey and a Collection with only the values that were > >>>>> produced on that node. > >>>>> * In the (global) reduce phase, all the intermediate values for each > >> key > >>>>> are merged, and a reduce operation takes an intermediate key and a > >>>> sequence > >>>>> of *all* the intermediate values generated for that key. These reduce > >>>>> operations are completely independent, so each intermediate key can > be > >>>>> mapped to a different node (distributed reduce), while still having > >>>> access > >>>>> to all the intermediate values at once. > >>>>> * In the end, the collator takes the Map from the > >>>> reduce > >>>>> phase and produces a single value. > >>>>> > >>>>> If a combiner can be used, then I believe it can also be run in > >> parallel > >>>>> with a LinkedBlockingQueue between the mapper and the combiner. But > >>>>> sometimes the reduce algorithm can only be run on the entire > collection > >>>> of > >>>>> values (e.g if you want to find the median, or a percentile). > >>>>> > >>>>> The limitation we have now is that in the reduce phase, the entire > list > >>>> of > >>>>> values for one intermediate key must be in memory at once. I think > >> Hadoop > >>>>> only loads a block of intermediate values in memory at once, and can > >> even > >>>>> sort the intermediate values (with a user-supplied comparison > function) > >>>> so > >>>>> that the reduce function can work on a sorted list without loading > the > >>>>> values in memory itself. > >>>>> > >>>>> Cheers > >>>>> Dan > >>>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> infinispan-dev mailing list > >>> infinispan-dev at lists.jboss.org > >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >>> > >> > >> _______________________________________________ > >> infinispan-dev mailing list > >> infinispan-dev at lists.jboss.org > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> > > > > > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140219/6690d65d/attachment-0001.html From dan.berindei at gmail.com Wed Feb 19 08:43:36 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Wed, 19 Feb 2014 15:43:36 +0200 Subject: [infinispan-dev] Ditching ASYNC modes for REPL/DIST/INV/CacheStores? In-Reply-To: References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com> <52EB5197.4050801@redhat.com> <52EB7F6C.505@redhat.com> <52EB9889.9070800@redhat.com> <52EFA355.2070203@redhat.com> <52EFC3AF.5060201@redhat.com> Message-ID: On Wed, Feb 19, 2014 at 1:03 PM, Sanne Grinovero wrote: > On 19 February 2014 07:12, Galder Zamarre?o wrote: > > > > On 03 Feb 2014, at 19:01, Dan Berindei wrote: > > > >> > >> > >> > >> On Mon, Feb 3, 2014 at 6:28 PM, Radim Vansa wrote: > >> >>>> For sync we would want to invoke directly to avoid context > switching. > >> >>> I think you haven't properly understood what I was talking about: > the > >> >>> putAsync should not switch context at all in the ideal design. It > should > >> >>> traverse through the interceptors all the way down (logically, in > >> >>> current behaviour), invoke JGroups async API and jump out. Then, as > soon > >> >>> as the response is received, the thread which delivered it should > >> >>> traverse the interceptor stack up (again, logically), and fire the > future. > >> > A Future doesn't make much sense with an async transport. The problem > >> > is with an async transport you never get back a response so you never > >> > know when the actual command is completed and thus a Future is > >> > worthless. The caller wouldn't know if they could rely on the use of > >> > the Future or not. > >> > >> You're right, there's one important difference between putAsync and put > >> with async transport: in the first case you can find out when the > >> request is completed while you cannot with the latter. Not requiring the > >> ack can be an important optimization. I think that both versions are > >> very valid: first mostly for bulk operations = reduction of latency, > >> second for modifications that are acceptable to fail without handling > that. > >> I had the first case in my mind when talking about async operations, and > >> there the futures are necessary. > >> > >> A couple more differences: > >> 1. You can't do commitAsync(), but you can configure the commit to be > replicated asynchronously (1PC). Although we did talk about removing that > option... > >> 2. If you do putAsync(k, v1); putAsync(k, v2), there is no ordering > between the two and you might end up with k=v1 in the cache. > > > > If there's any relationship between both puts for the caller thread, the > caller must make sure that the second put is only called after the first > has completed. > > Actually in such a case I would strongly expect Infinispan to keep the > two operations in order. This is not to be pushed on user's > responsibility. > I think you're talking about some other kind of putAsync(k, v) than we have now... all the work in putAsync happens on a separate thread, so there is no ordering between two separate putAsync calls whatsoever. > > > > If there's separate threads calling it and it relies on this, it should > call replace the second time, i.e. replaceAsync(k, v1, v2) to get the > guarantees it wants. > > > > What is really important is that the order in which they are executed in > one node/replica is the same order in which they're executed in all other > nodes. This was something that was not maintained when async marshalling > was enabled. > > +1000 > > But also I'd stress that any sync operation should have a Future > returned, someone in this long thread suggested to have an option to > drop it for example to speedup bulk imports, but I really can't see a > scenario in which I wouldn't want to know about a failure. Let's not > do the same mistake that made MongoDB so "popular" ;-) > Bulk imports can still be mad efficient without strictly needing to go > these lenghts. > You mean if the operation is synchronous, but the cache store/replication is async? I don't see how sync operations could return a Future, since most of them already have a return value. Bulk imports could certainly use putAsync(k, v), and that would indeed return a Future. > > Sanne > > > > > >> > >> > >> > > >> > Also it depends what you are trying to do with async. Currently async > >> > transport is only for sending messages to another node, we never think > >> > of when we are the owning node. In this case the calling thread would > >> > have to go down the interceptor stack and acquire any locks if it is > >> > the owner, thus causing this "async" to block if you have any > >> > contention on the given key. The use of another thread would allow > >> > the calling thread to be able to return immediately no matter what > >> > else is occurring. Also I don't see what is so wrong about having a > >> > context switch to run something asynchronously, we shouldn't have a > >> > context switch to block the user thread imo, which is very possible > >> > with locking. > >> > >> This is an important notice! Locking would complicate the design a lot, > >> because the thread in "async" mode should do only tryLocks - if this > >> fails, further processing should be dispatched to another thread. Not > >> sure if this could be implemented at all, because the thread may be > >> blocked inside JGroups as well (async API is about receiving the > >> response asynchronously, not about sending the message asynchronously). > >> > >> I don't say that the context switch is that bad. My concern is that you > >> have a very limited amount of requests that can be processed in > >> parallel. I consider a "request" something pretty lightweight in concept > >> - but one thread per request makes this rather heavyweight stuff. > >> > >> We did talk in Farnborough/Palma about removing the current LockManager > with a queue-based structure like the one used for ordering total-order > transactions. And about removing the implicit stack in the current > interceptor stack with an explicit stack, to allow resuming a command > mid-execution. But the feeling I got was that neither is going to make it > into 7.0. > >> > >> > >> > > >> >> +1 much cleaner, I love it. Actually wasn't aware the current code > >> >> didn't do this :-( > >> > This is what the current async transport does, but it does nothing > with Futures. > >> > >> Nevermind the futures, this is not the important part. It's not about > >> async transport neither, it's about async executors. > >> (okay, the thread was about dropping async transport, I have hijacked > it) > >> > >> Radim > >> > >> -- > >> Radim Vansa > >> JBoss DataGrid QA > >> > >> _______________________________________________ > >> infinispan-dev mailing list > >> infinispan-dev at lists.jboss.org > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> > >> _______________________________________________ > >> infinispan-dev mailing list > >> infinispan-dev at lists.jboss.org > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > > -- > > Galder Zamarre?o > > galder at redhat.com > > twitter.com/galderz > > > > Project Lead, Escalante > > http://escalante.io > > > > Engineer, Infinispan > > http://infinispan.org > > > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140219/297a4821/attachment.html From vblagoje at redhat.com Wed Feb 19 10:26:37 2014 From: vblagoje at redhat.com (Vladimir Blagojevic) Date: Wed, 19 Feb 2014 10:26:37 -0500 Subject: [infinispan-dev] MapReduce limitations and suggestions. In-Reply-To: References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com> <52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com> <530320F9.300@redhat.com> <5303342A.1050800@gmail.com> <53034F5E.6060706@gmail.com> <53037D40.6060101@gmail.com> Message-ID: <5304CD2D.1040909@redhat.com> On 2/19/2014, 8:22 AM, Dan Berindei wrote: > > > Sorry, I didn't get too much from that example either, I gave up after > the second "registering is fun" popup :) > > One last question: with Hadoop I imagine it's quite easy to leave the > results of the M/R job on the distributed FS and start a new job to > M/R from that. Do you think it would be important to offer something > similar in Infinispan (i.e. put the result of the reducers in a cache > instead of returning it to the user)? > > This is on our todo list https://issues.jboss.org/browse/ISPN-4002 From dereed at redhat.com Wed Feb 19 11:44:32 2014 From: dereed at redhat.com (Dennis Reed) Date: Wed, 19 Feb 2014 10:44:32 -0600 Subject: [infinispan-dev] Ditching ASYNC modes for REPL/DIST/INV/CacheStores? In-Reply-To: References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com> <52EB5197.4050801@redhat.com> Message-ID: <5304DF70.2050603@redhat.com> On 02/19/2014 12:57 AM, Galder Zamarre?o wrote: > On 31 Jan 2014, at 08:32, Dennis Reed wrote: > >> It would be a loss of functionality. >> >> As a common example, the AS web session replication cache is configured >> for ASYNC by default, for performance reasons. >> But it can be changed to SYNC to guarantee that when the request >> finishes that the session was replicated. >> >> That wouldn't be possible if you could no longer switch between >> ASYNC/SYNC with just a configuration change. > I disagree :). > > AS could abstract that configuration detail. IOW, if all Infinispan returned was Futures, AS or any other client application, has the choice in their hands: do they wait for the future to complete or not? If they do, they?re SYNC, if not ASYNC. AS can still expose this and no functionality is lost. Yes, the functionality is still lost. Your suggestion is just to re-implement the functionality over and over in each ISPN caller. :) > What happens is that SYNC/ASYNC decision stops being a configuration option (bad, bad, bad) and becomes an actual programming decision Infinispan clients must address (good, good, good). This really depends on the client. For the AS session replication use case, a config option is good, good, good. But re-implementing the same functionality in every caller that may want it to be a config option is bad, bad, bad. -Dennis >> -Dennis >> >> On 01/31/2014 01:08 AM, Galder Zamarre?o wrote: >>> Hi all, >>> >>> The following came to my mind yesterday: I think we should ditch ASYNC modes for DIST/REPL/INV and our async cache store functionality. >>> >>> Instead, whoever wants to store something asyncronously should use asynchronous methods, i.e. call putAsync. So, this would mean that when you call put(), it's always sync. This would reduce the complexity and configuration of our code base, without affecting our functionality, and it would make things more logical IMO. >>> >>> WDYT? >>> >>> From dan.berindei at gmail.com Wed Feb 19 12:43:36 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Wed, 19 Feb 2014 19:43:36 +0200 Subject: [infinispan-dev] MapReduce limitations and suggestions. In-Reply-To: <5304CD2D.1040909@redhat.com> References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com> <52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com> <530320F9.300@redhat.com> <5303342A.1050800@gmail.com> <53034F5E.6060706@gmail.com> <53037D40.6060101@gmail.com> <5304CD2D.1040909@redhat.com> Message-ID: On Wed, Feb 19, 2014 at 5:26 PM, Vladimir Blagojevic wrote: > On 2/19/2014, 8:22 AM, Dan Berindei wrote: > > > > > > Sorry, I didn't get too much from that example either, I gave up after > > the second "registering is fun" popup :) > > > > One last question: with Hadoop I imagine it's quite easy to leave the > > results of the M/R job on the distributed FS and start a new job to > > M/R from that. Do you think it would be important to offer something > > similar in Infinispan (i.e. put the result of the reducers in a cache > > instead of returning it to the user)? > > > > > > This is on our todo list https://issues.jboss.org/browse/ISPN-4002 > Cool, I thought I saw it somewhere but I didn't get to actually search in JIRA for it :) Vladimir, what do you think about the partitioning/sorting/grouping stuff? I'm not sure if it should be a priority for us: there are certainly Hadoop jobs that use those and would be pretty tricky to translate to our API, but on the other hand I'm sure most jobs are ok with an unordered Map as the output. Cheers Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140219/2918017c/attachment-0001.html From vblagoje at redhat.com Wed Feb 19 14:08:20 2014 From: vblagoje at redhat.com (Vladimir Blagojevic) Date: Wed, 19 Feb 2014 14:08:20 -0500 Subject: [infinispan-dev] MapReduce limitations and suggestions. In-Reply-To: References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com> <52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com> <530320F9.300@redhat.com> <5303342A.1050800@gmail.com> <53034F5E.6060706@gmail.com> <53037D40.6060101@gmail.com> <5304CD2D.1040909@redhat.com> Message-ID: <53050124.2070002@redhat.com> On 2/19/2014, 12:43 PM, Dan Berindei wrote: > > > This is on our todo list https://issues.jboss.org/browse/ISPN-4002 > > > Cool, I thought I saw it somewhere but I didn't get to actually search > in JIRA for it :) > > Vladimir, what do you think about the partitioning/sorting/grouping > stuff? I'm not sure if it should be a priority for us: there are > certainly Hadoop jobs that use those and would be pretty tricky to > translate to our API, but on the other hand I'm sure most jobs are ok > with an unordered Map as the output. > > Cheers > Dan > Dan, I have to focus on the pending tasks in JIRA and in the meantime I'll read up on this subject of partitioning/sorting/grouping. If anyone else has some extra cycles then they are more than welcome to help out. Vladimir -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140219/94e3ac45/attachment.html From vblagoje at redhat.com Wed Feb 19 15:45:45 2014 From: vblagoje at redhat.com (Vladimir Blagojevic) Date: Wed, 19 Feb 2014 15:45:45 -0500 Subject: [infinispan-dev] Further dist.exec and M/R API improvements Message-ID: <530517F9.3060008@redhat.com> Hey guys, As some of you might know we have received additional requirements from community and internally to add a few things to dist.executors and map/reduce API. On distributed executors front we need to enable distributed executors to store results into cache directly rather than returning them to invoker [1]. As soon as we introduce this API we also need a asyc. mechanism to allow notifications of subtask completion/failure. I was thinking we add a concept of DistributedTaskExecutionListener which can be specified in DistributedTaskBuilder: DistributedTaskBuilder executionListener(DistributedTaskExecutionListener listener); We needed DistributedTaskExecutionListener anyway. All distributed tasks might use some feedback about task progress, completion/failure and on. My proposal is roughly: public interface DistributedTaskExecutionListener { void subtaskSent(Address node, Set inputKeys); void subtaskFailed(Address node, Set inputKeys, Exception e); void subtaskSucceded(Address node, Set inputKeys, T result); void allSubtasksCompleted(); } So much for that. If tasks do not use input keys these parameters would be emply sets. Now for [1] we need to add additional methods to DistributedExecutorService. We can not specify result cache in DistributedTaskBuilder as we are still bound to only submit methods in DistributedExecutorService that return futures and we don't want that. We need two new void methods: void submitEverywhere(DistributedTask task, Cache, T> result); void submitEverywhere(DistributedTask task, Cache, T> result, K... input); Now, why bother with DistExecResultKey? Well we have tasks that use input keys and tasks that don't. So results cache could only be keyed by either keys or execution address, or combination of those two. Therefore, DistExecResultKey could be something like: public interface DistExecResultKey { Address getExecutionAddress(); K getKey(); } If you have a better idea how to address this aspect let us know. So much for distributed executors. For map/reduce we also have to enable storing of map reduce task results into cache [2] and allow users to specify custom cache for intermediate results[3]. Part of task [2] is to allow notification about map/reduce task progress and completion. Just as in dist.executor I would add MapReduceTaskExecutionListener interface: public interface MapReduceTaskExecutionListener { void mapTaskInitialized(Address executionAddress); void mapTaskSucceeded(Address executionAddress); void mapTaskFailed(Address executionTarget, Exception cause); void mapPhaseCompleted(); void reduceTaskInitialized(Address executionAddress); void reduceTaskSucceeded(Address executionAddress); void reduceTaskFailed(Address address, Exception cause); void reducePhaseCompleted(); } while MapReduceTask would have an additional method: public void execute(Cache resultsCache); MapReduceTaskExecutionListener could be specified using fluent MapReduceTask API just as intermediate cache would be: public MapReduceTask usingIntermediateCache(Cache> tmpCache); thus addressing issue [3]. Let me know what you think, Vladimir [1] https://issues.jboss.org/browse/ISPN-4030 [2] https://issues.jboss.org/browse/ISPN-4002 [3] https://issues.jboss.org/browse/ISPN-4021 From vagvaz at gmail.com Thu Feb 20 03:37:04 2014 From: vagvaz at gmail.com (Evangelos Vazaios) Date: Thu, 20 Feb 2014 10:37:04 +0200 Subject: [infinispan-dev] MapReduce limitations and suggestions. In-Reply-To: References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com> <52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com> <530320F9.300@redhat.com> <5303342A.1050800@gmail.com> <53034F5E.6060706@gmail.com> <53037D40.6060101@gmail.com> Message-ID: <5305BEB0.9000900@gmail.com> On 02/19/2014 03:22 PM, Dan Berindei wrote: > I forgot to ask about this... we already have the entries stored as > key,value pairs, so we expect the data to be already in the cache. That > means there is no ordering in the inputs, and the mapper can't rely on > sequential inputs to be related. Would you consider that to be a reasonable > expectation? > Yes, I have not encounter an algorithm in M/R that assumes such relations during the map phase. Cheers, Evangelos From faseela.k at ericsson.com Thu Feb 20 04:11:29 2014 From: faseela.k at ericsson.com (Faseela K) Date: Thu, 20 Feb 2014 09:11:29 +0000 Subject: [infinispan-dev] How to add programmatic config to an exisitng xml configured cache Message-ID: Hi, I have some infinispan configurations available in "config.xml". After loading this configuration, I want to append some more configurations programmatically, using Configuration Builder. I am doing something like this : Configuration template = null; ConfigurationBuilder builder = new ConfigurationBuilder(); DefaultCacheManager manager = new DefaultCacheManager( "config.xml"); template = manager.getCacheConfiguration("evictionCache"); builder.read(template); builder.loaders().passivation(false).shared(false).preload(true) .addFileCacheStore().fetchPersistentState(true) .purgerThreads(3).purgeSynchronously(true) .ignoreModifications(false).purgeOnStartup(false) .location("tmp").async() .enabled(true).flushLockTimeout(15000).threadPoolSize(5) .singletonStore().enabled(true).pushStateWhenCoordinator(true) .pushStateTimeout(20000); manager.defineConfiguration("abcd", builder.build()); The problem with this code is, it's overwriting the evictionCache configuration. Can somebody help me to fix this issue? Thanks, Faseela -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140220/31a82d43/attachment.html From galder at redhat.com Thu Feb 20 06:37:13 2014 From: galder at redhat.com (=?iso-8859-1?Q?Galder_Zamarre=F1o?=) Date: Thu, 20 Feb 2014 12:37:13 +0100 Subject: [infinispan-dev] [infinispan-internal] Introducing Infinispan OData server: Remote JSON documents querying In-Reply-To: <1836092005.4948036.1392726921319.JavaMail.zimbra@redhat.com> References: <1836092005.4948036.1392726921319.JavaMail.zimbra@redhat.com> Message-ID: Great work Tomas!! :) On 18 Feb 2014, at 13:35, Tomas Sykora wrote: > Hello all! :) > > It's the right time to make it a little bit more public and share some results of work on Infinispan OData server, finally! > This solution can serve as a proof of concept where we are able to remotely query JSON documents stored in Infinispan caches and using industrial standard and platform independent way of communication with the server (OData). > > There is still much to do/implement/improve in the server, but it is working as it is now. > > Check a blog post if you are interested: > http://tsykora-tech.blogspot.cz/2014/02/introducing-infinispan-odata-server.html > > Any feedback is more than welcome. > > + I'd like to say a big THANK YOU to all who supported me! > Mainly: JDG QE guys, Manik, Mircea, Sanne and Adrian. > It wouldn't be done without your patience and willingness to help me :-) > > Tomas > -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From rvansa at redhat.com Thu Feb 20 09:40:41 2014 From: rvansa at redhat.com (Radim Vansa) Date: Thu, 20 Feb 2014 15:40:41 +0100 Subject: [infinispan-dev] RadarGun 1.1.0.Final released Message-ID: <530613E9.3020602@redhat.com> Hi all, it has been a long time since last release of RadarGun. We have been using it intensively and developed many new features - 1.0.0 had 7,340 lines of Java code, 1.1.0 has 32,978 lines. RadarGun has become multi-purpose tool, used for checking both performance and functionality of caches under stress. During 1.1.0 development, most parts of code changed beyond the beyonds, but we tried to keep the old configuration compatible. However, the design started to be rather limiting, and therefore, we have decided to make the last release for 1.1.0 and move on to RadarGun 2.0.0. In 1.1.x branch we will provide bugfixes, but all new features should go to 2.0.0. Some decoys for features expected for RadarGun 2.0.0: * non-homogenous clusters: client/server setups, cooperation of different versions of products, or easy setup of cross-site deployment with different configurations * abstracting from cache wrapper: you will be able to use RadarGun for more than just caches without any hacks ** current CacheWrapper interface will be designed to match JSR-107 javax.cache.Cache rather than java.util.Map * pluggable reporting: statistics will be directly multiplexed to configured reporters (again, without cheating on directories), reporters will provide the output formatted as CSV, HTML or even can deploy the results to external repository * merging local and distributed benchmark -> master + single slave within one JVM * better property parsing: evaluation of expressions, property replacement executed on slaves I hope you will like it! And enjoy 1.1.0.Final release now. Radim ------ Radim Vansa JBoss DataGrid QA From afield at redhat.com Thu Feb 20 09:49:28 2014 From: afield at redhat.com (Alan Field) Date: Thu, 20 Feb 2014 09:49:28 -0500 (EST) Subject: [infinispan-dev] RadarGun 1.1.0.Final released In-Reply-To: <530613E9.3020602@redhat.com> References: <530613E9.3020602@redhat.com> Message-ID: <2023235320.6731410.1392907768166.JavaMail.zimbra@redhat.com> Yes! Congratulations Radim on defeating Maven's Release Plugin! ----- Original Message ----- > From: "Radim Vansa" > To: "infinispan -Dev List" > Sent: Thursday, February 20, 2014 9:40:41 AM > Subject: [infinispan-dev] RadarGun 1.1.0.Final released > > Hi all, > > it has been a long time since last release of RadarGun. We have been > using it intensively and developed many new features - 1.0.0 had 7,340 > lines of Java code, 1.1.0 has 32,978 lines. RadarGun has become > multi-purpose tool, used for checking both performance and functionality > of caches under stress. > > During 1.1.0 development, most parts of code changed beyond the beyonds, > but we tried to keep the old configuration compatible. However, the > design started to be rather limiting, and therefore, we have decided to > make the last release for 1.1.0 and move on to RadarGun 2.0.0. In 1.1.x > branch we will provide bugfixes, but all new features should go to 2.0.0. > > Some decoys for features expected for RadarGun 2.0.0: > > * non-homogenous clusters: client/server setups, cooperation of > different versions of products, or easy setup of cross-site deployment > with different configurations > * abstracting from cache wrapper: you will be able to use RadarGun for > more than just caches without any hacks > ** current CacheWrapper interface will be designed to match JSR-107 > javax.cache.Cache rather than java.util.Map > * pluggable reporting: statistics will be directly multiplexed to > configured reporters (again, without cheating on directories), reporters > will provide the output formatted as CSV, HTML or even can deploy the > results to external repository > * merging local and distributed benchmark -> master + single slave > within one JVM > * better property parsing: evaluation of expressions, property > replacement executed on slaves > > I hope you will like it! And enjoy 1.1.0.Final release now. > > Radim > > ------ > Radim Vansa JBoss DataGrid QA > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > From dan.berindei at gmail.com Fri Feb 21 11:03:11 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Fri, 21 Feb 2014 18:03:11 +0200 Subject: [infinispan-dev] ClusteredListeners: message delivered twice In-Reply-To: References: <6F2DD24C-4712-48AF-ABCD-7CE280089875@redhat.com> Message-ID: On Mon, Feb 17, 2014 at 7:44 PM, William Burns wrote: > On Mon, Feb 17, 2014 at 7:53 AM, Sanne Grinovero > wrote: > > On 12 February 2014 10:40, Mircea Markus wrote: > >> Hey Will, > >> > >> With the current design, during a topology change, an event might be > delivered twice to a cluster listener. I think we might be able to identify > such situations (a node becomes a key owner as a result of the topology > change) and add this information to the event we send, e.g. a flag > "potentiallyDuplicate" or something like that. Event implementors might be > able to make good use of this, e.g. checking their internal state if an > event is redelivered or not. What do you think? Are there any other > more-than-once delivery situations we can't keep track of? > > I agree, this would be important to track. I have thus added a new > flag to listeners that is set to true when a modification, removal, or > create that is done on behalf of a command that was retried due to a > topology change during the middle of it. Also this gives the benefit > not just for cluster listeners but regular listeners, since we could > have double notification currently even. > > > > > I would really wish we would not push such a burden to the API > > consumer. If we at least had a modification counter associated with > > each entry this could help to identify duplicate triggers as well (on > > top of ordering of modification events as already discussed many > > times). > > The issue in particular we have issues with listeners is when the > primary owner replicates the update to backup owners and then crashes > before the notification is sent. In this case we have no idea from > the originator's perspective if the backup owner has the update. When > the topology changes if updated it will be persisted to new owners > (possibly without notification). We could add a counter, however the > backup owner then has no idea if the primary owner has sent the > notification or not. Without adding some kind of 2PC to the primary > owner to tell the backup that it occurred, he won't know. However > this doesn't reliably tell the backup owner if the notification was > fired even if the node goes down during this period. Without > seriously rewriting our nontx dist code I don't see a viable way to do > this without the API consumer having to be alerted. > There's always going to be the possibility that a replication to one of the backup owner fails and the command is aborted after the listener was notified (but not on the successful backup owners). And even in tx mode, the listeners are notified during the prepare phase and not during the commit. So I don't think we'll ever be able to make listeners 100% reliable, but the "potentially duplicate" flag should be good enough. Cheers Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140221/6eff2b4d/attachment.html From mmarkus at redhat.com Mon Feb 24 11:32:24 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Mon, 24 Feb 2014 16:32:24 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <8955F382-8A6E-43AA-864E-1EC0C190654E@redhat.com> References: <888EA204-30A1-4BFF-9469-7118996024A1@hibernate.org> <6211A55D-9F1D-4686-9EF4-373C216E4927@hibernate.org> <8955F382-8A6E-43AA-864E-1EC0C190654E@redhat.com> Message-ID: <34A01AED-0DDF-4171-9B83-BB3B6C9DF0E8@redhat.com> On Feb 17, 2014, at 5:35 PM, Galder Zamarre?o wrote: > > On 30 Jan 2014, at 20:51, Mircea Markus wrote: > >> >> On Jan 30, 2014, at 9:42 AM, Galder Zamarre?o wrote: >> >>> >>> On Jan 21, 2014, at 11:52 PM, Mircea Markus wrote: >>> >>>> >>>> On Jan 15, 2014, at 1:42 PM, Emmanuel Bernard wrote: >>>> >>>>> By the way, people looking for that feature are also asking for a unified Cache API accessing these several caches right? Otherwise I am not fully understanding why they ask for a unified query. >>>>> Do you have written detailed use cases somewhere for me to better understand what is really requested? >>>> >>>> IMO from a user perspective, being able to run queries spreading several caches makes the programming simplifies the programming model: each cache corresponding to a single entity type, with potentially different configuration. >>> >>> Not sure if it simplifies things TBH if the configuration is the same. IMO, it just adds clutter. >> >> Not sure I follow: having a cache that contains both Cars and Persons sound more cluttering to me. I think it's cumbersome to write any kind of querying with an heterogenous cache, e.g. Map/Reduce tasks that need to count all the green Cars would need to be aware of Persons and ignore them. Not only it is harder to write, but discourages code reuse and makes it hard to maintain (if you'll add Pets in the same cache in future you need to update the M/R code as well). And of course there are also different cache-based configuration options that are not immediately obvious (at design time) but will be in the future (there are more Persons than Cars, they live longer/expiry etc): mixing everything together in the same cache from the begging is a design decision that might bite you in the future. >> >> The way I see it - and very curious to see your opinion on this - following an database analogy, the CacheManager corresponds to an Database and the Cache to a Table. Hence my thought that queries spreading multiple caches are both useful and needed (same as query spreading over multiple tables). > > My opinion is that seeing it this way is limiting. A key/value store is schemaless. Your view is forcing a particular schema on how to structure things. I'm not forcing anything at all, people can still use a Cache if they want to. What I'm saying is that, especially for larger application, grouping the type on caches makes a lot of sense for the users. > > I don?t pretend everyone to store everything in a single cache and of course there will be situations where it?s not ideal or the best solution, such as in cases like the ones you mention above, but if you want to do it, for any of the reasons I or Paul mentioned in [1], it?d be nice to be able to do so. of course, I don't plan to enforce this model at all, as it is useful. Just pondering on the way the domain model is split between caches. > > Cheers, > > [1] https://issues.jboss.org/browse/ISPN-3640 > >> >>> >>> Just yesterday I discovered this gem in Scala's Shapeless extensions [1]. This is experimental stuff but essentially it allows to define what the key/value type pairs a map will contain, and it does type checking at compile time. I almost wet my pants when I saw that ;) :p. In the example, it defines a map as containing Int -> String, and String -> Int key/value pairs. If you try to add an Int -> Int, it fails compilation. >> >> Agreed the compile time check is pretty awesome :-) Still mix and matching types in a Map doesn't look great to me for ISPN. >> >>> >>> Java's type checking is not powerful enough to do this, and it's compilation logic is not extendable in the same way Scala macros does, but I think the fact that other languages are looking into this validates Paul's suggestion in [2], on top of all the benefits listed there. >>> >>> Cheers, >>> >>> [1] https://github.com/milessabin/shapeless/wiki/Feature-overview:-shapeless-2.0.0#heterogenous-maps >>> [2] https://issues.jboss.org/browse/ISPN-3640 >>> >>>> Besides the query API that would need to be extended to support accessing multiple caches, not sure what other APIs would need to be extended to take advantage of this? >>>> >>>>> >>>>> Emmanuel >>>>> >>>>> On 14 Jan 2014, at 12:59, Sanne Grinovero wrote: >>>>> >>>>>> Up this: it was proposed again today ad a face to face meeting. >>>>>> Apparently multiple parties have been asking to be able to run >>>>>> cross-cache queries. >>>>>> >>>>>> Sanne >>>>>> >>>>>> On 11 April 2012 12:47, Emmanuel Bernard wrote: >>>>>>> >>>>>>> On 10 avr. 2012, at 19:10, Sanne Grinovero wrote: >>>>>>> >>>>>>>> Hello all, >>>>>>>> currently Infinispan Query is an interceptor registering on the >>>>>>>> specific Cache instance which has indexing enabled; one such >>>>>>>> interceptor is doing all what it needs to do in the sole scope of the >>>>>>>> cache it was registered in. >>>>>>>> >>>>>>>> If you enable indexing - for example - on 3 different caches, there >>>>>>>> will be 3 different Hibernate Search engines started in background, >>>>>>>> and they are all unaware of each other. >>>>>>>> >>>>>>>> After some design discussions with Ales for CapeDwarf, but also >>>>>>>> calling attention on something that bothered me since some time, I'd >>>>>>>> evaluate the option to have a single Hibernate Search Engine >>>>>>>> registered in the CacheManager, and have it shared across indexed >>>>>>>> caches. >>>>>>>> >>>>>>>> Current design limitations: >>>>>>>> >>>>>>>> A- If they are all configured to use the same base directory to >>>>>>>> store indexes, and happen to have same-named indexes, they'll share >>>>>>>> the index without being aware of each other. This is going to break >>>>>>>> unless the user configures some tricky parameters, and even so >>>>>>>> performance won't be great: instances will lock each other out, or at >>>>>>>> best write in alternate turns. >>>>>>>> B- The search engine isn't particularly "heavy", still it would be >>>>>>>> nice to share some components and internal services. >>>>>>>> C- Configuration details which need some care - like injecting a >>>>>>>> JGroups channel for clustering - needs to be done right isolating each >>>>>>>> instance (so large parts of configuration would be quite similar but >>>>>>>> not totally equal) >>>>>>>> D- Incoming messages into a JGroups Receiver need to be routed not >>>>>>>> only among indexes, but also among Engine instances. This prevents >>>>>>>> Query to reuse code from Hibernate Search. >>>>>>>> >>>>>>>> Problems with a unified Hibernate Search Engine: >>>>>>>> >>>>>>>> 1#- Isolation of types / indexes. If the same indexed class is >>>>>>>> stored in different (indexed) caches, they'll share the same index. Is >>>>>>>> it a problem? I'm tempted to consider this a good thing, but wonder if >>>>>>>> it would surprise some users. Would you expect that? >>>>>>> >>>>>>> I would not expect that. Unicity in Hibernate Search is not defined per identity but per class + provided id. >>>>>>> I can see people reusing the same class as partial DTO and willing to index that. I can even see people >>>>>>> using the Hibernate Search programmatic API to index the "DTO" stored in cache 2 differently than the >>>>>>> domain class stored in cache 1. >>>>>>> I can concede that I am pushing a bit the use case towards bad-ish design approaches. >>>>>>> >>>>>>>> 2#- configuration format overhaul: indexing options won't be set on >>>>>>>> the cache section but in the global section. I'm looking forward to >>>>>>>> use the schema extensions anyway to provide a better configuration >>>>>>>> experience than the current . >>>>>>>> 3#- Assuming 1# is fine, when a search hit is found I'd need to be >>>>>>>> able to figure out from which cache the value should be loaded. >>>>>>>> 3#A we could have the cache name encoded in the index, as part >>>>>>>> of the identifier: {PK,cacheName} >>>>>>>> 3#B we actually shard the index, keeping a physically separate >>>>>>>> index per cache. This would mean searching on the joint index view but >>>>>>>> extracting hits from specific indexes to keep track of "which index".. >>>>>>>> I think we can do that but it's definitely tricky. >>>>>>>> >>>>>>>> It's likely easier to keep indexed values from different caches in >>>>>>>> different indexes. that would mean to reject #1 and mess with the user >>>>>>>> defined index name, to add for example the cache name to the user >>>>>>>> defined string. >>>>>>>> >>>>>>>> Any comment? >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Sanne >>>>>>>> _______________________________________________ >>>>>>>> infinispan-dev mailing list >>>>>>>> infinispan-dev at lists.jboss.org >>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> infinispan-dev mailing list >>>>>>> infinispan-dev at lists.jboss.org >>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>> _______________________________________________ >>>>>> infinispan-dev mailing list >>>>>> infinispan-dev at lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>> >>>>> >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev at lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>>> Cheers, >>>> -- >>>> Mircea Markus >>>> Infinispan lead (www.infinispan.org) >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> >>> -- >>> Galder Zamarre?o >>> galder at redhat.com >>> twitter.com/galderz >>> >>> Project Lead, Escalante >>> http://escalante.io >>> >>> Engineer, Infinispan >>> http://infinispan.org >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> Cheers, >> -- >> Mircea Markus >> Infinispan lead (www.infinispan.org) >> >> >> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz > > Project Lead, Escalante > http://escalante.io > > Engineer, Infinispan > http://infinispan.org > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Mon Feb 24 11:39:05 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Mon, 24 Feb 2014 16:39:05 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> Message-ID: <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard wrote: > By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome. Curious to hear the whole story :-) We cannot mandate all the suers to use OGM though, one of the reasons being OGM is not platform independent (hotrod). > >> On 17 f?vr. 2014, at 18:51, Emmanuel Bernard wrote: >> >>> On Mon 2014-02-17 18:43, Galder Zamarre?o wrote: >>> >>>> On 05 Feb 2014, at 17:30, Emmanuel Bernard wrote: >>>> >>>>> On Wed 2014-02-05 15:53, Mircea Markus wrote: >>>>> >>>>>> On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard wrote: >>>>>> >>>>>> Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed. >>>>> >>>>> Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-) >>>> >>>> >>>> //some unified query giving me entries pointing by fk copy to bar and >>>> //buz objects. So I need to manually load these references. >>>> >>>> //happy emmanuel >>>> Cache unifiedCache = cacheManager.getMotherOfAllCaches(); >>>> Bar bar = unifiedCache.get(foo); >>>> Buz buz = unifiedCache.get(baz); >>>> >>>> //not so happy emmanuel >>>> Cache fooCache = cacheManager.getCache("foo"); >>>> Bar bar = fooCache.get(foo); >>>> Cache bazCache = cacheManager.getCache("baz"); >>>> Buz buz = bazCache.put(baz); >>> >>> Would something like what Paul suggests in https://issues.jboss.org/browse/ISPN-3640 help you better? IOW, have a single cache, and then have a filtered view for Bar or Buz types? Not sure I understand the differences in your code changes in terms of what makes you happy vs not. >> >> Not really. >> What makes me unhappy is to have to keep in my app all the >> references to these specific cache store instances. The filtering >> approach only moves the problem. >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Mon Feb 24 11:47:45 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Mon, 24 Feb 2014 16:47:45 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <53034BBB.1030809@redhat.com> References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <53034BBB.1030809@redhat.com> Message-ID: On Feb 18, 2014, at 12:02 PM, Adrian Nistor wrote: > Well, OGM and Infinispan are different species :) So, Infinispan being what it is today - a non-homogenous, schema-less KV store, without support for entity associations (except embedding) - which simplifies the whole thing a lot, should we or should we not provide transparent cross-cacheManager search capabilities, in this exact context? Vote? TBH I think users should push us for this if they need it. -1 to do it right now. > > There were some points raised previously like "if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well". In the SQL world you would also probably CRUD against a table or set of tables and then query against a view - a bit like what we're doing here. > I don't see any problem with this in principle. There is however something currently missing in the query result set API - it currently does not provide you the keys of the matching entities. would be nice to have an option for that, indeed. > People work around this by storing the key in the entity. Now with the addition of the cross-cacheManager search we'll probably need to fix the result api and also provide a reference to the cache (or just the name?) where the entity is stored. > > The (enforced) one entity type per cache rule is not conceptually or technically required for implementing this, so I won't start raving against it :) Sane users should apply it however. > > > On 02/18/2014 12:13 AM, Emmanuel Bernard wrote: >> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome. >> >> >>> On 17 f?vr. 2014, at 18:51, Emmanuel Bernard >>> wrote: >>> >>> >>>> On Mon 2014-02-17 18:43, Galder Zamarre?o wrote: >>>> >>>> >>>>> On 05 Feb 2014, at 17:30, Emmanuel Bernard >>>>> wrote: >>>>> >>>>> >>>>>> On Wed 2014-02-05 15:53, Mircea Markus wrote: >>>>>> >>>>>> >>>>>>> On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard >>>>>>> wrote: >>>>>>> >>>>>>> Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed. >>>>>>> >>>>>> Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-) >>>>>> >>>>> >>>>> //some unified query giving me entries pointing by fk copy to bar and >>>>> //buz objects. So I need to manually load these references. >>>>> >>>>> //happy emmanuel >>>>> Cache unifiedCache = cacheManager.getMotherOfAllCaches(); >>>>> Bar bar = unifiedCache.get(foo); >>>>> Buz buz = unifiedCache.get(baz); >>>>> >>>>> //not so happy emmanuel >>>>> Cache fooCache = cacheManager.getCache("foo"); >>>>> Bar bar = fooCache.get(foo); >>>>> Cache bazCache = cacheManager.getCache("baz"); >>>>> Buz buz = bazCache.put(baz); >>>>> >>>> Would something like what Paul suggests in https://issues.jboss.org/browse/ISPN-3640 >>>> help you better? IOW, have a single cache, and then have a filtered view for Bar or Buz types? Not sure I understand the differences in your code changes in terms of what makes you happy vs not. >>>> >>> Not really. >>> What makes me unhappy is to have to keep in my app all the >>> references to these specific cache store instances. The filtering >>> approach only moves the problem. >>> _______________________________________________ >>> infinispan-dev mailing list >>> >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> _______________________________________________ >> infinispan-dev mailing list >> >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Mon Feb 24 12:27:07 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Mon, 24 Feb 2014 17:27:07 +0000 Subject: [infinispan-dev] RadarGun 1.1.0.Final released In-Reply-To: <530613E9.3020602@redhat.com> References: <530613E9.3020602@redhat.com> Message-ID: <4EC95CDC-9146-4E27-96D6-FCEAD6B76F27@redhat.com> Nice work, Radim! And the roadmap looks very good. On Feb 20, 2014, at 2:40 PM, Radim Vansa wrote: > Hi all, > > it has been a long time since last release of RadarGun. We have been > using it intensively and developed many new features - 1.0.0 had 7,340 > lines of Java code, 1.1.0 has 32,978 lines. RadarGun has become > multi-purpose tool, used for checking both performance and functionality > of caches under stress. > > During 1.1.0 development, most parts of code changed beyond the beyonds, > but we tried to keep the old configuration compatible. However, the > design started to be rather limiting, and therefore, we have decided to > make the last release for 1.1.0 and move on to RadarGun 2.0.0. In 1.1.x > branch we will provide bugfixes, but all new features should go to 2.0.0. > > Some decoys for features expected for RadarGun 2.0.0: > > * non-homogenous clusters: client/server setups, cooperation of > different versions of products, or easy setup of cross-site deployment > with different configurations > * abstracting from cache wrapper: you will be able to use RadarGun for > more than just caches without any hacks > ** current CacheWrapper interface will be designed to match JSR-107 > javax.cache.Cache rather than java.util.Map > * pluggable reporting: statistics will be directly multiplexed to > configured reporters (again, without cheating on directories), reporters > will provide the output formatted as CSV, HTML or even can deploy the > results to external repository > * merging local and distributed benchmark -> master + single slave > within one JVM > * better property parsing: evaluation of expressions, property > replacement executed on slaves > > I hope you will like it! And enjoy 1.1.0.Final release now. > > Radim > > ------ > Radim Vansa JBoss DataGrid QA > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Mon Feb 24 12:28:56 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Mon, 24 Feb 2014 17:28:56 +0000 Subject: [infinispan-dev] [infinispan-internal] Introducing Infinispan OData server: Remote JSON documents querying In-Reply-To: References: <1836092005.4948036.1392726921319.JavaMail.zimbra@redhat.com> Message-ID: <288CCCC7-A08B-4852-A405-57D7E2D640A1@redhat.com> Great work! You might want to add a blog entry to the infinispan blog as well? that would gain you more visibility. On Feb 20, 2014, at 11:37 AM, Galder Zamarre?o wrote: > Great work Tomas!! :) > > On 18 Feb 2014, at 13:35, Tomas Sykora wrote: > >> Hello all! :) >> >> It's the right time to make it a little bit more public and share some results of work on Infinispan OData server, finally! >> This solution can serve as a proof of concept where we are able to remotely query JSON documents stored in Infinispan caches and using industrial standard and platform independent way of communication with the server (OData). >> >> There is still much to do/implement/improve in the server, but it is working as it is now. >> >> Check a blog post if you are interested: >> http://tsykora-tech.blogspot.cz/2014/02/introducing-infinispan-odata-server.html >> >> Any feedback is more than welcome. >> >> + I'd like to say a big THANK YOU to all who supported me! >> Mainly: JDG QE guys, Manik, Mircea, Sanne and Adrian. >> It wouldn't be done without your patience and willingness to help me :-) >> >> Tomas >> > > > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz > > Project Lead, Escalante > http://escalante.io > > Engineer, Infinispan > http://infinispan.org > > Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Mon Feb 24 12:47:59 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Mon, 24 Feb 2014 17:47:59 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: References: <20140218130345.GB11962@hibernate.org> <53035D8F.4000604@infinispan.org> <9B723B39-E9F4-4C53-8AF3-1367925BBC2B@redhat.com> Message-ID: On Feb 24, 2014, at 5:39 PM, Sanne Grinovero wrote: > On 24 February 2014 16:51, Mircea Markus wrote: >> Just to recap the main reason for the JPA cache store is to be a replacement for the JDBCacheStore, nothing more than that. >> And it certainly has advantages compared with the JDBC Cache Stores: >> - JPA offers database independence/portability >> - doesn't put that many restrictions on the schema >> - it's easier write/read from an exiting database table > > Don't you dare hijacking my nice 2 years old thread :-D :-D > BTW why is this dicussion not public anymore? I missed the switch to undercover. I don't know where it switched to private, make it public again ;) > > Cheers, > Sanne > >> >> >> >> On Feb 18, 2014, at 1:18 PM, Tristan Tarrant wrote: >> >>> I think that the CacheLoader/Store SPI should be enhanced with "schema" information, whatever its source (JPA annotations, ProtoBuf, etc). >>> >>> A schema-aware store can then do what it pleases. >>> >>> Tristan >>> >>> On 18/02/2014 14:03, Emmanuel Bernard wrote: >>>> On Tue 2014-02-18 13:16, Adrian Nistor wrote: >>>>>> JPA cache store is a waste of time IMO :) >>>>> +1 :) >>>> My understanding is that the JPACacheStore discussion is revived because >>>> users want to map an existing databases, load the data in the grid and >>>> keep both synchronized. >>>> At least that's the use case I was told was needed to be covered. >>> >> >> Cheers, >> -- >> Mircea Markus >> Infinispan lead (www.infinispan.org) >> >> >> >> Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Mon Feb 24 11:36:30 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Mon, 24 Feb 2014 16:36:30 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <2D1C63B2-7313-4FE4-93D2-D50B91565FF2@redhat.com> References: <888EA204-30A1-4BFF-9469-7118996024A1@hibernate.org> <6211A55D-9F1D-4686-9EF4-373C216E4927@hibernate.org> <2C233AC3-BEFC-4FD5-A297-A854FEA8165D@hibernate.org> <2D1C63B2-7313-4FE4-93D2-D50B91565FF2@redhat.com> Message-ID: <92D9B688-7285-4406-9DAE-B120452C1655@redhat.com> On Feb 17, 2014, at 5:36 PM, Galder Zamarre?o wrote: > > On 31 Jan 2014, at 09:28, Emmanuel Bernard wrote: > >> >> >>> On 30 janv. 2014, at 20:51, Mircea Markus wrote: >>> >>> >>>> On Jan 30, 2014, at 9:42 AM, Galder Zamarre?o wrote: >>>> >>>> >>>>> On Jan 21, 2014, at 11:52 PM, Mircea Markus wrote: >>>>> >>>>> >>>>>> On Jan 15, 2014, at 1:42 PM, Emmanuel Bernard wrote: >>>>>> >>>>>> By the way, people looking for that feature are also asking for a unified Cache API accessing these several caches right? Otherwise I am not fully understanding why they ask for a unified query. >>>>>> Do you have written detailed use cases somewhere for me to better understand what is really requested? >>>>> >>>>> IMO from a user perspective, being able to run queries spreading several caches makes the programming simplifies the programming model: each cache corresponding to a single entity type, with potentially different configuration. >>>> >>>> Not sure if it simplifies things TBH if the configuration is the same. IMO, it just adds clutter. >>> >>> Not sure I follow: having a cache that contains both Cars and Persons sound more cluttering to me. I think it's cumbersome to write any kind of querying with an heterogenous cache, e.g. Map/Reduce tasks that need to count all the green Cars would need to be aware of Persons and ignore them. Not only it is harder to write, but discourages code reuse and makes it hard to maintain (if you'll add Pets in the same cache in future you need to update the M/R code as well). And of course there are also different cache-based configuration options that are not immediately obvious (at design time) but will be in the future (there are more Persons than Cars, they live longer/expiry etc): mixing everything together in the same cache from the begging is a design decision that might bite you in the future. >>> >>> The way I see it - and very curious to see your opinion on this - following an database analogy, the CacheManager corresponds to an Database and the Cache to a Table. Hence my thought that queries spreading multiple caches are both useful and needed (same as query spreading over multiple tables). >> >> I know Sanne and you are keen to have one entity type per cache to be able to fine tune the configuration. I am a little more skeptical but I don't have strong opinions on the subject. >> >> However, I don't think you can forbid the case where people want to store heterogenous types in the same cache: >> >> - it's easy to start with >> - configuration is indeed simpler >> - when you work in the same service with cats, dogs, owners, addresses and refuges, juggling between these n Cache instances begins to be fugly I suspect - should write some application code to confirm >> - people will add to the grid types unknown at configuration time. They might want a single bucket. > > +100 Totally agreed, there's no plan to forbid people storing heterogenous values in the same cache. The discussion at hand was actually the other way around: do we want to allow people to store data in multiple caches? if so we querying across multiple caches makes sense, hence this email. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Mon Feb 24 12:57:17 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Mon, 24 Feb 2014 17:57:17 +0000 Subject: [infinispan-dev] Further dist.exec and M/R API improvements In-Reply-To: <530517F9.3060008@redhat.com> References: <530517F9.3060008@redhat.com> Message-ID: <696741CC-41E9-4D03-9A91-BF8BA07EFF2D@redhat.com> On Feb 19, 2014, at 8:45 PM, Vladimir Blagojevic wrote: > Hey guys, > > As some of you might know we have received additional requirements from > community and internally to add a few things to dist.executors and > map/reduce API. On distributed executors front we need to enable > distributed executors to store results into cache directly rather than > returning them to invoker [1]. As soon as we introduce this API we also > need a asyc. mechanism to allow notifications of subtask > completion/failure. I think we need both in at the same time :-) > I was thinking we add a concept of > DistributedTaskExecutionListener which can be specified in > DistributedTaskBuilder: > > DistributedTaskBuilder > executionListener(DistributedTaskExecutionListener listener); > > > We needed DistributedTaskExecutionListener anyway. All distributed tasks > might use some feedback about task progress, completion/failure and on. > My proposal is roughly: > > > public interface DistributedTaskExecutionListener { > > void subtaskSent(Address node, Set inputKeys); > void subtaskFailed(Address node, Set inputKeys, Exception e); > void subtaskSucceded(Address node, Set inputKeys, T result); > void allSubtasksCompleted(); > > } > > So much for that. I think this it would make sense to add this logic for monitoring, + additional info such as average execution time etc. I'm not sure if this is a generally useful API though, unless there were people asking for it already? > If tasks do not use input keys these parameters would > be emply sets. Now for [1] we need to add additional methods to > DistributedExecutorService. We can not specify result cache in > DistributedTaskBuilder as we are still bound to only submit methods in > DistributedExecutorService that return futures and we don't want that. > We need two new void methods: > > void submitEverywhere(DistributedTask task, > Cache, T> result); > void submitEverywhere(DistributedTask task, > Cache, T> result, K... input); > > > Now, why bother with DistExecResultKey? Well we have tasks that use > input keys and tasks that don't. So results cache could only be keyed by > either keys or execution address, or combination of those two. > Therefore, DistExecResultKey could be something like: > > public interface DistExecResultKey { > > Address getExecutionAddress(); > K getKey(); > > } > > If you have a better idea how to address this aspect let us know. So > much for distributed executors. > > > For map/reduce we also have to enable storing of map reduce task results > into cache [2] and allow users to specify custom cache for intermediate > results[3]. Part of task [2] is to allow notification about map/reduce > task progress and completion. Just as in dist.executor I would add > MapReduceTaskExecutionListener interface: > > > public interface MapReduceTaskExecutionListener { > > void mapTaskInitialized(Address executionAddress); > void mapTaskSucceeded(Address executionAddress); > void mapTaskFailed(Address executionTarget, Exception cause); > void mapPhaseCompleted(); > > void reduceTaskInitialized(Address executionAddress); > void reduceTaskSucceeded(Address executionAddress); > void reduceTaskFailed(Address address, Exception cause); > void reducePhaseCompleted(); > > } IMO - in the first stage at leas - I would rather use a simpler (Notifying)Future, on which the user can wait till the computation happens: it's simpler and more aligned with the rest of our async API. > > while MapReduceTask would have an additional method: > > public void execute(Cache resultsCache); you could overload it with cache name only method. > > MapReduceTaskExecutionListener could be specified using fluent > MapReduceTask API just as intermediate cache would be: > > public MapReduceTask > usingIntermediateCache(Cache> tmpCache); > > thus addressing issue [3] +1 > > Let me know what you think, > Vladimir > > > [1] https://issues.jboss.org/browse/ISPN-4030 > [2] https://issues.jboss.org/browse/ISPN-4002 > [3] https://issues.jboss.org/browse/ISPN-4021 > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From vblagoje at redhat.com Mon Feb 24 15:55:43 2014 From: vblagoje at redhat.com (Vladimir Blagojevic) Date: Mon, 24 Feb 2014 15:55:43 -0500 Subject: [infinispan-dev] Further dist.exec and M/R API improvements In-Reply-To: <696741CC-41E9-4D03-9A91-BF8BA07EFF2D@redhat.com> References: <530517F9.3060008@redhat.com> <696741CC-41E9-4D03-9A91-BF8BA07EFF2D@redhat.com> Message-ID: <530BB1CF.1020307@redhat.com> See inline On 2/24/2014, 12:57 PM, Mircea Markus wrote: > On Feb 19, 2014, at 8:45 PM, Vladimir Blagojevic wrote: > >> Hey guys, >> >> As some of you might know we have received additional requirements from >> community and internally to add a few things to dist.executors and >> map/reduce API. On distributed executors front we need to enable >> distributed executors to store results into cache directly rather than >> returning them to invoker [1]. As soon as we introduce this API we also >> need a asyc. mechanism to allow notifications of subtask >> completion/failure. > I think we need both in at the same time :-) Yes, that is what I actually meant. Poor wording. > >> I was thinking we add a concept of >> DistributedTaskExecutionListener which can be specified in >> DistributedTaskBuilder: >> >> DistributedTaskBuilder >> executionListener(DistributedTaskExecutionListener listener); >> >> >> We needed DistributedTaskExecutionListener anyway. All distributed tasks >> might use some feedback about task progress, completion/failure and on. >> My proposal is roughly: >> >> >> public interface DistributedTaskExecutionListener { >> >> void subtaskSent(Address node, Set inputKeys); >> void subtaskFailed(Address node, Set inputKeys, Exception e); >> void subtaskSucceded(Address node, Set inputKeys, T result); >> void allSubtasksCompleted(); >> >> } >> >> So much for that. > I think this it would make sense to add this logic for monitoring, + additional info such as average execution time etc. I'm not sure if this is a generally useful API though, unless there were people asking for it already? Ok, noted. If you remember any references about this let me know and I'll incorporate what people actually asked for rather than guess. > >> If tasks do not use input keys these parameters would >> be emply sets. Now for [1] we need to add additional methods to >> DistributedExecutorService. We can not specify result cache in >> DistributedTaskBuilder as we are still bound to only submit methods in >> DistributedExecutorService that return futures and we don't want that. >> We need two new void methods: >> >> void submitEverywhere(DistributedTask task, >> Cache, T> result); >> void submitEverywhere(DistributedTask task, >> Cache, T> result, K... input); >> >> >> Now, why bother with DistExecResultKey? Well we have tasks that use >> input keys and tasks that don't. So results cache could only be keyed by >> either keys or execution address, or combination of those two. >> Therefore, DistExecResultKey could be something like: >> >> public interface DistExecResultKey { >> >> Address getExecutionAddress(); >> K getKey(); >> >> } >> >> If you have a better idea how to address this aspect let us know. So >> much for distributed executors. >> >> >> For map/reduce we also have to enable storing of map reduce task results >> into cache [2] and allow users to specify custom cache for intermediate >> results[3]. Part of task [2] is to allow notification about map/reduce >> task progress and completion. Just as in dist.executor I would add >> MapReduceTaskExecutionListener interface: >> >> >> public interface MapReduceTaskExecutionListener { >> >> void mapTaskInitialized(Address executionAddress); >> void mapTaskSucceeded(Address executionAddress); >> void mapTaskFailed(Address executionTarget, Exception cause); >> void mapPhaseCompleted(); >> >> void reduceTaskInitialized(Address executionAddress); >> void reduceTaskSucceeded(Address executionAddress); >> void reduceTaskFailed(Address address, Exception cause); >> void reducePhaseCompleted(); >> >> } > IMO - in the first stage at leas - I would rather use a simpler (Notifying)Future, on which the user can wait till the computation happens: it's simpler and more aligned with the rest of our async API. > What do you mean? We already have futures in MapReduceTask API. This API is more fine grained and allows monitoring/reporting of task progress. Please clarify. >> while MapReduceTask would have an additional method: >> >> public void execute(Cache resultsCache); > you could overload it with cache name only method. Yeah, good idea. Same for usingIntermediateCache? I actually asked you this here https://issues.jboss.org/browse/ISPN-4021 Thanks Mircea! Vladimir From emmanuel at hibernate.org Tue Feb 25 04:28:51 2014 From: emmanuel at hibernate.org (Emmanuel Bernard) Date: Tue, 25 Feb 2014 10:28:51 +0100 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> Message-ID: <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> > On 24 f?vr. 2014, at 17:39, Mircea Markus wrote: > > >> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard wrote: >> >> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome. > > Curious to hear the whole story :-) > We cannot mandate all the suers to use OGM though, one of the reasons being OGM is not platform independent (hotrod). Then solve all the issues I have raised with a magic wand and come back to me when you have done it, I'm interested. From dan.berindei at gmail.com Tue Feb 25 07:33:32 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Tue, 25 Feb 2014 14:33:32 +0200 Subject: [infinispan-dev] Further dist.exec and M/R API improvements In-Reply-To: <530BB1CF.1020307@redhat.com> References: <530517F9.3060008@redhat.com> <696741CC-41E9-4D03-9A91-BF8BA07EFF2D@redhat.com> <530BB1CF.1020307@redhat.com> Message-ID: On Mon, Feb 24, 2014 at 10:55 PM, Vladimir Blagojevic wrote: > See inline > On 2/24/2014, 12:57 PM, Mircea Markus wrote: > > On Feb 19, 2014, at 8:45 PM, Vladimir Blagojevic > wrote: > > > >> Hey guys, > >> > >> As some of you might know we have received additional requirements from > >> community and internally to add a few things to dist.executors and > >> map/reduce API. On distributed executors front we need to enable > >> distributed executors to store results into cache directly rather than > >> returning them to invoker [1]. As soon as we introduce this API we also > >> need a asyc. mechanism to allow notifications of subtask > >> completion/failure. > > I think we need both in at the same time :-) > Yes, that is what I actually meant. Poor wording. > Do we really need special support for distributed tasks to write results to another cache? We already allow a task to do cache.getCacheManager().getCache("outputCache").put(k, v) > > > >> I was thinking we add a concept of > >> DistributedTaskExecutionListener which can be specified in > >> DistributedTaskBuilder: > >> > >> DistributedTaskBuilder > >> executionListener(DistributedTaskExecutionListener listener); > >> > >> > >> We needed DistributedTaskExecutionListener anyway. All distributed tasks > >> might use some feedback about task progress, completion/failure and on. > >> My proposal is roughly: > >> > >> > >> public interface DistributedTaskExecutionListener { > >> > >> void subtaskSent(Address node, Set inputKeys); > >> void subtaskFailed(Address node, Set inputKeys, Exception e); > >> void subtaskSucceded(Address node, Set inputKeys, T result); > >> void allSubtasksCompleted(); > >> > >> } > >> > >> So much for that. > > I think this it would make sense to add this logic for monitoring, + > additional info such as average execution time etc. I'm not sure if this is > a generally useful API though, unless there were people asking for it > already? > Ok, noted. If you remember any references about this let me know and > I'll incorporate what people actually asked for rather than guess. > Ok, let's wait until we get some actual requests from users then. TBH I don't think distributed tasks with subtasks are something that users care about. E.g. with Map/Reduce the reduce tasks are not subtasks of the map/combine tasks, so this API wouldn't help. Hadoop has a Reporter interface that allows you to report "ticks" and increment counters, maybe we should add something like that instead? > > > > >> If tasks do not use input keys these parameters would > >> be emply sets. Now for [1] we need to add additional methods to > >> DistributedExecutorService. We can not specify result cache in > >> DistributedTaskBuilder as we are still bound to only submit methods in > >> DistributedExecutorService that return futures and we don't want that. > >> We need two new void methods: > >> > >> void submitEverywhere(DistributedTask task, > >> Cache, T> result); > >> void submitEverywhere(DistributedTask task, > >> Cache, T> result, K... input); > >> > >> > >> Now, why bother with DistExecResultKey? Well we have tasks that use > >> input keys and tasks that don't. So results cache could only be keyed by > >> either keys or execution address, or combination of those two. > >> Therefore, DistExecResultKey could be something like: > >> > >> public interface DistExecResultKey { > >> > >> Address getExecutionAddress(); > >> K getKey(); > >> > >> } > >> > >> If you have a better idea how to address this aspect let us know. So > >> much for distributed executors. > >> > I think we should allow each distributed task to deal with output in its own way, the existing API should be enough. > >> > >> For map/reduce we also have to enable storing of map reduce task results > >> into cache [2] and allow users to specify custom cache for intermediate > >> results[3]. Part of task [2] is to allow notification about map/reduce > >> task progress and completion. Just as in dist.executor I would add > >> MapReduceTaskExecutionListener interface: > >> > >> > >> public interface MapReduceTaskExecutionListener { > >> > >> void mapTaskInitialized(Address executionAddress); > >> void mapTaskSucceeded(Address executionAddress); > >> void mapTaskFailed(Address executionTarget, Exception cause); > >> void mapPhaseCompleted(); > >> > >> void reduceTaskInitialized(Address executionAddress); > >> void reduceTaskSucceeded(Address executionAddress); > >> void reduceTaskFailed(Address address, Exception cause); > >> void reducePhaseCompleted(); > >> > >> } > > IMO - in the first stage at leas - I would rather use a simpler > (Notifying)Future, on which the user can wait till the computation happens: > it's simpler and more aligned with the rest of our async API. > > > What do you mean? We already have futures in MapReduceTask API. This API > is more fine grained and allows monitoring/reporting of task progress. > Please clarify. > I'm not sure about the usefulness of an API like this either... if the intention is to allow the user to collect statistics about duration of various phases, then I think exposing the durations via MapReduceTasks would be better. > > >> while MapReduceTask would have an additional method: > >> > >> public void execute(Cache resultsCache); > > you could overload it with cache name only method. > Yeah, good idea. Same for usingIntermediateCache? I actually asked you > this here https://issues.jboss.org/browse/ISPN-4021 > +1 to allow a cache name only. For the intermediate cache I don't think it makes sense to allow a Cache version at all. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140225/7048c1d6/attachment.html From sanne at infinispan.org Tue Feb 25 08:39:00 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Tue, 25 Feb 2014 13:39:00 +0000 Subject: [infinispan-dev] Where's the roadmap? Message-ID: I was asked about the Infinispan roadmap on a forum post, my draft reads: "Sure it's available online, see.." but then I could actually only find this: https://community.jboss.org/wiki/InfinispanRoadmap (which is very outdated). So, what's the roadmap? Would be nice if we could have it updated and published on the new website. Cheers, Sanne From mmarkus at redhat.com Tue Feb 25 10:08:06 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Tue, 25 Feb 2014 15:08:06 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> Message-ID: <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> On Feb 25, 2014, at 9:28 AM, Emmanuel Bernard wrote: >> On 24 f?vr. 2014, at 17:39, Mircea Markus wrote: >> >> >>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard wrote: >>> >>> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome. >> >> Curious to hear the whole story :-) >> We cannot mandate all the suers to use OGM though, one of the reasons being OGM is not platform independent (hotrod). > > Then solve all the issues I have raised with a magic wand and come back to me when you have done it, I'm interested. People are going to use infinispan with one cache per entity, because it makes sense: - different config (repl/dist | persistent/non-persistent) for different data types - have map/reduce tasks running only the Person entires not on Dog as well, when you want to select (Person) where age > 18 I don't see a reason to forbid this, on the contrary. The way I see it the relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be a better abstraction and should be recommended as such for the Java clients, but ultimately we're a general purpose storage engine that is available to different platforms as well. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From vblagoje at redhat.com Tue Feb 25 10:09:29 2014 From: vblagoje at redhat.com (Vladimir Blagojevic) Date: Tue, 25 Feb 2014 10:09:29 -0500 Subject: [infinispan-dev] Further dist.exec and M/R API improvements In-Reply-To: References: <530517F9.3060008@redhat.com> <696741CC-41E9-4D03-9A91-BF8BA07EFF2D@redhat.com> <530BB1CF.1020307@redhat.com> Message-ID: <530CB229.4090301@redhat.com> On 2/25/2014, 7:33 AM, Dan Berindei wrote: > > > Do we really need special support for distributed tasks to write > results to another cache? We already allow a task to do > > cache.getCacheManager().getCache("outputCache").put(k, v) Yeah, very good point Dan. Thanks for being sanity check. Mircea? > > > > >> I was thinking we add a concept of > >> DistributedTaskExecutionListener which can be specified in > >> DistributedTaskBuilder: > >> > >> DistributedTaskBuilder > >> executionListener(DistributedTaskExecutionListener listener); > >> > >> > >> We needed DistributedTaskExecutionListener anyway. All > distributed tasks > >> might use some feedback about task progress, completion/failure > and on. > >> My proposal is roughly: > >> > >> > >> public interface DistributedTaskExecutionListener { > >> > >> void subtaskSent(Address node, Set inputKeys); > >> void subtaskFailed(Address node, Set inputKeys, > Exception e); > >> void subtaskSucceded(Address node, Set inputKeys, T result); > >> void allSubtasksCompleted(); > >> > >> } > >> > >> So much for that. > > I think this it would make sense to add this logic for > monitoring, + additional info such as average execution time etc. > I'm not sure if this is a generally useful API though, unless > there were people asking for it already? > Ok, noted. If you remember any references about this let me know and > I'll incorporate what people actually asked for rather than guess. > > > Ok, let's wait until we get some actual requests from users then. TBH > I don't think distributed tasks with subtasks are something that users > care about. E.g. with Map/Reduce the reduce tasks are not subtasks of > the map/combine tasks, so this API wouldn't help. > > Hadoop has a Reporter interface that allows you to report "ticks" and > increment counters, maybe we should add something like that instead? The subtask I am referring to here is just to denote part of the distributed task initiated using dist.executors. This interface (maybe extended a bit with ideas from Reporter) could be used for both monitoring and more application specific logic about task re-execution and so on. > > > I think we should allow each distributed task to deal with output in > its own way, the existing API should be enough. Yes, I can see your point. Mircea? > > > >> public interface MapReduceTaskExecutionListener { > >> > >> void mapTaskInitialized(Address executionAddress); > >> void mapTaskSucceeded(Address executionAddress); > >> void mapTaskFailed(Address executionTarget, Exception cause); > >> void mapPhaseCompleted(); > >> > >> void reduceTaskInitialized(Address executionAddress); > >> void reduceTaskSucceeded(Address executionAddress); > >> void reduceTaskFailed(Address address, Exception cause); > >> void reducePhaseCompleted(); > >> > >> } > > IMO - in the first stage at leas - I would rather use a simpler > (Notifying)Future, on which the user can wait till the computation > happens: it's simpler and more aligned with the rest of our async API. > > > What do you mean? We already have futures in MapReduceTask API. > This API > is more fine grained and allows monitoring/reporting of task progress. > Please clarify. > > > I'm not sure about the usefulness of an API like this either... if the > intention is to allow the user to collect statistics about duration of > various phases, then I think exposing the durations via MapReduceTasks > would be better. How would you design that API Dan? Something other than listener/callback interface? > > >> while MapReduceTask would have an additional method: > >> > >> public void execute(Cache resultsCache); > > you could overload it with cache name only method. > Yeah, good idea. Same for usingIntermediateCache? I actually asked you > this here https://issues.jboss.org/browse/ISPN-4021 > > > +1 to allow a cache name only. For the intermediate cache I don't > think it makes sense to allow a Cache version at all. Ok good. Deal. Thanks, Vladimir -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140225/da649f6f/attachment.html From mmarkus at redhat.com Tue Feb 25 11:24:03 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Tue, 25 Feb 2014 16:24:03 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> Message-ID: On Feb 25, 2014, at 3:46 PM, Adrian Nistor wrote: > They can do what they please. Either put multiple types in one basket or put them in separate caches (one type per cache). But allowing / recommending is one thing, mandating it is a different story. > > There's no reason to forbid _any_ of these scenarios / mandate one over the other! There was previously in this thread some suggestion of mandating the one type per cache usage. -1 for it Agreed. I actually don't see how we can enforce people that declare Cache not put whatever they want in it. Also makes total sense for smaller caches as it is easy to set up etc. The debate in this email, the way I understood it, was: are/should people using multiple caches for storing data? If yes we should consider querying functionality spreading over multiple caches. > > > > On Tue, Feb 25, 2014 at 5:08 PM, Mircea Markus wrote: > > On Feb 25, 2014, at 9:28 AM, Emmanuel Bernard wrote: > > >> On 24 f?vr. 2014, at 17:39, Mircea Markus wrote: > >> > >> > >>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard wrote: > >>> > >>> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome. > >> > >> Curious to hear the whole story :-) > >> We cannot mandate all the suers to use OGM though, one of the reasons being OGM is not platform independent (hotrod). > > > > Then solve all the issues I have raised with a magic wand and come back to me when you have done it, I'm interested. > > People are going to use infinispan with one cache per entity, because it makes sense: > - different config (repl/dist | persistent/non-persistent) for different data types > - have map/reduce tasks running only the Person entires not on Dog as well, when you want to select (Person) where age > 18 > I don't see a reason to forbid this, on the contrary. The way I see it the relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be a better abstraction and should be recommended as such for the Java clients, but ultimately we're a general purpose storage engine that is available to different platforms as well. > > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Tue Feb 25 11:33:58 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Tue, 25 Feb 2014 16:33:58 +0000 Subject: [infinispan-dev] Where's the roadmap? In-Reply-To: References: Message-ID: <7335F27A-7B85-4341-A8A0-35670F8E827C@redhat.com> I'm working on it right now.. On Feb 25, 2014, at 1:39 PM, Sanne Grinovero wrote: > I was asked about the Infinispan roadmap on a forum post, my draft reads: > > "Sure it's available online, see.." > > but then I could actually only find this: > https://community.jboss.org/wiki/InfinispanRoadmap > > (which is very outdated). > > So, what's the roadmap? > > Would be nice if we could have it updated and published on the new website. > > Cheers, > Sanne > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From sanne at infinispan.org Tue Feb 25 12:08:27 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Tue, 25 Feb 2014 17:08:27 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> Message-ID: There also is the opposite problem to be considered, as Emmanuel suggested on 11/04/2012: you can't forbid the user to store the same object (same type and same id) in two different caches, where each Cache might be using different indexing options. If the "search service" is a global concept, and you run a query which matches object X, we'll return it to the user but he won't be able to figure out from which cache it's being sourced: is that ok? Ultimately this implies a query might return the same object X in multiple positions in the result list of the query; for example it might be the top result according to some criteria but also be the 5th result because of how it was indexed in a different case: maybe someone will find good use for this "capability" but I see it primarily as a source of confusion. Finally, if we move the search service as a global component, there might be an impact in how we explain security: an ACL filter applied on one cache - or the index metadata produced by that cache - might not be applied in the same way by an entity being matched through a second cache. Not least a user's permission to access one cache (or not) will affect his results in a rather complex way. I'm wondering if we need to prevent such situations. Sanne On 25 February 2014 16:24, Mircea Markus wrote: > > On Feb 25, 2014, at 3:46 PM, Adrian Nistor wrote: > >> They can do what they please. Either put multiple types in one basket or put them in separate caches (one type per cache). But allowing / recommending is one thing, mandating it is a different story. >> >> There's no reason to forbid _any_ of these scenarios / mandate one over the other! There was previously in this thread some suggestion of mandating the one type per cache usage. -1 for it > > Agreed. I actually don't see how we can enforce people that declare Cache not put whatever they want in it. Also makes total sense for smaller caches as it is easy to set up etc. > The debate in this email, the way I understood it, was: are/should people using multiple caches for storing data? If yes we should consider querying functionality spreading over multiple caches. > >> >> >> >> On Tue, Feb 25, 2014 at 5:08 PM, Mircea Markus wrote: >> >> On Feb 25, 2014, at 9:28 AM, Emmanuel Bernard wrote: >> >> >> On 24 f?vr. 2014, at 17:39, Mircea Markus wrote: >> >> >> >> >> >>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard wrote: >> >>> >> >>> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome. >> >> >> >> Curious to hear the whole story :-) >> >> We cannot mandate all the suers to use OGM though, one of the reasons being OGM is not platform independent (hotrod). >> > >> > Then solve all the issues I have raised with a magic wand and come back to me when you have done it, I'm interested. >> >> People are going to use infinispan with one cache per entity, because it makes sense: >> - different config (repl/dist | persistent/non-persistent) for different data types >> - have map/reduce tasks running only the Person entires not on Dog as well, when you want to select (Person) where age > 18 >> I don't see a reason to forbid this, on the contrary. The way I see it the relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be a better abstraction and should be recommended as such for the Java clients, but ultimately we're a general purpose storage engine that is available to different platforms as well. >> >> >> Cheers, >> -- >> Mircea Markus >> Infinispan lead (www.infinispan.org) >> >> >> >> >> > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From sanne at infinispan.org Tue Feb 25 12:09:45 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Tue, 25 Feb 2014 17:09:45 +0000 Subject: [infinispan-dev] Where's the roadmap? In-Reply-To: <7335F27A-7B85-4341-A8A0-35670F8E827C@redhat.com> References: <7335F27A-7B85-4341-A8A0-35670F8E827C@redhat.com> Message-ID: On 25 February 2014 16:33, Mircea Markus wrote: > I'm working on it right now.. Thanks! As soon as you have a draft I'm happy to help with the Query section. Cheers, Sanne > > On Feb 25, 2014, at 1:39 PM, Sanne Grinovero wrote: > >> I was asked about the Infinispan roadmap on a forum post, my draft reads: >> >> "Sure it's available online, see.." >> >> but then I could actually only find this: >> https://community.jboss.org/wiki/InfinispanRoadmap >> >> (which is very outdated). >> >> So, what's the roadmap? >> >> Would be nice if we could have it updated and published on the new website. >> >> Cheers, >> Sanne >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From mmarkus at redhat.com Tue Feb 25 11:30:25 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Tue, 25 Feb 2014 16:30:25 +0000 Subject: [infinispan-dev] Further dist.exec and M/R API improvements In-Reply-To: <530CB229.4090301@redhat.com> References: <530517F9.3060008@redhat.com> <696741CC-41E9-4D03-9A91-BF8BA07EFF2D@redhat.com> <530BB1CF.1020307@redhat.com> <530CB229.4090301@redhat.com> Message-ID: <14276AB5-94EC-4303-9CB6-77FCBD667F8F@redhat.com> On Feb 25, 2014, at 3:09 PM, Vladimir Blagojevic wrote: > On 2/25/2014, 7:33 AM, Dan Berindei wrote: >> >> >> Do we really need special support for distributed tasks to write results to another cache? We already allow a task to do >> >> cache.getCacheManager().getCache("outputCache").put(k, v) > Yeah, very good point Dan. Thanks for being sanity check. Mircea? +1 >> >> >> > >> >> I was thinking we add a concept of >> >> DistributedTaskExecutionListener which can be specified in >> >> DistributedTaskBuilder: >> >> >> >> DistributedTaskBuilder >> >> executionListener(DistributedTaskExecutionListener listener); >> >> >> >> >> >> We needed DistributedTaskExecutionListener anyway. All distributed tasks >> >> might use some feedback about task progress, completion/failure and on. >> >> My proposal is roughly: >> >> >> >> >> >> public interface DistributedTaskExecutionListener { >> >> >> >> void subtaskSent(Address node, Set inputKeys); >> >> void subtaskFailed(Address node, Set inputKeys, Exception e); >> >> void subtaskSucceded(Address node, Set inputKeys, T result); >> >> void allSubtasksCompleted(); >> >> >> >> } >> >> >> >> So much for that. >> > I think this it would make sense to add this logic for monitoring, + additional info such as average execution time etc. I'm not sure if this is a generally useful API though, unless there were people asking for it already? >> Ok, noted. If you remember any references about this let me know and >> I'll incorporate what people actually asked for rather than guess. >> >> Ok, let's wait until we get some actual requests from users then. TBH I don't think distributed tasks with subtasks are something that users care about. E.g. with Map/Reduce the reduce tasks are not subtasks of the map/combine tasks, so this API wouldn't help. >> >> Hadoop has a Reporter interface that allows you to report "ticks" and increment counters, maybe we should add something like that instead? > > The subtask I am referring to here is just to denote part of the distributed task initiated using dist.executors. This interface (maybe extended a bit with ideas from Reporter) could be used for both monitoring and more application specific logic about task re-execution and so on. > > >> >> >> I think we should allow each distributed task to deal with output in its own way, the existing API should be enough. > > Yes, I can see your point. Mircea? +1 user driven features >> >> >> >> public interface MapReduceTaskExecutionListener { >> >> >> >> void mapTaskInitialized(Address executionAddress); >> >> void mapTaskSucceeded(Address executionAddress); >> >> void mapTaskFailed(Address executionTarget, Exception cause); >> >> void mapPhaseCompleted(); >> >> >> >> void reduceTaskInitialized(Address executionAddress); >> >> void reduceTaskSucceeded(Address executionAddress); >> >> void reduceTaskFailed(Address address, Exception cause); >> >> void reducePhaseCompleted(); >> >> >> >> } >> > IMO - in the first stage at leas - I would rather use a simpler (Notifying)Future, on which the user can wait till the computation happens: it's simpler and more aligned with the rest of our async API. >> > >> What do you mean? We already have futures in MapReduceTask API. This API >> is more fine grained and allows monitoring/reporting of task progress. >> Please clarify. ah right, wasn't aware of MapReduceTask.executeAsynchronously() :-) That's what I was after. >> >> I'm not sure about the usefulness of an API like this either... if the intention is to allow the user to collect statistics about duration of various phases, then I think exposing the durations via MapReduceTasks would be better. > How would you design that API Dan? Something other than listener/callback interface? Functionally, what I was having in mind was JMX stats for the MapReduce tasks in general: like average execution time, count etc. Also the ability to cancel a running task through JMX/JON would be nice. I don't think we need to expose this to the user through the MapReduceTaskExecutionListener above, though. > >> >> >> >> while MapReduceTask would have an additional method: >> >> >> >> public void execute(Cache resultsCache); >> > you could overload it with cache name only method. >> Yeah, good idea. Same for usingIntermediateCache? I actually asked you >> this here https://issues.jboss.org/browse/ISPN-4021 >> >> +1 to allow a cache name only. For the intermediate cache I don't think it makes sense to allow a Cache version at all. > Ok good. Deal. > > > Thanks, > Vladimir > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From vblagoje at redhat.com Tue Feb 25 14:31:05 2014 From: vblagoje at redhat.com (Vladimir Blagojevic) Date: Tue, 25 Feb 2014 14:31:05 -0500 Subject: [infinispan-dev] Further dist.exec and M/R API improvements In-Reply-To: <14276AB5-94EC-4303-9CB6-77FCBD667F8F@redhat.com> References: <530517F9.3060008@redhat.com> <696741CC-41E9-4D03-9A91-BF8BA07EFF2D@redhat.com> <530BB1CF.1020307@redhat.com> <530CB229.4090301@redhat.com> <14276AB5-94EC-4303-9CB6-77FCBD667F8F@redhat.com> Message-ID: <530CEF79.3000308@redhat.com> Hey, I am starting to like this thread more and more :-) In conclusion, for distributed executors we are not adding any new APIs because Callable implementers can already write to cache using existing API. We don't have to add any new elaborate callback/listener API either as users have not requested but should investigate Hadoop Reporter like interface to allow users some sense of task current execution phase. For map/reduce we will add a new method: public void execute(Cache resultsCache); Using fluent MapReduceTask API users would be able to specify an intermediate cache: public MapReduceTask usingIntermediateCache(String cacheName); We are not adding MapReduceTaskExecutionListener but more like JMX stats for the MapReduce tasks in general: like average execution time, count etc. Also the ability to cancel a running task through JMX/JON would be nice. Regards, Vladimir From dan.berindei at gmail.com Tue Feb 25 15:44:18 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Tue, 25 Feb 2014 22:44:18 +0200 Subject: [infinispan-dev] Further dist.exec and M/R API improvements In-Reply-To: <530CEF79.3000308@redhat.com> References: <530517F9.3060008@redhat.com> <696741CC-41E9-4D03-9A91-BF8BA07EFF2D@redhat.com> <530BB1CF.1020307@redhat.com> <530CB229.4090301@redhat.com> <14276AB5-94EC-4303-9CB6-77FCBD667F8F@redhat.com> <530CEF79.3000308@redhat.com> Message-ID: On Tue, Feb 25, 2014 at 9:31 PM, Vladimir Blagojevic wrote: > Hey, > > I am starting to like this thread more and more :-) In conclusion, for > distributed executors we are not adding any new APIs because Callable > implementers can already write to cache using existing API. We don't > have to add any new elaborate callback/listener API either as users have > not requested but should investigate Hadoop Reporter like interface to > allow users some sense of task current execution phase. > > For map/reduce we will add a new method: > > public void execute(Cache resultsCache); > > Using fluent MapReduceTask API users would be able to specify an > intermediate cache: > > public MapReduceTask usingIntermediateCache(String > cacheName); > > We are not adding MapReduceTaskExecutionListener but more like JMX stats > for the MapReduce tasks in general: like average execution time, count > etc. Also the ability to cancel a running task through JMX/JON would be > nice. > For statistics, I was thinking of adding a getStatistics() method to MapReduceTask that would return an object with the duration of each phase and the number of keys processed on each node, after the M/R task is done. This could probably be extended such that it gives the user in-progress information as well. The in-progress information would also tie in nicely with a progress listener, but I feel the events you proposed are too coarse. If the user wanted to display a progress bar in his application, and the cluster only had 2 nodes, the progress bar would hover for half of the time around 0% and for the other half of the time around 50%. So we'd need to keep reporting something while a phase is in progress (e.g. by splitting a node's keys to more than one mapping task, and reporting the end of each subtask), otherwise the listener wouldn't be of much use. Anyway, this would be something nice to have, but I don't think it's very important, so supplying some global statistics via JMX should be enough for now. Cheers Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140225/2c69f464/attachment.html From galder at redhat.com Wed Feb 26 01:56:08 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Wed, 26 Feb 2014 07:56:08 +0100 Subject: [infinispan-dev] Ditching ASYNC modes for REPL/DIST/INV/CacheStores? In-Reply-To: References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com> <52EB5197.4050801@redhat.com> <52EB7F6C.505@redhat.com> <52EB9889.9070800@redhat.com> <52EFA355.2070203@redhat.com> <52EFC3AF.5060201@redhat.com> Message-ID: <28874E57-C988-448A-99BB-1B65849D408F@redhat.com> On 19 Feb 2014, at 12:03, Sanne Grinovero wrote: > On 19 February 2014 07:12, Galder Zamarre?o wrote: >> >> On 03 Feb 2014, at 19:01, Dan Berindei wrote: >> >>> >>> >>> >>> On Mon, Feb 3, 2014 at 6:28 PM, Radim Vansa wrote: >>>>>>> For sync we would want to invoke directly to avoid context switching. >>>>>> I think you haven't properly understood what I was talking about: the >>>>>> putAsync should not switch context at all in the ideal design. It should >>>>>> traverse through the interceptors all the way down (logically, in >>>>>> current behaviour), invoke JGroups async API and jump out. Then, as soon >>>>>> as the response is received, the thread which delivered it should >>>>>> traverse the interceptor stack up (again, logically), and fire the future. >>>> A Future doesn't make much sense with an async transport. The problem >>>> is with an async transport you never get back a response so you never >>>> know when the actual command is completed and thus a Future is >>>> worthless. The caller wouldn't know if they could rely on the use of >>>> the Future or not. >>> >>> You're right, there's one important difference between putAsync and put >>> with async transport: in the first case you can find out when the >>> request is completed while you cannot with the latter. Not requiring the >>> ack can be an important optimization. I think that both versions are >>> very valid: first mostly for bulk operations = reduction of latency, >>> second for modifications that are acceptable to fail without handling that. >>> I had the first case in my mind when talking about async operations, and >>> there the futures are necessary. >>> >>> A couple more differences: >>> 1. You can't do commitAsync(), but you can configure the commit to be replicated asynchronously (1PC). Although we did talk about removing that option... >>> 2. If you do putAsync(k, v1); putAsync(k, v2), there is no ordering between the two and you might end up with k=v1 in the cache. >> >> If there?s any relationship between both puts for the caller thread, the caller must make sure that the second put is only called after the first has completed. > > Actually in such a case I would strongly expect Infinispan to keep the > two operations in order. This is not to be pushed on user's > responsibility. If the two operations are executed by the same thread, then yes, I agree that it should be applied one after the other: Thread-1: Future f1 = putAsync(k, v1); Thread-1: Future f2 = putAsync(k, v2); I?d expect v1 to be applied and then v2. This operations would be added to some queue that you?d expect both insertions to happen one after the other, in Thread-1, so yeah, we can apply them in order. However, if the following happens: Thread-1: Future f1 = putAsync(k, v1); Thread-2: Future f2 = putAsync(k, v2); We can?t be enforcing such ordering. Now, if there?s a relationship to the eye of the beholder between v1 and v2, and you expect v2 to be the end result, this is how you?d have to do it (JDK8-esque): Thread-1: Future f1 = putAsync(k, v1); Thread-2: Future f2 = f1.map.putAsync(k, v2); or: Thread-1: Future f1 = putAsync(k, v1); Thread-2: Future f2 = f1.map.replaceAsync(k, v1, v2); > >> >> If there?s separate threads calling it and it relies on this, it should call replace the second time, i.e. replaceAsync(k, v1, v2) to get the guarantees it wants. >> >> What is really important is that the order in which they are executed in one node/replica is the same order in which they?re executed in all other nodes. This was something that was not maintained when async marshalling was enabled. > > +1000 > > But also I'd stress that any sync operation should have a Future > returned, ^ To me, purely sync operations are any operations that return anything other than a Future. IOW: void put(k, v); ^ That?s an implicit sync operation where you have no choice. An async operation can behave both sync and async: Future put(k, v); Can be sync or async, depends on whether the user waits or does something once it completes. If it does not wait, or discards the Future, it?s async. If it does somethign with the future, it?s sync. > someone in this long thread suggested to have an option to > drop it for example to speedup bulk imports, but I really can't see a > scenario in which I wouldn't want to know about a failure. +1, I think everything should return a Future. > Let's not > do the same mistake that made MongoDB so "popular" ;-) > Bulk imports can still be mad efficient without strictly needing to go > these lenghts. > > Sanne > > >> >>> >>> >>>> >>>> Also it depends what you are trying to do with async. Currently async >>>> transport is only for sending messages to another node, we never think >>>> of when we are the owning node. In this case the calling thread would >>>> have to go down the interceptor stack and acquire any locks if it is >>>> the owner, thus causing this "async" to block if you have any >>>> contention on the given key. The use of another thread would allow >>>> the calling thread to be able to return immediately no matter what >>>> else is occurring. Also I don't see what is so wrong about having a >>>> context switch to run something asynchronously, we shouldn't have a >>>> context switch to block the user thread imo, which is very possible >>>> with locking. >>> >>> This is an important notice! Locking would complicate the design a lot, >>> because the thread in "async" mode should do only tryLocks - if this >>> fails, further processing should be dispatched to another thread. Not >>> sure if this could be implemented at all, because the thread may be >>> blocked inside JGroups as well (async API is about receiving the >>> response asynchronously, not about sending the message asynchronously). >>> >>> I don't say that the context switch is that bad. My concern is that you >>> have a very limited amount of requests that can be processed in >>> parallel. I consider a "request" something pretty lightweight in concept >>> - but one thread per request makes this rather heavyweight stuff. >>> >>> We did talk in Farnborough/Palma about removing the current LockManager with a queue-based structure like the one used for ordering total-order transactions. And about removing the implicit stack in the current interceptor stack with an explicit stack, to allow resuming a command mid-execution. But the feeling I got was that neither is going to make it into 7.0. >>> >>> >>>> >>>>> +1 much cleaner, I love it. Actually wasn't aware the current code >>>>> didn't do this :-( >>>> This is what the current async transport does, but it does nothing with Futures. >>> >>> Nevermind the futures, this is not the important part. It's not about >>> async transport neither, it's about async executors. >>> (okay, the thread was about dropping async transport, I have hijacked it) >>> >>> Radim >>> >>> -- >>> Radim Vansa >>> JBoss DataGrid QA >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> -- >> Galder Zamarre?o >> galder at redhat.com >> twitter.com/galderz >> >> Project Lead, Escalante >> http://escalante.io >> >> Engineer, Infinispan >> http://infinispan.org >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From galder at redhat.com Wed Feb 26 03:45:07 2014 From: galder at redhat.com (=?iso-8859-1?Q?Galder_Zamarre=F1o?=) Date: Wed, 26 Feb 2014 09:45:07 +0100 Subject: [infinispan-dev] How to add programmatic config to an exisitng xml configured cache In-Reply-To: References: Message-ID: <6D966643-6E98-4642-9DA9-38E9DF3CB49A@redhat.com> Hi Faseela, Can you create a unit test demonstrating this (including the config.xml file)? There are plenty of examples in [1]. Cheers, [1] https://github.com/infinispan/infinispan/tree/master/core/src/test/java/org/infinispan/configuration On 20 Feb 2014, at 10:11, Faseela K wrote: > Hi, > > I have some infinispan configurations available in "config.xml". > After loading this configuration, I want to append some more configurations programmatically, using Configuration Builder. > I am doing something like this : > > Configuration template = null; > ConfigurationBuilder builder = new ConfigurationBuilder(); > > DefaultCacheManager manager = new DefaultCacheManager( > "config.xml"); > template = manager.getCacheConfiguration("evictionCache"); > builder.read(template); > builder.loaders().passivation(false).shared(false).preload(true) > .addFileCacheStore().fetchPersistentState(true) > .purgerThreads(3).purgeSynchronously(true) > .ignoreModifications(false).purgeOnStartup(false) > .location("tmp").async() > .enabled(true).flushLockTimeout(15000).threadPoolSize(5) > .singletonStore().enabled(true).pushStateWhenCoordinator(true) > .pushStateTimeout(20000); > > manager.defineConfiguration("abcd", builder.build()); > > The problem with this code is, it's overwriting the evictionCache configuration. > Can somebody help me to fix this issue? > > Thanks, > Faseela > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From galder at redhat.com Wed Feb 26 01:57:03 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Wed, 26 Feb 2014 07:57:03 +0100 Subject: [infinispan-dev] Ditching ASYNC modes for REPL/DIST/INV/CacheStores? In-Reply-To: <5304DF70.2050603@redhat.com> References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com> <52EB5197.4050801@redhat.com> <5304DF70.2050603@redhat.com> Message-ID: On 19 Feb 2014, at 17:44, Dennis Reed wrote: > On 02/19/2014 12:57 AM, Galder Zamarre?o wrote: >> On 31 Jan 2014, at 08:32, Dennis Reed wrote: >> >>> It would be a loss of functionality. >>> >>> As a common example, the AS web session replication cache is configured >>> for ASYNC by default, for performance reasons. >>> But it can be changed to SYNC to guarantee that when the request >>> finishes that the session was replicated. >>> >>> That wouldn't be possible if you could no longer switch between >>> ASYNC/SYNC with just a configuration change. >> I disagree :). >> >> AS could abstract that configuration detail. IOW, if all Infinispan returned was Futures, AS or any other client application, has the choice in their hands: do they wait for the future to complete or not? If they do, they?re SYNC, if not ASYNC. AS can still expose this and no functionality is lost. > > Yes, the functionality is still lost. Your suggestion is just to > re-implement the functionality over and over in each ISPN caller. :) Yup, welcome to the non-blocking world. > >> What happens is that SYNC/ASYNC decision stops being a configuration option (bad, bad, bad) and becomes an actual programming decision Infinispan clients must address (good, good, good). > > This really depends on the client. For the AS session replication use > case, a config option is good, good, good. > But re-implementing the same functionality in every caller that may want > it to be a config option is bad, bad, bad. > > -Dennis > >>> -Dennis >>> >>> On 01/31/2014 01:08 AM, Galder Zamarre?o wrote: >>>> Hi all, >>>> >>>> The following came to my mind yesterday: I think we should ditch ASYNC modes for DIST/REPL/INV and our async cache store functionality. >>>> >>>> Instead, whoever wants to store something asyncronously should use asynchronous methods, i.e. call putAsync. So, this would mean that when you call put(), it's always sync. This would reduce the complexity and configuration of our code base, without affecting our functionality, and it would make things more logical IMO. >>>> >>>> WDYT? >>>> >>>> > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From dan.berindei at gmail.com Wed Feb 26 05:30:41 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Wed, 26 Feb 2014 12:30:41 +0200 Subject: [infinispan-dev] Ditching ASYNC modes for REPL/DIST/INV/CacheStores? In-Reply-To: <28874E57-C988-448A-99BB-1B65849D408F@redhat.com> References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com> <52EB5197.4050801@redhat.com> <52EB7F6C.505@redhat.com> <52EB9889.9070800@redhat.com> <52EFA355.2070203@redhat.com> <52EFC3AF.5060201@redhat.com> <28874E57-C988-448A-99BB-1B65849D408F@redhat.com> Message-ID: On Wed, Feb 26, 2014 at 8:56 AM, Galder Zamarre?o wrote: > > On 19 Feb 2014, at 12:03, Sanne Grinovero wrote: > > > On 19 February 2014 07:12, Galder Zamarre?o wrote: > >> > >> On 03 Feb 2014, at 19:01, Dan Berindei wrote: > >> > >>> > >>> > >>> > >>> On Mon, Feb 3, 2014 at 6:28 PM, Radim Vansa wrote: > >>>>>>> For sync we would want to invoke directly to avoid context > switching. > >>>>>> I think you haven't properly understood what I was talking about: > the > >>>>>> putAsync should not switch context at all in the ideal design. It > should > >>>>>> traverse through the interceptors all the way down (logically, in > >>>>>> current behaviour), invoke JGroups async API and jump out. Then, as > soon > >>>>>> as the response is received, the thread which delivered it should > >>>>>> traverse the interceptor stack up (again, logically), and fire the > future. > >>>> A Future doesn't make much sense with an async transport. The problem > >>>> is with an async transport you never get back a response so you never > >>>> know when the actual command is completed and thus a Future is > >>>> worthless. The caller wouldn't know if they could rely on the use of > >>>> the Future or not. > >>> > >>> You're right, there's one important difference between putAsync and put > >>> with async transport: in the first case you can find out when the > >>> request is completed while you cannot with the latter. Not requiring > the > >>> ack can be an important optimization. I think that both versions are > >>> very valid: first mostly for bulk operations = reduction of latency, > >>> second for modifications that are acceptable to fail without handling > that. > >>> I had the first case in my mind when talking about async operations, > and > >>> there the futures are necessary. > >>> > >>> A couple more differences: > >>> 1. You can't do commitAsync(), but you can configure the commit to be > replicated asynchronously (1PC). Although we did talk about removing that > option... > >>> 2. If you do putAsync(k, v1); putAsync(k, v2), there is no ordering > between the two and you might end up with k=v1 in the cache. > >> > >> If there's any relationship between both puts for the caller thread, > the caller must make sure that the second put is only called after the > first has completed. > > > > Actually in such a case I would strongly expect Infinispan to keep the > > two operations in order. This is not to be pushed on user's > > responsibility. > > If the two operations are executed by the same thread, then yes, I agree > that it should be applied one after the other: > > Thread-1: Future f1 = putAsync(k, v1); > Thread-1: Future f2 = putAsync(k, v2); > > I'd expect v1 to be applied and then v2. This operations would be added to > some queue that you'd expect both insertions to happen one after the other, > in Thread-1, so yeah, we can apply them in order. > This does definitely not happen at the moment in Infinispan. Each putAsync gets its own asynchronous worker thread (there are 25 async threads by default), and the threads are not synchronized in any way. And I'm not sure it makes sense to order them anyway. I mean the order between two sequential putAsync operations was preserved, it would be quite natural to expect the ordering between a putAsync and a regular put to be preserved as well. Thread-1: Future f1 = putAsync(k, v1) Thread-1: put(k, v2) Thread-1: assert f1.isDone() && get(k).equals(v2) This would get quite complicated... an async put always creates a new, implicit, transaction, whereas a regular put can be part of an active transaction. So preserving the ordering between the putAsync and the put might mean delaying not the put, but the transaction commit. I'm not saying this couldn't be done, but I'm not sure it would make the semantics of putAsync any clearer than they are now. > > However, if the following happens: > > Thread-1: Future f1 = putAsync(k, v1); > Thread-2: Future f2 = putAsync(k, v2); > > We can't be enforcing such ordering. > > Now, if there's a relationship to the eye of the beholder between v1 and > v2, and you expect v2 to be the end result, this is how you'd have to do it > (JDK8-esque): > > Thread-1: Future f1 = putAsync(k, v1); > Thread-2: Future f2 = f1.map.putAsync(k, v2); > > or: > > Thread-1: Future f1 = putAsync(k, v1); > Thread-2: Future f2 = f1.map.replaceAsync(k, v1, v2); > Do you mean here that the 2nd putAsync/the replaceAsync operation would start executing only after f1 is done? Or would you expect them both to start executing at once, but with Infinispan ensuring that the 2nd operation is executed on the primary owner after the 1st? If it's the former, it should be quite easy to implement a Future with a getCache() method that returns a delegating cache, allowing you to submit a put operation immediately, but blocking it until the future is done. If it's the latter, I suspect it's going to be a lot more work. > > > >> > >> If there's separate threads calling it and it relies on this, it should > call replace the second time, i.e. replaceAsync(k, v1, v2) to get the > guarantees it wants. > >> > >> What is really important is that the order in which they are executed > in one node/replica is the same order in which they're executed in all > other nodes. This was something that was not maintained when async > marshalling was enabled. > > > > +1000 > > > > But also I'd stress that any sync operation should have a Future > > returned, > > ^ To me, purely sync operations are any operations that return anything > other than a Future. IOW: > > void put(k, v); > > ^ That's an implicit sync operation where you have no choice. > > An async operation can behave both sync and async: > > Future put(k, v); > > Can be sync or async, depends on whether the user waits or does something > once it completes. If it does not wait, or discards the Future, it's async. > If it does somethign with the future, it's sync. > I don't agree with this. If the user can do something else while the operation is executing, then the operation is async. I don't know if there is a specific name in Java-land for starting an async call and discarding the Future, but in .Net this pattern is called "fire-and-forget". > > someone in this long thread suggested to have an option to > > drop it for example to speedup bulk imports, but I really can't see a > > scenario in which I wouldn't want to know about a failure. > > +1, I think everything should return a Future. > Even void put(k, v)?? > > > Let's not > > do the same mistake that made MongoDB so "popular" ;-) > > Bulk imports can still be mad efficient without strictly needing to go > > these lenghts. > > > > Sanne > > > > > >> > >>> > >>> > >>>> > >>>> Also it depends what you are trying to do with async. Currently async > >>>> transport is only for sending messages to another node, we never think > >>>> of when we are the owning node. In this case the calling thread would > >>>> have to go down the interceptor stack and acquire any locks if it is > >>>> the owner, thus causing this "async" to block if you have any > >>>> contention on the given key. The use of another thread would allow > >>>> the calling thread to be able to return immediately no matter what > >>>> else is occurring. Also I don't see what is so wrong about having a > >>>> context switch to run something asynchronously, we shouldn't have a > >>>> context switch to block the user thread imo, which is very possible > >>>> with locking. > >>> > >>> This is an important notice! Locking would complicate the design a lot, > >>> because the thread in "async" mode should do only tryLocks - if this > >>> fails, further processing should be dispatched to another thread. Not > >>> sure if this could be implemented at all, because the thread may be > >>> blocked inside JGroups as well (async API is about receiving the > >>> response asynchronously, not about sending the message asynchronously). > >>> > >>> I don't say that the context switch is that bad. My concern is that you > >>> have a very limited amount of requests that can be processed in > >>> parallel. I consider a "request" something pretty lightweight in > concept > >>> - but one thread per request makes this rather heavyweight stuff. > >>> > >>> We did talk in Farnborough/Palma about removing the current > LockManager with a queue-based structure like the one used for ordering > total-order transactions. And about removing the implicit stack in the > current interceptor stack with an explicit stack, to allow resuming a > command mid-execution. But the feeling I got was that neither is going to > make it into 7.0. > >>> > >>> > >>>> > >>>>> +1 much cleaner, I love it. Actually wasn't aware the current code > >>>>> didn't do this :-( > >>>> This is what the current async transport does, but it does nothing > with Futures. > >>> > >>> Nevermind the futures, this is not the important part. It's not about > >>> async transport neither, it's about async executors. > >>> (okay, the thread was about dropping async transport, I have hijacked > it) > >>> > >>> Radim > >>> > >>> -- > >>> Radim Vansa > >>> JBoss DataGrid QA > >>> > >>> _______________________________________________ > >>> infinispan-dev mailing list > >>> infinispan-dev at lists.jboss.org > >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >>> > >>> _______________________________________________ > >>> infinispan-dev mailing list > >>> infinispan-dev at lists.jboss.org > >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> > >> > >> -- > >> Galder Zamarre?o > >> galder at redhat.com > >> twitter.com/galderz > >> > >> Project Lead, Escalante > >> http://escalante.io > >> > >> Engineer, Infinispan > >> http://infinispan.org > >> > >> > >> _______________________________________________ > >> infinispan-dev mailing list > >> infinispan-dev at lists.jboss.org > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz > > Project Lead, Escalante > http://escalante.io > > Engineer, Infinispan > http://infinispan.org > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140226/3228b679/attachment-0001.html From galder at redhat.com Wed Feb 26 05:47:34 2014 From: galder at redhat.com (=?iso-8859-1?Q?Galder_Zamarre=F1o?=) Date: Wed, 26 Feb 2014 11:47:34 +0100 Subject: [infinispan-dev] RadarGun 1.1.0.Final released In-Reply-To: <530613E9.3020602@redhat.com> References: <530613E9.3020602@redhat.com> Message-ID: <045153C3-EDCA-4D17-A0FF-C8C58134CE10@redhat.com> Great work Radim!!! Awesome job and very interesting roadmap :) On 20 Feb 2014, at 15:40, Radim Vansa wrote: > Hi all, > > it has been a long time since last release of RadarGun. We have been > using it intensively and developed many new features - 1.0.0 had 7,340 > lines of Java code, 1.1.0 has 32,978 lines. RadarGun has become > multi-purpose tool, used for checking both performance and functionality > of caches under stress. > > During 1.1.0 development, most parts of code changed beyond the beyonds, > but we tried to keep the old configuration compatible. However, the > design started to be rather limiting, and therefore, we have decided to > make the last release for 1.1.0 and move on to RadarGun 2.0.0. In 1.1.x > branch we will provide bugfixes, but all new features should go to 2.0.0. > > Some decoys for features expected for RadarGun 2.0.0: > > * non-homogenous clusters: client/server setups, cooperation of > different versions of products, or easy setup of cross-site deployment > with different configurations > * abstracting from cache wrapper: you will be able to use RadarGun for > more than just caches without any hacks > ** current CacheWrapper interface will be designed to match JSR-107 > javax.cache.Cache rather than java.util.Map > * pluggable reporting: statistics will be directly multiplexed to > configured reporters (again, without cheating on directories), reporters > will provide the output formatted as CSV, HTML or even can deploy the > results to external repository > * merging local and distributed benchmark -> master + single slave > within one JVM > * better property parsing: evaluation of expressions, property > replacement executed on slaves > > I hope you will like it! And enjoy 1.1.0.Final release now. > > Radim > > ------ > Radim Vansa JBoss DataGrid QA > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From dan.berindei at gmail.com Wed Feb 26 06:22:43 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Wed, 26 Feb 2014 13:22:43 +0200 Subject: [infinispan-dev] RadarGun 1.1.0.Final released In-Reply-To: <045153C3-EDCA-4D17-A0FF-C8C58134CE10@redhat.com> References: <530613E9.3020602@redhat.com> <045153C3-EDCA-4D17-A0FF-C8C58134CE10@redhat.com> Message-ID: Great job, Radim! Looking forward to Radargun 2.0! On Wed, Feb 26, 2014 at 12:47 PM, Galder Zamarre?o wrote: > Great work Radim!!! Awesome job and very interesting roadmap :) > > On 20 Feb 2014, at 15:40, Radim Vansa wrote: > > > Hi all, > > > > it has been a long time since last release of RadarGun. We have been > > using it intensively and developed many new features - 1.0.0 had 7,340 > > lines of Java code, 1.1.0 has 32,978 lines. RadarGun has become > > multi-purpose tool, used for checking both performance and functionality > > of caches under stress. > > > > During 1.1.0 development, most parts of code changed beyond the beyonds, > > but we tried to keep the old configuration compatible. However, the > > design started to be rather limiting, and therefore, we have decided to > > make the last release for 1.1.0 and move on to RadarGun 2.0.0. In 1.1.x > > branch we will provide bugfixes, but all new features should go to 2.0.0. > > > > Some decoys for features expected for RadarGun 2.0.0: > > > > * non-homogenous clusters: client/server setups, cooperation of > > different versions of products, or easy setup of cross-site deployment > > with different configurations > > * abstracting from cache wrapper: you will be able to use RadarGun for > > more than just caches without any hacks > > ** current CacheWrapper interface will be designed to match JSR-107 > > javax.cache.Cache rather than java.util.Map > > * pluggable reporting: statistics will be directly multiplexed to > > configured reporters (again, without cheating on directories), reporters > > will provide the output formatted as CSV, HTML or even can deploy the > > results to external repository > > * merging local and distributed benchmark -> master + single slave > > within one JVM > > * better property parsing: evaluation of expressions, property > > replacement executed on slaves > > > > I hope you will like it! And enjoy 1.1.0.Final release now. > > > > Radim > > > > ------ > > Radim Vansa JBoss DataGrid QA > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz > > Project Lead, Escalante > http://escalante.io > > Engineer, Infinispan > http://infinispan.org > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140226/f6c18c85/attachment.html From mmarkus at redhat.com Wed Feb 26 08:12:07 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Wed, 26 Feb 2014 13:12:07 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> Message-ID: <6036A294-231A-484F-8224-C77372987832@redhat.com> On Feb 25, 2014, at 5:08 PM, Sanne Grinovero wrote: > There also is the opposite problem to be considered, as Emmanuel > suggested on 11/04/2012: > you can't forbid the user to store the same object (same type and same > id) in two different caches, where each Cache might be using different > indexing options. > > If the "search service" is a global concept, and you run a query which > matches object X, we'll return it to the user but he won't be able to > figure out from which cache it's being sourced: is that ok? Can't the user figure that out based on the way the query is built? I mean the problem is similar with the databases: if address is both a table and an column in the USER table, then it's the query (select) that determines where from the address is returned. > > Ultimately this implies a query might return the same object X in > multiple positions in the result list of the query; for example it > might be the top result according to some criteria but also be the 5th > result because of how it was indexed in a different case: maybe > someone will find good use for this "capability" but I see it > primarily as a source of confusion. Curious if this cannot be source of data can/cannot be specified within the query. > Finally, if we move the search service as a global component, there > might be an impact in how we explain security: an ACL filter applied > on one cache - or the index metadata produced by that cache - might > not be applied in the same way by an entity being matched through a > second cache. > Not least a user's permission to access one cache (or not) will affect > his results in a rather complex way. I'll let Tristan comment more on this, but is this really different from an SQL database where you grant access on individual tables and run a query involving multiple of them? > > I'm wondering if we need to prevent such situations. > > Sanne > > On 25 February 2014 16:24, Mircea Markus wrote: >> >> On Feb 25, 2014, at 3:46 PM, Adrian Nistor wrote: >> >>> They can do what they please. Either put multiple types in one basket or put them in separate caches (one type per cache). But allowing / recommending is one thing, mandating it is a different story. >>> >>> There's no reason to forbid _any_ of these scenarios / mandate one over the other! There was previously in this thread some suggestion of mandating the one type per cache usage. -1 for it >> >> Agreed. I actually don't see how we can enforce people that declare Cache not put whatever they want in it. Also makes total sense for smaller caches as it is easy to set up etc. >> The debate in this email, the way I understood it, was: are/should people using multiple caches for storing data? If yes we should consider querying functionality spreading over multiple caches. >> >>> >>> >>> >>> On Tue, Feb 25, 2014 at 5:08 PM, Mircea Markus wrote: >>> >>> On Feb 25, 2014, at 9:28 AM, Emmanuel Bernard wrote: >>> >>>>> On 24 f?vr. 2014, at 17:39, Mircea Markus wrote: >>>>> >>>>> >>>>>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard wrote: >>>>>> >>>>>> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome. >>>>> >>>>> Curious to hear the whole story :-) >>>>> We cannot mandate all the suers to use OGM though, one of the reasons being OGM is not platform independent (hotrod). >>>> >>>> Then solve all the issues I have raised with a magic wand and come back to me when you have done it, I'm interested. >>> >>> People are going to use infinispan with one cache per entity, because it makes sense: >>> - different config (repl/dist | persistent/non-persistent) for different data types >>> - have map/reduce tasks running only the Person entires not on Dog as well, when you want to select (Person) where age > 18 >>> I don't see a reason to forbid this, on the contrary. The way I see it the relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be a better abstraction and should be recommended as such for the Java clients, but ultimately we're a general purpose storage engine that is available to different platforms as well. >>> >>> >>> Cheers, >>> -- >>> Mircea Markus >>> Infinispan lead (www.infinispan.org) >>> >>> >>> >>> >>> >> >> Cheers, >> -- >> Mircea Markus >> Infinispan lead (www.infinispan.org) >> >> >> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From ttarrant at redhat.com Wed Feb 26 08:05:54 2014 From: ttarrant at redhat.com (Tristan Tarrant) Date: Wed, 26 Feb 2014 14:05:54 +0100 Subject: [infinispan-dev] JavaDocs and API documentation Message-ID: <530DE6B2.2060405@redhat.com> Dear all, our JavaDocs currently encompass all of our classes, interfaces, etc with no clear distinction between public and private API/SPI. I would like to clearly mark which of our classes/interfaces are public API. Should we: - add some decoration / visual cue to such elements to distinguish them from the internal stuff - generate two JavaDoc bundles: one which only contains the public API/SPIs and one with everything Tristan From dan.berindei at gmail.com Wed Feb 26 09:13:36 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Wed, 26 Feb 2014 16:13:36 +0200 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <6036A294-231A-484F-8224-C77372987832@redhat.com> References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> <6036A294-231A-484F-8224-C77372987832@redhat.com> Message-ID: On Wed, Feb 26, 2014 at 3:12 PM, Mircea Markus wrote: > > On Feb 25, 2014, at 5:08 PM, Sanne Grinovero wrote: > > > There also is the opposite problem to be considered, as Emmanuel > > suggested on 11/04/2012: > > you can't forbid the user to store the same object (same type and same > > id) in two different caches, where each Cache might be using different > > indexing options. > > > > If the "search service" is a global concept, and you run a query which > > matches object X, we'll return it to the user but he won't be able to > > figure out from which cache it's being sourced: is that ok? > > Can't the user figure that out based on the way the query is built? > I mean the problem is similar with the databases: if address is both a > table and an column in the USER table, then it's the query (select) that > determines where from the address is returned. > You mean the user should specify the cache name(s) when building the query? With a database you have to go a bit out of your way to select from more than one table at a time, normally you have just one primary table that you select from and the others are just to help you filter and transform that table. You also have to add some information about the source table yourself if you need it, otherwise the DB won't tell you what table the results are coming from: SELECT "table1" as source, id FROM table1 UNION ALL SELECT "table2" as source, id FROM table2 Adrian tells our current query API doesn't allow us to do projections with synthetic columns. On the other hand, we need to extend the current API to give us the entry key anyway, so it would be easy to extend it to give us the name of the cache as well. > > > > Ultimately this implies a query might return the same object X in > > multiple positions in the result list of the query; for example it > > might be the top result according to some criteria but also be the 5th > > result because of how it was indexed in a different case: maybe > > someone will find good use for this "capability" but I see it > > primarily as a source of confusion. > > Curious if this cannot be source of data can/cannot be specified within > the query. > Right, the user should be able to scope a search to a single cache, or maybe to multiple caches, even if there is only one global index. But I think the same object can already be inserted twice in the same cache, only with a different key, so returning duplicates from a query is something the user already has to cope with. > > Finally, if we move the search service as a global component, there > > might be an impact in how we explain security: an ACL filter applied > > on one cache - or the index metadata produced by that cache - might > > not be applied in the same way by an entity being matched through a > > second cache. > > Not least a user's permission to access one cache (or not) will affect > > his results in a rather complex way. > > I'll let Tristan comment more on this, but is this really different from > an SQL database where you grant access on individual tables and run a query > involving multiple of them? > The difference would be that in a DB each table will have its own index(es), so they only have to check the permissions once and not for every row. OTOH, if we plan to support key-level permissions, that would require checking the permissions on each search result anyway, so this wouldn't cost us anything. > > > > > I'm wondering if we need to prevent such situations. > > > > Sanne > > > > On 25 February 2014 16:24, Mircea Markus wrote: > >> > >> On Feb 25, 2014, at 3:46 PM, Adrian Nistor wrote: > >> > >>> They can do what they please. Either put multiple types in one basket > or put them in separate caches (one type per cache). But allowing / > recommending is one thing, mandating it is a different story. > >>> > >>> There's no reason to forbid _any_ of these scenarios / mandate one > over the other! There was previously in this thread some suggestion of > mandating the one type per cache usage. -1 for it > >> > >> Agreed. I actually don't see how we can enforce people that declare > Cache not put whatever they want in it. Also makes total > sense for smaller caches as it is easy to set up etc. > >> The debate in this email, the way I understood it, was: are/should > people using multiple caches for storing data? If yes we should consider > querying functionality spreading over multiple caches. > >> > >>> > >>> > >>> > >>> On Tue, Feb 25, 2014 at 5:08 PM, Mircea Markus > wrote: > >>> > >>> On Feb 25, 2014, at 9:28 AM, Emmanuel Bernard > wrote: > >>> > >>>>> On 24 f?vr. 2014, at 17:39, Mircea Markus > wrote: > >>>>> > >>>>> > >>>>>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard < > emmanuel at hibernate.org> wrote: > >>>>>> > >>>>>> By the way, Mircea, Sanne and I had quite a long discussion about > this one and the idea of one cache per entity. It turns out that the right > (as in easy) solution does involve a higher level programming model like > OGM provides. You can simulate it yourself using the Infinispan APIs but it > is just cumbersome. > >>>>> > >>>>> Curious to hear the whole story :-) > >>>>> We cannot mandate all the suers to use OGM though, one of the > reasons being OGM is not platform independent (hotrod). > >>>> > >>>> Then solve all the issues I have raised with a magic wand and come > back to me when you have done it, I'm interested. > >>> > >>> People are going to use infinispan with one cache per entity, because > it makes sense: > >>> - different config (repl/dist | persistent/non-persistent) for > different data types > >>> - have map/reduce tasks running only the Person entires not on Dog as > well, when you want to select (Person) where age > 18 > >>> I don't see a reason to forbid this, on the contrary. The way I see it > the relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be > a better abstraction and should be recommended as such for the Java > clients, but ultimately we're a general purpose storage engine that is > available to different platforms as well. > >>> > >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140226/d2d2f701/attachment-0001.html From mmarkus at redhat.com Wed Feb 26 09:02:20 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Wed, 26 Feb 2014 14:02:20 +0000 Subject: [infinispan-dev] JavaDocs and API documentation In-Reply-To: <530DE6B2.2060405@redhat.com> References: <530DE6B2.2060405@redhat.com> Message-ID: On Feb 26, 2014, at 1:05 PM, Tristan Tarrant wrote: > Dear all, > > our JavaDocs currently encompass all of our classes, interfaces, etc > with no clear distinction between public and private API/SPI. I would > like to clearly mark which of our classes/interfaces are public API. > Should we: > > - add some decoration / visual cue to such elements to distinguish them > from the internal stuff I think Sanne mentioned and i think it was Hibernate that has impl sub-packages for all the non-public API. Sounds sensible to me, as people will see the impl in the class name when importing it, and that should raise question marks. shall we adopt that? > - generate two JavaDoc bundles: one which only contains the public > API/SPIs and one with everything > > Tristan > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Wed Feb 26 09:20:54 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Wed, 26 Feb 2014 14:20:54 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> <6036A294-231A-484F-8224-C77372987832@redhat.com> Message-ID: On Feb 26, 2014, at 2:13 PM, Dan Berindei wrote: > > > > On Wed, Feb 26, 2014 at 3:12 PM, Mircea Markus wrote: > > On Feb 25, 2014, at 5:08 PM, Sanne Grinovero wrote: > > > There also is the opposite problem to be considered, as Emmanuel > > suggested on 11/04/2012: > > you can't forbid the user to store the same object (same type and same > > id) in two different caches, where each Cache might be using different > > indexing options. > > > > If the "search service" is a global concept, and you run a query which > > matches object X, we'll return it to the user but he won't be able to > > figure out from which cache it's being sourced: is that ok? > > Can't the user figure that out based on the way the query is built? > I mean the problem is similar with the databases: if address is both a table and an column in the USER table, then it's the query (select) that determines where from the address is returned. > > You mean the user should specify the cache name(s) when building the query? yes > > With a database you have to go a bit out of your way to select from more than one table at a time, normally you have just one primary table that you select from and the others are just to help you filter and transform that table. You also have to add some information about the source table yourself if you need it, otherwise the DB won't tell you what table the results are coming from: > > SELECT "table1" as source, id FROM table1 > UNION ALL > SELECT "table2" as source, id FROM table2 > > Adrian tells our current query API doesn't allow us to do projections with synthetic columns. On the other hand, we need to extend the current API to give us the entry key anyway, so it would be easy to extend it to give us the name of the cache as well. > > > > > > Ultimately this implies a query might return the same object X in > > multiple positions in the result list of the query; for example it > > might be the top result according to some criteria but also be the 5th > > result because of how it was indexed in a different case: maybe > > someone will find good use for this "capability" but I see it > > primarily as a source of confusion. > > Curious if this cannot be source of data can/cannot be specified within the query. > > Right, the user should be able to scope a search to a single cache, or maybe to multiple caches, even if there is only one global index. > > But I think the same object can already be inserted twice in the same cache, only with a different key, so returning duplicates from a query is something the user already has to cope with. > > > > Finally, if we move the search service as a global component, there > > might be an impact in how we explain security: an ACL filter applied > > on one cache - or the index metadata produced by that cache - might > > not be applied in the same way by an entity being matched through a > > second cache. > > Not least a user's permission to access one cache (or not) will affect > > his results in a rather complex way. > > I'll let Tristan comment more on this, but is this really different from an SQL database where you grant access on individual tables and run a query involving multiple of them? > > The difference would be that in a DB each table will have its own index(es), so they only have to check the permissions once and not for every row. > > OTOH, if we plan to support key-level permissions, that would require checking the permissions on each search result anyway, so this wouldn't cost us anything. > > > > > > I'm wondering if we need to prevent such situations. > > > > Sanne > > > > On 25 February 2014 16:24, Mircea Markus wrote: > >> > >> On Feb 25, 2014, at 3:46 PM, Adrian Nistor wrote: > >> > >>> They can do what they please. Either put multiple types in one basket or put them in separate caches (one type per cache). But allowing / recommending is one thing, mandating it is a different story. > >>> > >>> There's no reason to forbid _any_ of these scenarios / mandate one over the other! There was previously in this thread some suggestion of mandating the one type per cache usage. -1 for it > >> > >> Agreed. I actually don't see how we can enforce people that declare Cache not put whatever they want in it. Also makes total sense for smaller caches as it is easy to set up etc. > >> The debate in this email, the way I understood it, was: are/should people using multiple caches for storing data? If yes we should consider querying functionality spreading over multiple caches. > >> > >>> > >>> > >>> > >>> On Tue, Feb 25, 2014 at 5:08 PM, Mircea Markus wrote: > >>> > >>> On Feb 25, 2014, at 9:28 AM, Emmanuel Bernard wrote: > >>> > >>>>> On 24 f?vr. 2014, at 17:39, Mircea Markus wrote: > >>>>> > >>>>> > >>>>>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard wrote: > >>>>>> > >>>>>> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome. > >>>>> > >>>>> Curious to hear the whole story :-) > >>>>> We cannot mandate all the suers to use OGM though, one of the reasons being OGM is not platform independent (hotrod). > >>>> > >>>> Then solve all the issues I have raised with a magic wand and come back to me when you have done it, I'm interested. > >>> > >>> People are going to use infinispan with one cache per entity, because it makes sense: > >>> - different config (repl/dist | persistent/non-persistent) for different data types > >>> - have map/reduce tasks running only the Person entires not on Dog as well, when you want to select (Person) where age > 18 > >>> I don't see a reason to forbid this, on the contrary. The way I see it the relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be a better abstraction and should be recommended as such for the Java clients, but ultimately we're a general purpose storage engine that is available to different platforms as well. > >>> > >>> > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From ttarrant at redhat.com Wed Feb 26 09:24:45 2014 From: ttarrant at redhat.com (Tristan Tarrant) Date: Wed, 26 Feb 2014 15:24:45 +0100 Subject: [infinispan-dev] JavaDocs and API documentation In-Reply-To: References: <530DE6B2.2060405@redhat.com> Message-ID: <530DF92D.1020202@redhat.com> On 26/02/2014 15:02, Mircea Markus wrote: > On Feb 26, 2014, at 1:05 PM, Tristan Tarrant wrote: > >> Dear all, >> >> our JavaDocs currently encompass all of our classes, interfaces, etc >> with no clear distinction between public and private API/SPI. I would >> like to clearly mark which of our classes/interfaces are public API. >> Should we: >> >> - add some decoration / visual cue to such elements to distinguish them >> from the internal stuff > I think Sanne mentioned and i think it was Hibernate that has impl sub-packages for all the non-public API. > Sounds sensible to me, as people will see the impl in the class name when importing it, and that should raise question marks. shall we adopt that? That would help, but we would still end up with a lot of noise in the javadocs, for example the list of classes on the left has no separation by package. Tristan From anistor at redhat.com Wed Feb 26 09:33:00 2014 From: anistor at redhat.com (Adrian Nistor) Date: Wed, 26 Feb 2014 16:33:00 +0200 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> <6036A294-231A-484F-8224-C77372987832@redhat.com> Message-ID: <530DFB1C.20109@redhat.com> On 02/26/2014 04:20 PM, Mircea Markus wrote: > On Feb 26, 2014, at 2:13 PM, Dan Berindei wrote: > >> >> >> On Wed, Feb 26, 2014 at 3:12 PM, Mircea Markus wrote: >> >> On Feb 25, 2014, at 5:08 PM, Sanne Grinovero wrote: >> >>> There also is the opposite problem to be considered, as Emmanuel >>> suggested on 11/04/2012: >>> you can't forbid the user to store the same object (same type and same >>> id) in two different caches, where each Cache might be using different >>> indexing options. >>> >>> If the "search service" is a global concept, and you run a query which >>> matches object X, we'll return it to the user but he won't be able to >>> figure out from which cache it's being sourced: is that ok? >> Can't the user figure that out based on the way the query is built? >> I mean the problem is similar with the databases: if address is both a table and an column in the USER table, then it's the query (select) that determines where from the address is returned. >> >> You mean the user should specify the cache name(s) when building the query? > yes Let's say multiple caches are specified when building the query. How can I tell (with current result api) where does the matching entity come from? I still think we should extend the result api in order to provide: 1. the key of the entity, 2. the name of the originating cache. The old result api that just gives you an Iterator over the matches should continue to exist because it's more efficient for the cases when the user does not need #1 and #2. > >> With a database you have to go a bit out of your way to select from more than one table at a time, normally you have just one primary table that you select from and the others are just to help you filter and transform that table. You also have to add some information about the source table yourself if you need it, otherwise the DB won't tell you what table the results are coming from: >> >> SELECT "table1" as source, id FROM table1 >> UNION ALL >> SELECT "table2" as source, id FROM table2 >> >> Adrian tells our current query API doesn't allow us to do projections with synthetic columns. On the other hand, we need to extend the current API to give us the entry key anyway, so it would be easy to extend it to give us the name of the cache as well. >> >> >>> Ultimately this implies a query might return the same object X in >>> multiple positions in the result list of the query; for example it >>> might be the top result according to some criteria but also be the 5th >>> result because of how it was indexed in a different case: maybe >>> someone will find good use for this "capability" but I see it >>> primarily as a source of confusion. >> Curious if this cannot be source of data can/cannot be specified within the query. >> >> Right, the user should be able to scope a search to a single cache, or maybe to multiple caches, even if there is only one global index. >> >> But I think the same object can already be inserted twice in the same cache, only with a different key, so returning duplicates from a query is something the user already has to cope with. >> >> >>> Finally, if we move the search service as a global component, there >>> might be an impact in how we explain security: an ACL filter applied >>> on one cache - or the index metadata produced by that cache - might >>> not be applied in the same way by an entity being matched through a >>> second cache. >>> Not least a user's permission to access one cache (or not) will affect >>> his results in a rather complex way. >> I'll let Tristan comment more on this, but is this really different from an SQL database where you grant access on individual tables and run a query involving multiple of them? >> >> The difference would be that in a DB each table will have its own index(es), so they only have to check the permissions once and not for every row. >> >> OTOH, if we plan to support key-level permissions, that would require checking the permissions on each search result anyway, so this wouldn't cost us anything. >> >> >>> I'm wondering if we need to prevent such situations. >>> >>> Sanne >>> >>> On 25 February 2014 16:24, Mircea Markus wrote: >>>> On Feb 25, 2014, at 3:46 PM, Adrian Nistor wrote: >>>> >>>>> They can do what they please. Either put multiple types in one basket or put them in separate caches (one type per cache). But allowing / recommending is one thing, mandating it is a different story. >>>>> >>>>> There's no reason to forbid _any_ of these scenarios / mandate one over the other! There was previously in this thread some suggestion of mandating the one type per cache usage. -1 for it >>>> Agreed. I actually don't see how we can enforce people that declare Cache not put whatever they want in it. Also makes total sense for smaller caches as it is easy to set up etc. >>>> The debate in this email, the way I understood it, was: are/should people using multiple caches for storing data? If yes we should consider querying functionality spreading over multiple caches. >>>> >>>>> >>>>> >>>>> On Tue, Feb 25, 2014 at 5:08 PM, Mircea Markus wrote: >>>>> >>>>> On Feb 25, 2014, at 9:28 AM, Emmanuel Bernard wrote: >>>>> >>>>>>> On 24 f?vr. 2014, at 17:39, Mircea Markus wrote: >>>>>>> >>>>>>> >>>>>>>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard wrote: >>>>>>>> >>>>>>>> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome. >>>>>>> Curious to hear the whole story :-) >>>>>>> We cannot mandate all the suers to use OGM though, one of the reasons being OGM is not platform independent (hotrod). >>>>>> Then solve all the issues I have raised with a magic wand and come back to me when you have done it, I'm interested. >>>>> People are going to use infinispan with one cache per entity, because it makes sense: >>>>> - different config (repl/dist | persistent/non-persistent) for different data types >>>>> - have map/reduce tasks running only the Person entires not on Dog as well, when you want to select (Person) where age > 18 >>>>> I don't see a reason to forbid this, on the contrary. The way I see it the relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be a better abstraction and should be recommended as such for the Java clients, but ultimately we're a general purpose storage engine that is available to different platforms as well. >>>>> >>>>> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > Cheers, From dan.berindei at gmail.com Wed Feb 26 10:29:26 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Wed, 26 Feb 2014 17:29:26 +0200 Subject: [infinispan-dev] JavaDocs and API documentation In-Reply-To: <530DF92D.1020202@redhat.com> References: <530DE6B2.2060405@redhat.com> <530DF92D.1020202@redhat.com> Message-ID: On Wed, Feb 26, 2014 at 4:24 PM, Tristan Tarrant wrote: > On 26/02/2014 15:02, Mircea Markus wrote: > > On Feb 26, 2014, at 1:05 PM, Tristan Tarrant > wrote: > > > >> Dear all, > >> > >> our JavaDocs currently encompass all of our classes, interfaces, etc > >> with no clear distinction between public and private API/SPI. I would > >> like to clearly mark which of our classes/interfaces are public API. > >> Should we: > >> > >> - add some decoration / visual cue to such elements to distinguish them > >> from the internal stuff > > I think Sanne mentioned and i think it was Hibernate that has impl > sub-packages for all the non-public API. > > Sounds sensible to me, as people will see the impl in the class name > when importing it, and that should raise question marks. shall we adopt > that? > That would help, but we would still end up with a lot of noise in the > javadocs, for example the list of classes on the left has no separation > by package. > > If we move all internal classes to .impl sub-packages, it will be quite easy to exclude the .impl packages from javadocs with a bit of maven-javadoc-plugin configuration. I don't think we need to generate javadocs for the internal classes at all, as the sources are easily accessible from any IDE. Cheers Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140226/d831716b/attachment-0001.html From mmarkus at redhat.com Wed Feb 26 11:08:17 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Wed, 26 Feb 2014 16:08:17 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <530DFB1C.20109@redhat.com> References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> <6036A294-231A-484F-8224-C77372987832@redhat.com> <530DFB1C.20109@! redhat.com> Message-ID: <26819DE2-8557-427B-BA3A-2F5BD121DDF5@redhat.com> On Feb 26, 2014, at 2:33 PM, Adrian Nistor wrote: > On 02/26/2014 04:20 PM, Mircea Markus wrote: >> On Feb 26, 2014, at 2:13 PM, Dan Berindei wrote: >> >>> >>> >>> On Wed, Feb 26, 2014 at 3:12 PM, Mircea Markus wrote: >>> >>> On Feb 25, 2014, at 5:08 PM, Sanne Grinovero wrote: >>> >>>> There also is the opposite problem to be considered, as Emmanuel >>>> suggested on 11/04/2012: >>>> you can't forbid the user to store the same object (same type and same >>>> id) in two different caches, where each Cache might be using different >>>> indexing options. >>>> >>>> If the "search service" is a global concept, and you run a query which >>>> matches object X, we'll return it to the user but he won't be able to >>>> figure out from which cache it's being sourced: is that ok? >>> Can't the user figure that out based on the way the query is built? >>> I mean the problem is similar with the databases: if address is both a table and an column in the USER table, then it's the query (select) that determines where from the address is returned. >>> >>> You mean the user should specify the cache name(s) when building the query? >> yes > Let's say multiple caches are specified when building the query. How can > I tell (with current result api) where does the matching entity come > from? I'm not talking about the current API here, just looking for a way to be able to specify the source cache for an object in the result. We should be able to do that through the query, or if the result is an alternative we can consider it. > I still think we should extend the result api in order to provide: > 1. the key of the entity, 2. the name of the originating cache. The old > result api that just gives you an Iterator over the matches > should continue to exist because it's more efficient for the cases when > the user does not need #1 and #2. I wouldn't mind that, but TBH i think we should add it only if users ask for it. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From vblagoje at redhat.com Wed Feb 26 11:08:35 2014 From: vblagoje at redhat.com (Vladimir Blagojevic) Date: Wed, 26 Feb 2014 11:08:35 -0500 Subject: [infinispan-dev] JavaDocs and API documentation In-Reply-To: References: <530DE6B2.2060405@redhat.com> <530DF92D.1020202@redhat.com> Message-ID: <530E1183.3060700@redhat.com> I agree, sounds like a sensible thing to do. But this needs to be planned carefully and when exactly is the good time to do it, soon and have it ready for 7.0.0.Final? On 2/26/2014, 10:29 AM, Dan Berindei wrote: > > > > > If we move all internal classes to .impl sub-packages, it will be > quite easy to exclude the .impl packages from javadocs with a bit of > maven-javadoc-plugin configuration. I don't think we need to generate > javadocs for the internal classes at all, as the sources are easily > accessible from any IDE. > > Cheers > Dan From emmanuel at hibernate.org Wed Feb 26 12:14:49 2014 From: emmanuel at hibernate.org (Emmanuel Bernard) Date: Wed, 26 Feb 2014 18:14:49 +0100 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> Message-ID: <10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org> On 25 Feb 2014, at 16:08, Mircea Markus wrote: > > On Feb 25, 2014, at 9:28 AM, Emmanuel Bernard wrote: > >>> On 24 f?vr. 2014, at 17:39, Mircea Markus wrote: >>> >>> >>>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard wrote: >>>> >>>> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome. >>> >>> Curious to hear the whole story :-) >>> We cannot mandate all the suers to use OGM though, one of the reasons being OGM is not platform independent (hotrod). >> >> Then solve all the issues I have raised with a magic wand and come back to me when you have done it, I'm interested. > > People are going to use infinispan with one cache per entity, because it makes sense: > - different config (repl/dist | persistent/non-persistent) for different data types > - have map/reduce tasks running only the Person entires not on Dog as well, when you want to select (Person) where age > 18 > I don't see a reason to forbid this, on the contrary. The way I see it the relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be a better abstraction and should be recommended as such for the Java clients, but ultimately we're a general purpose storage engine that is available to different platforms as well. > I do disagree on your assessment. I did write a whole essay on why I think your view is problematic - I was getting tired of repeating myself ;P https://github.com/infinispan/infinispan/wiki/A-continuum-of-data-structure-and-query-complexity To anecdotally answer your specific example, yes different configs for different entities is an interesting benefit but it has to outweigh the drawbacks. If you have to do a map reduce for tasks so simple as age > 18, I think you system better have to be prepared to run gazillions of M/R jobs. I think that Dogs and any domestic animal is fundamentally related to humans - Person in your case. So queries involving both will be required - a cross cache M/R is not doable today AFAIK and even if it was, it?s still M/R and all its drawbacks. To me, the Cache API and Hot Rod are well suited for what I call self contained object graph (i.e. where Dog would be an embedded object of Person and not a separate Entity). In that situation, there is a single cache. One cache per entity does make sense for API that do support what I call connected entities. Hibernate OGM specifically. But please read the wiki page first before commenting. I did spend a lot of time on it https://github.com/infinispan/infinispan/wiki/A-continuum-of-data-structure-and-query-complexity Emmanuel From vblagoje at redhat.com Wed Feb 26 14:31:23 2014 From: vblagoje at redhat.com (Vladimir Blagojevic) Date: Wed, 26 Feb 2014 14:31:23 -0500 Subject: [infinispan-dev] Distributed executors and Future(s) they return Message-ID: <530E410B.7050101@redhat.com> Hey, There is an interesting request from community to include an Address along with a Future returned for a subtask being executed [1]. I think it makes sense what this user wants. We might create Future sub interface that has getAddress method and we can return an object implementing that interface instead of plain Future. In some new major release we can officially change the signature of these DistributedExecutorService methods to return i.e TargetedFuture - it would not break existing clients. Maybe even make TargetedFuture extend NotifyingFuture. Any thoughts? Vladimir [1] https://community.jboss.org/thread/237442 From sanne at infinispan.org Wed Feb 26 16:25:08 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Wed, 26 Feb 2014 21:25:08 +0000 Subject: [infinispan-dev] Distributed executors and Future(s) they return In-Reply-To: <530E410B.7050101@redhat.com> References: <530E410B.7050101@redhat.com> Message-ID: I'm a bit skeptical. It might sound a sensible request currently, but if you do so you inherently "promise" that tasks are going to be executed on a specific server; AFAIK we promise execution on data locality, but maintaining a good level of flexibility you can evolve your system to smarter load balancing of tasks, failover operations, etc.. If you expose execution details, you won't be able to develop any of that in future. To make an example from the database world - seems the analogy is common these days - it's like you run a SELECT statement but want to pick which CPU core is going to be used. That would be really odd, as you would take away the option from the scheduler to make an effective choice. Still, this approach might be desirable for a database which doesn't do any smart scheduling. Some of these concerns might be mitigated if you return the Address of where the task *was* executed, after it's done. I still don't think it should be of user's interest but at least you would be able to implement rescheduling or failover policies in future. Sanne On 26 February 2014 19:31, Vladimir Blagojevic wrote: > Hey, > > There is an interesting request from community to include an Address along with a Future returned for a subtask being executed [1]. > > I think it makes sense what this user wants. We might create Future sub interface that has getAddress method and we can return an object implementing that interface instead of plain Future. In some new major release we can officially change the signature of these DistributedExecutorService methods to return i.e TargetedFuture - it would not break existing clients. Maybe even make TargetedFuture extend NotifyingFuture. > > Any thoughts? > > Vladimir > > [1] https://community.jboss.org/thread/237442 > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From sanne at infinispan.org Wed Feb 26 16:45:13 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Wed, 26 Feb 2014 21:45:13 +0000 Subject: [infinispan-dev] Row based security & Queries (Was: Design change in Infinispan Query) Message-ID: To clarify some points rised on the thread "Design change in Infinispan Query", which I don't wish to derail further: The Query engine can actually apply per-entry user restriction access in an efficient way which doesn't involve (necessarily) to check each result; currently this needs specific user coding but: # it's not too hard as Hibernate Search users do it regularly: we provide various helpers and examples. Especially in the book :-) # is not including a pre-canned strategy as -at least in case of our Hibernate integration - the details of how people want it done are often exotic. Essentially a typical solution would work with Filters: a filter is a bitset which masks potential results from queries, so it's applied upfront actual scoring and other more complex match operations. These bitsets are very suited for filters, and are split on segments so that pre-computed segments related to parts of an index which didn't change can be reused even if the index as a whole is mutating continually. Such a Filter could even encode the response of some external authorization service on a per-document base (slow but effective), or it simply represents user group tokens which are applied as tags on the indexed documents (more efficient as long as role definitions are stable). That said, I'm not suggesting that this should be a priority, but I expect that sometime in the future we could provide a pre-canned strategy to work out of the box with our security extensions, at least for the benefit of remote protocols. So let's keep this in mind while making other design decisions. -- Sanne From dan.berindei at gmail.com Thu Feb 27 05:54:45 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Thu, 27 Feb 2014 12:54:45 +0200 Subject: [infinispan-dev] Row based security & Queries (Was: Design change in Infinispan Query) In-Reply-To: References: Message-ID: Hi Sanne Reading your reply I realized I was wrong in my "evaluation", we could require the user to specify the secure cache(s) he wants to query when building the query and checking that he has read rights on all of them before executing the query, just like a DB would do. And if he doesn't specify any cache, throw an exception if there is any indexed cache that he doesn't have read access to. So we could implement the cache-level security we need now without any performance hit. But thanks for the explanation, it sounds like row-level security isn't quite as far-fetched as I was imagining it ;) Cheers Dan On Wed, Feb 26, 2014 at 11:45 PM, Sanne Grinovero wrote: > To clarify some points rised on the thread "Design change in > Infinispan Query", which I don't wish to derail further: > > The Query engine can actually apply per-entry user restriction access > in an efficient way which doesn't involve (necessarily) to check each > result; currently this needs specific user coding but: > # it's not too hard as Hibernate Search users do it regularly: we > provide various helpers and examples. Especially in the book :-) > # is not including a pre-canned strategy as -at least in case of our > Hibernate integration - the details of how people want it done are > often exotic. > > Essentially a typical solution would work with Filters: a filter is a > bitset which masks potential results from queries, so it's applied > upfront actual scoring and other more complex match operations. > These bitsets are very suited for filters, and are split on segments > so that pre-computed segments related to parts of an index which > didn't change can be reused even if the index as a whole is mutating > continually. > Such a Filter could even encode the response of some external > authorization service on a per-document base (slow but effective), or > it simply represents user group tokens which are applied as tags on > the indexed documents (more efficient as long as role definitions are > stable). > > That said, I'm not suggesting that this should be a priority, but I > expect that sometime in the future we could provide a pre-canned > strategy to work out of the box with our security extensions, at least > for the benefit of remote protocols. So let's keep this in mind while > making other design decisions. > > -- Sanne > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140227/0bbc466e/attachment-0001.html From sanne at infinispan.org Thu Feb 27 06:59:08 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Thu, 27 Feb 2014 11:59:08 +0000 Subject: [infinispan-dev] Row based security & Queries (Was: Design change in Infinispan Query) In-Reply-To: References: Message-ID: On 27 February 2014 10:54, Dan Berindei wrote: > Hi Sanne > > Reading your reply I realized I was wrong in my "evaluation", we could > require the user to specify the secure cache(s) he wants to query when > building the query and checking that he has read rights on all of them > before executing the query, just like a DB would do. And if he doesn't > specify any cache, throw an exception if there is any indexed cache that he > doesn't have read access to. So we could implement the cache-level security > we need now without any performance hit. Right, but when querying indexes, we target an index not a cache. A user could have access to one cache and not another, and if we go for a shared query engine, the current implementation allows to share indexes. You could have a valid situation in which an entry X stored in a Cache A, to which you have access to, but also stored in a Cache B, to which you have no access to, is retrieved from CacheA (so no security problem) but because of scoring affected by additional metadata which entered the index via Cache B. This wouldn't be a security violation sctrictly speaking but would be highly confusing, as it's often quite complex to figure out why some result is matching. As you say we could block the query if the user has no access to one of the related caches. It could still be puzzling as someone might be prevented to retrieve data from a ache for which he has haccess, but that's probably easier to explain and document that shared indexes require same access permissions on each involved cache. Cheers, Sanne > > But thanks for the explanation, it sounds like row-level security isn't > quite as far-fetched as I was imagining it ;) > > Cheers > Dan > > > > On Wed, Feb 26, 2014 at 11:45 PM, Sanne Grinovero > wrote: >> >> To clarify some points rised on the thread "Design change in >> Infinispan Query", which I don't wish to derail further: >> >> The Query engine can actually apply per-entry user restriction access >> in an efficient way which doesn't involve (necessarily) to check each >> result; currently this needs specific user coding but: >> # it's not too hard as Hibernate Search users do it regularly: we >> provide various helpers and examples. Especially in the book :-) >> # is not including a pre-canned strategy as -at least in case of our >> Hibernate integration - the details of how people want it done are >> often exotic. >> >> Essentially a typical solution would work with Filters: a filter is a >> bitset which masks potential results from queries, so it's applied >> upfront actual scoring and other more complex match operations. >> These bitsets are very suited for filters, and are split on segments >> so that pre-computed segments related to parts of an index which >> didn't change can be reused even if the index as a whole is mutating >> continually. >> Such a Filter could even encode the response of some external >> authorization service on a per-document base (slow but effective), or >> it simply represents user group tokens which are applied as tags on >> the indexed documents (more efficient as long as role definitions are >> stable). >> >> That said, I'm not suggesting that this should be a priority, but I >> expect that sometime in the future we could provide a pre-canned >> strategy to work out of the box with our security extensions, at least >> for the benefit of remote protocols. So let's keep this in mind while >> making other design decisions. >> >> -- Sanne >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From galder at redhat.com Thu Feb 27 09:23:37 2014 From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=) Date: Thu, 27 Feb 2014 15:23:37 +0100 Subject: [infinispan-dev] On the topic of Map/Reduce and Hadoop Message-ID: Hi, Recently we had an email thread on Map/Reduce and Hadoop?s API/mechanisms to do Map/Reduce. I?ve just finished watching [1], which looks at Hadoop?s Java API and then looks at evolutions, improvements that functional programming have enabled. Some food for thought :) Cheers, [1] http://www.infoq.com/presentations/big-data-functional-programming -- Galder Zamarre?o galder at redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org From vblagoje at redhat.com Thu Feb 27 10:28:08 2014 From: vblagoje at redhat.com (Vladimir Blagojevic) Date: Thu, 27 Feb 2014 10:28:08 -0500 Subject: [infinispan-dev] Distributed executors and Future(s) they return In-Reply-To: References: <530E410B.7050101@redhat.com> Message-ID: <530F5988.6010305@redhat.com> Hmm very good points Sanne. Yeah I think we can have a contract that returns an Address were task was executed. Cheers, Vladimir On 2/26/2014, 4:25 PM, Sanne Grinovero wrote: > I'm a bit skeptical. > It might sound a sensible request currently, but if you do so you > inherently "promise" that tasks are going to be executed on a specific > server; AFAIK we promise execution on data locality, but maintaining a > good level of flexibility you can evolve your system to smarter load > balancing of tasks, failover operations, etc.. > If you expose execution details, you won't be able to develop any of > that in future. > > To make an example from the database world - seems the analogy is > common these days - it's like you run a SELECT statement but want to > pick which CPU core is going to be used. That would be really odd, as > you would take away the option from the scheduler to make an effective > choice. > Still, this approach might be desirable for a database which doesn't > do any smart scheduling. > > Some of these concerns might be mitigated if you return the Address of > where the task *was* executed, after it's done. I still don't think it > should be of user's interest but at least you would be able to > implement rescheduling or failover policies in future. > > Sanne > > > On 26 February 2014 19:31, Vladimir Blagojevic wrote: >> Hey, >> >> There is an interesting request from community to include an Address along with a Future returned for a subtask being executed [1]. >> >> I think it makes sense what this user wants. We might create Future sub interface that has getAddress method and we can return an object implementing that interface instead of plain Future. In some new major release we can officially change the signature of these DistributedExecutorService methods to return i.e TargetedFuture - it would not break existing clients. Maybe even make TargetedFuture extend NotifyingFuture. >> >> Any thoughts? >> >> Vladimir >> >> [1] https://community.jboss.org/thread/237442 >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From mmarkus at redhat.com Thu Feb 27 11:58:32 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Thu, 27 Feb 2014 16:58:32 +0000 Subject: [infinispan-dev] Distributed executors and Future(s) they return In-Reply-To: <530F5988.6010305@redhat.com> References: <530E410B.7050101@redhat.com> <530F5988.6010305@redhat.com> Message-ID: <3F406743-FE81-467E-90BD-2B52D52C7A0F@redhat.com> On Feb 27, 2014, at 3:28 PM, Vladimir Blagojevic wrote: > Hmm very good points Sanne. Yeah I think we can have a contract that > returns an Address were task was executed. > > > Cheers, > Vladimir > On 2/26/2014, 4:25 PM, Sanne Grinovero wrote: >> I'm a bit skeptical. >> It might sound a sensible request currently, but if you do so you >> inherently "promise" that tasks are going to be executed on a specific >> server; AFAIK we promise execution on data locality, We allow execution to be bound on a specific address: http://goo.gl/H5qTJZ I see your point with data locality vs. specific server. >> but maintaining a >> good level of flexibility you can evolve your system to smarter load >> balancing of tasks, failover operations, etc.. >> If you expose execution details, you won't be able to develop any of >> that in future. >> >> To make an example from the database world - seems the analogy is >> common these days - it's like you run a SELECT statement but want to >> pick which CPU core is going to be used. That would be really odd, as >> you would take away the option from the scheduler to make an effective >> choice. >> Still, this approach might be desirable for a database which doesn't >> do any smart scheduling. >> >> Some of these concerns might be mitigated if you return the Address of >> where the task *was* executed, after it's done. I still don't think it >> should be of user's interest but at least you would be able to >> implement rescheduling or failover policies in future. We already have failure policies in place, but the user only needs to audit the failure, not to failover. If users are interested on knowing the failures, another way of doing it is the current future, in the Future.get to throw a custom exception (subclass of ExecutionException) containing as information where the execution failed. >> >> Sanne >> >> >> On 26 February 2014 19:31, Vladimir Blagojevic wrote: >>> Hey, >>> >>> There is an interesting request from community to include an Address along with a Future returned for a subtask being executed [1]. >>> >>> I think it makes sense what this user wants. We might create Future sub interface that has getAddress method and we can return an object implementing that interface instead of plain Future. In some new major release we can officially change the signature of these DistributedExecutorService methods to return i.e TargetedFuture - it would not break existing clients. Maybe even make TargetedFuture extend NotifyingFuture. >>> >>> Any thoughts? >>> >>> Vladimir >>> >>> [1] https://community.jboss.org/thread/237442 >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From sanne at infinispan.org Thu Feb 27 13:13:20 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Thu, 27 Feb 2014 18:13:20 +0000 Subject: [infinispan-dev] Distributed executors and Future(s) they return In-Reply-To: <3F406743-FE81-467E-90BD-2B52D52C7A0F@redhat.com> References: <530E410B.7050101@redhat.com> <530F5988.6010305@redhat.com> <3F406743-FE81-467E-90BD-2B52D52C7A0F@redhat.com> Message-ID: On 27 February 2014 16:58, Mircea Markus wrote: > > On Feb 27, 2014, at 3:28 PM, Vladimir Blagojevic wrote: > >> Hmm very good points Sanne. Yeah I think we can have a contract that >> returns an Address were task was executed. >> >> >> Cheers, >> Vladimir >> On 2/26/2014, 4:25 PM, Sanne Grinovero wrote: >>> I'm a bit skeptical. >>> It might sound a sensible request currently, but if you do so you >>> inherently "promise" that tasks are going to be executed on a specific >>> server; AFAIK we promise execution on data locality, > > We allow execution to be bound on a specific address: http://goo.gl/H5qTJZ I know but I think that smells :) Stuff like _Address_ should be an implementation detail. Maybe one day you'll see why and we'll deprecate it ;-) > I see your point with data locality vs. specific server. > > >>> but maintaining a >>> good level of flexibility you can evolve your system to smarter load >>> balancing of tasks, failover operations, etc.. >>> If you expose execution details, you won't be able to develop any of >>> that in future. >>> >>> To make an example from the database world - seems the analogy is >>> common these days - it's like you run a SELECT statement but want to >>> pick which CPU core is going to be used. That would be really odd, as >>> you would take away the option from the scheduler to make an effective >>> choice. >>> Still, this approach might be desirable for a database which doesn't >>> do any smart scheduling. >>> >>> Some of these concerns might be mitigated if you return the Address of >>> where the task *was* executed, after it's done. I still don't think it >>> should be of user's interest but at least you would be able to >>> implement rescheduling or failover policies in future. > > We already have failure policies in place, but the user only needs to audit the failure, not to failover. If users are interested on knowing the failures, another way of doing it is the current future, in the Future.get to throw a custom exception (subclass of ExecutionException) containing as information where the execution failed. Right, but the question is if the user really wants to know the intermediate failures? I suspect that if someone asks for this, he's actually wishing to implement his own failower policy & monitoring. >From the point of view of someone running a database query, I think the user would love to ignore issues altogether, but the real world forces him to at least consider that the whole operation might fail. Sending him specific notifications or exceptions of something that was succesfull but was actually run on a different resource set than what was originally planned is I'd say an exotic request. I like the idea of providing additional information in a Future subtype, but I don't think you should throw it on a get() operation. You could simply add getters to the FutureExtended to retrieve like an execution plan history, a trace of intermediate failures, etc. Sanne > >>> >>> Sanne >>> >>> >>> On 26 February 2014 19:31, Vladimir Blagojevic wrote: >>>> Hey, >>>> >>>> There is an interesting request from community to include an Address along with a Future returned for a subtask being executed [1]. >>>> >>>> I think it makes sense what this user wants. We might create Future sub interface that has getAddress method and we can return an object implementing that interface instead of plain Future. In some new major release we can officially change the signature of these DistributedExecutorService methods to return i.e TargetedFuture - it would not break existing clients. Maybe even make TargetedFuture extend NotifyingFuture. >>>> >>>> Any thoughts? >>>> >>>> Vladimir >>>> >>>> [1] https://community.jboss.org/thread/237442 >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev at lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev at lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From dan.berindei at gmail.com Thu Feb 27 13:40:15 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Thu, 27 Feb 2014 20:40:15 +0200 Subject: [infinispan-dev] Distributed executors and Future(s) they return In-Reply-To: References: <530E410B.7050101@redhat.com> <530F5988.6010305@redhat.com> <3F406743-FE81-467E-90BD-2B52D52C7A0F@redhat.com> Message-ID: On Thu, Feb 27, 2014 at 8:13 PM, Sanne Grinovero wrote: > On 27 February 2014 16:58, Mircea Markus wrote: > > > > On Feb 27, 2014, at 3:28 PM, Vladimir Blagojevic > wrote: > > > >> Hmm very good points Sanne. Yeah I think we can have a contract that > >> returns an Address were task was executed. > >> > >> > >> Cheers, > >> Vladimir > >> On 2/26/2014, 4:25 PM, Sanne Grinovero wrote: > >>> I'm a bit skeptical. > >>> It might sound a sensible request currently, but if you do so you > >>> inherently "promise" that tasks are going to be executed on a specific > >>> server; AFAIK we promise execution on data locality, > > > > We allow execution to be bound on a specific address: > http://goo.gl/H5qTJZ > > I know but I think that smells :) > Stuff like _Address_ should be an implementation detail. Maybe one day > you'll see why and we'll deprecate it ;-) > > > I see your point with data locality vs. specific server. > > > > > >>> but maintaining a > >>> good level of flexibility you can evolve your system to smarter load > >>> balancing of tasks, failover operations, etc.. > >>> If you expose execution details, you won't be able to develop any of > >>> that in future. > >>> > >>> To make an example from the database world - seems the analogy is > >>> common these days - it's like you run a SELECT statement but want to > >>> pick which CPU core is going to be used. That would be really odd, as > >>> you would take away the option from the scheduler to make an effective > >>> choice. > >>> Still, this approach might be desirable for a database which doesn't > >>> do any smart scheduling. > >>> > >>> Some of these concerns might be mitigated if you return the Address of > >>> where the task *was* executed, after it's done. I still don't think it > >>> should be of user's interest but at least you would be able to > >>> implement rescheduling or failover policies in future. > > > > We already have failure policies in place, but the user only needs to > audit the failure, not to failover. If users are interested on knowing the > failures, another way of doing it is the current future, in the Future.get > to throw a custom exception (subclass of ExecutionException) containing as > information where the execution failed. > > Right, but the question is if the user really wants to know the > intermediate failures? I suspect that if someone asks for this, he's > actually wishing to implement his own failower policy & monitoring. > >From the point of view of someone running a database query, I think > the user would love to ignore issues altogether, but the real world > forces him to at least consider that the whole operation might fail. > Sending him specific notifications or exceptions of something that was > succesfull but was actually run on a different resource set than what > was originally planned is I'd say an exotic request. > I don't think the user was after the address of the "real" executing node, I believe he just wanted a way to map each Future to the target address doing a submitEverywhere(task). > > I like the idea of providing additional information in a Future > subtype, but I don't think you should throw it on a get() operation. > You could simply add getters to the FutureExtended to retrieve like an > execution plan history, a trace of intermediate failures, etc. > > That sounds good, but we shouldn't limit that to just the result of one distributed task execution. We could take the opportunity to return something other than List from submitEverywhere(task) as well, doing a foreach to get all the results is a bit tedious. And even if we'd like users to treat the results from different nodes as interchangeable, sometimes they're not, so a way of getting the result from one particular node would also be useful. > Sanne > > > > >>> > >>> Sanne > >>> > >>> > >>> On 26 February 2014 19:31, Vladimir Blagojevic > wrote: > >>>> Hey, > >>>> > >>>> There is an interesting request from community to include an Address > along with a Future returned for a subtask being executed [1]. > >>>> > >>>> I think it makes sense what this user wants. We might create Future > sub interface that has getAddress method and we can return an object > implementing that interface instead of plain Future. In some new major > release we can officially change the signature of these > DistributedExecutorService methods to return i.e TargetedFuture - it would > not break existing clients. Maybe even make TargetedFuture extend > NotifyingFuture. > >>>> > >>>> Any thoughts? > >>>> > >>>> Vladimir > >>>> > >>>> [1] https://community.jboss.org/thread/237442 > >>>> > >>>> _______________________________________________ > >>>> infinispan-dev mailing list > >>>> infinispan-dev at lists.jboss.org > >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >>> _______________________________________________ > >>> infinispan-dev mailing list > >>> infinispan-dev at lists.jboss.org > >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> > >> _______________________________________________ > >> infinispan-dev mailing list > >> infinispan-dev at lists.jboss.org > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > Cheers, > > -- > > Mircea Markus > > Infinispan lead (www.infinispan.org) > > > > > > > > > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140227/df2315e6/attachment-0001.html From sanne at infinispan.org Thu Feb 27 14:03:58 2014 From: sanne at infinispan.org (Sanne Grinovero) Date: Thu, 27 Feb 2014 19:03:58 +0000 Subject: [infinispan-dev] Distributed executors and Future(s) they return In-Reply-To: References: <530E410B.7050101@redhat.com> <530F5988.6010305@redhat.com> <3F406743-FE81-467E-90BD-2B52D52C7A0F@redhat.com> Message-ID: On 27 February 2014 18:40, Dan Berindei wrote: > > > > On Thu, Feb 27, 2014 at 8:13 PM, Sanne Grinovero > wrote: >> >> On 27 February 2014 16:58, Mircea Markus wrote: >> > >> > On Feb 27, 2014, at 3:28 PM, Vladimir Blagojevic >> > wrote: >> > >> >> Hmm very good points Sanne. Yeah I think we can have a contract that >> >> returns an Address were task was executed. >> >> >> >> >> >> Cheers, >> >> Vladimir >> >> On 2/26/2014, 4:25 PM, Sanne Grinovero wrote: >> >>> I'm a bit skeptical. >> >>> It might sound a sensible request currently, but if you do so you >> >>> inherently "promise" that tasks are going to be executed on a specific >> >>> server; AFAIK we promise execution on data locality, >> > >> > We allow execution to be bound on a specific address: >> > http://goo.gl/H5qTJZ >> >> I know but I think that smells :) >> Stuff like _Address_ should be an implementation detail. Maybe one day >> you'll see why and we'll deprecate it ;-) >> >> > I see your point with data locality vs. specific server. >> > >> > >> >>> but maintaining a >> >>> good level of flexibility you can evolve your system to smarter load >> >>> balancing of tasks, failover operations, etc.. >> >>> If you expose execution details, you won't be able to develop any of >> >>> that in future. >> >>> >> >>> To make an example from the database world - seems the analogy is >> >>> common these days - it's like you run a SELECT statement but want to >> >>> pick which CPU core is going to be used. That would be really odd, as >> >>> you would take away the option from the scheduler to make an effective >> >>> choice. >> >>> Still, this approach might be desirable for a database which doesn't >> >>> do any smart scheduling. >> >>> >> >>> Some of these concerns might be mitigated if you return the Address of >> >>> where the task *was* executed, after it's done. I still don't think it >> >>> should be of user's interest but at least you would be able to >> >>> implement rescheduling or failover policies in future. >> > >> > We already have failure policies in place, but the user only needs to >> > audit the failure, not to failover. If users are interested on knowing the >> > failures, another way of doing it is the current future, in the Future.get >> > to throw a custom exception (subclass of ExecutionException) containing as >> > information where the execution failed. >> >> Right, but the question is if the user really wants to know the >> intermediate failures? I suspect that if someone asks for this, he's >> actually wishing to implement his own failower policy & monitoring. >> >From the point of view of someone running a database query, I think >> the user would love to ignore issues altogether, but the real world >> forces him to at least consider that the whole operation might fail. >> Sending him specific notifications or exceptions of something that was >> succesfull but was actually run on a different resource set than what >> was originally planned is I'd say an exotic request. > > > I don't think the user was after the address of the "real" executing node, I > believe he just wanted a way to map each Future to the target address doing > a submitEverywhere(task). > >> >> >> I like the idea of providing additional information in a Future >> subtype, but I don't think you should throw it on a get() operation. >> You could simply add getters to the FutureExtended to retrieve like an >> execution plan history, a trace of intermediate failures, etc. >> > > That sounds good, but we shouldn't limit that to just the result of one > distributed task execution. We could take the opportunity to return > something other than List from submitEverywhere(task) as well, doing > a foreach to get all the results is a bit tedious. And even if we'd like > users to treat the results from different nodes as interchangeable, > sometimes they're not, so a way of getting the result from one particular > node would also be useful. +1 > > >> >> Sanne >> >> > >> >>> >> >>> Sanne >> >>> >> >>> >> >>> On 26 February 2014 19:31, Vladimir Blagojevic >> >>> wrote: >> >>>> Hey, >> >>>> >> >>>> There is an interesting request from community to include an Address >> >>>> along with a Future returned for a subtask being executed [1]. >> >>>> >> >>>> I think it makes sense what this user wants. We might create Future >> >>>> sub interface that has getAddress method and we can return an object >> >>>> implementing that interface instead of plain Future. In some new major >> >>>> release we can officially change the signature of these >> >>>> DistributedExecutorService methods to return i.e TargetedFuture - it would >> >>>> not break existing clients. Maybe even make TargetedFuture extend >> >>>> NotifyingFuture. >> >>>> >> >>>> Any thoughts? >> >>>> >> >>>> Vladimir >> >>>> >> >>>> [1] https://community.jboss.org/thread/237442 >> >>>> >> >>>> _______________________________________________ >> >>>> infinispan-dev mailing list >> >>>> infinispan-dev at lists.jboss.org >> >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >>> _______________________________________________ >> >>> infinispan-dev mailing list >> >>> infinispan-dev at lists.jboss.org >> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> >> _______________________________________________ >> >> infinispan-dev mailing list >> >> infinispan-dev at lists.jboss.org >> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> > >> > Cheers, >> > -- >> > Mircea Markus >> > Infinispan lead (www.infinispan.org) >> > >> > >> > >> > >> > >> > _______________________________________________ >> > infinispan-dev mailing list >> > infinispan-dev at lists.jboss.org >> > https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From manik at infinispan.org Thu Feb 27 16:45:19 2014 From: manik at infinispan.org (Manik Surtani) Date: Thu, 27 Feb 2014 13:45:19 -0800 Subject: [infinispan-dev] Git repo very large? Message-ID: Hi guys. Why's the git repo over 100MB in size for a fresh checkout? Most of this seems to be consumed by git objects: ~/Code/infinispan/.git GIT_DIR! pwd /Users/manik/Code/infinispan/.git ~/Code/infinispan/.git GIT_DIR! du -hs . 54M . Perhaps we added some large files at some point and then removed them? If that is the case we'd need to clean up history as well. Pls have a look at http://rtyley.github.io/bfg-repo-cleaner/ and http://stackoverflow.com/questions/6884331/git-repo-still-huge-after-large-files-removed-from-repository-history... the repo shouldn't be more than 20 or 30 MB. - M -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140227/ab31f5d2/attachment.html From dan.berindei at gmail.com Fri Feb 28 06:57:21 2014 From: dan.berindei at gmail.com (Dan Berindei) Date: Fri, 28 Feb 2014 13:57:21 +0200 Subject: [infinispan-dev] Infinispan 7.0.0.Alpha1 Message-ID: Dear Infinispan community, We're proud to announce the first Alpha release of Infinispan 7.0.0. This release adds several new features: - Support for clustered listeners. One of the limitation of Infinispan's distributed mode used to be that listeners could only receive events for cache modifications on their own node. That's no longer the case, and it paves the way for a long-requested feature: HotRod listeners. - Map/Reduce tasks can now execute the mapper/combiner/reducer on multiple threads. Stay tuned for more Map/Reduce improvements in the near future. - The first essential component of cache security has been added, which will be the building block for remote protocol authentication and authorization. - Improved OSGi support in the HotRod Java client. The core components are also getting into shape for OSGi, expect more on this front in the next release. As you can see, many of the new features are stepping stones for bigger things yet to come. Feel free to join us and shape the future releases on our forums , our mailing lists or our #infinispan IRC channel. For a complete list of features and bug fixes included in this release please refer to the release notes . Visit our downloads section to find the latest release. Thanks to everyone for their involvement and contribution! Happy hacking! Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140228/fb5993c7/attachment.html From vblagoje at redhat.com Fri Feb 28 10:30:52 2014 From: vblagoje at redhat.com (Vladimir Blagojevic) Date: Fri, 28 Feb 2014 10:30:52 -0500 Subject: [infinispan-dev] On the topic of Map/Reduce and Hadoop In-Reply-To: References: Message-ID: <5310ABAC.5040806@redhat.com> Scala propaganda! :-) Thanks for sharing! On 2/27/2014, 9:23 AM, Galder Zamarre?o wrote: > Hi, > > Recently we had an email thread on Map/Reduce and Hadoop?s API/mechanisms to do Map/Reduce. > > I?ve just finished watching [1], which looks at Hadoop?s Java API and then looks at evolutions, improvements that functional programming have enabled. > > Some food for thought :) > > Cheers, > > [1] http://www.infoq.com/presentations/big-data-functional-programming > -- > Galder Zamarre?o > galder at redhat.com > twitter.com/galderz > > Project Lead, Escalante > http://escalante.io > > Engineer, Infinispan > http://infinispan.org > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev From ben.cotton at ALUMNI.RUTGERS.EDU Fri Feb 28 13:16:24 2014 From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben) Date: Fri, 28 Feb 2014 10:16:24 -0800 (PST) Subject: [infinispan-dev] Musings on ISPN/JGRPs OSI transport choices and ambitions Message-ID: <1393611384264-4028925.post@n3.nabble.com> Hi Mircea, Manik, Bela, et. al. I want to more publicly muse on this SUBJ line. Here now, then maybe in ISPN /user/ forum, then maybe JSR-347 provider wide. I know we had a semi-private (Bela led) exchange, but I want to be more public with this conversation. Long post again. sorry. This is just on open musing. I realize this musing should not expect to be accommodated by any "oh, we got to do this in ISPN/JGRPs now!" repsonse ... there is absolutely only the most infrequent use-case that would /today/ be served by addressing this musing ... but tomorrow that /will/ be a different story. Questions:: Does the concept of ISPN/JGRPs transport between "Cluster" nodes currently depend on OSI transport layer sockets' participation(s)? In other words, if all the nodes on my "Cluster" have locality=127.0.0.1 is ISPN/JGRPs accommodating enough to use a native OS IPC choice as an intra-node transport? Or, is it true that my transport choices are always limited to just {TCP,UDP} -- independent of the participating nodes' locality (and that I am thus forced to go over an OSI loopback)? If my transport choices are only limited to {TCP,UDP} for all node locality, then I might ask that you consider additional upcoming modern Java transport options. With the ambitions of upcoming OpenJDK JEPs, that will make mainstream an API capabilty that today is only available via sun.misc.Unsafe, Java will soon have "more complete" transport options that will include all of { TCP, UDP, RDMA/SDP, IPC } Some examples of upcoming accommodating providers= 1. RDMA/SDP: via Infiniband VERBS (works today in JDK 7 on OSI physical layer IB NICs, does not work over Ethernet) 2. IPC via OpenHFT' SHM as IPC solution (will work this year) Again, I realize that these transport choices are useful today only in a very rare use case. However, should these transports be in your offering to ISPN/JGRPs customers, then ISPN/JGRPs becomes -- like all of Java has become in recent years -- increasingly more attractive to /all/ HPC Linux supercomputing use cases (not just ours). -- View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/Musings-on-ISPN-JGRPs-OSI-transport-choices-and-ambitions-tp4028925.html Sent from the Infinispan Developer List mailing list archive at Nabble.com. From mmarkus at redhat.com Fri Feb 28 16:14:57 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Fri, 28 Feb 2014 21:14:57 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: <10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org> References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> <10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org> Message-ID: On Feb 26, 2014, at 5:14 PM, Emmanuel Bernard wrote: > > On 25 Feb 2014, at 16:08, Mircea Markus wrote: > >> >> On Feb 25, 2014, at 9:28 AM, Emmanuel Bernard wrote: >> >>>> On 24 f?vr. 2014, at 17:39, Mircea Markus wrote: >>>> >>>> >>>>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard wrote: >>>>> >>>>> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome. >>>> >>>> Curious to hear the whole story :-) >>>> We cannot mandate all the suers to use OGM though, one of the reasons being OGM is not platform independent (hotrod). >>> >>> Then solve all the issues I have raised with a magic wand and come back to me when you have done it, I'm interested. >> >> People are going to use infinispan with one cache per entity, because it makes sense: >> - different config (repl/dist | persistent/non-persistent) for different data types >> - have map/reduce tasks running only the Person entires not on Dog as well, when you want to select (Person) where age > 18 >> I don't see a reason to forbid this, on the contrary. The way I see it the relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be a better abstraction and should be recommended as such for the Java clients, but ultimately we're a general purpose storage engine that is available to different platforms as well. >> > > I do disagree on your assessment. > I did write a whole essay on why I think your view is problematic - I was getting tired of repeating myself ;P > https://github.com/infinispan/infinispan/wiki/A-continuum-of-data-structure-and-query-complexity Thanks for writing this up, it is a good taxonomy of data storage schemes and querying. > > To anecdotally answer your specific example, yes different configs for different entities is an interesting benefit but it has to outweigh the drawbacks. Using a single cache for all the types is practical at all :-) Just to expand my idea, people prefer using different caches for many reasons: - security: Account cache has a different security requirements than the News cache - data consistency: News is a non-transactional cache, Account require pessimistic XA transactions - expiry: expire last year's news from the system. Not the same for Accounts - availability: I want the Accounts cache to be backed up to another site. I don't want that for the News cache - logical data grouping: mixing Accounts with News doesn't make sense. I might want to know which account appeared in the news, though. > If you have to do a map reduce for tasks so simple as age > 18, I think you system better have to be prepared to run gazillions of M/R jobs. I want to run a simple M/R job in the evening to determine who turns 18 tomorrow, to congratulate them. Once a day, not gazzilions of times, and I don't need to index the age filed just for that. Also when it comes to Map/Reduce, the drawback of holding all the data in a single cache is two-folded: - performance: you iterate over the data that is not related to your query. - programming model: the Map/Reduce implementation has a dependency on both Dog and Person. If I add Cats to the cache, I'll need to update the M/R code to be aware of that as well. Same if I rename/remove Dog. Not nice. > I think that Dogs and any domestic animal is fundamentally related to humans - Person in your case. So queries involving both will be required - a cross cache M/R is not doable today AFAIK and even if it was, it?s still M/R and all its drawbacks. > To me, the Cache API and Hot Rod are well suited for what I call self contained object graph (i.e. where Dog would be an embedded object of Person and not a separate Entity). In that situation, there is a single cache. I see where you come from but I don't think requiring people to use a single cache for all the entities is an option. Besides a natural logical separation, different data has different storage requirements: security, access patterns, consistency, durability, availability etc. For most of the non-trivial use cases, using a single cache just wont do. > One cache per entity does make sense for API that do support what I call connected entities. Hibernate OGM specifically. OGM does a great job covering this, but it is very specific: java only and OOP - our C/S mode, hotrod specifically, is language independent and not OOP. Also I would like to comment on the following statements: "I believe a cache API and Hot Rod are well suited to address up to the self contained object graph use case with a couple of relations maintained manually by the application but that cannot be queried. For the connected entities use case, only a high level paradigm is suited like JPA." I don't think storing object graphs should be under scrutiny here: Infinispan C/S mode (and there's where most of the client focus is BTW) has a schema (prtobuf) that does not support object graphs. I also think expecting people to use multiple caches for multiple data types is a solid assumption to start from. And here's me speculating: these data types have logical relations between them so people will ask for querying. In order to queries on multiple data types, you can either merge them together (your suggestion) or support some sort of new cross-cache indexing/querying/api. x-cache querying is more flexible and less restraining than merging data, but from what I understand from you has certain implementation challenges. There's no pressure to take a decision now around supporting queries spreading multiple caches - just something to keep an eye on when dealing with use cases/users. ATM merging data is the only solution available, let's wait and see if people ask for more. > But please read the wiki page first before commenting. I did spend a lot of time on it > https://github.com/infinispan/infinispan/wiki/A-continuum-of-data-structure-and-query-complexity I do read your comments and I really appreciate your feedback. We come from slightly different worlds and look at things from different angles, but discussions like this raise many good points. > > Emmanuel > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) From mmarkus at redhat.com Fri Feb 28 16:17:43 2014 From: mmarkus at redhat.com (Mircea Markus) Date: Fri, 28 Feb 2014 21:17:43 +0000 Subject: [infinispan-dev] Design change in Infinispan Query In-Reply-To: References: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org> <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com> <20140205163032.GB93108@hibernate.org> <20140217175115.GC639@hibernate.org> <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org> <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com> <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org> <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com> <10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org> Message-ID: Added a correction: On Feb 28, 2014, at 9:14 PM, Mircea Markus wrote: > >>>>> On 24 f?vr. 2014, at 17:39, Mircea Markus wrote: >>>>> >>>>> >>>>>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard wrote: >>>>>> >>>>>> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome. >>>>> >>>>> Curious to hear the whole story :-) >>>>> We cannot mandate all the suers to use OGM though, one of the reasons being OGM is not platform independent (hotrod). >>>> >>>> Then solve all the issues I have raised with a magic wand and come back to me when you have done it, I'm interested. >>> >>> People are going to use infinispan with one cache per entity, because it makes sense: >>> - different config (repl/dist | persistent/non-persistent) for different data types >>> - have map/reduce tasks running only the Person entires not on Dog as well, when you want to select (Person) where age > 18 >>> I don't see a reason to forbid this, on the contrary. The way I see it the relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be a better abstraction and should be recommended as such for the Java clients, but ultimately we're a general purpose storage engine that is available to different platforms as well. >>> >> >> I do disagree on your assessment. >> I did write a whole essay on why I think your view is problematic - I was getting tired of repeating myself ;P >> https://github.com/infinispan/infinispan/wiki/A-continuum-of-data-structure-and-query-complexity > > Thanks for writing this up, it is a good taxonomy of data storage schemes and querying. > >> >> To anecdotally answer your specific example, yes different configs for different entities is an interesting benefit but it has to outweigh the drawbacks. > > Using a single cache for all the types is practical at all :-) Just to expand my idea, people prefer using different caches for many reasons: ^NOT > - security: Account cache has a different security requirements than the News cache > - data consistency: News is a non-transactional cache, Account require pessimistic XA transactions > - expiry: expire last year's news from the system. Not the same for Accounts > - availability: I want the Accounts cache to be backed up to another site. I don't want that for the News cache > - logical data grouping: mixing Accounts with News doesn't make sense. I might want to know which account appeared in the news, though. > >> If you have to do a map reduce for tasks so simple as age > 18, I think you system better have to be prepared to run gazillions of M/R jobs. > > I want to run a simple M/R job in the evening to determine who turns 18 tomorrow, to congratulate them. Once a day, not gazzilions of times, and I don't need to index the age filed just for that. Also when it comes to Map/Reduce, the drawback of holding all the data in a single cache is two-folded: > - performance: you iterate over the data that is not related to your query. > - programming model: the Map/Reduce implementation has a dependency on both Dog and Person. If I add Cats to the cache, I'll need to update the M/R code to be aware of that as well. Same if I rename/remove Dog. Not nice. > >> I think that Dogs and any domestic animal is fundamentally related to humans - Person in your case. So queries involving both will be required - a cross cache M/R is not doable today AFAIK and even if it was, it?s still M/R and all its drawbacks. >> To me, the Cache API and Hot Rod are well suited for what I call self contained object graph (i.e. where Dog would be an embedded object of Person and not a separate Entity). In that situation, there is a single cache. > > I see where you come from but I don't think requiring people to use a single cache for all the entities is an option. Besides a natural logical separation, different data has different storage requirements: security, access patterns, consistency, durability, availability etc. For most of the non-trivial use cases, using a single cache just wont do. > >> One cache per entity does make sense for API that do support what I call connected entities. Hibernate OGM specifically. > > OGM does a great job covering this, but it is very specific: java only and OOP - our C/S mode, hotrod specifically, is language independent and not OOP. Also I would like to comment on the following statements: > "I believe a cache API and Hot Rod are well suited to address up to the self contained object graph use case with a couple of relations maintained manually by the application but that cannot be queried. For the connected entities use case, only a high level paradigm is suited like JPA." > > I don't think storing object graphs should be under scrutiny here: Infinispan C/S mode (and there's where most of the client focus is BTW) has a schema (prtobuf) that does not support object graphs. I also think expecting people to use multiple caches for multiple data types is a solid assumption to start from. And here's me speculating: these data types have logical relations between them so people will ask for querying. In order to queries on multiple data types, you can either merge them together (your suggestion) or support some sort of new cross-cache indexing/querying/api. x-cache querying is more flexible and less restraining than merging data, but from what I understand from you has certain implementation challenges. There's no pressure to take a decision now around supporting queries spreading multiple caches - just something to keep an eye on when dealing with use cases/users. ATM merging data is the only solution available, let's wait and see if people ask for more. > >> But please read the wiki page first before commenting. I did spend a lot of time on it >> https://github.com/infinispan/infinispan/wiki/A-continuum-of-data-structure-and-query-complexity > > I do read your comments and I really appreciate your feedback. We come from slightly different worlds and look at things from different angles, but discussions like this raise many good points. > >> >> Emmanuel >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org)