From bban at redhat.com  Sun Feb  2 05:33:51 2014
From: bban at redhat.com (Bela Ban)
Date: Sun, 02 Feb 2014 11:33:51 +0100
Subject: [infinispan-dev] Ditching ASYNC modes for
	REPL/DIST/INV/CacheStores?
In-Reply-To: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com>
References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com>
Message-ID: <52EE1F0F.40103@redhat.com>

At the JGroups level, ASYNC generates *less* traffic than SYNC. So if 
you do sync under the cover and use a future to make it async at the API 
level, you're incurring more overhead, namely the messages sending back 
the responses.

Not sure about the Infinispan async API, but I'd assume this would also 
use more threads.

On 31/01/14 08:08, Galder Zamarre?o wrote:
> Hi all,
>
> The following came to my mind yesterday: I think we should ditch
> ASYNC modes for DIST/REPL/INV and our async cache store
> functionality.
>
> Instead, whoever wants to store something asyncronously should use
> asynchronous methods, i.e. call putAsync. So, this would mean that
> when you call put(), it's always sync. This would reduce the
> complexity and configuration of our code base, without affecting our
> functionality, and it would make things more logical IMO.
>
> WDYT?
>
> Cheers, -- Galder Zamarre?o galder at redhat.com twitter.com/galderz
>
> Project Lead, Escalante http://escalante.io
>
> Engineer, Infinispan http://infinispan.org
>
>
> _______________________________________________ infinispan-dev
> mailing list infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

From bban at redhat.com  Sun Feb  2 05:35:15 2014
From: bban at redhat.com (Bela Ban)
Date: Sun, 02 Feb 2014 11:35:15 +0100
Subject: [infinispan-dev] Ditching ASYNC modes for
	REPL/DIST/INV/CacheStores?
In-Reply-To: <52EB9889.9070800@redhat.com>
References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com>	<52EB5197.4050801@redhat.com>
	<52EB7F6C.505@redhat.com> <52EB9889.9070800@redhat.com>
Message-ID: <52EE1F63.5050606@redhat.com>


On 31/01/14 13:35, Radim Vansa wrote:
> Worth to note that Infinispan does not have true async operation -
> executing synchronous request in another threadpool is rather simplistic
> solution that has serious drawbacks (I can imagine a situation where I'd
> do 100 async gets in parallel, but this would drain the whole threadpool).

+1000, I should have read the entire thread before replying... :-)

>
> Implementing that would require serious changes in all interceptors,
> because you wouldn't be able to call
>
> visitWhateverCommand(command) {
>      /* do something */
>      try {
>         invokeNextInterceptor(command);
>      } finally {
>         /* do another stuff */
>      }
> }
>
> - you'd have to put all local state prior to invoking next interceptor
> to context. And you'd need twice as many methods, because now the code
> would explicitly traverse interceptor stack in both directions.
>
> Still, I believe that this may be something to consider/plan for future.
>
> And then, yes, you'd need just
>
> put(key, value) {
>      future = putAsync(key, value);
>      return sync ? future.get() : null;
> }
>
> Radim
>
> On 01/31/2014 11:48 AM, Tristan Tarrant wrote:
>> Couldn't this be handled higher up in our implementatoin then ?
>>
>> If I enable an async mode, all puts / gets become putAsync/getAsync
>> transparently to both the application and to the state transfer.
>>
>> Tristan
>>
>> On 01/31/2014 08:32 AM, Dennis Reed wrote:
>>> It would be a loss of functionality.
>>>
>>> As a common example, the AS web session replication cache is configured
>>> for ASYNC by default, for performance reasons.
>>> But it can be changed to SYNC to guarantee that when the request
>>> finishes that the session was replicated.
>>>
>>> That wouldn't be possible if you could no longer switch between
>>> ASYNC/SYNC with just a configuration change.
>>>
>>> -Dennis
>>>
>>> On 01/31/2014 01:08 AM, Galder Zamarre?o wrote:
>>>> Hi all,
>>>>
>>>> The following came to my mind yesterday: I think we should ditch ASYNC modes for DIST/REPL/INV and our async cache store functionality.
>>>>
>>>> Instead, whoever wants to store something asyncronously should use asynchronous methods, i.e. call putAsync. So, this would mean that when you call put(), it's always sync. This would reduce the complexity and configuration of our code base, without affecting our functionality, and it would make things more logical IMO.
>>>>
>>>> WDYT?
>>>>
>>>> Cheers,
>>>> --
>>>> Galder Zamarre?o
>>>> galder at redhat.com
>>>> twitter.com/galderz
>>>>
>>>> Project Lead, Escalante
>>>> http://escalante.io
>>>>
>>>> Engineer, Infinispan
>>>> http://infinispan.org
>>>>
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

From bban at redhat.com  Sun Feb  2 05:42:40 2014
From: bban at redhat.com (Bela Ban)
Date: Sun, 02 Feb 2014 11:42:40 +0100
Subject: [infinispan-dev] Kyro performance (Was: reusing infinispan's
 marshalling)
In-Reply-To: <CAFm4XO2QmEqjxFTECDsh32-M7sWKeM-hrgUy7C5qZajF2Zw+Tg@mail.gmail.com>
References: <CAFm4XO2QmEqjxFTECDsh32-M7sWKeM-hrgUy7C5qZajF2Zw+Tg@mail.gmail.com>
Message-ID: <52EE2120.2090806@redhat.com>

I recently had a very bad experience with Kryo. A JGroups user used Kryo 
to marshal data types into byte buffers which he then broadcast around 
using JGroups (details in [1]).

Turns out the culprit was Kryo temporarily flipping bits in an already 
marshalled buffer passed to JGroups. Of course retransmission would then 
cause a corrupted buffer to be sent.

The solution was to copy the buffer which forfeits the advantages of 
using Kryo. Not being an expert on Kryo, perhaps this could be done 
differently, e.g. by synchronizing around a buffer...

[1] https://issues.jboss.org/browse/JGRP-1718

On 31/01/14 17:59, Sanne Grinovero wrote:
> Changing the subject, as Adrian will need a reply to his (more
> important) question.
>
> I don't think we should go shopping for different marshaller
> implementations, especially given other priorities.
>
> I've been keeping an eye on Kryo since a while and it looks very good
> indeed, but JBMarshaller is serving us pretty well and I'm loving its
> reliability.
>
> If we need more speed in this area, I'd rather see us perform some
> very accurate benchmark development and try to understand why Kyro is
> faster than JBM (if it really is), and potentially improve JBM.
> For example as I've already suggested, it's using an internal
> identityMap to detect graphs, and often we might not need that, or
> also it would be nice to refactor it to write to an existing byte
> stream rather than having it allocate internal buffers, and finally we
> might want a "stateless edition" so to get rid of need for pooling of
> JBMar instances.
>
>   -- Sanne
>
>
>
> On 31 January 2014 16:29, Vladimir Blagojevic <vblagoje at redhat.com> wrote:
>> Not 100% related to what you are asking about but have a look at this
>> post and the discussion that "erupted":
>>
>> http://gridgain.blogspot.ca/2012/12/java-serialization-good-fast-and-faster.html
>>
>> Vladimir
>> On 1/30/2014, 7:13 AM, Adrian Nistor wrote:
>>> Hi list!
>>>
>>> I've been pondering about re-using the marshalling machinery of
>>> Infinispan in another project, specifically in ProtoStream, where I'm
>>> planning to add it as a test scoped dependency so I can create a
>>> benchmark  to compare marshalling performace. I'm basically interested
>>> in comparing ProtoStream and Infinispan's JBoss Marshalling based
>>> mechanism. Comparing against plain JBMAR, without using the
>>> ExternalizerTable and Externalizers introduced by Infinispan is not
>>> going to get me accurate results.
>>>
>>> But how? I see the marshaling is spread across infinispan-commons and
>>> infinispan-core modules.
>>>
>>> Thanks!
>>> Adrian
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

From rvansa at redhat.com  Mon Feb  3 09:10:29 2014
From: rvansa at redhat.com (Radim Vansa)
Date: Mon, 03 Feb 2014 15:10:29 +0100
Subject: [infinispan-dev] Ditching ASYNC modes for
	REPL/DIST/INV/CacheStores?
In-Reply-To: <CA+YCuUNG1Pf0tD2ZtRJJTmsvKY4Gtyo0QkSCCaamCzWpHwbxzg@mail.gmail.com>
References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com>
	<52EB5197.4050801@redhat.com> <52EB7F6C.505@redhat.com>
	<52EB9889.9070800@redhat.com>
	<CA+YCuUNG1Pf0tD2ZtRJJTmsvKY4Gtyo0QkSCCaamCzWpHwbxzg@mail.gmail.com>
Message-ID: <52EFA355.2070203@redhat.com>

See below...

On Fri, Jan 31, 2014 at 7:35 AM, Radim Vansa <rvansa at redhat.com> wrote:
>> Worth to note that Infinispan does not have true async operation -
>> executing synchronous request in another threadpool is rather simplistic
>> solution that has serious drawbacks (I can imagine a situation where I'd
>> do 100 async gets in parallel, but this would drain the whole threadpool).
> I agree if we could optimize this with batching it would make it better.
>
>> Implementing that would require serious changes in all interceptors,
>> because you wouldn't be able to call
>>
>> visitWhateverCommand(command) {
>>      /* do something */
>>      try {
>>         invokeNextInterceptor(command);
>>      } finally {
>>         /* do another stuff */
>>      }
>> }
>>
>> - you'd have to put all local state prior to invoking next interceptor
>> to context. And you'd need twice as many methods, because now the code
>> would explicitly traverse interceptor stack in both directions.
> I am not quite sure what you mean here.  Async transport currently
> traverses the interceptors for originator and receiver (albeit
> originator goes back up without a response).
>
>> Still, I believe that this may be something to consider/plan for future.
>>
>> And then, yes, you'd need just
>>
>> put(key, value) {
>>      future = putAsync(key, value);
>>      return sync ? future.get() : null;
>> }
> For sync we would want to invoke directly to avoid context switching.

I think you haven't properly understood what I was talking about: the 
putAsync should not switch context at all in the ideal design. It should 
traverse through the interceptors all the way down (logically, in 
current behaviour), invoke JGroups async API and jump out. Then, as soon 
as the response is received, the thread which delivered it should 
traverse the interceptor stack up (again, logically), and fire the future.

Radim

From sanne at infinispan.org  Mon Feb  3 09:54:52 2014
From: sanne at infinispan.org (Sanne Grinovero)
Date: Mon, 3 Feb 2014 14:54:52 +0000
Subject: [infinispan-dev] Ditching ASYNC modes for
	REPL/DIST/INV/CacheStores?
In-Reply-To: <52EFA355.2070203@redhat.com>
References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com>
	<52EB5197.4050801@redhat.com> <52EB7F6C.505@redhat.com>
	<52EB9889.9070800@redhat.com>
	<CA+YCuUNG1Pf0tD2ZtRJJTmsvKY4Gtyo0QkSCCaamCzWpHwbxzg@mail.gmail.com>
	<52EFA355.2070203@redhat.com>
Message-ID: <CAFm4XO3ZwDY5Fxp5sVajDU4bju287-4P_JiDmAm-z+aQo2JKrw@mail.gmail.com>

On 3 February 2014 14:10, Radim Vansa <rvansa at redhat.com> wrote:
> See below...
>
> On Fri, Jan 31, 2014 at 7:35 AM, Radim Vansa <rvansa at redhat.com> wrote:
>>> Worth to note that Infinispan does not have true async operation -
>>> executing synchronous request in another threadpool is rather simplistic
>>> solution that has serious drawbacks (I can imagine a situation where I'd
>>> do 100 async gets in parallel, but this would drain the whole threadpool).
>> I agree if we could optimize this with batching it would make it better.
>>
>>> Implementing that would require serious changes in all interceptors,
>>> because you wouldn't be able to call
>>>
>>> visitWhateverCommand(command) {
>>>      /* do something */
>>>      try {
>>>         invokeNextInterceptor(command);
>>>      } finally {
>>>         /* do another stuff */
>>>      }
>>> }
>>>
>>> - you'd have to put all local state prior to invoking next interceptor
>>> to context. And you'd need twice as many methods, because now the code
>>> would explicitly traverse interceptor stack in both directions.
>> I am not quite sure what you mean here.  Async transport currently
>> traverses the interceptors for originator and receiver (albeit
>> originator goes back up without a response).
>>
>>> Still, I believe that this may be something to consider/plan for future.
>>>
>>> And then, yes, you'd need just
>>>
>>> put(key, value) {
>>>      future = putAsync(key, value);
>>>      return sync ? future.get() : null;
>>> }
>> For sync we would want to invoke directly to avoid context switching.
>
> I think you haven't properly understood what I was talking about: the
> putAsync should not switch context at all in the ideal design. It should
> traverse through the interceptors all the way down (logically, in
> current behaviour), invoke JGroups async API and jump out. Then, as soon
> as the response is received, the thread which delivered it should
> traverse the interceptor stack up (again, logically), and fire the future.

+1 much cleaner, I love it. Actually wasn't aware the current code
didn't do this :-(

Sanne

>
> Radim
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

From mudokonman at gmail.com  Mon Feb  3 10:02:41 2014
From: mudokonman at gmail.com (William Burns)
Date: Mon, 3 Feb 2014 10:02:41 -0500
Subject: [infinispan-dev] Ditching ASYNC modes for
	REPL/DIST/INV/CacheStores?
In-Reply-To: <CAFm4XO3ZwDY5Fxp5sVajDU4bju287-4P_JiDmAm-z+aQo2JKrw@mail.gmail.com>
References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com>
	<52EB5197.4050801@redhat.com> <52EB7F6C.505@redhat.com>
	<52EB9889.9070800@redhat.com>
	<CA+YCuUNG1Pf0tD2ZtRJJTmsvKY4Gtyo0QkSCCaamCzWpHwbxzg@mail.gmail.com>
	<52EFA355.2070203@redhat.com>
	<CAFm4XO3ZwDY5Fxp5sVajDU4bju287-4P_JiDmAm-z+aQo2JKrw@mail.gmail.com>
Message-ID: <CA+YCuUPge2spEAH=6jBLbbYyCp_DA0MkOu9ym5MEfL5pZvUXWA@mail.gmail.com>

On Mon, Feb 3, 2014 at 9:54 AM, Sanne Grinovero <sanne at infinispan.org> wrote:
> On 3 February 2014 14:10, Radim Vansa <rvansa at redhat.com> wrote:
>> See below...
>>
>> On Fri, Jan 31, 2014 at 7:35 AM, Radim Vansa <rvansa at redhat.com> wrote:
>>>> Worth to note that Infinispan does not have true async operation -
>>>> executing synchronous request in another threadpool is rather simplistic
>>>> solution that has serious drawbacks (I can imagine a situation where I'd
>>>> do 100 async gets in parallel, but this would drain the whole threadpool).
>>> I agree if we could optimize this with batching it would make it better.
>>>
>>>> Implementing that would require serious changes in all interceptors,
>>>> because you wouldn't be able to call
>>>>
>>>> visitWhateverCommand(command) {
>>>>      /* do something */
>>>>      try {
>>>>         invokeNextInterceptor(command);
>>>>      } finally {
>>>>         /* do another stuff */
>>>>      }
>>>> }
>>>>
>>>> - you'd have to put all local state prior to invoking next interceptor
>>>> to context. And you'd need twice as many methods, because now the code
>>>> would explicitly traverse interceptor stack in both directions.
>>> I am not quite sure what you mean here.  Async transport currently
>>> traverses the interceptors for originator and receiver (albeit
>>> originator goes back up without a response).
>>>
>>>> Still, I believe that this may be something to consider/plan for future.
>>>>
>>>> And then, yes, you'd need just
>>>>
>>>> put(key, value) {
>>>>      future = putAsync(key, value);
>>>>      return sync ? future.get() : null;
>>>> }
>>> For sync we would want to invoke directly to avoid context switching.
>>
>> I think you haven't properly understood what I was talking about: the
>> putAsync should not switch context at all in the ideal design. It should
>> traverse through the interceptors all the way down (logically, in
>> current behaviour), invoke JGroups async API and jump out. Then, as soon
>> as the response is received, the thread which delivered it should
>> traverse the interceptor stack up (again, logically), and fire the future.

A Future doesn't make much sense with an async transport.  The problem
is with an async transport you never get back a response so you never
know when the actual command is completed and thus a Future is
worthless.  The caller wouldn't know if they could rely on the use of
the Future or not.

Also it depends what you are trying to do with async.  Currently async
transport is only for sending messages to another node, we never think
of when we are the owning node.  In this case the calling thread would
have to go down the interceptor stack and acquire any locks if it is
the owner, thus causing this "async" to block if you have any
contention on the given key.  The use of another thread would allow
the calling thread to be able to return immediately no matter what
else is occurring.  Also I don't see what is so wrong about having a
context switch to run something asynchronously, we shouldn't have a
context switch to block the user thread imo, which is very possible
with locking.

>
> +1 much cleaner, I love it. Actually wasn't aware the current code
> didn't do this :-(

This is what the current async transport does, but it does nothing with Futures.

>
> Sanne
>
>>
>> Radim
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

From ttarrant at redhat.com  Mon Feb  3 10:49:02 2014
From: ttarrant at redhat.com (Tristan Tarrant)
Date: Mon, 03 Feb 2014 16:49:02 +0100
Subject: [infinispan-dev] Weekly IRC meeting minutes
Message-ID: <52EFBA6E.2060603@redhat.com>

Dear all,

you can read the transcript of this week's IRC meeting at:


http://transcripts.jboss.org/meeting/irc.freenode.org/infinispan/2014/infinispan.2014-02-03-15.12.html

Tristan

From galder at redhat.com  Mon Feb  3 11:07:52 2014
From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=)
Date: Mon, 3 Feb 2014 17:07:52 +0100
Subject: [infinispan-dev] New Cache Entry Notifications
In-Reply-To: <1FAA84A0-3AFE-4DA7-980F-AF1FB5725F5A@redhat.com>
References: <CA+YCuUNm6pXT4MYBOCmDMP1oKHxuueBKD5wdz0L_-MPasYV_Xg@mail.gmail.com>
	<1FAA84A0-3AFE-4DA7-980F-AF1FB5725F5A@redhat.com>
Message-ID: <AB79B768-654C-42BA-BB44-BB6BBE24E634@redhat.com>


On 23 Jan 2014, at 18:54, Mircea Markus <mmarkus at redhat.com> wrote:

> 
> On Jan 23, 2014, at 5:48 PM, William Burns <mudokonman at gmail.com> wrote:
> 
>> Hello all,
>> 
>> I have been working with notifications and most recently I have come
>> to look into events generated when a new entry is created.  Now
>> normally I would just expect a CacheEntryCreatedEvent to be raised.
>> However we currently raise a CacheEntryModifiedEvent event and then a
>> CacheEntryCreatedEvent.  I notice that there are comments around the
>> code saying that tests require both to be fired.
> 
> it doesn't sound right to me: modified is different than created.

I?ve lost count the number of times I?ve raised this up in the dev mailing list :|

And, if CacheEntryModifiedEvent has a method called isCreated(), is cos I added it in order to differentiate between a modify and a create without breaking backwards compatibility and expectations of events to be received. Just need to trace back the jira issue number, and associated forum threads ;) :p

> 
>> 
>> I am wondering if anyone has an objection to only raising a
>> CacheEntryCreatedEvent on a new cache entry being created.  

It?d break expectations of existing applications that expect certain events. It?s a very difficult one to swallow.

Plus, there?s JCache specifications, which adds listeners, and gets them right. Eventually everyone should move towards that.

>> Does
>> anyone know why we raise both currently?  

Legacy really.

>> Was it just so the
>> PutKeyValueCommand could more ignorantly just raise the
>> CacheEntryModified pre Event?
>> 
>> Any input would be appreciated, Thanks.
> 
> Cheers,
> -- 
> Mircea Markus
> Infinispan lead (www.infinispan.org)
> 
> 
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From rvansa at redhat.com  Mon Feb  3 11:28:31 2014
From: rvansa at redhat.com (Radim Vansa)
Date: Mon, 03 Feb 2014 17:28:31 +0100
Subject: [infinispan-dev] Ditching ASYNC modes for
	REPL/DIST/INV/CacheStores?
In-Reply-To: <CA+YCuUPge2spEAH=6jBLbbYyCp_DA0MkOu9ym5MEfL5pZvUXWA@mail.gmail.com>
References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com>
	<52EB5197.4050801@redhat.com> <52EB7F6C.505@redhat.com>
	<52EB9889.9070800@redhat.com>
	<CA+YCuUNG1Pf0tD2ZtRJJTmsvKY4Gtyo0QkSCCaamCzWpHwbxzg@mail.gmail.com>
	<52EFA355.2070203@redhat.com>
	<CAFm4XO3ZwDY5Fxp5sVajDU4bju287-4P_JiDmAm-z+aQo2JKrw@mail.gmail.com>
	<CA+YCuUPge2spEAH=6jBLbbYyCp_DA0MkOu9ym5MEfL5pZvUXWA@mail.gmail.com>
Message-ID: <52EFC3AF.5060201@redhat.com>

>>>> For sync we would want to invoke directly to avoid context switching.
>>> I think you haven't properly understood what I was talking about: the
>>> putAsync should not switch context at all in the ideal design. It should
>>> traverse through the interceptors all the way down (logically, in
>>> current behaviour), invoke JGroups async API and jump out. Then, as soon
>>> as the response is received, the thread which delivered it should
>>> traverse the interceptor stack up (again, logically), and fire the future.
> A Future doesn't make much sense with an async transport.  The problem
> is with an async transport you never get back a response so you never
> know when the actual command is completed and thus a Future is
> worthless.  The caller wouldn't know if they could rely on the use of
> the Future or not.

You're right, there's one important difference between putAsync and put 
with async transport: in the first case you can find out when the 
request is completed while you cannot with the latter. Not requiring the 
ack can be an important optimization. I think that both versions are 
very valid: first mostly for bulk operations = reduction of latency, 
second for modifications that are acceptable to fail without handling that.
I had the first case in my mind when talking about async operations, and 
there the futures are necessary.

>
> Also it depends what you are trying to do with async.  Currently async
> transport is only for sending messages to another node, we never think
> of when we are the owning node.  In this case the calling thread would
> have to go down the interceptor stack and acquire any locks if it is
> the owner, thus causing this "async" to block if you have any
> contention on the given key.  The use of another thread would allow
> the calling thread to be able to return immediately no matter what
> else is occurring.  Also I don't see what is so wrong about having a
> context switch to run something asynchronously, we shouldn't have a
> context switch to block the user thread imo, which is very possible
> with locking.

This is an important notice! Locking would complicate the design a lot, 
because the thread in "async" mode should do only tryLocks - if this 
fails, further processing should be dispatched to another thread. Not 
sure if this could be implemented at all, because the thread may be 
blocked inside JGroups as well (async API is about receiving the 
response asynchronously, not about sending the message asynchronously).

I don't say that the context switch is that bad. My concern is that you 
have a very limited amount of requests that can be processed in 
parallel. I consider a "request" something pretty lightweight in concept 
- but one thread per request makes this rather heavyweight stuff.

>
>> +1 much cleaner, I love it. Actually wasn't aware the current code
>> didn't do this :-(
> This is what the current async transport does, but it does nothing with Futures.

Nevermind the futures, this is not the important part. It's not about 
async transport neither, it's about async executors.
(okay, the thread was about dropping async transport, I have hijacked it)

Radim

-- 
Radim Vansa <rvansa at redhat.com>
JBoss DataGrid QA


From mudokonman at gmail.com  Mon Feb  3 11:29:56 2014
From: mudokonman at gmail.com (William Burns)
Date: Mon, 3 Feb 2014 11:29:56 -0500
Subject: [infinispan-dev] New Cache Entry Notifications
In-Reply-To: <AB79B768-654C-42BA-BB44-BB6BBE24E634@redhat.com>
References: <CA+YCuUNm6pXT4MYBOCmDMP1oKHxuueBKD5wdz0L_-MPasYV_Xg@mail.gmail.com>
	<1FAA84A0-3AFE-4DA7-980F-AF1FB5725F5A@redhat.com>
	<AB79B768-654C-42BA-BB44-BB6BBE24E634@redhat.com>
Message-ID: <CA+YCuUOzxgMYH5_Hr6RTciNAvjVDHGWbZzGQZrQv+dA6SDKcaA@mail.gmail.com>

On Mon, Feb 3, 2014 at 11:07 AM, Galder Zamarre?o <galder at redhat.com> wrote:
>
> On 23 Jan 2014, at 18:54, Mircea Markus <mmarkus at redhat.com> wrote:
>
>>
>> On Jan 23, 2014, at 5:48 PM, William Burns <mudokonman at gmail.com> wrote:
>>
>>> Hello all,
>>>
>>> I have been working with notifications and most recently I have come
>>> to look into events generated when a new entry is created.  Now
>>> normally I would just expect a CacheEntryCreatedEvent to be raised.
>>> However we currently raise a CacheEntryModifiedEvent event and then a
>>> CacheEntryCreatedEvent.  I notice that there are comments around the
>>> code saying that tests require both to be fired.
>>
>> it doesn't sound right to me: modified is different than created.
>
> I've lost count the number of times I've raised this up in the dev mailing list :|
>
> And, if CacheEntryModifiedEvent has a method called isCreated(), is cos I added it in order to differentiate between a modify and a create without breaking backwards compatibility and expectations of events to be received. Just need to trace back the jira issue number, and associated forum threads ;) :p

Ah nice I didn't even notice the method until you pointed it out.

>
>>
>>>
>>> I am wondering if anyone has an objection to only raising a
>>> CacheEntryCreatedEvent on a new cache entry being created.
>
> It'd break expectations of existing applications that expect certain events. It's a very difficult one to swallow.

I agree. Maybe I should change to if anyone minds if Cluster Listeners
only raise the CacheEntryModifiedEvent on an entry creation for
cluster listeners instead?  This wouldn't break existing assumptions
since we don't currently support Cluster Listeners.  The only thing is
it wouldn't be consistent with regular listeners...


>
> Plus, there's JCache specifications, which adds listeners, and gets them right. Eventually everyone should move towards that.

Just to be clear you are saying the JCache only raises a single event
for change and create right?

>
>>> Does
>>> anyone know why we raise both currently?
>
> Legacy really.
>
>>> Was it just so the
>>> PutKeyValueCommand could more ignorantly just raise the
>>> CacheEntryModified pre Event?
>>>
>>> Any input would be appreciated, Thanks.
>>
>> Cheers,
>> --
>> Mircea Markus
>> Infinispan lead (www.infinispan.org)
>>
>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> --
> Galder Zamarre?o
> galder at redhat.com
> twitter.com/galderz
>
> Project Lead, Escalante
> http://escalante.io
>
> Engineer, Infinispan
> http://infinispan.org
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


From dan.berindei at gmail.com  Mon Feb  3 13:01:43 2014
From: dan.berindei at gmail.com (Dan Berindei)
Date: Mon, 3 Feb 2014 20:01:43 +0200
Subject: [infinispan-dev] Ditching ASYNC modes for
	REPL/DIST/INV/CacheStores?
In-Reply-To: <52EFC3AF.5060201@redhat.com>
References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com>
	<52EB5197.4050801@redhat.com> <52EB7F6C.505@redhat.com>
	<52EB9889.9070800@redhat.com>
	<CA+YCuUNG1Pf0tD2ZtRJJTmsvKY4Gtyo0QkSCCaamCzWpHwbxzg@mail.gmail.com>
	<52EFA355.2070203@redhat.com>
	<CAFm4XO3ZwDY5Fxp5sVajDU4bju287-4P_JiDmAm-z+aQo2JKrw@mail.gmail.com>
	<CA+YCuUPge2spEAH=6jBLbbYyCp_DA0MkOu9ym5MEfL5pZvUXWA@mail.gmail.com>
	<52EFC3AF.5060201@redhat.com>
Message-ID: <CA+nfvwSsJAPQdrq7sEzHHHjEJZqh_Mg1bmuaMEmzySdrBVSCfQ@mail.gmail.com>

On Mon, Feb 3, 2014 at 6:28 PM, Radim Vansa <rvansa at redhat.com> wrote:

> >>>> For sync we would want to invoke directly to avoid context switching.
> >>> I think you haven't properly understood what I was talking about: the
> >>> putAsync should not switch context at all in the ideal design. It
> should
> >>> traverse through the interceptors all the way down (logically, in
> >>> current behaviour), invoke JGroups async API and jump out. Then, as
> soon
> >>> as the response is received, the thread which delivered it should
> >>> traverse the interceptor stack up (again, logically), and fire the
> future.
> > A Future doesn't make much sense with an async transport.  The problem
> > is with an async transport you never get back a response so you never
> > know when the actual command is completed and thus a Future is
> > worthless.  The caller wouldn't know if they could rely on the use of
> > the Future or not.
>
> You're right, there's one important difference between putAsync and put
> with async transport: in the first case you can find out when the
> request is completed while you cannot with the latter. Not requiring the
> ack can be an important optimization. I think that both versions are
> very valid: first mostly for bulk operations = reduction of latency,
> second for modifications that are acceptable to fail without handling that.
> I had the first case in my mind when talking about async operations, and
> there the futures are necessary.
>

A couple more differences:
1. You can't do commitAsync(), but you can configure the commit to be
replicated asynchronously (1PC). Although we did talk about removing that
option...
2. If you do putAsync(k, v1); putAsync(k, v2), there is no ordering between
the two and you might end up with k=v1 in the cache.


> >
> > Also it depends what you are trying to do with async.  Currently async
> > transport is only for sending messages to another node, we never think
> > of when we are the owning node.  In this case the calling thread would
> > have to go down the interceptor stack and acquire any locks if it is
> > the owner, thus causing this "async" to block if you have any
> > contention on the given key.  The use of another thread would allow
> > the calling thread to be able to return immediately no matter what
> > else is occurring.  Also I don't see what is so wrong about having a
> > context switch to run something asynchronously, we shouldn't have a
> > context switch to block the user thread imo, which is very possible
> > with locking.
>
> This is an important notice! Locking would complicate the design a lot,
> because the thread in "async" mode should do only tryLocks - if this
> fails, further processing should be dispatched to another thread. Not
> sure if this could be implemented at all, because the thread may be
> blocked inside JGroups as well (async API is about receiving the
> response asynchronously, not about sending the message asynchronously).
>
> I don't say that the context switch is that bad. My concern is that you
> have a very limited amount of requests that can be processed in
> parallel. I consider a "request" something pretty lightweight in concept
> - but one thread per request makes this rather heavyweight stuff.
>

We did talk in Farnborough/Palma about removing the current LockManager
with a queue-based structure like the one used for ordering total-order
transactions. And about removing the implicit stack in the current
interceptor stack with an explicit stack, to allow resuming a command
mid-execution. But the feeling I got was that neither is going to make it
into 7.0.


>
> >
> >> +1 much cleaner, I love it. Actually wasn't aware the current code
> >> didn't do this :-(
> > This is what the current async transport does, but it does nothing with
> Futures.
>
> Nevermind the futures, this is not the important part. It's not about
> async transport neither, it's about async executors.
> (okay, the thread was about dropping async transport, I have hijacked it)
>
> Radim
>
> --
> Radim Vansa <rvansa at redhat.com>
> JBoss DataGrid QA
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140203/8e6773f7/attachment.html 

From galder at redhat.com  Mon Feb  3 13:24:52 2014
From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=)
Date: Mon, 3 Feb 2014 19:24:52 +0100
Subject: [infinispan-dev] reusing infinispan's marshalling
In-Reply-To: <52EA41E0.2010505@redhat.com>
References: <52EA41E0.2010505@redhat.com>
Message-ID: <1824630C-1D48-480A-8687-E563A54E7E6A@redhat.com>

Not sure I understand the need to compare this.

JBMAR and ProtoStream are solving different problems. The former is focused on getting the best out of Java persistence. The latter is focused on serializing stuff in a plattform independent way.

IMO, it?s not an apples to apples comparison.

Cheers,

On 30 Jan 2014, at 13:13, Adrian Nistor <anistor at redhat.com> wrote:

> Hi list!
> 
> I've been pondering about re-using the marshalling machinery of 
> Infinispan in another project, specifically in ProtoStream, where I'm 
> planning to add it as a test scoped dependency so I can create a 
> benchmark  to compare marshalling performace. I'm basically interested 
> in comparing ProtoStream and Infinispan's JBoss Marshalling based 
> mechanism. Comparing against plain JBMAR, without using the 
> ExternalizerTable and Externalizers introduced by Infinispan is not 
> going to get me accurate results.
> 
> But how? I see the marshaling is spread across infinispan-commons and 
> infinispan-core modules.
> 
> Thanks!
> Adrian
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From galder at redhat.com  Tue Feb  4 02:14:23 2014
From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=)
Date: Tue, 4 Feb 2014 08:14:23 +0100
Subject: [infinispan-dev] Store as binary
In-Reply-To: <68B26C2A-389B-4C0A-A3C6-DBE3B0526DAC@redhat.com>
References: <52D92AC4.7080701@redhat.com>
	<C0E65F3E-03F9-4E9D-9642-C6B3EAB506BC@redhat.com>
	<52DCF101.3020903@infinispan.org>
	<87020416-72D3-412E-818B-A7F9161355CC@redhat.com>
	<52DCF70C.4090404@infinispan.org> <52DD4534.7080209@redhat.com>
	<CAFm4XO2qkdDTeEnp3yvkTnsO-U6w5xzUFq38yPnj+cvPD8JqVQ@mail.gmail.com>
	<AE016285-BB80-4BFA-82BE-ACEB4570D733@redhat.com>
	<F56699FB-6A82-4A92-8F66-29DE2096CF04@redhat.com>
	<CAFm4XO12OFcnbsY--J2wKV7prs8JHULa_3ksXqPC=Bb5cKryeQ@mail.gmail.com>
	<68B26C2A-389B-4C0A-A3C6-DBE3B0526DAC@redhat.com>
Message-ID: <A461759C-57A9-486D-97D8-9E33847E9A1E@redhat.com>


On 21 Jan 2014, at 17:45, Mircea Markus <mmarkus at redhat.com> wrote:

> 
> On Jan 21, 2014, at 2:13 PM, Sanne Grinovero <sanne at infinispan.org> wrote:
> 
>> On 21 January 2014 13:37, Mircea Markus <mmarkus at redhat.com> wrote:
>>> 
>>> On Jan 21, 2014, at 1:21 PM, Galder Zamarre?o <galder at redhat.com> wrote:
>>> 
>>>>> What's the point for these tests?
>>>> 
>>>> +1
>>> 
>>> To validate if storing the data in binary format yields better performance than store is as a POJO.
>> 
>> That will highly depend on the scenarios you want to test for. AFAIK
>> this started after Paul described how session replication works in
>> WildFly, and we already know that both strategies are suboptimal with
>> the current options available: in his case the active node will always
>> write on the POJO, while the backup node will essentially only need to
>> store the buffer "just in case" he might need to take over.
> 
> Indeed as it is today, it doesn't make sense for WildFly's session replication.
> 
>> 
>> Sure, one will be slower, but if you want to make a suggestion to him
>> about which configuration he should be using, we should measure his
>> use case, not a different one.
>> 
>> Even then as discussed in Palma, an in memory String representation
>> might be way more compact because of pooling of strings and a very
>> high likelihood for repeated headers (as common in web frameworks),
> 
> pooling like in String.intern()? 
> Even so, if most of your access to the String is to serialize it and sent is remotely then you have a serialization cost(CPU) to pay for the reduced size.

Serialization has a cost, but nothing compared with the transport itself, and you don?t have to go very far to see the impact of transport. Just recently we were chasing some performance regression and even though there were some changes in serialization, the impact of my improvements was minimal, max 2-3%. Optimal network and transport configuration is more important IMO, and once again, misconfiguration in that layer is what was causing us to be ~20% slower.

> 
>> so
>> you might want to measure the CPU vs storage cost on the receiving
>> side.. but then again your results will definitely depend on the input
>> data and assumptions on likelihood of failover, how often is being
>> written on the owner node vs on the other node (since he uses
>> locality), etc.. many factors I'm not seeing being considered here and
>> which could make a significant difference.
> 
> I'm looking for the default setting of storeAsBinary in the configurations we ship. I think the default configs should be optimized for distribution, random key access (every reads/writes for any key executes on every node of the cluster with the same probability) for both read an write.

I?m with Sanne on this. I still think this is not a useful exercise really, since serialization is not huge cost in total time spent. Our latency is driven by waiting for others to reply to our requests, and that?s the driver on sync mode. In async, you can forget about the serialization cost if you use putAsync(). 

I find it way more useful to look at Infinispan all the time and consider what things we should be ditching to make our configuration smaller, our memory consumption smaller, and a smaller code base.

> 
>> 
>>> As of now, it doesn't so I need to check why.
>> 
>> You could play with the test parameters until it produces an output
>> you like better, but I still see no point?
> 
> the point is to provide the best defaults params for the default config, and see what's the usefulness of storeAsBinary.  
> 
>> This is not a realistic
>> scenario, at best it could help us document suggestions about which
>> scenarios you'd want to keep the option enabled vs disabled, but then
>> again I think we're wasting time as we could implement a better
>> strategy for Paul's use case: one which never deserializes a value
>> received from a remote node until it's been requested as a POJO, but
>> keeps the POJO as-is when it's stored locally.
> 
> I disagree: Paul's scenario, whilst very important, is quite specific. For what I consider the general case (random key access, see above), your approach is suboptimal.  
> 
> 
>> I believe that would
>> make sense also for OGM and probably most other users of Embedded.
>> Basically, that would re-implement something similar to the previous
>> design but simplifying it a bit so that it doesn't allow for a
>> back-and-forth conversion between storage types but rather dynamically
>> favors a specific storage strategy.
> 
> It all boils down to what we want to optimize for: random key access or some degree of affinity. I think the former is the default.
> One way or the other, from the test Radim ran with random key access, the storeAsBinary doesn't bring any benefit and it should: http://lists.jboss.org/pipermail/infinispan-dev/2009-October/004299.html
> 
>> 
>> Cheers,
>> Sanne
>> 
>>> 
>>> Cheers,
>>> --
>>> Mircea Markus
>>> Infinispan lead (www.infinispan.org)
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> Cheers,
> -- 
> Mircea Markus
> Infinispan lead (www.infinispan.org)
> 
> 
> 
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From galder at redhat.com  Tue Feb  4 03:07:09 2014
From: galder at redhat.com (=?iso-8859-1?Q?Galder_Zamarre=F1o?=)
Date: Tue, 4 Feb 2014 09:07:09 +0100
Subject: [infinispan-dev] L1OnRehash Discussion
In-Reply-To: <CA+YCuUN-s4LubgNBEyP7ASh6msWT9ngzow_ZZ-121mXuesitWQ@mail.gmail.com>
References: <CA+YCuUN-s4LubgNBEyP7ASh6msWT9ngzow_ZZ-121mXuesitWQ@mail.gmail.com>
Message-ID: <5853A076-0FEF-4782-A4A3-6C84999DCA2D@redhat.com>


On 28 Jan 2014, at 15:29, William Burns <mudokonman at gmail.com> wrote:

> Hello everyone,
> 
> I wanted to discuss what I would say as dubious benefit of L1OnRehash
> especially compared to the benefits it provide.
> 
> L1OnRehash is used to retain a value by moving a previously owned
> value into the L1 when a rehash occurs and this node no longer owns
> that value  Also any current L1 values are removed when a rehash
> occurs.  Therefore it can only save a single remote get for only a few
> keys when a rehash occurs.
> 
> This by itself is fine however L1OnRehash has many edge cases to
> guarantee consistency as can be seen from
> https://issues.jboss.org/browse/ISPN-3838.  This can get quite
> complicated for a feature that gives marginal performance increases
> (especially given that this value may never have been read recently -
> at least normal L1 usage guarantees this).
> 
> My first suggestion is instead to deprecate the L1OnRehash
> configuration option and to remove this logic.

+1

> My second suggestion is a new implementation of L1OnRehash that is
> always enabled when L1 threshold is configured to 0.  For those not
> familiar L1 threshold controls whether invalidations are broadcasted
> instead of individual messages.  A value of 0 means to always
> broadcast.  This would allow for some benefits that we can't currently
> do:
> 
> 1. L1 values would never have to be invalidated on a rehash event
> (guarantee locality reads under rehash)
> 2. L1 requestors would not have to be tracked any longer
> 
> However every write would be required to send an invalidation which
> could slow write performance in additional cases (since we currently
> only send invalidations when requestors are found).  The difference
> would be lessened with udp, which is the transport I would assume
> someone would use when configuring L1 threshold to 0.

Sounds good to me, but I think you could go even beyond this and maybe get rid of threshold configuration option too?

If the transport is UDP and multicast is configured, invalidations are broadcasted (and apply the two benefits you mention). 
If UDP w/ unicast or TCP used, track invalidations and send them as unicasts.

Do we really need to expose these configuration options to the user?

> What do you guys think?  I am thinking that no one minds the removal
> of L1OnRehash that we have currently (if so let me know).  I am quite
> curious what others think about the changes for L1 threshold value of
> 0, maybe this configuration value is never used?
> 
> Thanks,
> 
> - Will
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From galder at redhat.com  Tue Feb  4 03:21:13 2014
From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=)
Date: Tue, 4 Feb 2014 09:21:13 +0100
Subject: [infinispan-dev] Module jars dissapearing leaving empty classes/
	folders and errors
Message-ID: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com>

Hi all,

We?re having issues with our Infinispan Server integration tests, which run within Wildfly 8.0.0.Beta1 (as I?m typing I?m wondering if we should just upgrade it to see if this goes away...?).

Quite often some of the runs fail with error message [1].

Having looked at the build environment when a run fails, you see this:

--
$ ls modules/system/layers/base/org/infinispan/server/rest/main 
drwxrwxr-x  2 g  staff    68B Feb  3 18:41 classes (<? a directory??)
-rw-r--r--  1 g  staff     1B Feb  3 18:41 classes.index
-rw-r--r--  1 g  staff   2.1K Feb  3 18:41 module.xml

$ ls modules/system/layers/base/org/infinispan/server/rest/main/classes 
drwxrwxr-x  2 g  staff    68B Feb  3 18:41 .
drwxrwxr-x  5 g  staff   170B Feb  3 18:41 ..

$ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml
<module xmlns="urn:jboss:module:1.1" name="org.infinispan.server.rest?>
...
<resource-root path="classes"/>
...

This is completely different to what happens with a successful run:

--
$ ls modules/system/layers/base/org/infinispan/server/rest/main 
-rw-r--r--  1 g  staff   103K Feb  3 19:40 infinispan-classes.jar (<? a jar file!)
-rw-r--r--  1 g  staff   278B Feb  3 19:40 infinispan-classes.jar.index
-rw-r--r--  1 g  staff   2.1K Feb  3 19:40 module.xml

$ jar tf modules/system/layers/base/org/infinispan/server/rest/main/infinispan-classes.jar | grep ExtendedHeaders
org/infinispan/rest/configuration/ExtendedHeaders.class

$ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml
<module xmlns="urn:jboss:module:1.1" name="org.infinispan.server.rest?>
...
<resource-root path="infinispan-classes.jar?/>
?

Anyone can explain what is going on here? Does it ring a bell to anyone? Is this a known Wildfly issue by any chance? 

[1] https://gist.github.com/galderz/bd74cebfc840ef3ae284

Cheers,
--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From galder at redhat.com  Tue Feb  4 03:56:52 2014
From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=)
Date: Tue, 4 Feb 2014 09:56:52 +0100
Subject: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing
	leaving empty classes/ folders and errors
In-Reply-To: <CAAoo=c7FF1Kkqih=Rdt4vfF5f2_xkBMFyh_Vvs+NkHw+Bdz5Xw@mail.gmail.com>
References: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com>
	<CAAoo=c7FF1Kkqih=Rdt4vfF5f2_xkBMFyh_Vvs+NkHw+Bdz5Xw@mail.gmail.com>
Message-ID: <D34293D4-A648-4537-B93E-CF87544A8C2C@redhat.com>


On 04 Feb 2014, at 09:37, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:

> This looks like an issue with your environment. The modules directory is static. Wildfly does not contain any code that messes with it. I would say the culprit is probably something in either your build process or your test suite.

Correction, this is happening with AS 7.2.0.Final (Wildfly 8 used somewhere else). I guess your answer still applies?

Cheers,

> 
> Stuart
> 
> 
> On Tue, Feb 4, 2014 at 10:21 AM, Galder Zamarre?o <galder at redhat.com> wrote:
> Hi all,
> 
> We?re having issues with our Infinispan Server integration tests, which run within Wildfly 8.0.0.Beta1 (as I?m typing I?m wondering if we should just upgrade it to see if this goes away...?).
> 
> Quite often some of the runs fail with error message [1].
> 
> Having looked at the build environment when a run fails, you see this:
> 
> --
> $ ls modules/system/layers/base/org/infinispan/server/rest/main
> drwxrwxr-x  2 g  staff    68B Feb  3 18:41 classes (<? a directory??)
> -rw-r--r--  1 g  staff     1B Feb  3 18:41 classes.index
> -rw-r--r--  1 g  staff   2.1K Feb  3 18:41 module.xml
> 
> $ ls modules/system/layers/base/org/infinispan/server/rest/main/classes
> drwxrwxr-x  2 g  staff    68B Feb  3 18:41 .
> drwxrwxr-x  5 g  staff   170B Feb  3 18:41 ..
> 
> $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml
> <module xmlns="urn:jboss:module:1.1" name="org.infinispan.server.rest?>
> ...
> <resource-root path="classes"/>
> ...
> 
> This is completely different to what happens with a successful run:
> 
> --
> $ ls modules/system/layers/base/org/infinispan/server/rest/main
> -rw-r--r--  1 g  staff   103K Feb  3 19:40 infinispan-classes.jar (<? a jar file!)
> -rw-r--r--  1 g  staff   278B Feb  3 19:40 infinispan-classes.jar.index
> -rw-r--r--  1 g  staff   2.1K Feb  3 19:40 module.xml
> 
> $ jar tf modules/system/layers/base/org/infinispan/server/rest/main/infinispan-classes.jar | grep ExtendedHeaders
> org/infinispan/rest/configuration/ExtendedHeaders.class
> 
> $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml
> <module xmlns="urn:jboss:module:1.1" name="org.infinispan.server.rest?>
> ...
> <resource-root path="infinispan-classes.jar?/>
> ?
> 
> Anyone can explain what is going on here? Does it ring a bell to anyone? Is this a known Wildfly issue by any chance?
> 
> [1] https://gist.github.com/galderz/bd74cebfc840ef3ae284
> 
> Cheers,
> --
> Galder Zamarre?o
> galder at redhat.com
> twitter.com/galderz
> 
> Project Lead, Escalante
> http://escalante.io
> 
> Engineer, Infinispan
> http://infinispan.org
> 
> 
> _______________________________________________
> jboss-as7-dev mailing list
> jboss-as7-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev
> 


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From galder at redhat.com  Tue Feb  4 04:14:34 2014
From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=)
Date: Tue, 4 Feb 2014 10:14:34 +0100
Subject: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing
	leaving empty classes/ folders and errors
In-Reply-To: <CAAoo=c4U6p4QTXj82LdUxiN6izuW05Z+YKy8Q6MjTkUiRFJ7DA@mail.gmail.com>
References: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com>
	<CAAoo=c7FF1Kkqih=Rdt4vfF5f2_xkBMFyh_Vvs+NkHw+Bdz5Xw@mail.gmail.com>
	<D34293D4-A648-4537-B93E-CF87544A8C2C@redhat.com>
	<CAAoo=c69Hk2UmTAQiy4VA4g8iT_qA=hHz4kk-RNZDA+zPA_H3w@mail.gmail.com>
	<CAAoo=c4U6p4QTXj82LdUxiN6izuW05Z+YKy8Q6MjTkUiRFJ7DA@mail.gmail.com>
Message-ID: <A500D8EC-751F-44A3-BB3E-659D88734B39@redhat.com>


On 04 Feb 2014, at 10:01, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:

> 
> 
> 
> On Tue, Feb 4, 2014 at 11:00 AM, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
> Yes, there is nothing in the server code that modified the modules directory.
> 
> Well, except for the new patching stuff, but that is not really relevant here.

The testsuite AS/WF builds are built out of the distribution build, which shows the same problem. The distribution we build uses the scripts we got from AS [1].

Do you see anything in there that could be causing this? We are using maven-antrun-plugin version 1.3, and take into account the lib.xml in [2].

Finally, do you have any suggestions on changes we could make to these files to further debug the issue?

Thanks a lot for your help!

[1] https://github.com/infinispan/infinispan/blob/master/server/integration/build/build.xml
[2] https://github.com/infinispan/infinispan/blob/master/server/integration/build/lib.xml

> 
> Stuart
>  
> 
> Stuart
> 
> 
> On Tue, Feb 4, 2014 at 10:56 AM, Galder Zamarre?o <galder at redhat.com> wrote:
> 
> On 04 Feb 2014, at 09:37, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
> 
> > This looks like an issue with your environment. The modules directory is static. Wildfly does not contain any code that messes with it. I would say the culprit is probably something in either your build process or your test suite.
> 
> Correction, this is happening with AS 7.2.0.Final (Wildfly 8 used somewhere else). I guess your answer still applies?
> 
> Cheers,
> 
> >
> > Stuart
> >
> >
> > On Tue, Feb 4, 2014 at 10:21 AM, Galder Zamarre?o <galder at redhat.com> wrote:
> > Hi all,
> >
> > We?re having issues with our Infinispan Server integration tests, which run within Wildfly 8.0.0.Beta1 (as I?m typing I?m wondering if we should just upgrade it to see if this goes away...?).
> >
> > Quite often some of the runs fail with error message [1].
> >
> > Having looked at the build environment when a run fails, you see this:
> >
> > --
> > $ ls modules/system/layers/base/org/infinispan/server/rest/main
> > drwxrwxr-x  2 g  staff    68B Feb  3 18:41 classes (<? a directory??)
> > -rw-r--r--  1 g  staff     1B Feb  3 18:41 classes.index
> > -rw-r--r--  1 g  staff   2.1K Feb  3 18:41 module.xml
> >
> > $ ls modules/system/layers/base/org/infinispan/server/rest/main/classes
> > drwxrwxr-x  2 g  staff    68B Feb  3 18:41 .
> > drwxrwxr-x  5 g  staff   170B Feb  3 18:41 ..
> >
> > $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml
> > <module xmlns="urn:jboss:module:1.1" name="org.infinispan.server.rest?>
> > ...
> > <resource-root path="classes"/>
> > ...
> >
> > This is completely different to what happens with a successful run:
> >
> > --
> > $ ls modules/system/layers/base/org/infinispan/server/rest/main
> > -rw-r--r--  1 g  staff   103K Feb  3 19:40 infinispan-classes.jar (<? a jar file!)
> > -rw-r--r--  1 g  staff   278B Feb  3 19:40 infinispan-classes.jar.index
> > -rw-r--r--  1 g  staff   2.1K Feb  3 19:40 module.xml
> >
> > $ jar tf modules/system/layers/base/org/infinispan/server/rest/main/infinispan-classes.jar | grep ExtendedHeaders
> > org/infinispan/rest/configuration/ExtendedHeaders.class
> >
> > $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml
> > <module xmlns="urn:jboss:module:1.1" name="org.infinispan.server.rest?>
> > ...
> > <resource-root path="infinispan-classes.jar?/>
> > ?
> >
> > Anyone can explain what is going on here? Does it ring a bell to anyone? Is this a known Wildfly issue by any chance?
> >
> > [1] https://gist.github.com/galderz/bd74cebfc840ef3ae284
> >
> > Cheers,
> > --
> > Galder Zamarre?o
> > galder at redhat.com
> > twitter.com/galderz
> >
> > Project Lead, Escalante
> > http://escalante.io
> >
> > Engineer, Infinispan
> > http://infinispan.org
> >
> >
> > _______________________________________________
> > jboss-as7-dev mailing list
> > jboss-as7-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/jboss-as7-dev
> >
> 
> 
> --
> Galder Zamarre?o
> galder at redhat.com
> twitter.com/galderz
> 
> Project Lead, Escalante
> http://escalante.io
> 
> Engineer, Infinispan
> http://infinispan.org
> 
> 
> 


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From dan.berindei at gmail.com  Tue Feb  4 06:04:22 2014
From: dan.berindei at gmail.com (Dan Berindei)
Date: Tue, 4 Feb 2014 13:04:22 +0200
Subject: [infinispan-dev] L1OnRehash Discussion
In-Reply-To: <5853A076-0FEF-4782-A4A3-6C84999DCA2D@redhat.com>
References: <CA+YCuUN-s4LubgNBEyP7ASh6msWT9ngzow_ZZ-121mXuesitWQ@mail.gmail.com>
	<5853A076-0FEF-4782-A4A3-6C84999DCA2D@redhat.com>
Message-ID: <CA+nfvwT+BO+PBfVMByj+46KwALX9Ex78VMrZF2g05z8Bvx3dPQ@mail.gmail.com>

On Tue, Feb 4, 2014 at 10:07 AM, Galder Zamarre?o <galder at redhat.com> wrote:

>
> On 28 Jan 2014, at 15:29, William Burns <mudokonman at gmail.com> wrote:
>
> > Hello everyone,
> >
> > I wanted to discuss what I would say as dubious benefit of L1OnRehash
> > especially compared to the benefits it provide.
> >
> > L1OnRehash is used to retain a value by moving a previously owned
> > value into the L1 when a rehash occurs and this node no longer owns
> > that value  Also any current L1 values are removed when a rehash
> > occurs.  Therefore it can only save a single remote get for only a few
> > keys when a rehash occurs.
> >
> > This by itself is fine however L1OnRehash has many edge cases to
> > guarantee consistency as can be seen from
> > https://issues.jboss.org/browse/ISPN-3838.  This can get quite
> > complicated for a feature that gives marginal performance increases
> > (especially given that this value may never have been read recently -
> > at least normal L1 usage guarantees this).
> >
> > My first suggestion is instead to deprecate the L1OnRehash
> > configuration option and to remove this logic.
>
> +1
>

+1 from me as well


>
> > My second suggestion is a new implementation of L1OnRehash that is
> > always enabled when L1 threshold is configured to 0.  For those not
> > familiar L1 threshold controls whether invalidations are broadcasted
> > instead of individual messages.  A value of 0 means to always
> > broadcast.  This would allow for some benefits that we can't currently
> > do:
> >
> > 1. L1 values would never have to be invalidated on a rehash event
> > (guarantee locality reads under rehash)
> > 2. L1 requestors would not have to be tracked any longer
> >
> > However every write would be required to send an invalidation which
> > could slow write performance in additional cases (since we currently
> > only send invalidations when requestors are found).  The difference
> > would be lessened with udp, which is the transport I would assume
> > someone would use when configuring L1 threshold to 0.
>
> Sounds good to me, but I think you could go even beyond this and maybe get
> rid of threshold configuration option too?
>
> If the transport is UDP and multicast is configured, invalidations are
> broadcasted (and apply the two benefits you mention).
> If UDP w/ unicast or TCP used, track invalidations and send them as
> unicasts.
>
> Do we really need to expose these configuration options to the user?
>

I think the idea was that even with UDP, sending 2 unicasts and waiting for
only 2 responses may be faster than sending a multicast and waiting for 10
responses. However, I'm not sure that's the case if we send 1 unicast
invalidation from each owner instead of a single multicast invalidation
from the primary owner/originator [1]. Maybe if each owner would return a
list of requestors and the originator would do the invalidation at the
end...

One tangible benefit of having the setting is that we can run the test
suite with TCP only, and still cover every path in L1Manager. If removed it
completely, it would still be possible to change the toggle in
L1ManagerImpl via reflection, but it would be a little hacky.


> > What do you guys think?  I am thinking that no one minds the removal
> > of L1OnRehash that we have currently (if so let me know).  I am quite
> > curious what others think about the changes for L1 threshold value of
> > 0, maybe this configuration value is never used?
> >
>

Since we don't give any guidance as to what a good threshold value would
be, I doubt many people use it.

My alternative proposal would be to replace the
invalidationThreshold=-1|0|>0 setting with a traceRequestors=true|false
setting.
1. If traceRequestors == false, don't keep track of requestors, only send
the invalidation from the originator, and enable l1OnRehash.
    This means we can keep the entries that are in L1 after a rehash as
well.
2. If traceRequestors == true, track requestors, send unicast/multicast
invalidations depending on the transport, and disable l1OnRehash.


[1] https://issues.jboss.org/browse/ISPN-186

Cheers
Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140204/cd16e2df/attachment-0001.html 

From galder at redhat.com  Tue Feb  4 07:30:35 2014
From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=)
Date: Tue, 4 Feb 2014 13:30:35 +0100
Subject: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing
	leaving empty classes/ folders and errors
In-Reply-To: <CAAoo=c6TKRnCFsMDHXpZms_BLXy996z6=AShCZYhq-FEuvsf9g@mail.gmail.com>
References: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com>
	<CAAoo=c7FF1Kkqih=Rdt4vfF5f2_xkBMFyh_Vvs+NkHw+Bdz5Xw@mail.gmail.com>
	<D34293D4-A648-4537-B93E-CF87544A8C2C@redhat.com>
	<CAAoo=c69Hk2UmTAQiy4VA4g8iT_qA=hHz4kk-RNZDA+zPA_H3w@mail.gmail.com>
	<CAAoo=c4U6p4QTXj82LdUxiN6izuW05Z+YKy8Q6MjTkUiRFJ7DA@mail.gmail.com>
	<A500D8EC-751F-44A3-BB3E-659D88734B39@redhat.com>
	<CAAoo=c6TKRnCFsMDHXpZms_BLXy996z6=AShCZYhq-FEuvsf9g@mail.gmail.com>
Message-ID: <05187A01-9E89-41D7-9E85-0FE4AC9DACCA@redhat.com>


On 04 Feb 2014, at 10:38, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:

> It is almost certainly something to do with this:
> 
> <module-def name="org.infinispan.server.rest" src="${infinispan.server.modules.dir}">
> 
>          <maven-resource-with-classifier group="org.infinispan" artifact="infinispan-server-rest" classifier="classes" />
> 
> </module-def>
> 
> I guess sometimes the classes artefact is being attached as a reference to the classes directory, rather than a reference to a jar, which causes the issue.

Here?s a gist with a subset of the build log [1]. When it works fine, it?s copying a jar, when it?s not, it?s copying an empty folder.

However, this is not only happening for the org.infinispan.server.rest module, others show the same issue [2]. What seems to be a pattern is that it only happens with modules that are built by us, it?s not happening for modules coming with the base AS/WF instance.

I?ve traced back and this might be due to build failures that are not producing the right jars [3].

@Stuart, this is really our problem. Sorry for the inconvenience!

[1] https://gist.github.com/galderz/b9286f385aad1316df51
[2] https://gist.github.com/galderz/9e6a9bd9b18b805db323
[3] https://gist.github.com/galderz/6ab662a1027cd96cbd8c

> 
> Stuart
> 
> 
> On Tue, Feb 4, 2014 at 11:14 AM, Galder Zamarre?o <galder at redhat.com> wrote:
> 
> On 04 Feb 2014, at 10:01, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
> 
> >
> >
> >
> > On Tue, Feb 4, 2014 at 11:00 AM, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
> > Yes, there is nothing in the server code that modified the modules directory.
> >
> > Well, except for the new patching stuff, but that is not really relevant here.
> 
> The testsuite AS/WF builds are built out of the distribution build, which shows the same problem. The distribution we build uses the scripts we got from AS [1].
> 
> Do you see anything in there that could be causing this? We are using maven-antrun-plugin version 1.3, and take into account the lib.xml in [2].
> 
> Finally, do you have any suggestions on changes we could make to these files to further debug the issue?
> 
> Thanks a lot for your help!
> 
> [1] https://github.com/infinispan/infinispan/blob/master/server/integration/build/build.xml
> [2] https://github.com/infinispan/infinispan/blob/master/server/integration/build/lib.xml
> 
> >
> > Stuart
> >
> >
> > Stuart
> >
> >
> > On Tue, Feb 4, 2014 at 10:56 AM, Galder Zamarre?o <galder at redhat.com> wrote:
> >
> > On 04 Feb 2014, at 09:37, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
> >
> > > This looks like an issue with your environment. The modules directory is static. Wildfly does not contain any code that messes with it. I would say the culprit is probably something in either your build process or your test suite.
> >
> > Correction, this is happening with AS 7.2.0.Final (Wildfly 8 used somewhere else). I guess your answer still applies?
> >
> > Cheers,
> >
> > >
> > > Stuart
> > >
> > >
> > > On Tue, Feb 4, 2014 at 10:21 AM, Galder Zamarre?o <galder at redhat.com> wrote:
> > > Hi all,
> > >
> > > We?re having issues with our Infinispan Server integration tests, which run within Wildfly 8.0.0.Beta1 (as I?m typing I?m wondering if we should just upgrade it to see if this goes away...?).
> > >
> > > Quite often some of the runs fail with error message [1].
> > >
> > > Having looked at the build environment when a run fails, you see this:
> > >
> > > --
> > > $ ls modules/system/layers/base/org/infinispan/server/rest/main
> > > drwxrwxr-x  2 g  staff    68B Feb  3 18:41 classes (<? a directory??)
> > > -rw-r--r--  1 g  staff     1B Feb  3 18:41 classes.index
> > > -rw-r--r--  1 g  staff   2.1K Feb  3 18:41 module.xml
> > >
> > > $ ls modules/system/layers/base/org/infinispan/server/rest/main/classes
> > > drwxrwxr-x  2 g  staff    68B Feb  3 18:41 .
> > > drwxrwxr-x  5 g  staff   170B Feb  3 18:41 ..
> > >
> > > $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml
> > > <module xmlns="urn:jboss:module:1.1" name="org.infinispan.server.rest?>
> > > ...
> > > <resource-root path="classes"/>
> > > ...
> > >
> > > This is completely different to what happens with a successful run:
> > >
> > > --
> > > $ ls modules/system/layers/base/org/infinispan/server/rest/main
> > > -rw-r--r--  1 g  staff   103K Feb  3 19:40 infinispan-classes.jar (<? a jar file!)
> > > -rw-r--r--  1 g  staff   278B Feb  3 19:40 infinispan-classes.jar.index
> > > -rw-r--r--  1 g  staff   2.1K Feb  3 19:40 module.xml
> > >
> > > $ jar tf modules/system/layers/base/org/infinispan/server/rest/main/infinispan-classes.jar | grep ExtendedHeaders
> > > org/infinispan/rest/configuration/ExtendedHeaders.class
> > >
> > > $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml
> > > <module xmlns="urn:jboss:module:1.1" name="org.infinispan.server.rest?>
> > > ...
> > > <resource-root path="infinispan-classes.jar?/>
> > > ?
> > >
> > > Anyone can explain what is going on here? Does it ring a bell to anyone? Is this a known Wildfly issue by any chance?
> > >
> > > [1] https://gist.github.com/galderz/bd74cebfc840ef3ae284
> > >
> > > Cheers,
> > > --
> > > Galder Zamarre?o
> > > galder at redhat.com
> > > twitter.com/galderz
> > >
> > > Project Lead, Escalante
> > > http://escalante.io
> > >
> > > Engineer, Infinispan
> > > http://infinispan.org
> > >
> > >
> > > _______________________________________________
> > > jboss-as7-dev mailing list
> > > jboss-as7-dev at lists.jboss.org
> > > https://lists.jboss.org/mailman/listinfo/jboss-as7-dev
> > >
> >
> >
> > --
> > Galder Zamarre?o
> > galder at redhat.com
> > twitter.com/galderz
> >
> > Project Lead, Escalante
> > http://escalante.io
> >
> > Engineer, Infinispan
> > http://infinispan.org
> >
> >
> >
> 
> 
> --
> Galder Zamarre?o
> galder at redhat.com
> twitter.com/galderz
> 
> Project Lead, Escalante
> http://escalante.io
> 
> Engineer, Infinispan
> http://infinispan.org
> 
> 


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From galder at redhat.com  Tue Feb  4 07:36:54 2014
From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=)
Date: Tue, 4 Feb 2014 13:36:54 +0100
Subject: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing
	leaving empty classes/ folders and errors
In-Reply-To: <05187A01-9E89-41D7-9E85-0FE4AC9DACCA@redhat.com>
References: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com>
	<CAAoo=c7FF1Kkqih=Rdt4vfF5f2_xkBMFyh_Vvs+NkHw+Bdz5Xw@mail.gmail.com>
	<D34293D4-A648-4537-B93E-CF87544A8C2C@redhat.com>
	<CAAoo=c69Hk2UmTAQiy4VA4g8iT_qA=hHz4kk-RNZDA+zPA_H3w@mail.gmail.com>
	<CAAoo=c4U6p4QTXj82LdUxiN6izuW05Z+YKy8Q6MjTkUiRFJ7DA@mail.gmail.com>
	<A500D8EC-751F-44A3-BB3E-659D88734B39@redhat.com>
	<CAAoo=c6TKRnCFsMDHXpZms_BLXy996z6=AShCZYhq-FEuvsf9g@mail.gmail.com>
	<05187A01-9E89-41D7-9E85-0FE4AC9DACCA@redhat.com>
Message-ID: <D8BC1EB0-001F-4DD3-A701-B1A5BC6575F7@redhat.com>

Narrowing down the list now, since this is a problem of how our CI is doing builds.

These logs are retrieved from [1].

Dunno how our CI is configured but this is odd. Seems like the build is halt due to test failures, but it continues somehow? I mean, the jars are not being produced properly, but the build is not halting.

It?s about time we did the following:
1) Any test failures should halt the build there and then. IOW, do not continue the build at all. 
2) Any tests that fail randomly should be disabled.

Cheers,

[1] https://dl.dropboxusercontent.com/u/6148072/does-not-work.log

On 04 Feb 2014, at 13:30, Galder Zamarre?o <galder at redhat.com> wrote:

> 
> On 04 Feb 2014, at 10:38, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
> 
>> It is almost certainly something to do with this:
>> 
>> <module-def name="org.infinispan.server.rest" src="${infinispan.server.modules.dir}">
>> 
>>         <maven-resource-with-classifier group="org.infinispan" artifact="infinispan-server-rest" classifier="classes" />
>> 
>> </module-def>
>> 
>> I guess sometimes the classes artefact is being attached as a reference to the classes directory, rather than a reference to a jar, which causes the issue.
> 
> Here?s a gist with a subset of the build log [1]. When it works fine, it?s copying a jar, when it?s not, it?s copying an empty folder.
> 
> However, this is not only happening for the org.infinispan.server.rest module, others show the same issue [2]. What seems to be a pattern is that it only happens with modules that are built by us, it?s not happening for modules coming with the base AS/WF instance.
> 
> I?ve traced back and this might be due to build failures that are not producing the right jars [3].
> 
> @Stuart, this is really our problem. Sorry for the inconvenience!
> 
> [1] https://gist.github.com/galderz/b9286f385aad1316df51
> [2] https://gist.github.com/galderz/9e6a9bd9b18b805db323
> [3] https://gist.github.com/galderz/6ab662a1027cd96cbd8c
> 
>> 
>> Stuart
>> 
>> 
>> On Tue, Feb 4, 2014 at 11:14 AM, Galder Zamarre?o <galder at redhat.com> wrote:
>> 
>> On 04 Feb 2014, at 10:01, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
>> 
>>> 
>>> 
>>> 
>>> On Tue, Feb 4, 2014 at 11:00 AM, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
>>> Yes, there is nothing in the server code that modified the modules directory.
>>> 
>>> Well, except for the new patching stuff, but that is not really relevant here.
>> 
>> The testsuite AS/WF builds are built out of the distribution build, which shows the same problem. The distribution we build uses the scripts we got from AS [1].
>> 
>> Do you see anything in there that could be causing this? We are using maven-antrun-plugin version 1.3, and take into account the lib.xml in [2].
>> 
>> Finally, do you have any suggestions on changes we could make to these files to further debug the issue?
>> 
>> Thanks a lot for your help!
>> 
>> [1] https://github.com/infinispan/infinispan/blob/master/server/integration/build/build.xml
>> [2] https://github.com/infinispan/infinispan/blob/master/server/integration/build/lib.xml
>> 
>>> 
>>> Stuart
>>> 
>>> 
>>> Stuart
>>> 
>>> 
>>> On Tue, Feb 4, 2014 at 10:56 AM, Galder Zamarre?o <galder at redhat.com> wrote:
>>> 
>>> On 04 Feb 2014, at 09:37, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
>>> 
>>>> This looks like an issue with your environment. The modules directory is static. Wildfly does not contain any code that messes with it. I would say the culprit is probably something in either your build process or your test suite.
>>> 
>>> Correction, this is happening with AS 7.2.0.Final (Wildfly 8 used somewhere else). I guess your answer still applies?
>>> 
>>> Cheers,
>>> 
>>>> 
>>>> Stuart
>>>> 
>>>> 
>>>> On Tue, Feb 4, 2014 at 10:21 AM, Galder Zamarre?o <galder at redhat.com> wrote:
>>>> Hi all,
>>>> 
>>>> We?re having issues with our Infinispan Server integration tests, which run within Wildfly 8.0.0.Beta1 (as I?m typing I?m wondering if we should just upgrade it to see if this goes away...?).
>>>> 
>>>> Quite often some of the runs fail with error message [1].
>>>> 
>>>> Having looked at the build environment when a run fails, you see this:
>>>> 
>>>> --
>>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main
>>>> drwxrwxr-x  2 g  staff    68B Feb  3 18:41 classes (<? a directory??)
>>>> -rw-r--r--  1 g  staff     1B Feb  3 18:41 classes.index
>>>> -rw-r--r--  1 g  staff   2.1K Feb  3 18:41 module.xml
>>>> 
>>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main/classes
>>>> drwxrwxr-x  2 g  staff    68B Feb  3 18:41 .
>>>> drwxrwxr-x  5 g  staff   170B Feb  3 18:41 ..
>>>> 
>>>> $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml
>>>> <module xmlns="urn:jboss:module:1.1" name="org.infinispan.server.rest?>
>>>> ...
>>>> <resource-root path="classes"/>
>>>> ...
>>>> 
>>>> This is completely different to what happens with a successful run:
>>>> 
>>>> --
>>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main
>>>> -rw-r--r--  1 g  staff   103K Feb  3 19:40 infinispan-classes.jar (<? a jar file!)
>>>> -rw-r--r--  1 g  staff   278B Feb  3 19:40 infinispan-classes.jar.index
>>>> -rw-r--r--  1 g  staff   2.1K Feb  3 19:40 module.xml
>>>> 
>>>> $ jar tf modules/system/layers/base/org/infinispan/server/rest/main/infinispan-classes.jar | grep ExtendedHeaders
>>>> org/infinispan/rest/configuration/ExtendedHeaders.class
>>>> 
>>>> $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml
>>>> <module xmlns="urn:jboss:module:1.1" name="org.infinispan.server.rest?>
>>>> ...
>>>> <resource-root path="infinispan-classes.jar?/>
>>>> ?
>>>> 
>>>> Anyone can explain what is going on here? Does it ring a bell to anyone? Is this a known Wildfly issue by any chance?
>>>> 
>>>> [1] https://gist.github.com/galderz/bd74cebfc840ef3ae284
>>>> 
>>>> Cheers,
>>>> --
>>>> Galder Zamarre?o
>>>> galder at redhat.com
>>>> twitter.com/galderz
>>>> 
>>>> Project Lead, Escalante
>>>> http://escalante.io
>>>> 
>>>> Engineer, Infinispan
>>>> http://infinispan.org
>>>> 
>>>> 
>>>> _______________________________________________
>>>> jboss-as7-dev mailing list
>>>> jboss-as7-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev
>>>> 
>>> 
>>> 
>>> --
>>> Galder Zamarre?o
>>> galder at redhat.com
>>> twitter.com/galderz
>>> 
>>> Project Lead, Escalante
>>> http://escalante.io
>>> 
>>> Engineer, Infinispan
>>> http://infinispan.org
>>> 
>>> 
>>> 
>> 
>> 
>> --
>> Galder Zamarre?o
>> galder at redhat.com
>> twitter.com/galderz
>> 
>> Project Lead, Escalante
>> http://escalante.io
>> 
>> Engineer, Infinispan
>> http://infinispan.org
>> 
>> 
> 
> 
> --
> Galder Zamarre?o
> galder at redhat.com
> twitter.com/galderz
> 
> Project Lead, Escalante
> http://escalante.io
> 
> Engineer, Infinispan
> http://infinispan.org
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From dan.berindei at gmail.com  Tue Feb  4 07:50:55 2014
From: dan.berindei at gmail.com (Dan Berindei)
Date: Tue, 4 Feb 2014 14:50:55 +0200
Subject: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing
 leaving empty classes/ folders and errors
In-Reply-To: <D8BC1EB0-001F-4DD3-A701-B1A5BC6575F7@redhat.com>
References: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com>
	<CAAoo=c7FF1Kkqih=Rdt4vfF5f2_xkBMFyh_Vvs+NkHw+Bdz5Xw@mail.gmail.com>
	<D34293D4-A648-4537-B93E-CF87544A8C2C@redhat.com>
	<CAAoo=c69Hk2UmTAQiy4VA4g8iT_qA=hHz4kk-RNZDA+zPA_H3w@mail.gmail.com>
	<CAAoo=c4U6p4QTXj82LdUxiN6izuW05Z+YKy8Q6MjTkUiRFJ7DA@mail.gmail.com>
	<A500D8EC-751F-44A3-BB3E-659D88734B39@redhat.com>
	<CAAoo=c6TKRnCFsMDHXpZms_BLXy996z6=AShCZYhq-FEuvsf9g@mail.gmail.com>
	<05187A01-9E89-41D7-9E85-0FE4AC9DACCA@redhat.com>
	<D8BC1EB0-001F-4DD3-A701-B1A5BC6575F7@redhat.com>
Message-ID: <CA+nfvwRWQEK1cNw3bdEGpwWHZEjW2vtBe=mzEeTg9-w61iKPWA@mail.gmail.com>

On Tue, Feb 4, 2014 at 2:36 PM, Galder Zamarre?o <galder at redhat.com> wrote:

> Narrowing down the list now, since this is a problem of how our CI is
> doing builds.
>
> These logs are retrieved from [1].
>
> Dunno how our CI is configured but this is odd. Seems like the build is
> halt due to test failures, but it continues somehow? I mean, the jars are
> not being produced properly, but the build is not halting.
>

We run the build with -fn (fail-never), so the build should never be halted
because of a test failure. The configuration is here:
http://ci.infinispan.org/admin/editRunType.html?id=buildType:bt8&runnerId=RUNNER_1


>
> It's about time we did the following:
> 1) Any test failures should halt the build there and then. IOW, do not
> continue the build at all.
>

Will having 100 tests in one run and 2000 tests in another really help?


> 2) Any tests that fail randomly should be disabled.
>

Let's go ahead and disable all the server tests then? ;)


>
> Cheers,
>
> [1] https://dl.dropboxusercontent.com/u/6148072/does-not-work.log
>
> On 04 Feb 2014, at 13:30, Galder Zamarre?o <galder at redhat.com> wrote:
>
> >
> > On 04 Feb 2014, at 10:38, Stuart Douglas <stuart.w.douglas at gmail.com>
> wrote:
> >
> >> It is almost certainly something to do with this:
> >>
> >> <module-def name="org.infinispan.server.rest"
> src="${infinispan.server.modules.dir}">
> >>
> >>         <maven-resource-with-classifier group="org.infinispan"
> artifact="infinispan-server-rest" classifier="classes" />
> >>
> >> </module-def>
> >>
> >> I guess sometimes the classes artefact is being attached as a reference
> to the classes directory, rather than a reference to a jar, which causes
> the issue.
> >
> > Here's a gist with a subset of the build log [1]. When it works fine,
> it's copying a jar, when it's not, it's copying an empty folder.
> >
> > However, this is not only happening for the org.infinispan.server.rest
> module, others show the same issue [2]. What seems to be a pattern is that
> it only happens with modules that are built by us, it's not happening for
> modules coming with the base AS/WF instance.
> >
> > I've traced back and this might be due to build failures that are not
> producing the right jars [3].
> >
> > @Stuart, this is really our problem. Sorry for the inconvenience!
> >
> > [1] https://gist.github.com/galderz/b9286f385aad1316df51
> > [2] https://gist.github.com/galderz/9e6a9bd9b18b805db323
> > [3] https://gist.github.com/galderz/6ab662a1027cd96cbd8c
> >
> >>
> >> Stuart
> >>
> >>
> >> On Tue, Feb 4, 2014 at 11:14 AM, Galder Zamarre?o <galder at redhat.com>
> wrote:
> >>
> >> On 04 Feb 2014, at 10:01, Stuart Douglas <stuart.w.douglas at gmail.com>
> wrote:
> >>
> >>>
> >>>
> >>>
> >>> On Tue, Feb 4, 2014 at 11:00 AM, Stuart Douglas <
> stuart.w.douglas at gmail.com> wrote:
> >>> Yes, there is nothing in the server code that modified the modules
> directory.
> >>>
> >>> Well, except for the new patching stuff, but that is not really
> relevant here.
> >>
> >> The testsuite AS/WF builds are built out of the distribution build,
> which shows the same problem. The distribution we build uses the scripts we
> got from AS [1].
> >>
> >> Do you see anything in there that could be causing this? We are using
> maven-antrun-plugin version 1.3, and take into account the lib.xml in [2].
> >>
> >> Finally, do you have any suggestions on changes we could make to these
> files to further debug the issue?
> >>
> >> Thanks a lot for your help!
> >>
> >> [1]
> https://github.com/infinispan/infinispan/blob/master/server/integration/build/build.xml
> >> [2]
> https://github.com/infinispan/infinispan/blob/master/server/integration/build/lib.xml
> >>
> >>>
> >>> Stuart
> >>>
> >>>
> >>> Stuart
> >>>
> >>>
> >>> On Tue, Feb 4, 2014 at 10:56 AM, Galder Zamarre?o <galder at redhat.com>
> wrote:
> >>>
> >>> On 04 Feb 2014, at 09:37, Stuart Douglas <stuart.w.douglas at gmail.com>
> wrote:
> >>>
> >>>> This looks like an issue with your environment. The modules directory
> is static. Wildfly does not contain any code that messes with it. I would
> say the culprit is probably something in either your build process or your
> test suite.
> >>>
> >>> Correction, this is happening with AS 7.2.0.Final (Wildfly 8 used
> somewhere else). I guess your answer still applies?
> >>>
> >>> Cheers,
> >>>
> >>>>
> >>>> Stuart
> >>>>
> >>>>
> >>>> On Tue, Feb 4, 2014 at 10:21 AM, Galder Zamarre?o <galder at redhat.com>
> wrote:
> >>>> Hi all,
> >>>>
> >>>> We're having issues with our Infinispan Server integration tests,
> which run within Wildfly 8.0.0.Beta1 (as I'm typing I'm wondering if we
> should just upgrade it to see if this goes away...?).
> >>>>
> >>>> Quite often some of the runs fail with error message [1].
> >>>>
> >>>> Having looked at the build environment when a run fails, you see this:
> >>>>
> >>>> --
> >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main
> >>>> drwxrwxr-x  2 g  staff    68B Feb  3 18:41 classes (<-- a directory??)
> >>>> -rw-r--r--  1 g  staff     1B Feb  3 18:41 classes.index
> >>>> -rw-r--r--  1 g  staff   2.1K Feb  3 18:41 module.xml
> >>>>
> >>>> $ ls
> modules/system/layers/base/org/infinispan/server/rest/main/classes
> >>>> drwxrwxr-x  2 g  staff    68B Feb  3 18:41 .
> >>>> drwxrwxr-x  5 g  staff   170B Feb  3 18:41 ..
> >>>>
> >>>> $ more
> modules/system/layers/base/org/infinispan/server/rest/main/module.xml
> >>>> <module xmlns="urn:jboss:module:1.1"
> name="org.infinispan.server.rest">
> >>>> ...
> >>>> <resource-root path="classes"/>
> >>>> ...
> >>>>
> >>>> This is completely different to what happens with a successful run:
> >>>>
> >>>> --
> >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main
> >>>> -rw-r--r--  1 g  staff   103K Feb  3 19:40 infinispan-classes.jar (<--
> a jar file!)
> >>>> -rw-r--r--  1 g  staff   278B Feb  3 19:40
> infinispan-classes.jar.index
> >>>> -rw-r--r--  1 g  staff   2.1K Feb  3 19:40 module.xml
> >>>>
> >>>> $ jar tf
> modules/system/layers/base/org/infinispan/server/rest/main/infinispan-classes.jar
> | grep ExtendedHeaders
> >>>> org/infinispan/rest/configuration/ExtendedHeaders.class
> >>>>
> >>>> $ more
> modules/system/layers/base/org/infinispan/server/rest/main/module.xml
> >>>> <module xmlns="urn:jboss:module:1.1"
> name="org.infinispan.server.rest">
> >>>> ...
> >>>> <resource-root path="infinispan-classes.jar"/>
> >>>> --
> >>>>
> >>>> Anyone can explain what is going on here? Does it ring a bell to
> anyone? Is this a known Wildfly issue by any chance?
> >>>>
> >>>> [1] https://gist.github.com/galderz/bd74cebfc840ef3ae284
> >>>>
> >>>> Cheers,
> >>>> --
> >>>> Galder Zamarre?o
> >>>> galder at redhat.com
> >>>> twitter.com/galderz
> >>>>
> >>>> Project Lead, Escalante
> >>>> http://escalante.io
> >>>>
> >>>> Engineer, Infinispan
> >>>> http://infinispan.org
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> jboss-as7-dev mailing list
> >>>> jboss-as7-dev at lists.jboss.org
> >>>> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev
> >>>>
> >>>
> >>>
> >>> --
> >>> Galder Zamarre?o
> >>> galder at redhat.com
> >>> twitter.com/galderz
> >>>
> >>> Project Lead, Escalante
> >>> http://escalante.io
> >>>
> >>> Engineer, Infinispan
> >>> http://infinispan.org
> >>>
> >>>
> >>>
> >>
> >>
> >> --
> >> Galder Zamarre?o
> >> galder at redhat.com
> >> twitter.com/galderz
> >>
> >> Project Lead, Escalante
> >> http://escalante.io
> >>
> >> Engineer, Infinispan
> >> http://infinispan.org
> >>
> >>
> >
> >
> > --
> > Galder Zamarre?o
> > galder at redhat.com
> > twitter.com/galderz
> >
> > Project Lead, Escalante
> > http://escalante.io
> >
> > Engineer, Infinispan
> > http://infinispan.org
> >
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> --
> Galder Zamarre?o
> galder at redhat.com
> twitter.com/galderz
>
> Project Lead, Escalante
> http://escalante.io
>
> Engineer, Infinispan
> http://infinispan.org
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140204/a15edc6a/attachment-0001.html 

From galder at redhat.com  Tue Feb  4 07:52:09 2014
From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=)
Date: Tue, 4 Feb 2014 13:52:09 +0100
Subject: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing
	leaving empty classes/ folders and errors
In-Reply-To: <D8BC1EB0-001F-4DD3-A701-B1A5BC6575F7@redhat.com>
References: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com>
	<CAAoo=c7FF1Kkqih=Rdt4vfF5f2_xkBMFyh_Vvs+NkHw+Bdz5Xw@mail.gmail.com>
	<D34293D4-A648-4537-B93E-CF87544A8C2C@redhat.com>
	<CAAoo=c69Hk2UmTAQiy4VA4g8iT_qA=hHz4kk-RNZDA+zPA_H3w@mail.gmail.com>
	<CAAoo=c4U6p4QTXj82LdUxiN6izuW05Z+YKy8Q6MjTkUiRFJ7DA@mail.gmail.com>
	<A500D8EC-751F-44A3-BB3E-659D88734B39@redhat.com>
	<CAAoo=c6TKRnCFsMDHXpZms_BLXy996z6=AShCZYhq-FEuvsf9g@mail.gmail.com>
	<05187A01-9E89-41D7-9E85-0FE4AC9DACCA@redhat.com>
	<D8BC1EB0-001F-4DD3-A701-B1A5BC6575F7@redhat.com>
Message-ID: <7D6D9D4B-8023-4A04-B946-8CB90640319F@redhat.com>


On 04 Feb 2014, at 13:36, Galder Zamarre?o <galder at redhat.com> wrote:

> Narrowing down the list now, since this is a problem of how our CI is doing builds.
> 
> These logs are retrieved from [1].
> 
> Dunno how our CI is configured but this is odd. Seems like the build is halt due to test failures, but it continues somehow? I mean, the jars are not being produced properly, but the build is not halting.
> 
> It?s about time we did the following:
> 1) Any test failures should halt the build there and then. IOW, do not continue the build at all. 
> 2) Any tests that fail randomly should be disabled.

Having had to debug throw this, I can certainly understand Sanne?s frustration, and as server component lead, I?m not going to bother looking at any CI builds until all modules that server modules depend on are green and their testsuites are passing.

And I?m gonna do the same. I?m going to disable all tests that are failing for which I?m lead, and try to solve them in the next few days. I won?t do any further Infinispan development until then.

Cheers,

> 
> Cheers,
> 
> [1] https://dl.dropboxusercontent.com/u/6148072/does-not-work.log
> 
> On 04 Feb 2014, at 13:30, Galder Zamarre?o <galder at redhat.com> wrote:
> 
>> 
>> On 04 Feb 2014, at 10:38, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
>> 
>>> It is almost certainly something to do with this:
>>> 
>>> <module-def name="org.infinispan.server.rest" src="${infinispan.server.modules.dir}">
>>> 
>>>        <maven-resource-with-classifier group="org.infinispan" artifact="infinispan-server-rest" classifier="classes" />
>>> 
>>> </module-def>
>>> 
>>> I guess sometimes the classes artefact is being attached as a reference to the classes directory, rather than a reference to a jar, which causes the issue.
>> 
>> Here?s a gist with a subset of the build log [1]. When it works fine, it?s copying a jar, when it?s not, it?s copying an empty folder.
>> 
>> However, this is not only happening for the org.infinispan.server.rest module, others show the same issue [2]. What seems to be a pattern is that it only happens with modules that are built by us, it?s not happening for modules coming with the base AS/WF instance.
>> 
>> I?ve traced back and this might be due to build failures that are not producing the right jars [3].
>> 
>> @Stuart, this is really our problem. Sorry for the inconvenience!
>> 
>> [1] https://gist.github.com/galderz/b9286f385aad1316df51
>> [2] https://gist.github.com/galderz/9e6a9bd9b18b805db323
>> [3] https://gist.github.com/galderz/6ab662a1027cd96cbd8c
>> 
>>> 
>>> Stuart
>>> 
>>> 
>>> On Tue, Feb 4, 2014 at 11:14 AM, Galder Zamarre?o <galder at redhat.com> wrote:
>>> 
>>> On 04 Feb 2014, at 10:01, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
>>> 
>>>> 
>>>> 
>>>> 
>>>> On Tue, Feb 4, 2014 at 11:00 AM, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
>>>> Yes, there is nothing in the server code that modified the modules directory.
>>>> 
>>>> Well, except for the new patching stuff, but that is not really relevant here.
>>> 
>>> The testsuite AS/WF builds are built out of the distribution build, which shows the same problem. The distribution we build uses the scripts we got from AS [1].
>>> 
>>> Do you see anything in there that could be causing this? We are using maven-antrun-plugin version 1.3, and take into account the lib.xml in [2].
>>> 
>>> Finally, do you have any suggestions on changes we could make to these files to further debug the issue?
>>> 
>>> Thanks a lot for your help!
>>> 
>>> [1] https://github.com/infinispan/infinispan/blob/master/server/integration/build/build.xml
>>> [2] https://github.com/infinispan/infinispan/blob/master/server/integration/build/lib.xml
>>> 
>>>> 
>>>> Stuart
>>>> 
>>>> 
>>>> Stuart
>>>> 
>>>> 
>>>> On Tue, Feb 4, 2014 at 10:56 AM, Galder Zamarre?o <galder at redhat.com> wrote:
>>>> 
>>>> On 04 Feb 2014, at 09:37, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
>>>> 
>>>>> This looks like an issue with your environment. The modules directory is static. Wildfly does not contain any code that messes with it. I would say the culprit is probably something in either your build process or your test suite.
>>>> 
>>>> Correction, this is happening with AS 7.2.0.Final (Wildfly 8 used somewhere else). I guess your answer still applies?
>>>> 
>>>> Cheers,
>>>> 
>>>>> 
>>>>> Stuart
>>>>> 
>>>>> 
>>>>> On Tue, Feb 4, 2014 at 10:21 AM, Galder Zamarre?o <galder at redhat.com> wrote:
>>>>> Hi all,
>>>>> 
>>>>> We?re having issues with our Infinispan Server integration tests, which run within Wildfly 8.0.0.Beta1 (as I?m typing I?m wondering if we should just upgrade it to see if this goes away...?).
>>>>> 
>>>>> Quite often some of the runs fail with error message [1].
>>>>> 
>>>>> Having looked at the build environment when a run fails, you see this:
>>>>> 
>>>>> --
>>>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main
>>>>> drwxrwxr-x  2 g  staff    68B Feb  3 18:41 classes (<? a directory??)
>>>>> -rw-r--r--  1 g  staff     1B Feb  3 18:41 classes.index
>>>>> -rw-r--r--  1 g  staff   2.1K Feb  3 18:41 module.xml
>>>>> 
>>>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main/classes
>>>>> drwxrwxr-x  2 g  staff    68B Feb  3 18:41 .
>>>>> drwxrwxr-x  5 g  staff   170B Feb  3 18:41 ..
>>>>> 
>>>>> $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml
>>>>> <module xmlns="urn:jboss:module:1.1" name="org.infinispan.server.rest?>
>>>>> ...
>>>>> <resource-root path="classes"/>
>>>>> ...
>>>>> 
>>>>> This is completely different to what happens with a successful run:
>>>>> 
>>>>> --
>>>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main
>>>>> -rw-r--r--  1 g  staff   103K Feb  3 19:40 infinispan-classes.jar (<? a jar file!)
>>>>> -rw-r--r--  1 g  staff   278B Feb  3 19:40 infinispan-classes.jar.index
>>>>> -rw-r--r--  1 g  staff   2.1K Feb  3 19:40 module.xml
>>>>> 
>>>>> $ jar tf modules/system/layers/base/org/infinispan/server/rest/main/infinispan-classes.jar | grep ExtendedHeaders
>>>>> org/infinispan/rest/configuration/ExtendedHeaders.class
>>>>> 
>>>>> $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml
>>>>> <module xmlns="urn:jboss:module:1.1" name="org.infinispan.server.rest?>
>>>>> ...
>>>>> <resource-root path="infinispan-classes.jar?/>
>>>>> ?
>>>>> 
>>>>> Anyone can explain what is going on here? Does it ring a bell to anyone? Is this a known Wildfly issue by any chance?
>>>>> 
>>>>> [1] https://gist.github.com/galderz/bd74cebfc840ef3ae284
>>>>> 
>>>>> Cheers,
>>>>> --
>>>>> Galder Zamarre?o
>>>>> galder at redhat.com
>>>>> twitter.com/galderz
>>>>> 
>>>>> Project Lead, Escalante
>>>>> http://escalante.io
>>>>> 
>>>>> Engineer, Infinispan
>>>>> http://infinispan.org
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> jboss-as7-dev mailing list
>>>>> jboss-as7-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Galder Zamarre?o
>>>> galder at redhat.com
>>>> twitter.com/galderz
>>>> 
>>>> Project Lead, Escalante
>>>> http://escalante.io
>>>> 
>>>> Engineer, Infinispan
>>>> http://infinispan.org
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Galder Zamarre?o
>>> galder at redhat.com
>>> twitter.com/galderz
>>> 
>>> Project Lead, Escalante
>>> http://escalante.io
>>> 
>>> Engineer, Infinispan
>>> http://infinispan.org
>>> 
>>> 
>> 
>> 
>> --
>> Galder Zamarre?o
>> galder at redhat.com
>> twitter.com/galderz
>> 
>> Project Lead, Escalante
>> http://escalante.io
>> 
>> Engineer, Infinispan
>> http://infinispan.org
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> 
> --
> Galder Zamarre?o
> galder at redhat.com
> twitter.com/galderz
> 
> Project Lead, Escalante
> http://escalante.io
> 
> Engineer, Infinispan
> http://infinispan.org
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From galder at redhat.com  Tue Feb  4 08:03:16 2014
From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=)
Date: Tue, 4 Feb 2014 14:03:16 +0100
Subject: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing
	leaving empty classes/ folders and errors
In-Reply-To: <CA+nfvwRWQEK1cNw3bdEGpwWHZEjW2vtBe=mzEeTg9-w61iKPWA@mail.gmail.com>
References: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com>
	<CAAoo=c7FF1Kkqih=Rdt4vfF5f2_xkBMFyh_Vvs+NkHw+Bdz5Xw@mail.gmail.com>
	<D34293D4-A648-4537-B93E-CF87544A8C2C@redhat.com>
	<CAAoo=c69Hk2UmTAQiy4VA4g8iT_qA=hHz4kk-RNZDA+zPA_H3w@mail.gmail.com>
	<CAAoo=c4U6p4QTXj82LdUxiN6izuW05Z+YKy8Q6MjTkUiRFJ7DA@mail.gmail.com>
	<A500D8EC-751F-44A3-BB3E-659D88734B39@redhat.com>
	<CAAoo=c6TKRnCFsMDHXpZms_BLXy996z6=AShCZYhq-FEuvsf9g@mail.gmail.com>
	<05187A01-9E89-41D7-9E85-0FE4AC9DACCA@redhat.com>
	<D8BC1EB0-001F-4DD3-A701-B1A5BC6575F7@redhat.com>
	<CA+nfvwRWQEK1cNw3bdEGpwWHZEjW2vtBe=mzEeTg9-w61iKPWA@mail.gmail.com>
Message-ID: <7B023B75-68A3-46D1-A77F-F355F84CABCF@redhat.com>


On 04 Feb 2014, at 13:50, Dan Berindei <dan.berindei at gmail.com> wrote:

> 
> On Tue, Feb 4, 2014 at 2:36 PM, Galder Zamarre?o <galder at redhat.com> wrote:
> Narrowing down the list now, since this is a problem of how our CI is doing builds.
> 
> These logs are retrieved from [1].
> 
> Dunno how our CI is configured but this is odd. Seems like the build is halt due to test failures, but it continues somehow? I mean, the jars are not being produced properly, but the build is not halting.
> 
> We run the build with -fn (fail-never), so the build should never be halted because of a test failure. The configuration is here: http://ci.infinispan.org/admin/editRunType.html?id=buildType:bt8&runnerId=RUNNER_1

^ That?s not working as expected, see the build log, my snippets?etc.

> 
>  
> 
> It?s about time we did the following:
> 1) Any test failures should halt the build there and then. IOW, do not continue the build at all.
> 
> Will having 100 tests in one run and 2000 tests in another really help?

As you disable randomly failing tests, and do not integrate commits making the testsuite fail, these number should even out.

> 
>  
> 2) Any tests that fail randomly should be disabled.
> 
> Let's go ahead and disable all the server tests then? ;)

Those server tests that are randomly failing should be disabled and looked at. Those tests that are failing as a result of container not starting are side effects of things not working properly, and these should not be disabled.

> 
>  
> 
> Cheers,
> 
> [1] https://dl.dropboxusercontent.com/u/6148072/does-not-work.log
> 
> On 04 Feb 2014, at 13:30, Galder Zamarre?o <galder at redhat.com> wrote:
> 
> >
> > On 04 Feb 2014, at 10:38, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
> >
> >> It is almost certainly something to do with this:
> >>
> >> <module-def name="org.infinispan.server.rest" src="${infinispan.server.modules.dir}">
> >>
> >>         <maven-resource-with-classifier group="org.infinispan" artifact="infinispan-server-rest" classifier="classes" />
> >>
> >> </module-def>
> >>
> >> I guess sometimes the classes artefact is being attached as a reference to the classes directory, rather than a reference to a jar, which causes the issue.
> >
> > Here?s a gist with a subset of the build log [1]. When it works fine, it?s copying a jar, when it?s not, it?s copying an empty folder.
> >
> > However, this is not only happening for the org.infinispan.server.rest module, others show the same issue [2]. What seems to be a pattern is that it only happens with modules that are built by us, it?s not happening for modules coming with the base AS/WF instance.
> >
> > I?ve traced back and this might be due to build failures that are not producing the right jars [3].
> >
> > @Stuart, this is really our problem. Sorry for the inconvenience!
> >
> > [1] https://gist.github.com/galderz/b9286f385aad1316df51
> > [2] https://gist.github.com/galderz/9e6a9bd9b18b805db323
> > [3] https://gist.github.com/galderz/6ab662a1027cd96cbd8c
> >
> >>
> >> Stuart
> >>
> >>
> >> On Tue, Feb 4, 2014 at 11:14 AM, Galder Zamarre?o <galder at redhat.com> wrote:
> >>
> >> On 04 Feb 2014, at 10:01, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
> >>
> >>>
> >>>
> >>>
> >>> On Tue, Feb 4, 2014 at 11:00 AM, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
> >>> Yes, there is nothing in the server code that modified the modules directory.
> >>>
> >>> Well, except for the new patching stuff, but that is not really relevant here.
> >>
> >> The testsuite AS/WF builds are built out of the distribution build, which shows the same problem. The distribution we build uses the scripts we got from AS [1].
> >>
> >> Do you see anything in there that could be causing this? We are using maven-antrun-plugin version 1.3, and take into account the lib.xml in [2].
> >>
> >> Finally, do you have any suggestions on changes we could make to these files to further debug the issue?
> >>
> >> Thanks a lot for your help!
> >>
> >> [1] https://github.com/infinispan/infinispan/blob/master/server/integration/build/build.xml
> >> [2] https://github.com/infinispan/infinispan/blob/master/server/integration/build/lib.xml
> >>
> >>>
> >>> Stuart
> >>>
> >>>
> >>> Stuart
> >>>
> >>>
> >>> On Tue, Feb 4, 2014 at 10:56 AM, Galder Zamarre?o <galder at redhat.com> wrote:
> >>>
> >>> On 04 Feb 2014, at 09:37, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
> >>>
> >>>> This looks like an issue with your environment. The modules directory is static. Wildfly does not contain any code that messes with it. I would say the culprit is probably something in either your build process or your test suite.
> >>>
> >>> Correction, this is happening with AS 7.2.0.Final (Wildfly 8 used somewhere else). I guess your answer still applies?
> >>>
> >>> Cheers,
> >>>
> >>>>
> >>>> Stuart
> >>>>
> >>>>
> >>>> On Tue, Feb 4, 2014 at 10:21 AM, Galder Zamarre?o <galder at redhat.com> wrote:
> >>>> Hi all,
> >>>>
> >>>> We?re having issues with our Infinispan Server integration tests, which run within Wildfly 8.0.0.Beta1 (as I?m typing I?m wondering if we should just upgrade it to see if this goes away...?).
> >>>>
> >>>> Quite often some of the runs fail with error message [1].
> >>>>
> >>>> Having looked at the build environment when a run fails, you see this:
> >>>>
> >>>> --
> >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main
> >>>> drwxrwxr-x  2 g  staff    68B Feb  3 18:41 classes (<? a directory??)
> >>>> -rw-r--r--  1 g  staff     1B Feb  3 18:41 classes.index
> >>>> -rw-r--r--  1 g  staff   2.1K Feb  3 18:41 module.xml
> >>>>
> >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main/classes
> >>>> drwxrwxr-x  2 g  staff    68B Feb  3 18:41 .
> >>>> drwxrwxr-x  5 g  staff   170B Feb  3 18:41 ..
> >>>>
> >>>> $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml
> >>>> <module xmlns="urn:jboss:module:1.1" name="org.infinispan.server.rest?>
> >>>> ...
> >>>> <resource-root path="classes"/>
> >>>> ...
> >>>>
> >>>> This is completely different to what happens with a successful run:
> >>>>
> >>>> --
> >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main
> >>>> -rw-r--r--  1 g  staff   103K Feb  3 19:40 infinispan-classes.jar (<? a jar file!)
> >>>> -rw-r--r--  1 g  staff   278B Feb  3 19:40 infinispan-classes.jar.index
> >>>> -rw-r--r--  1 g  staff   2.1K Feb  3 19:40 module.xml
> >>>>
> >>>> $ jar tf modules/system/layers/base/org/infinispan/server/rest/main/infinispan-classes.jar | grep ExtendedHeaders
> >>>> org/infinispan/rest/configuration/ExtendedHeaders.class
> >>>>
> >>>> $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml
> >>>> <module xmlns="urn:jboss:module:1.1" name="org.infinispan.server.rest?>
> >>>> ...
> >>>> <resource-root path="infinispan-classes.jar?/>
> >>>> ?
> >>>>
> >>>> Anyone can explain what is going on here? Does it ring a bell to anyone? Is this a known Wildfly issue by any chance?
> >>>>
> >>>> [1] https://gist.github.com/galderz/bd74cebfc840ef3ae284
> >>>>
> >>>> Cheers,
> >>>> --
> >>>> Galder Zamarre?o
> >>>> galder at redhat.com
> >>>> twitter.com/galderz
> >>>>
> >>>> Project Lead, Escalante
> >>>> http://escalante.io
> >>>>
> >>>> Engineer, Infinispan
> >>>> http://infinispan.org
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> jboss-as7-dev mailing list
> >>>> jboss-as7-dev at lists.jboss.org
> >>>> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev
> >>>>
> >>>
> >>>
> >>> --
> >>> Galder Zamarre?o
> >>> galder at redhat.com
> >>> twitter.com/galderz
> >>>
> >>> Project Lead, Escalante
> >>> http://escalante.io
> >>>
> >>> Engineer, Infinispan
> >>> http://infinispan.org
> >>>
> >>>
> >>>
> >>
> >>
> >> --
> >> Galder Zamarre?o
> >> galder at redhat.com
> >> twitter.com/galderz
> >>
> >> Project Lead, Escalante
> >> http://escalante.io
> >>
> >> Engineer, Infinispan
> >> http://infinispan.org
> >>
> >>
> >
> >
> > --
> > Galder Zamarre?o
> > galder at redhat.com
> > twitter.com/galderz
> >
> > Project Lead, Escalante
> > http://escalante.io
> >
> > Engineer, Infinispan
> > http://infinispan.org
> >
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> 
> --
> Galder Zamarre?o
> galder at redhat.com
> twitter.com/galderz
> 
> Project Lead, Escalante
> http://escalante.io
> 
> Engineer, Infinispan
> http://infinispan.org
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From dan.berindei at gmail.com  Tue Feb  4 08:47:54 2014
From: dan.berindei at gmail.com (Dan Berindei)
Date: Tue, 4 Feb 2014 15:47:54 +0200
Subject: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing
 leaving empty classes/ folders and errors
In-Reply-To: <7B023B75-68A3-46D1-A77F-F355F84CABCF@redhat.com>
References: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com>
	<CAAoo=c7FF1Kkqih=Rdt4vfF5f2_xkBMFyh_Vvs+NkHw+Bdz5Xw@mail.gmail.com>
	<D34293D4-A648-4537-B93E-CF87544A8C2C@redhat.com>
	<CAAoo=c69Hk2UmTAQiy4VA4g8iT_qA=hHz4kk-RNZDA+zPA_H3w@mail.gmail.com>
	<CAAoo=c4U6p4QTXj82LdUxiN6izuW05Z+YKy8Q6MjTkUiRFJ7DA@mail.gmail.com>
	<A500D8EC-751F-44A3-BB3E-659D88734B39@redhat.com>
	<CAAoo=c6TKRnCFsMDHXpZms_BLXy996z6=AShCZYhq-FEuvsf9g@mail.gmail.com>
	<05187A01-9E89-41D7-9E85-0FE4AC9DACCA@redhat.com>
	<D8BC1EB0-001F-4DD3-A701-B1A5BC6575F7@redhat.com>
	<CA+nfvwRWQEK1cNw3bdEGpwWHZEjW2vtBe=mzEeTg9-w61iKPWA@mail.gmail.com>
	<7B023B75-68A3-46D1-A77F-F355F84CABCF@redhat.com>
Message-ID: <CA+nfvwRBtMxRxnN5uow_bHaq518dS7SpjO_aavAavxY2PbE-rA@mail.gmail.com>

On Tue, Feb 4, 2014 at 3:03 PM, Galder Zamarre?o <galder at redhat.com> wrote:

>
> On 04 Feb 2014, at 13:50, Dan Berindei <dan.berindei at gmail.com> wrote:
>
> >
> > On Tue, Feb 4, 2014 at 2:36 PM, Galder Zamarre?o <galder at redhat.com>
> wrote:
> > Narrowing down the list now, since this is a problem of how our CI is
> doing builds.
> >
> > These logs are retrieved from [1].
> >
> > Dunno how our CI is configured but this is odd. Seems like the build is
> halt due to test failures, but it continues somehow? I mean, the jars are
> not being produced properly, but the build is not halting.
> >
> > We run the build with -fn (fail-never), so the build should never be
> halted because of a test failure. The configuration is here:
> http://ci.infinispan.org/admin/editRunType.html?id=buildType:bt8&runnerId=RUNNER_1
>
> ^ That's not working as expected, see the build log, my snippets...etc.
>

Sorry, I didn't understand what's happening in those snippets. All I saw
was an Ant script that doesn't do what it's supposed to do :)

I did see some differences in the configuration between the JDK6 and the
JDK7 builds:
* JDK7 uses -fn and JDK6 uses -Dmaven.test.failure.ignore
* JDK7 uses -nsu (no snapshot updates), JDK6 doesn't

I've changed both builds to use -Dmaven.test.failure.ignore and -nsu, let's
see how it goes.


>
> >
> >
> >
> > It's about time we did the following:
> > 1) Any test failures should halt the build there and then. IOW, do not
> continue the build at all.
> >
> > Will having 100 tests in one run and 2000 tests in another really help?
>
> As you disable randomly failing tests, and do not integrate commits making
> the testsuite fail, these number should even out.
>

Not integrating commits that fail every time is easy, not integrating
commits that fail randomly (maybe only in some environments) is trickier.


>
> >
> >
> > 2) Any tests that fail randomly should be disabled.
> >
> > Let's go ahead and disable all the server tests then? ;)
>
> Those server tests that are randomly failing should be disabled and looked
> at. Those tests that are failing as a result of container not starting are
> side effects of things not working properly, and these should not be
> disabled.
>

Why treat the tests that are failing because of a build problem
differently? What about the tests that fail only on IBM JDK6?


> >
> >
> >
> > Cheers,
> >
> > [1] https://dl.dropboxusercontent.com/u/6148072/does-not-work.log
> >
> > On 04 Feb 2014, at 13:30, Galder Zamarre?o <galder at redhat.com> wrote:
> >
> > >
> > > On 04 Feb 2014, at 10:38, Stuart Douglas <stuart.w.douglas at gmail.com>
> wrote:
> > >
> > >> It is almost certainly something to do with this:
> > >>
> > >> <module-def name="org.infinispan.server.rest"
> src="${infinispan.server.modules.dir}">
> > >>
> > >>         <maven-resource-with-classifier group="org.infinispan"
> artifact="infinispan-server-rest" classifier="classes" />
> > >>
> > >> </module-def>
> > >>
> > >> I guess sometimes the classes artefact is being attached as a
> reference to the classes directory, rather than a reference to a jar, which
> causes the issue.
> > >
> > > Here's a gist with a subset of the build log [1]. When it works fine,
> it's copying a jar, when it's not, it's copying an empty folder.
> > >
> > > However, this is not only happening for the org.infinispan.server.rest
> module, others show the same issue [2]. What seems to be a pattern is that
> it only happens with modules that are built by us, it's not happening for
> modules coming with the base AS/WF instance.
> > >
> > > I've traced back and this might be due to build failures that are not
> producing the right jars [3].
> > >
> > > @Stuart, this is really our problem. Sorry for the inconvenience!
> > >
> > > [1] https://gist.github.com/galderz/b9286f385aad1316df51
> > > [2] https://gist.github.com/galderz/9e6a9bd9b18b805db323
> > > [3] https://gist.github.com/galderz/6ab662a1027cd96cbd8c
> > >
> > >>
> > >> Stuart
> > >>
> > >>
> > >> On Tue, Feb 4, 2014 at 11:14 AM, Galder Zamarre?o <galder at redhat.com>
> wrote:
> > >>
> > >> On 04 Feb 2014, at 10:01, Stuart Douglas <stuart.w.douglas at gmail.com>
> wrote:
> > >>
> > >>>
> > >>>
> > >>>
> > >>> On Tue, Feb 4, 2014 at 11:00 AM, Stuart Douglas <
> stuart.w.douglas at gmail.com> wrote:
> > >>> Yes, there is nothing in the server code that modified the modules
> directory.
> > >>>
> > >>> Well, except for the new patching stuff, but that is not really
> relevant here.
> > >>
> > >> The testsuite AS/WF builds are built out of the distribution build,
> which shows the same problem. The distribution we build uses the scripts we
> got from AS [1].
> > >>
> > >> Do you see anything in there that could be causing this? We are using
> maven-antrun-plugin version 1.3, and take into account the lib.xml in [2].
> > >>
> > >> Finally, do you have any suggestions on changes we could make to
> these files to further debug the issue?
> > >>
> > >> Thanks a lot for your help!
> > >>
> > >> [1]
> https://github.com/infinispan/infinispan/blob/master/server/integration/build/build.xml
> > >> [2]
> https://github.com/infinispan/infinispan/blob/master/server/integration/build/lib.xml
> > >>
> > >>>
> > >>> Stuart
> > >>>
> > >>>
> > >>> Stuart
> > >>>
> > >>>
> > >>> On Tue, Feb 4, 2014 at 10:56 AM, Galder Zamarre?o <galder at redhat.com>
> wrote:
> > >>>
> > >>> On 04 Feb 2014, at 09:37, Stuart Douglas <stuart.w.douglas at gmail.com>
> wrote:
> > >>>
> > >>>> This looks like an issue with your environment. The modules
> directory is static. Wildfly does not contain any code that messes with it.
> I would say the culprit is probably something in either your build process
> or your test suite.
> > >>>
> > >>> Correction, this is happening with AS 7.2.0.Final (Wildfly 8 used
> somewhere else). I guess your answer still applies?
> > >>>
> > >>> Cheers,
> > >>>
> > >>>>
> > >>>> Stuart
> > >>>>
> > >>>>
> > >>>> On Tue, Feb 4, 2014 at 10:21 AM, Galder Zamarre?o <
> galder at redhat.com> wrote:
> > >>>> Hi all,
> > >>>>
> > >>>> We're having issues with our Infinispan Server integration tests,
> which run within Wildfly 8.0.0.Beta1 (as I'm typing I'm wondering if we
> should just upgrade it to see if this goes away...?).
> > >>>>
> > >>>> Quite often some of the runs fail with error message [1].
> > >>>>
> > >>>> Having looked at the build environment when a run fails, you see
> this:
> > >>>>
> > >>>> --
> > >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main
> > >>>> drwxrwxr-x  2 g  staff    68B Feb  3 18:41 classes (<-- a
> directory??)
> > >>>> -rw-r--r--  1 g  staff     1B Feb  3 18:41 classes.index
> > >>>> -rw-r--r--  1 g  staff   2.1K Feb  3 18:41 module.xml
> > >>>>
> > >>>> $ ls
> modules/system/layers/base/org/infinispan/server/rest/main/classes
> > >>>> drwxrwxr-x  2 g  staff    68B Feb  3 18:41 .
> > >>>> drwxrwxr-x  5 g  staff   170B Feb  3 18:41 ..
> > >>>>
> > >>>> $ more
> modules/system/layers/base/org/infinispan/server/rest/main/module.xml
> > >>>> <module xmlns="urn:jboss:module:1.1"
> name="org.infinispan.server.rest">
> > >>>> ...
> > >>>> <resource-root path="classes"/>
> > >>>> ...
> > >>>>
> > >>>> This is completely different to what happens with a successful run:
> > >>>>
> > >>>> --
> > >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main
> > >>>> -rw-r--r--  1 g  staff   103K Feb  3 19:40 infinispan-classes.jar
> (<-- a jar file!)
> > >>>> -rw-r--r--  1 g  staff   278B Feb  3 19:40
> infinispan-classes.jar.index
> > >>>> -rw-r--r--  1 g  staff   2.1K Feb  3 19:40 module.xml
> > >>>>
> > >>>> $ jar tf
> modules/system/layers/base/org/infinispan/server/rest/main/infinispan-classes.jar
> | grep ExtendedHeaders
> > >>>> org/infinispan/rest/configuration/ExtendedHeaders.class
> > >>>>
> > >>>> $ more
> modules/system/layers/base/org/infinispan/server/rest/main/module.xml
> > >>>> <module xmlns="urn:jboss:module:1.1"
> name="org.infinispan.server.rest">
> > >>>> ...
> > >>>> <resource-root path="infinispan-classes.jar"/>
> > >>>> --
> > >>>>
> > >>>> Anyone can explain what is going on here? Does it ring a bell to
> anyone? Is this a known Wildfly issue by any chance?
> > >>>>
> > >>>> [1] https://gist.github.com/galderz/bd74cebfc840ef3ae284
> > >>>>
> > >>>> Cheers,
> > >>>> --
> > >>>> Galder Zamarre?o
> > >>>> galder at redhat.com
> > >>>> twitter.com/galderz
> > >>>>
> > >>>> Project Lead, Escalante
> > >>>> http://escalante.io
> > >>>>
> > >>>> Engineer, Infinispan
> > >>>> http://infinispan.org
> > >>>>
> > >>>>
> > >>>> _______________________________________________
> > >>>> jboss-as7-dev mailing list
> > >>>> jboss-as7-dev at lists.jboss.org
> > >>>> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>> Galder Zamarre?o
> > >>> galder at redhat.com
> > >>> twitter.com/galderz
> > >>>
> > >>> Project Lead, Escalante
> > >>> http://escalante.io
> > >>>
> > >>> Engineer, Infinispan
> > >>> http://infinispan.org
> > >>>
> > >>>
> > >>>
> > >>
> > >>
> > >> --
> > >> Galder Zamarre?o
> > >> galder at redhat.com
> > >> twitter.com/galderz
> > >>
> > >> Project Lead, Escalante
> > >> http://escalante.io
> > >>
> > >> Engineer, Infinispan
> > >> http://infinispan.org
> > >>
> > >>
> > >
> > >
> > > --
> > > Galder Zamarre?o
> > > galder at redhat.com
> > > twitter.com/galderz
> > >
> > > Project Lead, Escalante
> > > http://escalante.io
> > >
> > > Engineer, Infinispan
> > > http://infinispan.org
> > >
> > >
> > > _______________________________________________
> > > infinispan-dev mailing list
> > > infinispan-dev at lists.jboss.org
> > > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> >
> > --
> > Galder Zamarre?o
> > galder at redhat.com
> > twitter.com/galderz
> >
> > Project Lead, Escalante
> > http://escalante.io
> >
> > Engineer, Infinispan
> > http://infinispan.org
> >
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> --
> Galder Zamarre?o
> galder at redhat.com
> twitter.com/galderz
>
> Project Lead, Escalante
> http://escalante.io
>
> Engineer, Infinispan
> http://infinispan.org
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140204/4cb657c3/attachment-0001.html 

From galder at redhat.com  Tue Feb  4 08:03:16 2014
From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=)
Date: Tue, 4 Feb 2014 14:03:16 +0100
Subject: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing
	leaving empty classes/ folders and errors
In-Reply-To: <CA+nfvwRWQEK1cNw3bdEGpwWHZEjW2vtBe=mzEeTg9-w61iKPWA@mail.gmail.com>
References: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com>
	<CAAoo=c7FF1Kkqih=Rdt4vfF5f2_xkBMFyh_Vvs+NkHw+Bdz5Xw@mail.gmail.com>
	<D34293D4-A648-4537-B93E-CF87544A8C2C@redhat.com>
	<CAAoo=c69Hk2UmTAQiy4VA4g8iT_qA=hHz4kk-RNZDA+zPA_H3w@mail.gmail.com>
	<CAAoo=c4U6p4QTXj82LdUxiN6izuW05Z+YKy8Q6MjTkUiRFJ7DA@mail.gmail.com>
	<A500D8EC-751F-44A3-BB3E-659D88734B39@redhat.com>
	<CAAoo=c6TKRnCFsMDHXpZms_BLXy996z6=AShCZYhq-FEuvsf9g@mail.gmail.com>
	<05187A01-9E89-41D7-9E85-0FE4AC9DACCA@redhat.com>
	<D8BC1EB0-001F-4DD3-A701-B1A5BC6575F7@redhat.com>
	<CA+nfvwRWQEK1cNw3bdEGpwWHZEjW2vtBe=mzEeTg9-w61iKPWA@mail.gmail.com>
Message-ID: <7B023B75-68A3-46D1-A77F-F355F84CABCF@redhat.com>


On 04 Feb 2014, at 13:50, Dan Berindei <dan.berindei at gmail.com> wrote:

> 
> On Tue, Feb 4, 2014 at 2:36 PM, Galder Zamarre?o <galder at redhat.com> wrote:
> Narrowing down the list now, since this is a problem of how our CI is doing builds.
> 
> These logs are retrieved from [1].
> 
> Dunno how our CI is configured but this is odd. Seems like the build is halt due to test failures, but it continues somehow? I mean, the jars are not being produced properly, but the build is not halting.
> 
> We run the build with -fn (fail-never), so the build should never be halted because of a test failure. The configuration is here: http://ci.infinispan.org/admin/editRunType.html?id=buildType:bt8&runnerId=RUNNER_1

^ That?s not working as expected, see the build log, my snippets?etc.

> 
>  
> 
> It?s about time we did the following:
> 1) Any test failures should halt the build there and then. IOW, do not continue the build at all.
> 
> Will having 100 tests in one run and 2000 tests in another really help?

As you disable randomly failing tests, and do not integrate commits making the testsuite fail, these number should even out.

> 
>  
> 2) Any tests that fail randomly should be disabled.
> 
> Let's go ahead and disable all the server tests then? ;)

Those server tests that are randomly failing should be disabled and looked at. Those tests that are failing as a result of container not starting are side effects of things not working properly, and these should not be disabled.

> 
>  
> 
> Cheers,
> 
> [1] https://dl.dropboxusercontent.com/u/6148072/does-not-work.log
> 
> On 04 Feb 2014, at 13:30, Galder Zamarre?o <galder at redhat.com> wrote:
> 
> >
> > On 04 Feb 2014, at 10:38, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
> >
> >> It is almost certainly something to do with this:
> >>
> >> <module-def name="org.infinispan.server.rest" src="${infinispan.server.modules.dir}">
> >>
> >>         <maven-resource-with-classifier group="org.infinispan" artifact="infinispan-server-rest" classifier="classes" />
> >>
> >> </module-def>
> >>
> >> I guess sometimes the classes artefact is being attached as a reference to the classes directory, rather than a reference to a jar, which causes the issue.
> >
> > Here?s a gist with a subset of the build log [1]. When it works fine, it?s copying a jar, when it?s not, it?s copying an empty folder.
> >
> > However, this is not only happening for the org.infinispan.server.rest module, others show the same issue [2]. What seems to be a pattern is that it only happens with modules that are built by us, it?s not happening for modules coming with the base AS/WF instance.
> >
> > I?ve traced back and this might be due to build failures that are not producing the right jars [3].
> >
> > @Stuart, this is really our problem. Sorry for the inconvenience!
> >
> > [1] https://gist.github.com/galderz/b9286f385aad1316df51
> > [2] https://gist.github.com/galderz/9e6a9bd9b18b805db323
> > [3] https://gist.github.com/galderz/6ab662a1027cd96cbd8c
> >
> >>
> >> Stuart
> >>
> >>
> >> On Tue, Feb 4, 2014 at 11:14 AM, Galder Zamarre?o <galder at redhat.com> wrote:
> >>
> >> On 04 Feb 2014, at 10:01, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
> >>
> >>>
> >>>
> >>>
> >>> On Tue, Feb 4, 2014 at 11:00 AM, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
> >>> Yes, there is nothing in the server code that modified the modules directory.
> >>>
> >>> Well, except for the new patching stuff, but that is not really relevant here.
> >>
> >> The testsuite AS/WF builds are built out of the distribution build, which shows the same problem. The distribution we build uses the scripts we got from AS [1].
> >>
> >> Do you see anything in there that could be causing this? We are using maven-antrun-plugin version 1.3, and take into account the lib.xml in [2].
> >>
> >> Finally, do you have any suggestions on changes we could make to these files to further debug the issue?
> >>
> >> Thanks a lot for your help!
> >>
> >> [1] https://github.com/infinispan/infinispan/blob/master/server/integration/build/build.xml
> >> [2] https://github.com/infinispan/infinispan/blob/master/server/integration/build/lib.xml
> >>
> >>>
> >>> Stuart
> >>>
> >>>
> >>> Stuart
> >>>
> >>>
> >>> On Tue, Feb 4, 2014 at 10:56 AM, Galder Zamarre?o <galder at redhat.com> wrote:
> >>>
> >>> On 04 Feb 2014, at 09:37, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
> >>>
> >>>> This looks like an issue with your environment. The modules directory is static. Wildfly does not contain any code that messes with it. I would say the culprit is probably something in either your build process or your test suite.
> >>>
> >>> Correction, this is happening with AS 7.2.0.Final (Wildfly 8 used somewhere else). I guess your answer still applies?
> >>>
> >>> Cheers,
> >>>
> >>>>
> >>>> Stuart
> >>>>
> >>>>
> >>>> On Tue, Feb 4, 2014 at 10:21 AM, Galder Zamarre?o <galder at redhat.com> wrote:
> >>>> Hi all,
> >>>>
> >>>> We?re having issues with our Infinispan Server integration tests, which run within Wildfly 8.0.0.Beta1 (as I?m typing I?m wondering if we should just upgrade it to see if this goes away...?).
> >>>>
> >>>> Quite often some of the runs fail with error message [1].
> >>>>
> >>>> Having looked at the build environment when a run fails, you see this:
> >>>>
> >>>> --
> >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main
> >>>> drwxrwxr-x  2 g  staff    68B Feb  3 18:41 classes (<? a directory??)
> >>>> -rw-r--r--  1 g  staff     1B Feb  3 18:41 classes.index
> >>>> -rw-r--r--  1 g  staff   2.1K Feb  3 18:41 module.xml
> >>>>
> >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main/classes
> >>>> drwxrwxr-x  2 g  staff    68B Feb  3 18:41 .
> >>>> drwxrwxr-x  5 g  staff   170B Feb  3 18:41 ..
> >>>>
> >>>> $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml
> >>>> <module xmlns="urn:jboss:module:1.1" name="org.infinispan.server.rest?>
> >>>> ...
> >>>> <resource-root path="classes"/>
> >>>> ...
> >>>>
> >>>> This is completely different to what happens with a successful run:
> >>>>
> >>>> --
> >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main
> >>>> -rw-r--r--  1 g  staff   103K Feb  3 19:40 infinispan-classes.jar (<? a jar file!)
> >>>> -rw-r--r--  1 g  staff   278B Feb  3 19:40 infinispan-classes.jar.index
> >>>> -rw-r--r--  1 g  staff   2.1K Feb  3 19:40 module.xml
> >>>>
> >>>> $ jar tf modules/system/layers/base/org/infinispan/server/rest/main/infinispan-classes.jar | grep ExtendedHeaders
> >>>> org/infinispan/rest/configuration/ExtendedHeaders.class
> >>>>
> >>>> $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml
> >>>> <module xmlns="urn:jboss:module:1.1" name="org.infinispan.server.rest?>
> >>>> ...
> >>>> <resource-root path="infinispan-classes.jar?/>
> >>>> ?
> >>>>
> >>>> Anyone can explain what is going on here? Does it ring a bell to anyone? Is this a known Wildfly issue by any chance?
> >>>>
> >>>> [1] https://gist.github.com/galderz/bd74cebfc840ef3ae284
> >>>>
> >>>> Cheers,
> >>>> --
> >>>> Galder Zamarre?o
> >>>> galder at redhat.com
> >>>> twitter.com/galderz
> >>>>
> >>>> Project Lead, Escalante
> >>>> http://escalante.io
> >>>>
> >>>> Engineer, Infinispan
> >>>> http://infinispan.org
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> jboss-as7-dev mailing list
> >>>> jboss-as7-dev at lists.jboss.org
> >>>> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev
> >>>>
> >>>
> >>>
> >>> --
> >>> Galder Zamarre?o
> >>> galder at redhat.com
> >>> twitter.com/galderz
> >>>
> >>> Project Lead, Escalante
> >>> http://escalante.io
> >>>
> >>> Engineer, Infinispan
> >>> http://infinispan.org
> >>>
> >>>
> >>>
> >>
> >>
> >> --
> >> Galder Zamarre?o
> >> galder at redhat.com
> >> twitter.com/galderz
> >>
> >> Project Lead, Escalante
> >> http://escalante.io
> >>
> >> Engineer, Infinispan
> >> http://infinispan.org
> >>
> >>
> >
> >
> > --
> > Galder Zamarre?o
> > galder at redhat.com
> > twitter.com/galderz
> >
> > Project Lead, Escalante
> > http://escalante.io
> >
> > Engineer, Infinispan
> > http://infinispan.org
> >
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> 
> --
> Galder Zamarre?o
> galder at redhat.com
> twitter.com/galderz
> 
> Project Lead, Escalante
> http://escalante.io
> 
> Engineer, Infinispan
> http://infinispan.org
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From galder at redhat.com  Tue Feb  4 09:10:35 2014
From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=)
Date: Tue, 4 Feb 2014 15:10:35 +0100
Subject: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing
	leaving empty classes/ folders and errors
In-Reply-To: <CA+nfvwRBtMxRxnN5uow_bHaq518dS7SpjO_aavAavxY2PbE-rA@mail.gmail.com>
References: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com>
	<CAAoo=c7FF1Kkqih=Rdt4vfF5f2_xkBMFyh_Vvs+NkHw+Bdz5Xw@mail.gmail.com>
	<D34293D4-A648-4537-B93E-CF87544A8C2C@redhat.com>
	<CAAoo=c69Hk2UmTAQiy4VA4g8iT_qA=hHz4kk-RNZDA+zPA_H3w@mail.gmail.com>
	<CAAoo=c4U6p4QTXj82LdUxiN6izuW05Z+YKy8Q6MjTkUiRFJ7DA@mail.gmail.com>
	<A500D8EC-751F-44A3-BB3E-659D88734B39@redhat.com>
	<CAAoo=c6TKRnCFsMDHXpZms_BLXy996z6=AShCZYhq-FEuvsf9g@mail.gmail.com>
	<05187A01-9E89-41D7-9E85-0FE4AC9DACCA@redhat.com>
	<D8BC1EB0-001F-4DD3-A701-B1A5BC6575F7@redhat.com>
	<CA+nfvwRWQEK1cNw3bdEGpwWHZEjW2vtBe=mzEeTg9-w61iKPWA@mail.gmail.com>
	<7B023B75-68A3-46D1-A77F-F355F84CABCF@redhat.com>
	<CA+nfvwRBtMxRxnN5uow_bHaq518dS7SpjO_aavAavxY2PbE-rA@mail.gmail.com>
Message-ID: <063F6F57-8F25-4622-9BBF-355806DA6294@redhat.com>


On 04 Feb 2014, at 14:47, Dan Berindei <dan.berindei at gmail.com> wrote:

> 
> 
> 
> On Tue, Feb 4, 2014 at 3:03 PM, Galder Zamarre?o <galder at redhat.com> wrote:
> 
> On 04 Feb 2014, at 13:50, Dan Berindei <dan.berindei at gmail.com> wrote:
> 
> >
> > On Tue, Feb 4, 2014 at 2:36 PM, Galder Zamarre?o <galder at redhat.com> wrote:
> > Narrowing down the list now, since this is a problem of how our CI is doing builds.
> >
> > These logs are retrieved from [1].
> >
> > Dunno how our CI is configured but this is odd. Seems like the build is halt due to test failures, but it continues somehow? I mean, the jars are not being produced properly, but the build is not halting.
> >
> > We run the build with -fn (fail-never), so the build should never be halted because of a test failure. The configuration is here: http://ci.infinispan.org/admin/editRunType.html?id=buildType:bt8&runnerId=RUNNER_1
> 
> ^ That?s not working as expected, see the build log, my snippets?etc.
> 
> Sorry, I didn't understand what's happening in those snippets.

The log shows it quite clearly that after those tests fail, nothing else runs in that module, including producing the jar. It halts. That?s 

> All I saw was an Ant script that doesn't do what it's supposed to do :)

The ant script not doing it?s job is because there modules are not completing the build. There?s a direct correlation between the three modules that fail with tests and the 3 modules that are copying an empty folder instead of the jar.

> 
> I did see some differences in the configuration between the JDK6 and the JDK7 builds:
> * JDK7 uses -fn and JDK6 uses -Dmaven.test.failure.ignore
> * JDK7 uses -nsu (no snapshot updates), JDK6 doesn't
> 
> I've changed both builds to use -Dmaven.test.failure.ignore and -nsu, let's see how it goes.

You are solving the wrong problem.

> 
>  
> 
> >
> >
> >
> > It?s about time we did the following:
> > 1) Any test failures should halt the build there and then. IOW, do not continue the build at all.
> >
> > Will having 100 tests in one run and 2000 tests in another really help?
> 
> As you disable randomly failing tests, and do not integrate commits making the testsuite fail, these number should even out.
> 
> Not integrating commits that fail every time is easy, not integrating commits that fail randomly (maybe only in some environments) is trickier.

I know it?s tricky, but the only thing we can do is disable those really. I don?t see how keeping them enabled is helping at all.

>  
> 
> >
> >
> > 2) Any tests that fail randomly should be disabled.
> >
> > Let's go ahead and disable all the server tests then? ;)
> 
> Those server tests that are randomly failing should be disabled and looked at. Those tests that are failing as a result of container not starting are side effects of things not working properly, and these should not be disabled.
> 
> Why treat the tests that are failing because of a build problem differently? What about the tests that fail only on IBM JDK6?

Disable and indicate that the test fails on IBM JDK6. Once the issue is fixed, reenable it.

> 
> 
> >
> >
> >
> > Cheers,
> >
> > [1] https://dl.dropboxusercontent.com/u/6148072/does-not-work.log
> >
> > On 04 Feb 2014, at 13:30, Galder Zamarre?o <galder at redhat.com> wrote:
> >
> > >
> > > On 04 Feb 2014, at 10:38, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
> > >
> > >> It is almost certainly something to do with this:
> > >>
> > >> <module-def name="org.infinispan.server.rest" src="${infinispan.server.modules.dir}">
> > >>
> > >>         <maven-resource-with-classifier group="org.infinispan" artifact="infinispan-server-rest" classifier="classes" />
> > >>
> > >> </module-def>
> > >>
> > >> I guess sometimes the classes artefact is being attached as a reference to the classes directory, rather than a reference to a jar, which causes the issue.
> > >
> > > Here?s a gist with a subset of the build log [1]. When it works fine, it?s copying a jar, when it?s not, it?s copying an empty folder.
> > >
> > > However, this is not only happening for the org.infinispan.server.rest module, others show the same issue [2]. What seems to be a pattern is that it only happens with modules that are built by us, it?s not happening for modules coming with the base AS/WF instance.
> > >
> > > I?ve traced back and this might be due to build failures that are not producing the right jars [3].
> > >
> > > @Stuart, this is really our problem. Sorry for the inconvenience!
> > >
> > > [1] https://gist.github.com/galderz/b9286f385aad1316df51
> > > [2] https://gist.github.com/galderz/9e6a9bd9b18b805db323
> > > [3] https://gist.github.com/galderz/6ab662a1027cd96cbd8c
> > >
> > >>
> > >> Stuart
> > >>
> > >>
> > >> On Tue, Feb 4, 2014 at 11:14 AM, Galder Zamarre?o <galder at redhat.com> wrote:
> > >>
> > >> On 04 Feb 2014, at 10:01, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
> > >>
> > >>>
> > >>>
> > >>>
> > >>> On Tue, Feb 4, 2014 at 11:00 AM, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
> > >>> Yes, there is nothing in the server code that modified the modules directory.
> > >>>
> > >>> Well, except for the new patching stuff, but that is not really relevant here.
> > >>
> > >> The testsuite AS/WF builds are built out of the distribution build, which shows the same problem. The distribution we build uses the scripts we got from AS [1].
> > >>
> > >> Do you see anything in there that could be causing this? We are using maven-antrun-plugin version 1.3, and take into account the lib.xml in [2].
> > >>
> > >> Finally, do you have any suggestions on changes we could make to these files to further debug the issue?
> > >>
> > >> Thanks a lot for your help!
> > >>
> > >> [1] https://github.com/infinispan/infinispan/blob/master/server/integration/build/build.xml
> > >> [2] https://github.com/infinispan/infinispan/blob/master/server/integration/build/lib.xml
> > >>
> > >>>
> > >>> Stuart
> > >>>
> > >>>
> > >>> Stuart
> > >>>
> > >>>
> > >>> On Tue, Feb 4, 2014 at 10:56 AM, Galder Zamarre?o <galder at redhat.com> wrote:
> > >>>
> > >>> On 04 Feb 2014, at 09:37, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
> > >>>
> > >>>> This looks like an issue with your environment. The modules directory is static. Wildfly does not contain any code that messes with it. I would say the culprit is probably something in either your build process or your test suite.
> > >>>
> > >>> Correction, this is happening with AS 7.2.0.Final (Wildfly 8 used somewhere else). I guess your answer still applies?
> > >>>
> > >>> Cheers,
> > >>>
> > >>>>
> > >>>> Stuart
> > >>>>
> > >>>>
> > >>>> On Tue, Feb 4, 2014 at 10:21 AM, Galder Zamarre?o <galder at redhat.com> wrote:
> > >>>> Hi all,
> > >>>>
> > >>>> We?re having issues with our Infinispan Server integration tests, which run within Wildfly 8.0.0.Beta1 (as I?m typing I?m wondering if we should just upgrade it to see if this goes away...?).
> > >>>>
> > >>>> Quite often some of the runs fail with error message [1].
> > >>>>
> > >>>> Having looked at the build environment when a run fails, you see this:
> > >>>>
> > >>>> --
> > >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main
> > >>>> drwxrwxr-x  2 g  staff    68B Feb  3 18:41 classes (<? a directory??)
> > >>>> -rw-r--r--  1 g  staff     1B Feb  3 18:41 classes.index
> > >>>> -rw-r--r--  1 g  staff   2.1K Feb  3 18:41 module.xml
> > >>>>
> > >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main/classes
> > >>>> drwxrwxr-x  2 g  staff    68B Feb  3 18:41 .
> > >>>> drwxrwxr-x  5 g  staff   170B Feb  3 18:41 ..
> > >>>>
> > >>>> $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml
> > >>>> <module xmlns="urn:jboss:module:1.1" name="org.infinispan.server.rest?>
> > >>>> ...
> > >>>> <resource-root path="classes"/>
> > >>>> ...
> > >>>>
> > >>>> This is completely different to what happens with a successful run:
> > >>>>
> > >>>> --
> > >>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main
> > >>>> -rw-r--r--  1 g  staff   103K Feb  3 19:40 infinispan-classes.jar (<? a jar file!)
> > >>>> -rw-r--r--  1 g  staff   278B Feb  3 19:40 infinispan-classes.jar.index
> > >>>> -rw-r--r--  1 g  staff   2.1K Feb  3 19:40 module.xml
> > >>>>
> > >>>> $ jar tf modules/system/layers/base/org/infinispan/server/rest/main/infinispan-classes.jar | grep ExtendedHeaders
> > >>>> org/infinispan/rest/configuration/ExtendedHeaders.class
> > >>>>
> > >>>> $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml
> > >>>> <module xmlns="urn:jboss:module:1.1" name="org.infinispan.server.rest?>
> > >>>> ...
> > >>>> <resource-root path="infinispan-classes.jar?/>
> > >>>> ?
> > >>>>
> > >>>> Anyone can explain what is going on here? Does it ring a bell to anyone? Is this a known Wildfly issue by any chance?
> > >>>>
> > >>>> [1] https://gist.github.com/galderz/bd74cebfc840ef3ae284
> > >>>>
> > >>>> Cheers,
> > >>>> --
> > >>>> Galder Zamarre?o
> > >>>> galder at redhat.com
> > >>>> twitter.com/galderz
> > >>>>
> > >>>> Project Lead, Escalante
> > >>>> http://escalante.io
> > >>>>
> > >>>> Engineer, Infinispan
> > >>>> http://infinispan.org
> > >>>>
> > >>>>
> > >>>> _______________________________________________
> > >>>> jboss-as7-dev mailing list
> > >>>> jboss-as7-dev at lists.jboss.org
> > >>>> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>> Galder Zamarre?o
> > >>> galder at redhat.com
> > >>> twitter.com/galderz
> > >>>
> > >>> Project Lead, Escalante
> > >>> http://escalante.io
> > >>>
> > >>> Engineer, Infinispan
> > >>> http://infinispan.org
> > >>>
> > >>>
> > >>>
> > >>
> > >>
> > >> --
> > >> Galder Zamarre?o
> > >> galder at redhat.com
> > >> twitter.com/galderz
> > >>
> > >> Project Lead, Escalante
> > >> http://escalante.io
> > >>
> > >> Engineer, Infinispan
> > >> http://infinispan.org
> > >>
> > >>
> > >
> > >
> > > --
> > > Galder Zamarre?o
> > > galder at redhat.com
> > > twitter.com/galderz
> > >
> > > Project Lead, Escalante
> > > http://escalante.io
> > >
> > > Engineer, Infinispan
> > > http://infinispan.org
> > >
> > >
> > > _______________________________________________
> > > infinispan-dev mailing list
> > > infinispan-dev at lists.jboss.org
> > > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> >
> > --
> > Galder Zamarre?o
> > galder at redhat.com
> > twitter.com/galderz
> >
> > Project Lead, Escalante
> > http://escalante.io
> >
> > Engineer, Infinispan
> > http://infinispan.org
> >
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> 
> --
> Galder Zamarre?o
> galder at redhat.com
> twitter.com/galderz
> 
> Project Lead, Escalante
> http://escalante.io
> 
> Engineer, Infinispan
> http://infinispan.org
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From galder at redhat.com  Tue Feb  4 09:36:23 2014
From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=)
Date: Tue, 4 Feb 2014 15:36:23 +0100
Subject: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing
	leaving empty classes/ folders and errors
In-Reply-To: <063F6F57-8F25-4622-9BBF-355806DA6294@redhat.com>
References: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com>
	<CAAoo=c7FF1Kkqih=Rdt4vfF5f2_xkBMFyh_Vvs+NkHw+Bdz5Xw@mail.gmail.com>
	<D34293D4-A648-4537-B93E-CF87544A8C2C@redhat.com>
	<CAAoo=c69Hk2UmTAQiy4VA4g8iT_qA=hHz4kk-RNZDA+zPA_H3w@mail.gmail.com>
	<CAAoo=c4U6p4QTXj82LdUxiN6izuW05Z+YKy8Q6MjTkUiRFJ7DA@mail.gmail.com>
	<A500D8EC-751F-44A3-BB3E-659D88734B39@redhat.com>
	<CAAoo=c6TKRnCFsMDHXpZms_BLXy996z6=AShCZYhq-FEuvsf9g@mail.gmail.com>
	<05187A01-9E89-41D7-9E85-0FE4AC9DACCA@redhat.com>
	<D8BC1EB0-001F-4DD3-A701-B1A5BC6575F7@redhat.com>
	<CA+nfvwRWQEK1cNw3bdEGpwWHZEjW2vtBe=mzEeTg9-w61iKPWA@mail.gmail.com>
	<7B023B75-68A3-46D1-A77F-F355F84CABCF@redhat.com>
	<CA+nfvwRBtMxRxnN5uow_bHaq518dS7SpjO_aavAavxY2PbE-rA@mail.gmail.com>
	<063F6F57-8F25-4622-9BBF-355806DA6294@redhat.com>
Message-ID: <AC531BC8-1464-40E1-A0F1-5E615CB3B8EF@redhat.com>

All,

Sanne, Pedro, Dan and I had a very productive discussion on IRC on this topic [1].

We?ve decided that instead of disabling tests, we need them to run in order to get recent stacktraces, logs, etc. So, we?ve decided to create a new test group called ?unstable?. This test group would only be run in CI once a day and it?d be run in a different build. This build would also enable TRACE logging for standalone and server tests. For server, I need to create a task to do this selectively.

The rest of builds, masters and PRS would not run the ?unstable? group, and would not have TRACE enabled.

The responsibility of unstable tests are the component owners. They need to handle them and decide what to do with them.

Cheers,

[1] https://gist.github.com/galderz/3563d1b23b5d50f80d82

On 04 Feb 2014, at 15:10, Galder Zamarre?o <galder at redhat.com> wrote:

> 
> On 04 Feb 2014, at 14:47, Dan Berindei <dan.berindei at gmail.com> wrote:
> 
>> 
>> 
>> 
>> On Tue, Feb 4, 2014 at 3:03 PM, Galder Zamarre?o <galder at redhat.com> wrote:
>> 
>> On 04 Feb 2014, at 13:50, Dan Berindei <dan.berindei at gmail.com> wrote:
>> 
>>> 
>>> On Tue, Feb 4, 2014 at 2:36 PM, Galder Zamarre?o <galder at redhat.com> wrote:
>>> Narrowing down the list now, since this is a problem of how our CI is doing builds.
>>> 
>>> These logs are retrieved from [1].
>>> 
>>> Dunno how our CI is configured but this is odd. Seems like the build is halt due to test failures, but it continues somehow? I mean, the jars are not being produced properly, but the build is not halting.
>>> 
>>> We run the build with -fn (fail-never), so the build should never be halted because of a test failure. The configuration is here: http://ci.infinispan.org/admin/editRunType.html?id=buildType:bt8&runnerId=RUNNER_1
>> 
>> ^ That?s not working as expected, see the build log, my snippets?etc.
>> 
>> Sorry, I didn't understand what's happening in those snippets.
> 
> The log shows it quite clearly that after those tests fail, nothing else runs in that module, including producing the jar. It halts. That?s 
> 
>> All I saw was an Ant script that doesn't do what it's supposed to do :)
> 
> The ant script not doing it?s job is because there modules are not completing the build. There?s a direct correlation between the three modules that fail with tests and the 3 modules that are copying an empty folder instead of the jar.
> 
>> 
>> I did see some differences in the configuration between the JDK6 and the JDK7 builds:
>> * JDK7 uses -fn and JDK6 uses -Dmaven.test.failure.ignore
>> * JDK7 uses -nsu (no snapshot updates), JDK6 doesn't
>> 
>> I've changed both builds to use -Dmaven.test.failure.ignore and -nsu, let's see how it goes.
> 
> You are solving the wrong problem.
> 
>> 
>> 
>> 
>>> 
>>> 
>>> 
>>> It?s about time we did the following:
>>> 1) Any test failures should halt the build there and then. IOW, do not continue the build at all.
>>> 
>>> Will having 100 tests in one run and 2000 tests in another really help?
>> 
>> As you disable randomly failing tests, and do not integrate commits making the testsuite fail, these number should even out.
>> 
>> Not integrating commits that fail every time is easy, not integrating commits that fail randomly (maybe only in some environments) is trickier.
> 
> I know it?s tricky, but the only thing we can do is disable those really. I don?t see how keeping them enabled is helping at all.
> 
>> 
>> 
>>> 
>>> 
>>> 2) Any tests that fail randomly should be disabled.
>>> 
>>> Let's go ahead and disable all the server tests then? ;)
>> 
>> Those server tests that are randomly failing should be disabled and looked at. Those tests that are failing as a result of container not starting are side effects of things not working properly, and these should not be disabled.
>> 
>> Why treat the tests that are failing because of a build problem differently? What about the tests that fail only on IBM JDK6?
> 
> Disable and indicate that the test fails on IBM JDK6. Once the issue is fixed, reenable it.
> 
>> 
>> 
>>> 
>>> 
>>> 
>>> Cheers,
>>> 
>>> [1] https://dl.dropboxusercontent.com/u/6148072/does-not-work.log
>>> 
>>> On 04 Feb 2014, at 13:30, Galder Zamarre?o <galder at redhat.com> wrote:
>>> 
>>>> 
>>>> On 04 Feb 2014, at 10:38, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
>>>> 
>>>>> It is almost certainly something to do with this:
>>>>> 
>>>>> <module-def name="org.infinispan.server.rest" src="${infinispan.server.modules.dir}">
>>>>> 
>>>>>        <maven-resource-with-classifier group="org.infinispan" artifact="infinispan-server-rest" classifier="classes" />
>>>>> 
>>>>> </module-def>
>>>>> 
>>>>> I guess sometimes the classes artefact is being attached as a reference to the classes directory, rather than a reference to a jar, which causes the issue.
>>>> 
>>>> Here?s a gist with a subset of the build log [1]. When it works fine, it?s copying a jar, when it?s not, it?s copying an empty folder.
>>>> 
>>>> However, this is not only happening for the org.infinispan.server.rest module, others show the same issue [2]. What seems to be a pattern is that it only happens with modules that are built by us, it?s not happening for modules coming with the base AS/WF instance.
>>>> 
>>>> I?ve traced back and this might be due to build failures that are not producing the right jars [3].
>>>> 
>>>> @Stuart, this is really our problem. Sorry for the inconvenience!
>>>> 
>>>> [1] https://gist.github.com/galderz/b9286f385aad1316df51
>>>> [2] https://gist.github.com/galderz/9e6a9bd9b18b805db323
>>>> [3] https://gist.github.com/galderz/6ab662a1027cd96cbd8c
>>>> 
>>>>> 
>>>>> Stuart
>>>>> 
>>>>> 
>>>>> On Tue, Feb 4, 2014 at 11:14 AM, Galder Zamarre?o <galder at redhat.com> wrote:
>>>>> 
>>>>> On 04 Feb 2014, at 10:01, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Tue, Feb 4, 2014 at 11:00 AM, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
>>>>>> Yes, there is nothing in the server code that modified the modules directory.
>>>>>> 
>>>>>> Well, except for the new patching stuff, but that is not really relevant here.
>>>>> 
>>>>> The testsuite AS/WF builds are built out of the distribution build, which shows the same problem. The distribution we build uses the scripts we got from AS [1].
>>>>> 
>>>>> Do you see anything in there that could be causing this? We are using maven-antrun-plugin version 1.3, and take into account the lib.xml in [2].
>>>>> 
>>>>> Finally, do you have any suggestions on changes we could make to these files to further debug the issue?
>>>>> 
>>>>> Thanks a lot for your help!
>>>>> 
>>>>> [1] https://github.com/infinispan/infinispan/blob/master/server/integration/build/build.xml
>>>>> [2] https://github.com/infinispan/infinispan/blob/master/server/integration/build/lib.xml
>>>>> 
>>>>>> 
>>>>>> Stuart
>>>>>> 
>>>>>> 
>>>>>> Stuart
>>>>>> 
>>>>>> 
>>>>>> On Tue, Feb 4, 2014 at 10:56 AM, Galder Zamarre?o <galder at redhat.com> wrote:
>>>>>> 
>>>>>> On 04 Feb 2014, at 09:37, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
>>>>>> 
>>>>>>> This looks like an issue with your environment. The modules directory is static. Wildfly does not contain any code that messes with it. I would say the culprit is probably something in either your build process or your test suite.
>>>>>> 
>>>>>> Correction, this is happening with AS 7.2.0.Final (Wildfly 8 used somewhere else). I guess your answer still applies?
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>>>> 
>>>>>>> Stuart
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Feb 4, 2014 at 10:21 AM, Galder Zamarre?o <galder at redhat.com> wrote:
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> We?re having issues with our Infinispan Server integration tests, which run within Wildfly 8.0.0.Beta1 (as I?m typing I?m wondering if we should just upgrade it to see if this goes away...?).
>>>>>>> 
>>>>>>> Quite often some of the runs fail with error message [1].
>>>>>>> 
>>>>>>> Having looked at the build environment when a run fails, you see this:
>>>>>>> 
>>>>>>> --
>>>>>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main
>>>>>>> drwxrwxr-x  2 g  staff    68B Feb  3 18:41 classes (<? a directory??)
>>>>>>> -rw-r--r--  1 g  staff     1B Feb  3 18:41 classes.index
>>>>>>> -rw-r--r--  1 g  staff   2.1K Feb  3 18:41 module.xml
>>>>>>> 
>>>>>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main/classes
>>>>>>> drwxrwxr-x  2 g  staff    68B Feb  3 18:41 .
>>>>>>> drwxrwxr-x  5 g  staff   170B Feb  3 18:41 ..
>>>>>>> 
>>>>>>> $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml
>>>>>>> <module xmlns="urn:jboss:module:1.1" name="org.infinispan.server.rest?>
>>>>>>> ...
>>>>>>> <resource-root path="classes"/>
>>>>>>> ...
>>>>>>> 
>>>>>>> This is completely different to what happens with a successful run:
>>>>>>> 
>>>>>>> --
>>>>>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main
>>>>>>> -rw-r--r--  1 g  staff   103K Feb  3 19:40 infinispan-classes.jar (<? a jar file!)
>>>>>>> -rw-r--r--  1 g  staff   278B Feb  3 19:40 infinispan-classes.jar.index
>>>>>>> -rw-r--r--  1 g  staff   2.1K Feb  3 19:40 module.xml
>>>>>>> 
>>>>>>> $ jar tf modules/system/layers/base/org/infinispan/server/rest/main/infinispan-classes.jar | grep ExtendedHeaders
>>>>>>> org/infinispan/rest/configuration/ExtendedHeaders.class
>>>>>>> 
>>>>>>> $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml
>>>>>>> <module xmlns="urn:jboss:module:1.1" name="org.infinispan.server.rest?>
>>>>>>> ...
>>>>>>> <resource-root path="infinispan-classes.jar?/>
>>>>>>> ?
>>>>>>> 
>>>>>>> Anyone can explain what is going on here? Does it ring a bell to anyone? Is this a known Wildfly issue by any chance?
>>>>>>> 
>>>>>>> [1] https://gist.github.com/galderz/bd74cebfc840ef3ae284
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> --
>>>>>>> Galder Zamarre?o
>>>>>>> galder at redhat.com
>>>>>>> twitter.com/galderz
>>>>>>> 
>>>>>>> Project Lead, Escalante
>>>>>>> http://escalante.io
>>>>>>> 
>>>>>>> Engineer, Infinispan
>>>>>>> http://infinispan.org
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> jboss-as7-dev mailing list
>>>>>>> jboss-as7-dev at lists.jboss.org
>>>>>>> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Galder Zamarre?o
>>>>>> galder at redhat.com
>>>>>> twitter.com/galderz
>>>>>> 
>>>>>> Project Lead, Escalante
>>>>>> http://escalante.io
>>>>>> 
>>>>>> Engineer, Infinispan
>>>>>> http://infinispan.org
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Galder Zamarre?o
>>>>> galder at redhat.com
>>>>> twitter.com/galderz
>>>>> 
>>>>> Project Lead, Escalante
>>>>> http://escalante.io
>>>>> 
>>>>> Engineer, Infinispan
>>>>> http://infinispan.org
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Galder Zamarre?o
>>>> galder at redhat.com
>>>> twitter.com/galderz
>>>> 
>>>> Project Lead, Escalante
>>>> http://escalante.io
>>>> 
>>>> Engineer, Infinispan
>>>> http://infinispan.org
>>>> 
>>>> 
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> 
>>> 
>>> --
>>> Galder Zamarre?o
>>> galder at redhat.com
>>> twitter.com/galderz
>>> 
>>> Project Lead, Escalante
>>> http://escalante.io
>>> 
>>> Engineer, Infinispan
>>> http://infinispan.org
>>> 
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> 
>> --
>> Galder Zamarre?o
>> galder at redhat.com
>> twitter.com/galderz
>> 
>> Project Lead, Escalante
>> http://escalante.io
>> 
>> Engineer, Infinispan
>> http://infinispan.org
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> 
> --
> Galder Zamarre?o
> galder at redhat.com
> twitter.com/galderz
> 
> Project Lead, Escalante
> http://escalante.io
> 
> Engineer, Infinispan
> http://infinispan.org
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From dan.berindei at gmail.com  Tue Feb  4 13:13:38 2014
From: dan.berindei at gmail.com (Dan Berindei)
Date: Tue, 4 Feb 2014 20:13:38 +0200
Subject: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing
 leaving empty classes/ folders and errors
In-Reply-To: <AC531BC8-1464-40E1-A0F1-5E615CB3B8EF@redhat.com>
References: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com>
	<CAAoo=c7FF1Kkqih=Rdt4vfF5f2_xkBMFyh_Vvs+NkHw+Bdz5Xw@mail.gmail.com>
	<D34293D4-A648-4537-B93E-CF87544A8C2C@redhat.com>
	<CAAoo=c69Hk2UmTAQiy4VA4g8iT_qA=hHz4kk-RNZDA+zPA_H3w@mail.gmail.com>
	<CAAoo=c4U6p4QTXj82LdUxiN6izuW05Z+YKy8Q6MjTkUiRFJ7DA@mail.gmail.com>
	<A500D8EC-751F-44A3-BB3E-659D88734B39@redhat.com>
	<CAAoo=c6TKRnCFsMDHXpZms_BLXy996z6=AShCZYhq-FEuvsf9g@mail.gmail.com>
	<05187A01-9E89-41D7-9E85-0FE4AC9DACCA@redhat.com>
	<D8BC1EB0-001F-4DD3-A701-B1A5BC6575F7@redhat.com>
	<CA+nfvwRWQEK1cNw3bdEGpwWHZEjW2vtBe=mzEeTg9-w61iKPWA@mail.gmail.com>
	<7B023B75-68A3-46D1-A77F-F355F84CABCF@redhat.com>
	<CA+nfvwRBtMxRxnN5uow_bHaq518dS7SpjO_aavAavxY2PbE-rA@mail.gmail.com>
	<063F6F57-8F25-4622-9BBF-355806DA6294@redhat.com>
	<AC531BC8-1464-40E1-A0F1-5E615CB3B8EF@redhat.com>
Message-ID: <CA+nfvwRp8ON3d42PqOy-0=fSYw7Z=gBNDrik5eZgiChVH=RJSQ@mail.gmail.com>

For the record, -Dmaven.test.failure.ignore seems to do the right thing,
and the JDK7 build now only has 7 test failures (+ 4 ignored):

http://ci.infinispan.org/viewLog.html?buildId=5912&tab=buildResultsDiv&buildTypeId=bt8


On Tue, Feb 4, 2014 at 4:36 PM, Galder Zamarre?o <galder at redhat.com> wrote:

> All,
>
> Sanne, Pedro, Dan and I had a very productive discussion on IRC on this
> topic [1].
>
> We've decided that instead of disabling tests, we need them to run in
> order to get recent stacktraces, logs, etc. So, we've decided to create a
> new test group called "unstable". This test group would only be run in CI
> once a day and it'd be run in a different build. This build would also
> enable TRACE logging for standalone and server tests. For server, I need to
> create a task to do this selectively.
>
> The rest of builds, masters and PRS would not run the "unstable" group,
> and would not have TRACE enabled.
>
> The responsibility of unstable tests are the component owners. They need
> to handle them and decide what to do with them.
>
> Cheers,
>
> [1] https://gist.github.com/galderz/3563d1b23b5d50f80d82
>
> On 04 Feb 2014, at 15:10, Galder Zamarre?o <galder at redhat.com> wrote:
>
> >
> > On 04 Feb 2014, at 14:47, Dan Berindei <dan.berindei at gmail.com> wrote:
> >
> >>
> >>
> >>
> >> On Tue, Feb 4, 2014 at 3:03 PM, Galder Zamarre?o <galder at redhat.com>
> wrote:
> >>
> >> On 04 Feb 2014, at 13:50, Dan Berindei <dan.berindei at gmail.com> wrote:
> >>
> >>>
> >>> On Tue, Feb 4, 2014 at 2:36 PM, Galder Zamarre?o <galder at redhat.com>
> wrote:
> >>> Narrowing down the list now, since this is a problem of how our CI is
> doing builds.
> >>>
> >>> These logs are retrieved from [1].
> >>>
> >>> Dunno how our CI is configured but this is odd. Seems like the build
> is halt due to test failures, but it continues somehow? I mean, the jars
> are not being produced properly, but the build is not halting.
> >>>
> >>> We run the build with -fn (fail-never), so the build should never be
> halted because of a test failure. The configuration is here:
> http://ci.infinispan.org/admin/editRunType.html?id=buildType:bt8&runnerId=RUNNER_1
> >>
> >> ^ That's not working as expected, see the build log, my snippets...etc.
> >>
> >> Sorry, I didn't understand what's happening in those snippets.
> >
> > The log shows it quite clearly that after those tests fail, nothing else
> runs in that module, including producing the jar. It halts. That's
> >
> >> All I saw was an Ant script that doesn't do what it's supposed to do :)
> >
> > The ant script not doing it's job is because there modules are not
> completing the build. There's a direct correlation between the three
> modules that fail with tests and the 3 modules that are copying an empty
> folder instead of the jar.
> >
> >>
> >> I did see some differences in the configuration between the JDK6 and
> the JDK7 builds:
> >> * JDK7 uses -fn and JDK6 uses -Dmaven.test.failure.ignore
> >> * JDK7 uses -nsu (no snapshot updates), JDK6 doesn't
> >>
> >> I've changed both builds to use -Dmaven.test.failure.ignore and -nsu,
> let's see how it goes.
> >
> > You are solving the wrong problem.
> >
> >>
> >>
> >>
> >>>
> >>>
> >>>
> >>> It's about time we did the following:
> >>> 1) Any test failures should halt the build there and then. IOW, do not
> continue the build at all.
> >>>
> >>> Will having 100 tests in one run and 2000 tests in another really help?
> >>
> >> As you disable randomly failing tests, and do not integrate commits
> making the testsuite fail, these number should even out.
> >>
> >> Not integrating commits that fail every time is easy, not integrating
> commits that fail randomly (maybe only in some environments) is trickier.
> >
> > I know it's tricky, but the only thing we can do is disable those
> really. I don't see how keeping them enabled is helping at all.
> >
> >>
> >>
> >>>
> >>>
> >>> 2) Any tests that fail randomly should be disabled.
> >>>
> >>> Let's go ahead and disable all the server tests then? ;)
> >>
> >> Those server tests that are randomly failing should be disabled and
> looked at. Those tests that are failing as a result of container not
> starting are side effects of things not working properly, and these should
> not be disabled.
> >>
> >> Why treat the tests that are failing because of a build problem
> differently? What about the tests that fail only on IBM JDK6?
> >
> > Disable and indicate that the test fails on IBM JDK6. Once the issue is
> fixed, reenable it.
> >
> >>
> >>
> >>>
> >>>
> >>>
> >>> Cheers,
> >>>
> >>> [1] https://dl.dropboxusercontent.com/u/6148072/does-not-work.log
> >>>
> >>> On 04 Feb 2014, at 13:30, Galder Zamarre?o <galder at redhat.com> wrote:
> >>>
> >>>>
> >>>> On 04 Feb 2014, at 10:38, Stuart Douglas <stuart.w.douglas at gmail.com>
> wrote:
> >>>>
> >>>>> It is almost certainly something to do with this:
> >>>>>
> >>>>> <module-def name="org.infinispan.server.rest"
> src="${infinispan.server.modules.dir}">
> >>>>>
> >>>>>        <maven-resource-with-classifier group="org.infinispan"
> artifact="infinispan-server-rest" classifier="classes" />
> >>>>>
> >>>>> </module-def>
> >>>>>
> >>>>> I guess sometimes the classes artefact is being attached as a
> reference to the classes directory, rather than a reference to a jar, which
> causes the issue.
> >>>>
> >>>> Here's a gist with a subset of the build log [1]. When it works fine,
> it's copying a jar, when it's not, it's copying an empty folder.
> >>>>
> >>>> However, this is not only happening for the
> org.infinispan.server.rest module, others show the same issue [2]. What
> seems to be a pattern is that it only happens with modules that are built
> by us, it's not happening for modules coming with the base AS/WF instance.
> >>>>
> >>>> I've traced back and this might be due to build failures that are not
> producing the right jars [3].
> >>>>
> >>>> @Stuart, this is really our problem. Sorry for the inconvenience!
> >>>>
> >>>> [1] https://gist.github.com/galderz/b9286f385aad1316df51
> >>>> [2] https://gist.github.com/galderz/9e6a9bd9b18b805db323
> >>>> [3] https://gist.github.com/galderz/6ab662a1027cd96cbd8c
> >>>>
> >>>>>
> >>>>> Stuart
> >>>>>
> >>>>>
> >>>>> On Tue, Feb 4, 2014 at 11:14 AM, Galder Zamarre?o <galder at redhat.com>
> wrote:
> >>>>>
> >>>>> On 04 Feb 2014, at 10:01, Stuart Douglas <stuart.w.douglas at gmail.com>
> wrote:
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Feb 4, 2014 at 11:00 AM, Stuart Douglas <
> stuart.w.douglas at gmail.com> wrote:
> >>>>>> Yes, there is nothing in the server code that modified the modules
> directory.
> >>>>>>
> >>>>>> Well, except for the new patching stuff, but that is not really
> relevant here.
> >>>>>
> >>>>> The testsuite AS/WF builds are built out of the distribution build,
> which shows the same problem. The distribution we build uses the scripts we
> got from AS [1].
> >>>>>
> >>>>> Do you see anything in there that could be causing this? We are
> using maven-antrun-plugin version 1.3, and take into account the lib.xml in
> [2].
> >>>>>
> >>>>> Finally, do you have any suggestions on changes we could make to
> these files to further debug the issue?
> >>>>>
> >>>>> Thanks a lot for your help!
> >>>>>
> >>>>> [1]
> https://github.com/infinispan/infinispan/blob/master/server/integration/build/build.xml
> >>>>> [2]
> https://github.com/infinispan/infinispan/blob/master/server/integration/build/lib.xml
> >>>>>
> >>>>>>
> >>>>>> Stuart
> >>>>>>
> >>>>>>
> >>>>>> Stuart
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Feb 4, 2014 at 10:56 AM, Galder Zamarre?o <
> galder at redhat.com> wrote:
> >>>>>>
> >>>>>> On 04 Feb 2014, at 09:37, Stuart Douglas <
> stuart.w.douglas at gmail.com> wrote:
> >>>>>>
> >>>>>>> This looks like an issue with your environment. The modules
> directory is static. Wildfly does not contain any code that messes with it.
> I would say the culprit is probably something in either your build process
> or your test suite.
> >>>>>>
> >>>>>> Correction, this is happening with AS 7.2.0.Final (Wildfly 8 used
> somewhere else). I guess your answer still applies?
> >>>>>>
> >>>>>> Cheers,
> >>>>>>
> >>>>>>>
> >>>>>>> Stuart
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Feb 4, 2014 at 10:21 AM, Galder Zamarre?o <
> galder at redhat.com> wrote:
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> We're having issues with our Infinispan Server integration tests,
> which run within Wildfly 8.0.0.Beta1 (as I'm typing I'm wondering if we
> should just upgrade it to see if this goes away...?).
> >>>>>>>
> >>>>>>> Quite often some of the runs fail with error message [1].
> >>>>>>>
> >>>>>>> Having looked at the build environment when a run fails, you see
> this:
> >>>>>>>
> >>>>>>> --
> >>>>>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main
> >>>>>>> drwxrwxr-x  2 g  staff    68B Feb  3 18:41 classes (<-- a
> directory??)
> >>>>>>> -rw-r--r--  1 g  staff     1B Feb  3 18:41 classes.index
> >>>>>>> -rw-r--r--  1 g  staff   2.1K Feb  3 18:41 module.xml
> >>>>>>>
> >>>>>>> $ ls
> modules/system/layers/base/org/infinispan/server/rest/main/classes
> >>>>>>> drwxrwxr-x  2 g  staff    68B Feb  3 18:41 .
> >>>>>>> drwxrwxr-x  5 g  staff   170B Feb  3 18:41 ..
> >>>>>>>
> >>>>>>> $ more
> modules/system/layers/base/org/infinispan/server/rest/main/module.xml
> >>>>>>> <module xmlns="urn:jboss:module:1.1"
> name="org.infinispan.server.rest">
> >>>>>>> ...
> >>>>>>> <resource-root path="classes"/>
> >>>>>>> ...
> >>>>>>>
> >>>>>>> This is completely different to what happens with a successful run:
> >>>>>>>
> >>>>>>> --
> >>>>>>> $ ls modules/system/layers/base/org/infinispan/server/rest/main
> >>>>>>> -rw-r--r--  1 g  staff   103K Feb  3 19:40 infinispan-classes.jar
> (<-- a jar file!)
> >>>>>>> -rw-r--r--  1 g  staff   278B Feb  3 19:40
> infinispan-classes.jar.index
> >>>>>>> -rw-r--r--  1 g  staff   2.1K Feb  3 19:40 module.xml
> >>>>>>>
> >>>>>>> $ jar tf
> modules/system/layers/base/org/infinispan/server/rest/main/infinispan-classes.jar
> | grep ExtendedHeaders
> >>>>>>> org/infinispan/rest/configuration/ExtendedHeaders.class
> >>>>>>>
> >>>>>>> $ more
> modules/system/layers/base/org/infinispan/server/rest/main/module.xml
> >>>>>>> <module xmlns="urn:jboss:module:1.1"
> name="org.infinispan.server.rest">
> >>>>>>> ...
> >>>>>>> <resource-root path="infinispan-classes.jar"/>
> >>>>>>> --
> >>>>>>>
> >>>>>>> Anyone can explain what is going on here? Does it ring a bell to
> anyone? Is this a known Wildfly issue by any chance?
> >>>>>>>
> >>>>>>> [1] https://gist.github.com/galderz/bd74cebfc840ef3ae284
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> --
> >>>>>>> Galder Zamarre?o
> >>>>>>> galder at redhat.com
> >>>>>>> twitter.com/galderz
> >>>>>>>
> >>>>>>> Project Lead, Escalante
> >>>>>>> http://escalante.io
> >>>>>>>
> >>>>>>> Engineer, Infinispan
> >>>>>>> http://infinispan.org
> >>>>>>>
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> jboss-as7-dev mailing list
> >>>>>>> jboss-as7-dev at lists.jboss.org
> >>>>>>> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Galder Zamarre?o
> >>>>>> galder at redhat.com
> >>>>>> twitter.com/galderz
> >>>>>>
> >>>>>> Project Lead, Escalante
> >>>>>> http://escalante.io
> >>>>>>
> >>>>>> Engineer, Infinispan
> >>>>>> http://infinispan.org
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Galder Zamarre?o
> >>>>> galder at redhat.com
> >>>>> twitter.com/galderz
> >>>>>
> >>>>> Project Lead, Escalante
> >>>>> http://escalante.io
> >>>>>
> >>>>> Engineer, Infinispan
> >>>>> http://infinispan.org
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Galder Zamarre?o
> >>>> galder at redhat.com
> >>>> twitter.com/galderz
> >>>>
> >>>> Project Lead, Escalante
> >>>> http://escalante.io
> >>>>
> >>>> Engineer, Infinispan
> >>>> http://infinispan.org
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> infinispan-dev mailing list
> >>>> infinispan-dev at lists.jboss.org
> >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>>
> >>>
> >>> --
> >>> Galder Zamarre?o
> >>> galder at redhat.com
> >>> twitter.com/galderz
> >>>
> >>> Project Lead, Escalante
> >>> http://escalante.io
> >>>
> >>> Engineer, Infinispan
> >>> http://infinispan.org
> >>>
> >>>
> >>> _______________________________________________
> >>> infinispan-dev mailing list
> >>> infinispan-dev at lists.jboss.org
> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>>
> >>> _______________________________________________
> >>> infinispan-dev mailing list
> >>> infinispan-dev at lists.jboss.org
> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>
> >>
> >> --
> >> Galder Zamarre?o
> >> galder at redhat.com
> >> twitter.com/galderz
> >>
> >> Project Lead, Escalante
> >> http://escalante.io
> >>
> >> Engineer, Infinispan
> >> http://infinispan.org
> >>
> >>
> >> _______________________________________________
> >> infinispan-dev mailing list
> >> infinispan-dev at lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>
> >> _______________________________________________
> >> infinispan-dev mailing list
> >> infinispan-dev at lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> >
> > --
> > Galder Zamarre?o
> > galder at redhat.com
> > twitter.com/galderz
> >
> > Project Lead, Escalante
> > http://escalante.io
> >
> > Engineer, Infinispan
> > http://infinispan.org
> >
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> --
> Galder Zamarre?o
> galder at redhat.com
> twitter.com/galderz
>
> Project Lead, Escalante
> http://escalante.io
>
> Engineer, Infinispan
> http://infinispan.org
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140204/2166b5b2/attachment-0001.html 

From mmarkus at redhat.com  Wed Feb  5 07:42:34 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Wed, 5 Feb 2014 12:42:34 +0000
Subject: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing
	leaving empty classes/ folders and errors
In-Reply-To: <CA+nfvwRWQEK1cNw3bdEGpwWHZEjW2vtBe=mzEeTg9-w61iKPWA@mail.gmail.com>
References: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com>
	<CAAoo=c7FF1Kkqih=Rdt4vfF5f2_xkBMFyh_Vvs+NkHw+Bdz5Xw@mail.gmail.com>
	<D34293D4-A648-4537-B93E-CF87544A8C2C@redhat.com>
	<CAAoo=c69Hk2UmTAQiy4VA4g8iT_qA=hHz4kk-RNZDA+zPA_H3w@mail.gmail.com>
	<CAAoo=c4U6p4QTXj82LdUxiN6izuW05Z+YKy8Q6MjTkUiRFJ7DA@mail.gmail.com>
	<A500D8EC-751F-44A3-BB3E-659D88734B39@redhat.com>
	<CAAoo=c6TKRnCFsMDHXpZms_BLXy996z6=AShCZYhq-FEuvsf9g@mail.gmail.com>
	<05187A01-9E89-41D7-9E85-0FE4AC9DACCA@redhat.com>
	<D8BC1EB0-001F-4DD3-A701-B1A5BC6575F7@redhat.com>
	<CA+nfvwRWQEK1cNw3bdEGpwWHZEjW2vtBe=mzEeTg9-w61iKPWA@mail.gmail.com>
Message-ID: <29846662-5D58-474C-9492-A38355BE9D02@redhat.com>


On Feb 4, 2014, at 12:50 PM, Dan Berindei <dan.berindei at gmail.com> wrote:

> 
> On Tue, Feb 4, 2014 at 2:36 PM, Galder Zamarre?o <galder at redhat.com> wrote:
> Narrowing down the list now, since this is a problem of how our CI is doing builds.
> 
> These logs are retrieved from [1].
> 
> Dunno how our CI is configured but this is odd. Seems like the build is halt due to test failures, but it continues somehow? I mean, the jars are not being produced properly, but the build is not halting.
> 
> We run the build with -fn (fail-never), so the build should never be halted because of a test failure. The configuration is here:http://ci.infinispan.org/admin/editRunType.html?id=buildType:bt8&runnerId=RUNNER_1
> 
>  
> 
> It?s about time we did the following:
> 1) Any test failures should halt the build there and then. IOW, do not continue the build at all.
> 
> Will having 100 tests in one run and 2000 tests in another really help?
> 
>  
> 2) Any tests that fail randomly should be disabled.

Doing this in past didn't seem to help: tests were disabled and never re-enabled again. IMO we should fight to get the suite green and then any intermittent failure should be considered a blocker and treated as the highest prio.

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From mmarkus at redhat.com  Wed Feb  5 07:43:55 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Wed, 5 Feb 2014 12:43:55 +0000
Subject: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing
	leaving empty classes/ folders and errors
In-Reply-To: <7D6D9D4B-8023-4A04-B946-8CB90640319F@redhat.com>
References: <69786B4D-C6DC-49AD-B3ED-2BDED1E7AE28@redhat.com>
	<CAAoo=c7FF1Kkqih=Rdt4vfF5f2_xkBMFyh_Vvs+NkHw+Bdz5Xw@mail.gmail.com>
	<D34293D4-A648-4537-B93E-CF87544A8C2C@redhat.com>
	<CAAoo=c69Hk2UmTAQiy4VA4g8iT_qA=hHz4kk-RNZDA+zPA_H3w@mail.gmail.com>
	<CAAoo=c4U6p4QTXj82LdUxiN6izuW05Z+YKy8Q6MjTkUiRFJ7DA@mail.gmail.com>
	<A500D8EC-751F-44A3-BB3E-659D88734B39@redhat.com>
	<CAAoo=c6TKRnCFsMDHXpZms_BLXy996z6=AShCZYhq-FEuvsf9g@mail.gmail.com>
	<05187A01-9E89-41D7-9E85-0FE4AC9DACCA@redhat.com>
	<D8BC1EB0-001F-4DD3-A701-B1A5BC6575F7@redhat.com>
	<7D6D9D4B-8023-4A04-B946-8CB90640319F@redhat.com>
Message-ID: <F8521368-1CF0-41FF-B06D-14FF5544F542@redhat.com>


On Feb 4, 2014, at 12:52 PM, Galder Zamarre?o <galder at redhat.com> wrote:

> 
> On 04 Feb 2014, at 13:36, Galder Zamarre?o <galder at redhat.com> wrote:
> 
>> Narrowing down the list now, since this is a problem of how our CI is doing builds.
>> 
>> These logs are retrieved from [1].
>> 
>> Dunno how our CI is configured but this is odd. Seems like the build is halt due to test failures, but it continues somehow? I mean, the jars are not being produced properly, but the build is not halting.
>> 
>> It?s about time we did the following:
>> 1) Any test failures should halt the build there and then. IOW, do not continue the build at all. 
>> 2) Any tests that fail randomly should be disabled.
> 
> Having had to debug throw this, I can certainly understand Sanne?s frustration, and as server component lead, I?m not going to bother looking at any CI builds until all modules that server modules depend on are green and their testsuites are passing.

+1

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From mmarkus at redhat.com  Wed Feb  5 07:44:50 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Wed, 5 Feb 2014 12:44:50 +0000
Subject: [infinispan-dev] L1OnRehash Discussion
In-Reply-To: <CA+nfvwT+BO+PBfVMByj+46KwALX9Ex78VMrZF2g05z8Bvx3dPQ@mail.gmail.com>
References: <CA+YCuUN-s4LubgNBEyP7ASh6msWT9ngzow_ZZ-121mXuesitWQ@mail.gmail.com>
	<5853A076-0FEF-4782-A4A3-6C84999DCA2D@redhat.com>
	<CA+nfvwT+BO+PBfVMByj+46KwALX9Ex78VMrZF2g05z8Bvx3dPQ@mail.gmail.com>
Message-ID: <9C3EB525-3C12-45B1-B278-702B544BABDF@redhat.com>


On Feb 4, 2014, at 11:04 AM, Dan Berindei <dan.berindei at gmail.com> wrote:

> On Tue, Feb 4, 2014 at 10:07 AM, Galder Zamarre?o <galder at redhat.com> wrote:
> 
> On 28 Jan 2014, at 15:29, William Burns <mudokonman at gmail.com> wrote:
> 
> > Hello everyone,
> >
> > I wanted to discuss what I would say as dubious benefit of L1OnRehash
> > especially compared to the benefits it provide.
> >
> > L1OnRehash is used to retain a value by moving a previously owned
> > value into the L1 when a rehash occurs and this node no longer owns
> > that value  Also any current L1 values are removed when a rehash
> > occurs.  Therefore it can only save a single remote get for only a few
> > keys when a rehash occurs.
> >
> > This by itself is fine however L1OnRehash has many edge cases to
> > guarantee consistency as can be seen from
> > https://issues.jboss.org/browse/ISPN-3838.  This can get quite
> > complicated for a feature that gives marginal performance increases
> > (especially given that this value may never have been read recently -
> > at least normal L1 usage guarantees this).
> >
> > My first suggestion is instead to deprecate the L1OnRehash
> > configuration option and to remove this logic.
> 
> +1
> 
> +1 from me as well

+1

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From mmarkus at redhat.com  Wed Feb  5 07:55:20 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Wed, 5 Feb 2014 12:55:20 +0000
Subject: [infinispan-dev] Store as binary
In-Reply-To: <A461759C-57A9-486D-97D8-9E33847E9A1E@redhat.com>
References: <52D92AC4.7080701@redhat.com>
	<C0E65F3E-03F9-4E9D-9642-C6B3EAB506BC@redhat.com>
	<52DCF101.3020903@infinispan.org>
	<87020416-72D3-412E-818B-A7F9161355CC@redhat.com>
	<52DCF70C.4090404@infinispan.org> <52DD4534.7080209@redhat.com>
	<CAFm4XO2qkdDTeEnp3yvkTnsO-U6w5xzUFq38yPnj+cvPD8JqVQ@mail.gmail.com>
	<AE016285-BB80-4BFA-82BE-ACEB4570D733@redhat.com>
	<F56699FB-6A82-4A92-8F66-29DE2096CF04@redhat.com>
	<CAFm4XO12OFcnbsY--J2wKV7prs8JHULa_3ksXqPC=Bb5cKryeQ@mail.gmail.com>
	<68B26C2A-389B-4C0A-A3C6-DBE3B0526DAC@redhat.com>
	<A461759C-57A9-486D-97D8-9E33847E9A1E@redhat.com>
Message-ID: <C1B9B298-ACF3-4CD4-A6F0-68AA44698215@redhat.com>


On Feb 4, 2014, at 7:14 AM, Galder Zamarre?o <galder at redhat.com> wrote:

> On 21 Jan 2014, at 17:45, Mircea Markus <mmarkus at redhat.com> wrote:
> 
>> 
>> On Jan 21, 2014, at 2:13 PM, Sanne Grinovero <sanne at infinispan.org> wrote:
>> 
>>> On 21 January 2014 13:37, Mircea Markus <mmarkus at redhat.com> wrote:
>>>> 
>>>> On Jan 21, 2014, at 1:21 PM, Galder Zamarre?o <galder at redhat.com> wrote:
>>>> 
>>>>>> What's the point for these tests?
>>>>> 
>>>>> +1
>>>> 
>>>> To validate if storing the data in binary format yields better performance than store is as a POJO.
>>> 
>>> That will highly depend on the scenarios you want to test for. AFAIK
>>> this started after Paul described how session replication works in
>>> WildFly, and we already know that both strategies are suboptimal with
>>> the current options available: in his case the active node will always
>>> write on the POJO, while the backup node will essentially only need to
>>> store the buffer "just in case" he might need to take over.
>> 
>> Indeed as it is today, it doesn't make sense for WildFly's session replication.
>> 
>>> 
>>> Sure, one will be slower, but if you want to make a suggestion to him
>>> about which configuration he should be using, we should measure his
>>> use case, not a different one.
>>> 
>>> Even then as discussed in Palma, an in memory String representation
>>> might be way more compact because of pooling of strings and a very
>>> high likelihood for repeated headers (as common in web frameworks),
>> 
>> pooling like in String.intern()? 
>> Even so, if most of your access to the String is to serialize it and sent is remotely then you have a serialization cost(CPU) to pay for the reduced size.
> 
> Serialization has a cost, but nothing compared with the transport itself, and you don?t have to go very far to see the impact of transport. Just recently we were chasing some performance regression and even though there were some changes in serialization, the impact of my improvements was minimal, max 2-3%. Optimal network and transport configuration is more important IMO, and once again, misconfiguration in that layer is what was causing us to be ~20% slower.

yes, I din't expect huge improvements from storeAsBinary, but at least some improvement caused by the fact that lots of serialization should't happen in the tested scenario. 2-3% improvement wouldn't hurt, though :-)

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From mudokonman at gmail.com  Wed Feb  5 08:19:03 2014
From: mudokonman at gmail.com (William Burns)
Date: Wed, 5 Feb 2014 08:19:03 -0500
Subject: [infinispan-dev] L1OnRehash Discussion
In-Reply-To: <CA+nfvwT+BO+PBfVMByj+46KwALX9Ex78VMrZF2g05z8Bvx3dPQ@mail.gmail.com>
References: <CA+YCuUN-s4LubgNBEyP7ASh6msWT9ngzow_ZZ-121mXuesitWQ@mail.gmail.com>
	<5853A076-0FEF-4782-A4A3-6C84999DCA2D@redhat.com>
	<CA+nfvwT+BO+PBfVMByj+46KwALX9Ex78VMrZF2g05z8Bvx3dPQ@mail.gmail.com>
Message-ID: <CA+YCuUPNwuhS+RHZi6oheDYORDApSj946-axA2uJ_qNgG9HV7Q@mail.gmail.com>

On Tue, Feb 4, 2014 at 6:04 AM, Dan Berindei <dan.berindei at gmail.com> wrote:
>
>
>
> On Tue, Feb 4, 2014 at 10:07 AM, Galder Zamarre?o <galder at redhat.com> wrote:
>>
>>
>> On 28 Jan 2014, at 15:29, William Burns <mudokonman at gmail.com> wrote:
>>
>> > Hello everyone,
>> >
>> > I wanted to discuss what I would say as dubious benefit of L1OnRehash
>> > especially compared to the benefits it provide.
>> >
>> > L1OnRehash is used to retain a value by moving a previously owned
>> > value into the L1 when a rehash occurs and this node no longer owns
>> > that value  Also any current L1 values are removed when a rehash
>> > occurs.  Therefore it can only save a single remote get for only a few
>> > keys when a rehash occurs.
>> >
>> > This by itself is fine however L1OnRehash has many edge cases to
>> > guarantee consistency as can be seen from
>> > https://issues.jboss.org/browse/ISPN-3838.  This can get quite
>> > complicated for a feature that gives marginal performance increases
>> > (especially given that this value may never have been read recently -
>> > at least normal L1 usage guarantees this).
>> >
>> > My first suggestion is instead to deprecate the L1OnRehash
>> > configuration option and to remove this logic.
>>
>> +1
>
>
> +1 from me as well
>
>>
>>
>> > My second suggestion is a new implementation of L1OnRehash that is
>> > always enabled when L1 threshold is configured to 0.  For those not
>> > familiar L1 threshold controls whether invalidations are broadcasted
>> > instead of individual messages.  A value of 0 means to always
>> > broadcast.  This would allow for some benefits that we can't currently
>> > do:
>> >
>> > 1. L1 values would never have to be invalidated on a rehash event
>> > (guarantee locality reads under rehash)
>> > 2. L1 requestors would not have to be tracked any longer
>> >
>> > However every write would be required to send an invalidation which
>> > could slow write performance in additional cases (since we currently
>> > only send invalidations when requestors are found).  The difference
>> > would be lessened with udp, which is the transport I would assume
>> > someone would use when configuring L1 threshold to 0.
>>
>> Sounds good to me, but I think you could go even beyond this and maybe get
>> rid of threshold configuration option too?
>>
>> If the transport is UDP and multicast is configured, invalidations are
>> broadcasted (and apply the two benefits you mention).
>> If UDP w/ unicast or TCP used, track invalidations and send them as
>> unicasts.
>>
>> Do we really need to expose these configuration options to the user?
>
>
> I think the idea was that even with UDP, sending 2 unicasts and waiting for
> only 2 responses may be faster than sending a multicast and waiting for 10
> responses. However, I'm not sure that's the case if we send 1 unicast
> invalidation from each owner instead of a single multicast invalidation from
> the primary owner/originator [1]. Maybe if each owner would return a list of
> requestors and the originator would do the invalidation at the end...

I totally agree since we currently have to send invalidations from the
primary owner and all backup owners to guarantee consistency if we
have a response from the backup owner [2].  By moving to this route we
only ever have to send a single multicast invalidation instead of N
unicast invalidations.  However this also brings up another change
where we only L1 cache the primary owner response [3] :) Actually that
would tilt the performance discussion the other way.  Makes me think
deprecating current L1OnRehash and adding primary owner L1 caching
should be first and then reevaluate if the new L1OnRehash support is
even needed.

The originator firing the invalidations is interesting, but don't
think it is feasible.  With async transport this is not doable at all.
 Also if the originator goes down and the value is persisted we will
have invalid L1 values cached still.  The latter could be fixed with
txs but non tx would still be broken.

>
> One tangible benefit of having the setting is that we can run the test suite
> with TCP only, and still cover every path in L1Manager. If removed it
> completely, it would still be possible to change the toggle in L1ManagerImpl
> via reflection, but it would be a little hacky.
>
>>
>> > What do you guys think?  I am thinking that no one minds the removal
>> > of L1OnRehash that we have currently (if so let me know).  I am quite
>> > curious what others think about the changes for L1 threshold value of
>> > 0, maybe this configuration value is never used?
>> >
>
>
> Since we don't give any guidance as to what a good threshold value would be,
> I doubt many people use it.
>
> My alternative proposal would be to replace the
> invalidationThreshold=-1|0|>0 setting with a traceRequestors=true|false
> setting.
> 1. If traceRequestors == false, don't keep track of requestors, only send
> the invalidation from the originator, and enable l1OnRehash.
>     This means we can keep the entries that are in L1 after a rehash as
> well.
> 2. If traceRequestors == true, track requestors, send unicast/multicast
> invalidations depending on the transport, and disable l1OnRehash.

I have to admit I am struggling with whether we even need this
configuration option anymore and just solely enable requestors based
on the transport configuration.  I do like the option though,
especially if we find out not tracking requestors is faster.  The
default value though would be based on whether the transport allows
for multicast or not.

>
>
> [1] https://issues.jboss.org/browse/ISPN-186

[2] https://issues.jboss.org/browse/ISPN-3648
[3] https://issues.jboss.org/browse/ISPN-3684

>
> Cheers
> Dan
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev.


From mmarkus at redhat.com  Wed Feb  5 09:15:41 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Wed, 5 Feb 2014 14:15:41 +0000
Subject: [infinispan-dev] reusing infinispan's marshalling
In-Reply-To: <52EA41E0.2010505@redhat.com>
References: <52EA41E0.2010505@redhat.com>
Message-ID: <796469E7-CB0C-4E96-97BC-81D74D48D51E@redhat.com>

One way to do it is use a  distributed cache with two different marshallers: JBMAR and protostream. Admittedly this won't measure only the serialisation performance, but include other stuff as well, such as network time (I guess you can remove this from the result though). 
This way we would get a better understanding on how the two marshaller affects performance of the system as a whole. Also if using radargun, you could get more info around how much CPU time is used by each scenario.


On Jan 30, 2014, at 12:13 PM, Adrian Nistor <anistor at redhat.com> wrote:

> I've been pondering about re-using the marshalling machinery of 
> Infinispan in another project, specifically in ProtoStream, where I'm 
> planning to add it as a test scoped dependency so I can create a 
> benchmark  to compare marshalling performace. I'm basically interested 
> in comparing ProtoStream and Infinispan's JBoss Marshalling based 
> mechanism. Comparing against plain JBMAR, without using the 
> ExternalizerTable and Externalizers introduced by Infinispan is not 
> going to get me accurate results.

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From mmarkus at redhat.com  Wed Feb  5 09:28:00 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Wed, 5 Feb 2014 14:28:00 +0000
Subject: [infinispan-dev] reusing infinispan's marshalling
In-Reply-To: <1824630C-1D48-480A-8687-E563A54E7E6A@redhat.com>
References: <52EA41E0.2010505@redhat.com>
	<1824630C-1D48-480A-8687-E563A54E7E6A@redhat.com>
Message-ID: <91EB93BD-0133-44BF-AB64-F57948621CBC@redhat.com>


On Feb 3, 2014, at 6:24 PM, Galder Zamarre?o <galder at redhat.com> wrote:

> Not sure I understand the need to compare this.
> 
> JBMAR and ProtoStream are solving different problems. The former is focused on getting the best out of Java persistence. The latter is focused on serializing stuff in a plattform independent way.
> 
> IMO, it?s not an apples to apples comparison.

AFAIK the only thing JBMAR does and proto doesn't is tracking circular references: e.g. person has a reference to address which has a reference to the same person instance. That comes at a performance cost (I guess an IdentityMapLookup per serialized object), though and for many users tracking circular dependencies is not needed, because of their data model. My expectation is that ISPN+protostram will be faster than ISPN+JBMAR because:
- protostream doesn't track circular references (AFAIK this is something that can be disabled in JBMAR as well)
- protostream allows for partial deserialization, that is only deserialize a specific attribute of a class

On top of that, it is platform independent, so if you start using it as the default serialization format, it will be easier for you to use ISPN from multiple platforms.

The drawback protostream has over JBMAR is that it requires one to define, besides the serialized, a protofile. Las time we discussed, Adrian had some ideas on how that can be circumvented, though.

IMO, in certain deployments makes sense to use protostream over JBMAR even when serializing only java objects and this benchmark would be a good tool to validate that.

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From mmarkus at redhat.com  Wed Feb  5 09:38:38 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Wed, 5 Feb 2014 14:38:38 +0000
Subject: [infinispan-dev] New Cache Entry Notifications
In-Reply-To: <AB79B768-654C-42BA-BB44-BB6BBE24E634@redhat.com>
References: <CA+YCuUNm6pXT4MYBOCmDMP1oKHxuueBKD5wdz0L_-MPasYV_Xg@mail.gmail.com>
	<1FAA84A0-3AFE-4DA7-980F-AF1FB5725F5A@redhat.com>
	<AB79B768-654C-42BA-BB44-BB6BBE24E634@redhat.com>
Message-ID: <263229C1-72AC-4DDC-B924-B46D26FA1C6C@redhat.com>


On Feb 3, 2014, at 4:07 PM, Galder Zamarre?o <galder at redhat.com> wrote:

>> 
>> On Jan 23, 2014, at 5:48 PM, William Burns <mudokonman at gmail.com> wrote:
>> 
>>> Hello all,
>>> 
>>> I have been working with notifications and most recently I have come
>>> to look into events generated when a new entry is created.  Now
>>> normally I would just expect a CacheEntryCreatedEvent to be raised.
>>> However we currently raise a CacheEntryModifiedEvent event and then a
>>> CacheEntryCreatedEvent.  I notice that there are comments around the
>>> code saying that tests require both to be fired.
>> 
>> it doesn't sound right to me: modified is different than created.
> 
> I?ve lost count the number of times I?ve raised this up in the dev mailing list :|
> 
> And, if CacheEntryModifiedEvent has a method called isCreated(), is cos I added it in order to differentiate between a modify and a create without breaking backwards compatibility and expectations of events to be received. Just need to trace back the jira issue number, and associated forum threads ;) :p

Sorry for missing this till now :-) If it was raised that frequently through, must be because people are confused about it. As we're having a major release next, I think we should get this right, even if breaking backward compatibility, and document it as such in the migration guide.

> 
>> 
>>> 
>>> I am wondering if anyone has an objection to only raising a
>>> CacheEntryCreatedEvent on a new cache entry being created.  
> 
> It?d break expectations of existing applications that expect certain events. It?s a very difficult one to swallow.

we're at a major now, so we should break compatibility if it makes sense.

> 
> Plus, there?s JCache specifications, which adds listeners, and gets them right. Eventually everyone should move towards that.

Not sure I understand: JCache raises both an "created" and a "modified" event when an entry is created? or just "created" events? 

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From galder at redhat.com  Wed Feb  5 09:40:31 2014
From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=)
Date: Wed, 5 Feb 2014 15:40:31 +0100
Subject: [infinispan-dev] New Cache Entry Notifications
In-Reply-To: <CA+YCuUOzxgMYH5_Hr6RTciNAvjVDHGWbZzGQZrQv+dA6SDKcaA@mail.gmail.com>
References: <CA+YCuUNm6pXT4MYBOCmDMP1oKHxuueBKD5wdz0L_-MPasYV_Xg@mail.gmail.com>
	<1FAA84A0-3AFE-4DA7-980F-AF1FB5725F5A@redhat.com>
	<AB79B768-654C-42BA-BB44-BB6BBE24E634@redhat.com>
	<CA+YCuUOzxgMYH5_Hr6RTciNAvjVDHGWbZzGQZrQv+dA6SDKcaA@mail.gmail.com>
Message-ID: <9289487C-EF37-4A9B-9491-69787F32AC4D@redhat.com>


On 03 Feb 2014, at 17:29, William Burns <mudokonman at gmail.com> wrote:

> On Mon, Feb 3, 2014 at 11:07 AM, Galder Zamarre?o <galder at redhat.com> wrote:
>> 
>> On 23 Jan 2014, at 18:54, Mircea Markus <mmarkus at redhat.com> wrote:
>> 
>>> 
>>> On Jan 23, 2014, at 5:48 PM, William Burns <mudokonman at gmail.com> wrote:
>>> 
>>>> Hello all,
>>>> 
>>>> I have been working with notifications and most recently I have come
>>>> to look into events generated when a new entry is created.  Now
>>>> normally I would just expect a CacheEntryCreatedEvent to be raised.
>>>> However we currently raise a CacheEntryModifiedEvent event and then a
>>>> CacheEntryCreatedEvent.  I notice that there are comments around the
>>>> code saying that tests require both to be fired.
>>> 
>>> it doesn't sound right to me: modified is different than created.
>> 
>> I've lost count the number of times I've raised this up in the dev mailing list :|
>> 
>> And, if CacheEntryModifiedEvent has a method called isCreated(), is cos I added it in order to differentiate between a modify and a create without breaking backwards compatibility and expectations of events to be received. Just need to trace back the jira issue number, and associated forum threads ;) :p
> 
> Ah nice I didn't even notice the method until you pointed it out.
> 
>> 
>>> 
>>>> 
>>>> I am wondering if anyone has an objection to only raising a
>>>> CacheEntryCreatedEvent on a new cache entry being created.
>> 
>> It'd break expectations of existing applications that expect certain events. It's a very difficult one to swallow.
> 
> I agree. Maybe I should change to if anyone minds if Cluster Listeners
> only raise the CacheEntryModifiedEvent on an entry creation for
> cluster listeners instead?  This wouldn't break existing assumptions
> since we don't currently support Cluster Listeners.  The only thing is
> it wouldn't be consistent with regular listeners?

Yeah, it?s a tricky one. You don?t wanna raise both cos that?d be expensive to ship it around for no extra gain. If you are going to choose one that?d be CacheEntryModifiedEvent indeed. I think we can break off here for clustered listeners specifying it clearly. I don?t think there?s much point in creating a new set of listeners/event/annotations for the clustered option since eventually we should move towards JCache listeners and only have custom ones for the extra stuff we provide callbacks for.

> 
> 
>> 
>> Plus, there's JCache specifications, which adds listeners, and gets them right. Eventually everyone should move towards that.
> 
> Just to be clear you are saying the JCache only raises a single event
> for change and create right?

Yeah, see JCacheListenerAdapter class.

> 
>> 
>>>> Does
>>>> anyone know why we raise both currently?
>> 
>> Legacy really.
>> 
>>>> Was it just so the
>>>> PutKeyValueCommand could more ignorantly just raise the
>>>> CacheEntryModified pre Event?
>>>> 
>>>> Any input would be appreciated, Thanks.
>>> 
>>> Cheers,
>>> --
>>> Mircea Markus
>>> Infinispan lead (www.infinispan.org)
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> 
>> --
>> Galder Zamarre?o
>> galder at redhat.com
>> twitter.com/galderz
>> 
>> Project Lead, Escalante
>> http://escalante.io
>> 
>> Engineer, Infinispan
>> http://infinispan.org
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From sanne at infinispan.org  Wed Feb  5 09:43:34 2014
From: sanne at infinispan.org (Sanne Grinovero)
Date: Wed, 5 Feb 2014 14:43:34 +0000
Subject: [infinispan-dev] L1OnRehash Discussion
In-Reply-To: <CA+YCuUPNwuhS+RHZi6oheDYORDApSj946-axA2uJ_qNgG9HV7Q@mail.gmail.com>
References: <CA+YCuUN-s4LubgNBEyP7ASh6msWT9ngzow_ZZ-121mXuesitWQ@mail.gmail.com>
	<5853A076-0FEF-4782-A4A3-6C84999DCA2D@redhat.com>
	<CA+nfvwT+BO+PBfVMByj+46KwALX9Ex78VMrZF2g05z8Bvx3dPQ@mail.gmail.com>
	<CA+YCuUPNwuhS+RHZi6oheDYORDApSj946-axA2uJ_qNgG9HV7Q@mail.gmail.com>
Message-ID: <CAFm4XO2XpMZnSt4tGRd=y897-Utbbkts-q4EgvRWYNp4CkhAKg@mail.gmail.com>

I'm all for simplification, assuming that this will deliver better
reliability and easier maintenance, but let's not forget that some
entries might be actually large.

Saving a couple of transfers might be a pointless complexity for our
usual small-key tests but maybe it's an interesting feature when you
store gigabytes per value.
Also, performance "hiccups" are not desirable even in small-key
scenarios: an often read key should stay where it is rather than
needing an occasional RPC.

I haven't looked into the details of your problem, so if you think
it's too complex I'm not against ditching this, I'm just trying to
make sure we evaluate the full picture.

I think you made a great point when specifying that the entry
remaining in place might actually not get any hit - so being pointless
- but that should be a decision the eviction strategy should be able
to handle?

Cheers,
Sanne


On 5 February 2014 13:19, William Burns <mudokonman at gmail.com> wrote:
> On Tue, Feb 4, 2014 at 6:04 AM, Dan Berindei <dan.berindei at gmail.com> wrote:
>>
>>
>>
>> On Tue, Feb 4, 2014 at 10:07 AM, Galder Zamarre?o <galder at redhat.com> wrote:
>>>
>>>
>>> On 28 Jan 2014, at 15:29, William Burns <mudokonman at gmail.com> wrote:
>>>
>>> > Hello everyone,
>>> >
>>> > I wanted to discuss what I would say as dubious benefit of L1OnRehash
>>> > especially compared to the benefits it provide.
>>> >
>>> > L1OnRehash is used to retain a value by moving a previously owned
>>> > value into the L1 when a rehash occurs and this node no longer owns
>>> > that value  Also any current L1 values are removed when a rehash
>>> > occurs.  Therefore it can only save a single remote get for only a few
>>> > keys when a rehash occurs.
>>> >
>>> > This by itself is fine however L1OnRehash has many edge cases to
>>> > guarantee consistency as can be seen from
>>> > https://issues.jboss.org/browse/ISPN-3838.  This can get quite
>>> > complicated for a feature that gives marginal performance increases
>>> > (especially given that this value may never have been read recently -
>>> > at least normal L1 usage guarantees this).
>>> >
>>> > My first suggestion is instead to deprecate the L1OnRehash
>>> > configuration option and to remove this logic.
>>>
>>> +1
>>
>>
>> +1 from me as well
>>
>>>
>>>
>>> > My second suggestion is a new implementation of L1OnRehash that is
>>> > always enabled when L1 threshold is configured to 0.  For those not
>>> > familiar L1 threshold controls whether invalidations are broadcasted
>>> > instead of individual messages.  A value of 0 means to always
>>> > broadcast.  This would allow for some benefits that we can't currently
>>> > do:
>>> >
>>> > 1. L1 values would never have to be invalidated on a rehash event
>>> > (guarantee locality reads under rehash)
>>> > 2. L1 requestors would not have to be tracked any longer
>>> >
>>> > However every write would be required to send an invalidation which
>>> > could slow write performance in additional cases (since we currently
>>> > only send invalidations when requestors are found).  The difference
>>> > would be lessened with udp, which is the transport I would assume
>>> > someone would use when configuring L1 threshold to 0.
>>>
>>> Sounds good to me, but I think you could go even beyond this and maybe get
>>> rid of threshold configuration option too?
>>>
>>> If the transport is UDP and multicast is configured, invalidations are
>>> broadcasted (and apply the two benefits you mention).
>>> If UDP w/ unicast or TCP used, track invalidations and send them as
>>> unicasts.
>>>
>>> Do we really need to expose these configuration options to the user?
>>
>>
>> I think the idea was that even with UDP, sending 2 unicasts and waiting for
>> only 2 responses may be faster than sending a multicast and waiting for 10
>> responses. However, I'm not sure that's the case if we send 1 unicast
>> invalidation from each owner instead of a single multicast invalidation from
>> the primary owner/originator [1]. Maybe if each owner would return a list of
>> requestors and the originator would do the invalidation at the end...
>
> I totally agree since we currently have to send invalidations from the
> primary owner and all backup owners to guarantee consistency if we
> have a response from the backup owner [2].  By moving to this route we
> only ever have to send a single multicast invalidation instead of N
> unicast invalidations.  However this also brings up another change
> where we only L1 cache the primary owner response [3] :) Actually that
> would tilt the performance discussion the other way.  Makes me think
> deprecating current L1OnRehash and adding primary owner L1 caching
> should be first and then reevaluate if the new L1OnRehash support is
> even needed.
>
> The originator firing the invalidations is interesting, but don't
> think it is feasible.  With async transport this is not doable at all.
>  Also if the originator goes down and the value is persisted we will
> have invalid L1 values cached still.  The latter could be fixed with
> txs but non tx would still be broken.
>
>>
>> One tangible benefit of having the setting is that we can run the test suite
>> with TCP only, and still cover every path in L1Manager. If removed it
>> completely, it would still be possible to change the toggle in L1ManagerImpl
>> via reflection, but it would be a little hacky.
>>
>>>
>>> > What do you guys think?  I am thinking that no one minds the removal
>>> > of L1OnRehash that we have currently (if so let me know).  I am quite
>>> > curious what others think about the changes for L1 threshold value of
>>> > 0, maybe this configuration value is never used?
>>> >
>>
>>
>> Since we don't give any guidance as to what a good threshold value would be,
>> I doubt many people use it.
>>
>> My alternative proposal would be to replace the
>> invalidationThreshold=-1|0|>0 setting with a traceRequestors=true|false
>> setting.
>> 1. If traceRequestors == false, don't keep track of requestors, only send
>> the invalidation from the originator, and enable l1OnRehash.
>>     This means we can keep the entries that are in L1 after a rehash as
>> well.
>> 2. If traceRequestors == true, track requestors, send unicast/multicast
>> invalidations depending on the transport, and disable l1OnRehash.
>
> I have to admit I am struggling with whether we even need this
> configuration option anymore and just solely enable requestors based
> on the transport configuration.  I do like the option though,
> especially if we find out not tracking requestors is faster.  The
> default value though would be based on whether the transport allows
> for multicast or not.
>
>>
>>
>> [1] https://issues.jboss.org/browse/ISPN-186
>
> [2] https://issues.jboss.org/browse/ISPN-3648
> [3] https://issues.jboss.org/browse/ISPN-3684
>
>>
>> Cheers
>> Dan
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev.
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


From galder at redhat.com  Wed Feb  5 10:03:50 2014
From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=)
Date: Wed, 5 Feb 2014 16:03:50 +0100
Subject: [infinispan-dev] New Cache Entry Notifications
In-Reply-To: <263229C1-72AC-4DDC-B924-B46D26FA1C6C@redhat.com>
References: <CA+YCuUNm6pXT4MYBOCmDMP1oKHxuueBKD5wdz0L_-MPasYV_Xg@mail.gmail.com>
	<1FAA84A0-3AFE-4DA7-980F-AF1FB5725F5A@redhat.com>
	<AB79B768-654C-42BA-BB44-BB6BBE24E634@redhat.com>
	<263229C1-72AC-4DDC-B924-B46D26FA1C6C@redhat.com>
Message-ID: <E3DEF236-BF3E-4190-92C8-998558D4D280@redhat.com>


On 05 Feb 2014, at 15:38, Mircea Markus <mmarkus at redhat.com> wrote:

> 
> On Feb 3, 2014, at 4:07 PM, Galder Zamarre?o <galder at redhat.com> wrote:
> 
>>> 
>>> On Jan 23, 2014, at 5:48 PM, William Burns <mudokonman at gmail.com> wrote:
>>> 
>>>> Hello all,
>>>> 
>>>> I have been working with notifications and most recently I have come
>>>> to look into events generated when a new entry is created.  Now
>>>> normally I would just expect a CacheEntryCreatedEvent to be raised.
>>>> However we currently raise a CacheEntryModifiedEvent event and then a
>>>> CacheEntryCreatedEvent.  I notice that there are comments around the
>>>> code saying that tests require both to be fired.
>>> 
>>> it doesn't sound right to me: modified is different than created.
>> 
>> I?ve lost count the number of times I?ve raised this up in the dev mailing list :|
>> 
>> And, if CacheEntryModifiedEvent has a method called isCreated(), is cos I added it in order to differentiate between a modify and a create without breaking backwards compatibility and expectations of events to be received. Just need to trace back the jira issue number, and associated forum threads ;) :p
> 
> Sorry for missing this till now :-) If it was raised that frequently through, must be because people are confused about it. As we're having a major release next, I think we should get this right, even if breaking backward compatibility, and document it as such in the migration guide.

-1. 

As already mentioned, the reason why we?ve never this tackled this problem is cos of JCache, which gets listeners right in this area. JCache is about to go final and people should start moving towards that. Redoing our listeners would be a waste of time IMO. You?d be doing some work to fix something people should stop using in near-medium future.

> 
>> 
>>> 
>>>> 
>>>> I am wondering if anyone has an objection to only raising a
>>>> CacheEntryCreatedEvent on a new cache entry being created.  
>> 
>> It?d break expectations of existing applications that expect certain events. It?s a very difficult one to swallow.
> 
> we're at a major now, so we should break compatibility if it makes sense.
> 
>> 
>> Plus, there?s JCache specifications, which adds listeners, and gets them right. Eventually everyone should move towards that.
> 
> Not sure I understand: JCache raises both an "created" and a "modified" event when an entry is created? or just "created" events? 
> 
> Cheers,
> -- 
> Mircea Markus
> Infinispan lead (www.infinispan.org)
> 
> 
> 
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From galder at redhat.com  Wed Feb  5 10:05:34 2014
From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=)
Date: Wed, 5 Feb 2014 16:05:34 +0100
Subject: [infinispan-dev] New Cache Entry Notifications
In-Reply-To: <263229C1-72AC-4DDC-B924-B46D26FA1C6C@redhat.com>
References: <CA+YCuUNm6pXT4MYBOCmDMP1oKHxuueBKD5wdz0L_-MPasYV_Xg@mail.gmail.com>
	<1FAA84A0-3AFE-4DA7-980F-AF1FB5725F5A@redhat.com>
	<AB79B768-654C-42BA-BB44-BB6BBE24E634@redhat.com>
	<263229C1-72AC-4DDC-B924-B46D26FA1C6C@redhat.com>
Message-ID: <B1A2FC29-A119-45EF-8330-A2D2E1724F29@redhat.com>


On 05 Feb 2014, at 15:38, Mircea Markus <mmarkus at redhat.com> wrote:

> 
> On Feb 3, 2014, at 4:07 PM, Galder Zamarre?o <galder at redhat.com> wrote:
> 
>>> 
>>> On Jan 23, 2014, at 5:48 PM, William Burns <mudokonman at gmail.com> wrote:
>>> 
>>>> Hello all,
>>>> 
>>>> I have been working with notifications and most recently I have come
>>>> to look into events generated when a new entry is created.  Now
>>>> normally I would just expect a CacheEntryCreatedEvent to be raised.
>>>> However we currently raise a CacheEntryModifiedEvent event and then a
>>>> CacheEntryCreatedEvent.  I notice that there are comments around the
>>>> code saying that tests require both to be fired.
>>> 
>>> it doesn't sound right to me: modified is different than created.
>> 
>> I?ve lost count the number of times I?ve raised this up in the dev mailing list :|
>> 
>> And, if CacheEntryModifiedEvent has a method called isCreated(), is cos I added it in order to differentiate between a modify and a create without breaking backwards compatibility and expectations of events to be received. Just need to trace back the jira issue number, and associated forum threads ;) :p
> 
> Sorry for missing this till now :-) If it was raised that frequently through, must be because people are confused about it. As we're having a major release next, I think we should get this right, even if breaking backward compatibility, and document it as such in the migration guide.
> 
>> 
>>> 
>>>> 
>>>> I am wondering if anyone has an objection to only raising a
>>>> CacheEntryCreatedEvent on a new cache entry being created.  
>> 
>> It?d break expectations of existing applications that expect certain events. It?s a very difficult one to swallow.
> 
> we're at a major now, so we should break compatibility if it makes sense.
> 
>> 
>> Plus, there?s JCache specifications, which adds listeners, and gets them right. Eventually everyone should move towards that.
> 
> Not sure I understand: JCache raises both an "created" and a "modified" event when an entry is created? or just "created" events? 

JCache differentiates between an entry being created vs being updated, and hence it sends different events depending of which case it is. See JCacheListenerAdapter and JCacheListenerNotifier classes in our JCache impl.

> 
> Cheers,
> -- 
> Mircea Markus
> Infinispan lead (www.infinispan.org)
> 
> 
> 
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From mmarkus at redhat.com  Wed Feb  5 10:40:41 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Wed, 5 Feb 2014 15:40:41 +0000
Subject: [infinispan-dev] New Cache Entry Notifications
In-Reply-To: <E3DEF236-BF3E-4190-92C8-998558D4D280@redhat.com>
References: <CA+YCuUNm6pXT4MYBOCmDMP1oKHxuueBKD5wdz0L_-MPasYV_Xg@mail.gmail.com>
	<1FAA84A0-3AFE-4DA7-980F-AF1FB5725F5A@redhat.com>
	<AB79B768-654C-42BA-BB44-BB6BBE24E634@redhat.com>
	<263229C1-72AC-4DDC-B924-B46D26FA1C6C@redhat.com>
	<E3DEF236-BF3E-4190-92C8-998558D4D280@redhat.com>
Message-ID: <49A1AD3E-A626-4FB7-A415-0398E3DC65D7@redhat.com>


On Feb 5, 2014, at 3:03 PM, Galder Zamarre?o <galder at redhat.com> wrote:

> On 05 Feb 2014, at 15:38, Mircea Markus <mmarkus at redhat.com> wrote:
> 
>> 
>> On Feb 3, 2014, at 4:07 PM, Galder Zamarre?o <galder at redhat.com> wrote:
>> 
>>>> 
>>>> On Jan 23, 2014, at 5:48 PM, William Burns <mudokonman at gmail.com> wrote:
>>>> 
>>>>> Hello all,
>>>>> 
>>>>> I have been working with notifications and most recently I have come
>>>>> to look into events generated when a new entry is created.  Now
>>>>> normally I would just expect a CacheEntryCreatedEvent to be raised.
>>>>> However we currently raise a CacheEntryModifiedEvent event and then a
>>>>> CacheEntryCreatedEvent.  I notice that there are comments around the
>>>>> code saying that tests require both to be fired.
>>>> 
>>>> it doesn't sound right to me: modified is different than created.
>>> 
>>> I?ve lost count the number of times I?ve raised this up in the dev mailing list :|
>>> 
>>> And, if CacheEntryModifiedEvent has a method called isCreated(), is cos I added it in order to differentiate between a modify and a create without breaking backwards compatibility and expectations of events to be received. Just need to trace back the jira issue number, and associated forum threads ;) :p
>> 
>> Sorry for missing this till now :-) If it was raised that frequently through, must be because people are confused about it. As we're having a major release next, I think we should get this right, even if breaking backward compatibility, and document it as such in the migration guide.
> 
> -1. 
> 
> As already mentioned, the reason why we?ve never this tackled this problem is cos of JCache, which gets listeners right in this area. JCache is about to go final and people should start moving towards that. Redoing our listeners would be a waste of time IMO.

The effort here is minimum, pretty much adding an if statement. The good thing though is that you won't have to raise this on the mailing list again :-)

> You?d be doing some work to fix something people should stop using in near-medium future.

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From mmarkus at redhat.com  Wed Feb  5 10:53:38 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Wed, 5 Feb 2014 15:53:38 +0000
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>
	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>
	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>
Message-ID: <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>


On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:

> Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed. 

Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-)

> 
> I have to admit that having to add a cache name to the stored elements of the index documents makes me a bit sad.

sad because of the increased index size? 

> I was already unhappy when I had to do it for class names. Renaming a cache will be a heavy operation too. 
> Sanne, if we know that we don't share the semi index for different caches, can we avoid the need to store the cache name in each document?
> 
> BTW, this discussion should be in the open. 

+1

> 
> On 31 janv. 2014, at 18:04, Adrian Nistor <anistor at gmail.com> wrote:
> 
>> I think it conceptually makes sense to have one entity type per cache but this should be a good practice rather than an enforced constraint. It would be a bit late and difficult to add such a constraint now.
>> 
>> The design change we are talking about is being able to search across caches. That can easily be implemented regardless of this. We can move the SearchManager from Cache scope to CacheManager scope. Indexes are bound to types not to caches anyway, so same-type entities from multiple caches can end up in the same index, we just need to store an extra hidden field: the name of the originating cache. This move would also allow us to share some lucene/hsearch resources.  
>> 
>> We can easily continue to support Search.getSearchManager(cache) so old api usages continue to work. This would return a delegating/decorating SearchManager that creates queries that are automatically restricted to the scope of the given cache.
>> 
>> Piece of cake? :)
>> 
>> 
>> 
>> On Thu, Jan 30, 2014 at 9:56 PM, Mircea Markus <mmarkus at redhat.com> wrote:
>> curious to see your thoughts on this: it is a recurring topic and will affects the way we design things in future in a significant way.
>> E.g. if we think (recommend) that a distinct cache should be used for each entity, then we'll need querying to work between caches. Also some cache stores can be built along these lines (e.g. for the JPA cache store we only need it to support a single entity type).
>> 
>> Begin forwarded message:
>> 
>> > On Jan 30, 2014, at 9:42 AM, Galder Zamarre?o <galder at redhat.com> wrote:
>> >
>> >>
>> >> On Jan 21, 2014, at 11:52 PM, Mircea Markus <mmarkus at redhat.com> wrote:
>> >>
>> >>>
>> >>> On Jan 15, 2014, at 1:42 PM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>> >>>
>> >>>> By the way, people looking for that feature are also asking for a unified Cache API accessing these several caches right? Otherwise I am not fully understanding why they ask for a unified query.
>> >>>> Do you have written detailed use cases somewhere for me to better understand what is really requested?
>> >>>
>> >>> IMO from a user perspective, being able to run queries spreading several caches makes the programming simplifies the programming model: each cache corresponding to a single entity type, with potentially different configuration.
>> >>
>> >> Not sure if it simplifies things TBH if the configuration is the same. IMO, it just adds clutter.
>> >
>> > Not sure I follow: having a cache that contains both Cars and Persons sound more cluttering to me. I think it's cumbersome to write any kind of querying with an heterogenous  cache, e.g. Map/Reduce tasks that need to count all the green Cars would need to be aware of Persons and ignore them. Not only it is harder to write, but discourages code reuse and makes it hard to maintain (if you'll add Pets in the same cache in future you need to update the M/R code as well). And of course there are also different cache-based configuration options that are not immediately obvious (at design time) but will be in the future (there are more Persons than Cars, they live longer/expiry etc): mixing everything together in the same cache from the begging is a design decision that might bite you in the future.
>> >
>> > The way I see it - and very curious to see your opinion on this - following an database analogy, the CacheManager corresponds to an Database and the Cache to a Table. Hence my thought that queries spreading multiple caches are both useful and needed (same as query spreading over multiple tables).
>> >
>> >
>> 
>> Cheers,
>> --
>> Mircea Markus
>> Infinispan lead (www.infinispan.org)
>> 
>> 
>> 
>> 
>> 

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From emmanuel at hibernate.org  Wed Feb  5 11:30:32 2014
From: emmanuel at hibernate.org (Emmanuel Bernard)
Date: Wed, 5 Feb 2014 17:30:32 +0100
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>
	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>
	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>
	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>
Message-ID: <20140205163032.GB93108@hibernate.org>

On Wed 2014-02-05 15:53, Mircea Markus wrote:
> 
> On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
> 
> > Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed. 
> 
> Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-)


//some unified query giving me entries pointing by fk copy to bar and
//buz objects. So I need to manually load these references.

//happy emmanuel
Cache unifiedCache = cacheManager.getMotherOfAllCaches();
Bar bar = unifiedCache.get(foo);
Buz buz = unifiedCache.get(baz);

//not so happy emmanuel
Cache fooCache = cacheManager.getCache("foo");
Bar bar = fooCache.get(foo);
Cache bazCache = cacheManager.getCache("baz");
Buz buz = bazCache.put(baz);


> 
> > 
> > I have to admit that having to add a cache name to the stored elements of the index documents makes me a bit sad.
> 
> sad because of the increased index size? 

It makes the index non natural and less reusable using direct Lucene
APIs. But that might be less of a concern for Infinispan.

> 
> > I was already unhappy when I had to do it for class names. Renaming a cache will be a heavy operation too. 
> > Sanne, if we know that we don't share the semi index for different caches, can we avoid the need to store the cache name in each document?

From rvansa at redhat.com  Wed Feb  5 11:44:59 2014
From: rvansa at redhat.com (Radim Vansa)
Date: Wed, 05 Feb 2014 17:44:59 +0100
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <20140205163032.GB93108@hibernate.org>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>
	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>
	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>
	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>
	<20140205163032.GB93108@hibernate.org>
Message-ID: <52F26A8B.60306@redhat.com>

On 02/05/2014 05:30 PM, Emmanuel Bernard wrote:
> On Wed 2014-02-05 15:53, Mircea Markus wrote:
>> On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>
>>> Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed.
>> Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-)
>
> //some unified query giving me entries pointing by fk copy to bar and
> //buz objects. So I need to manually load these references.
>
> //happy emmanuel
> Cache unifiedCache = cacheManager.getMotherOfAllCaches();
> Bar bar = unifiedCache.get(foo);
> Buz buz = unifiedCache.get(baz);
>
> //not so happy emmanuel
> Cache fooCache = cacheManager.getCache("foo");
> Bar bar = fooCache.get(foo);
> Cache bazCache = cacheManager.getCache("baz");
> Buz buz = bazCache.put(baz);

cacheManager.getCache("foo").put("xxx", "yyy");
cacheManager.getCache("foo").put("xxx", "zzz");

String xxx = cacheManager.getMotherOfAllCaches().get("xxx");
System.out.println(xxx);

What should it print? Should an exception be thrown? Or should get on 
mother of all caches return Map<Cache<String, String>, String>?

Radim

>
>
>>> I have to admit that having to add a cache name to the stored elements of the index documents makes me a bit sad.
>> sad because of the increased index size?
> It makes the index non natural and less reusable using direct Lucene
> APIs. But that might be less of a concern for Infinispan.
>
>>> I was already unhappy when I had to do it for class names. Renaming a cache will be a heavy operation too.
>>> Sanne, if we know that we don't share the semi index for different caches, can we avoid the need to store the cache name in each document?
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


-- 
Radim Vansa <rvansa at redhat.com>
JBoss DataGrid QA


From mmarkus at redhat.com  Wed Feb  5 11:59:41 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Wed, 5 Feb 2014 16:59:41 +0000
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <20140205163032.GB93108@hibernate.org>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>
	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>
	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>
	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>
	<20140205163032.GB93108@hibernate.org>
Message-ID: <066E5871-8FAD-4F89-9705-767B1BC41037@redhat.com>


On Feb 5, 2014, at 4:30 PM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:

> On Wed 2014-02-05 15:53, Mircea Markus wrote:
>> 
>> On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>> 
>>> Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed. 
>> 
>> Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-)
> 
> 
> //some unified query giving me entries pointing by fk copy to bar and
> //buz objects. So I need to manually load these references.
> 
> //happy emmanuel
> Cache unifiedCache = cacheManager.getMotherOfAllCaches();
> Bar bar = unifiedCache.get(foo);
> Buz buz = unifiedCache.get(baz);

Can you please elaborate the advantages the mother of all caches would bring? :-) 
It but feels to me like querying a whole database by a primary key without mentioning the table name :-) Also might get nasty if multiple caches have the same key.

> //not so happy emmanuel
> Cache fooCache = cacheManager.getCache("foo");
> Bar bar = fooCache.get(foo);
> Cache bazCache = cacheManager.getCache("baz");
> Buz buz = bazCache.put(baz);

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From emmanuel at hibernate.org  Wed Feb  5 14:34:45 2014
From: emmanuel at hibernate.org (Emmanuel Bernard)
Date: Wed, 5 Feb 2014 20:34:45 +0100
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <52F26A8B.60306@redhat.com>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>
	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>
	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>
	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>
	<20140205163032.GB93108@hibernate.org> <52F26A8B.60306@redhat.com>
Message-ID: <20140205193445.GC93108@hibernate.org>

On Wed 2014-02-05 17:44, Radim Vansa wrote:
> On 02/05/2014 05:30 PM, Emmanuel Bernard wrote:
> > On Wed 2014-02-05 15:53, Mircea Markus wrote:
> >> On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
> >>
> >>> Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed.
> >> Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-)
> >
> > //some unified query giving me entries pointing by fk copy to bar and
> > //buz objects. So I need to manually load these references.
> >
> > //happy emmanuel
> > Cache unifiedCache = cacheManager.getMotherOfAllCaches();
> > Bar bar = unifiedCache.get(foo);
> > Buz buz = unifiedCache.get(baz);
> >
> > //not so happy emmanuel
> > Cache fooCache = cacheManager.getCache("foo");
> > Bar bar = fooCache.get(foo);
> > Cache bazCache = cacheManager.getCache("baz");
> > Buz buz = bazCache.put(baz);
> 
> cacheManager.getCache("foo").put("xxx", "yyy");
> cacheManager.getCache("foo").put("xxx", "zzz");
> 
> String xxx = cacheManager.getMotherOfAllCaches().get("xxx");
> System.out.println(xxx);
> 
> What should it print? Should an exception be thrown? Or should get on 
> mother of all caches return Map<Cache<String, String>, String>?
> 

Yes I'm aware of that.
What I am saying is that the idea of search across caches as
appealing as it is is is not the whole story.

People search, read, navigate and M/R their data in interleaved ways.
You need to project and think about a 100-200 lines of code that would
use that feature in combination with other related features to see if
that will be useful in the end (or gimmicky) and if the user experience
(API mostly in our case) will be good or make people kill themselves.

The feeling I have is that we are too feature focused and not enough use
case and experience focused.

From rhauch at redhat.com  Wed Feb  5 14:54:02 2014
From: rhauch at redhat.com (Randall Hauch)
Date: Wed, 5 Feb 2014 13:54:02 -0600
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <20140205193445.GC93108@hibernate.org>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>
	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>
	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>
	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>
	<20140205163032.GB93108@hibernate.org> <52F26A8B.60306@redhat.com>
	<20140205193445.GC93108@hibernate.org>
Message-ID: <9C4DBDF7-EB51-45F3-B9D2-815E9D215C9B@redhat.com>


On Feb 5, 2014, at 1:34 PM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:

> What I am saying is that the idea of search across caches as
> appealing as it is is is not the whole story.
> 
> People search, read, navigate and M/R their data in interleaved ways.
> You need to project and think about a 100-200 lines of code that would
> use that feature in combination with other related features to see if
> that will be useful in the end (or gimmicky) and if the user experience
> (API mostly in our case) will be good or make people kill themselves.
> 

What is the plan for supporting joins across entity types? 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140205/ab7ccc23/attachment.html 

From mmarkus at redhat.com  Wed Feb  5 16:40:57 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Wed, 5 Feb 2014 21:40:57 +0000
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <20140205193445.GC93108@hibernate.org>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>
	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>
	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>
	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>
	<20140205163032.GB93108@hibernate.org> <52F26A8B.60306@redhat.com>
	<20140205193445.GC93108@hibernate.org>
Message-ID: <D6827D86-31CC-41DD-8E49-EEB22373E69E@redhat.com>


On Feb 5, 2014, at 7:34 PM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:

> On Wed 2014-02-05 17:44, Radim Vansa wrote:
>> On 02/05/2014 05:30 PM, Emmanuel Bernard wrote:
>>> On Wed 2014-02-05 15:53, Mircea Markus wrote:
>>>> On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>>> 
>>>>> Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed.
>>>> Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-)
>>> 
>>> //some unified query giving me entries pointing by fk copy to bar and
>>> //buz objects. So I need to manually load these references.
>>> 
>>> //happy emmanuel
>>> Cache unifiedCache = cacheManager.getMotherOfAllCaches();
>>> Bar bar = unifiedCache.get(foo);
>>> Buz buz = unifiedCache.get(baz);
>>> 
>>> //not so happy emmanuel
>>> Cache fooCache = cacheManager.getCache("foo");
>>> Bar bar = fooCache.get(foo);
>>> Cache bazCache = cacheManager.getCache("baz");
>>> Buz buz = bazCache.put(baz);
>> 
>> cacheManager.getCache("foo").put("xxx", "yyy");
>> cacheManager.getCache("foo").put("xxx", "zzz");
>> 
>> String xxx = cacheManager.getMotherOfAllCaches().get("xxx");
>> System.out.println(xxx);
>> 
>> What should it print? Should an exception be thrown? Or should get on 
>> mother of all caches return Map<Cache<String, String>, String>?
>> 
> 
> Yes I'm aware of that.
> What I am saying is that the idea of search across caches as
> appealing as it is is is not the whole story.
> 
> People search, read, navigate and M/R their data in interleaved ways.

In all the non-trivial deployments I saw people used multiple caches for different data, instead of one. That's why for me this came as the straight forward way of structuring data and naturally I thought that querying multiple caches makes sense in this context: to allow querying to run over a model that is already in use and not to change the model to accommodate querying. 

> You need to project and think about a 100-200 lines of code that would
> use that feature in combination with other related features to see if
> that will be useful in the end (or gimmicky) and if the user experience
> (API mostly in our case) will be good or make people kill themselves.
> 
> The feeling I have is that we are too feature focused and not enough use
> case and experience focused.


Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From mgencur at redhat.com  Thu Feb  6 04:52:37 2014
From: mgencur at redhat.com (Martin Gencur)
Date: Thu, 06 Feb 2014 10:52:37 +0100
Subject: [infinispan-dev] infinispan-bom vs. infinispan-parent dependencies
Message-ID: <52F35B65.1080106@redhat.com>

Hi,
there are currently two Maven pom files in Infinispan where dependency 
versions are defined - infinispan-bom and infinispan-parent. For 
instance, version.protostream is defined in the BOM while 
version.commons.pool is defined in infinispan-parent.

This causes me troubles when I want to do filtering with 
maven-resources-plugin and substitute versions of dependencies in 
certain configuration file because properties defined in the BOM are not 
visible to other modules (I'm currently trying to generate "features" 
file for HotRod to be easily deployable into Karaf - 
https://issues.jboss.org/browse/ISPN-3967, and I can't really access 
versions of some dependencies)

We include the BOM file in infinispan-parent as a dependency with scope 
"import" which causes the properties defined in the BOM to be lost.

Questions:
Is there a reason why we include it as a dependency and do not have it 
as a parent of infinispan-parent? (as suggested in [1])
Can someone explain the reason why we have version declarations in two 
separate files?
If you possibly know how to access properties in the BOM, please advise. 
To me it seems impossible without some nasty hacks.

Thanks,
Martin


[1] 
http://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140206/9fc95081/attachment.html 

From ttarrant at redhat.com  Thu Feb  6 07:19:46 2014
From: ttarrant at redhat.com (Tristan Tarrant)
Date: Thu, 06 Feb 2014 06:19:46 -0600
Subject: [infinispan-dev] infinispan-bom vs. infinispan-parent
	dependencies
In-Reply-To: <52F35B65.1080106@redhat.com>
References: <52F35B65.1080106@redhat.com>
Message-ID: <52F37DE2.9030603@redhat.com>

The idea is that the bom should have dependencies required by 
applications using Infinispan, whereas -parent includes build-time 
dependencies.

Tristan

On 02/06/2014 03:52 AM, Martin Gencur wrote:
> Hi,
> there are currently two Maven pom files in Infinispan where dependency 
> versions are defined - infinispan-bom and infinispan-parent. For 
> instance, version.protostream is defined in the BOM while 
> version.commons.pool is defined in infinispan-parent.
>
> This causes me troubles when I want to do filtering with 
> maven-resources-plugin and substitute versions of dependencies in 
> certain configuration file because properties defined in the BOM are 
> not visible to other modules (I'm currently trying to generate 
> "features" file for HotRod to be easily deployable into Karaf - 
> https://issues.jboss.org/browse/ISPN-3967, and I can't really access 
> versions of some dependencies)
>
> We include the BOM file in infinispan-parent as a dependency with 
> scope "import" which causes the properties defined in the BOM to be lost.
>
> Questions:
> Is there a reason why we include it as a dependency and do not have it 
> as a parent of infinispan-parent? (as suggested in [1])
> Can someone explain the reason why we have version declarations in two 
> separate files?
> If you possibly know how to access properties in the BOM, please 
> advise. To me it seems impossible without some nasty hacks.
>
> Thanks,
> Martin
>
>
> [1] 
> http://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


From emmanuel at hibernate.org  Thu Feb  6 04:27:45 2014
From: emmanuel at hibernate.org (Emmanuel Bernard)
Date: Thu, 6 Feb 2014 10:27:45 +0100
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <D6827D86-31CC-41DD-8E49-EEB22373E69E@redhat.com>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>
	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>
	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>
	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>
	<20140205163032.GB93108@hibernate.org> <52F26A8B.60306@redhat.com>
	<20140205193445.GC93108@hibernate.org>
	<D6827D86-31CC-41DD-8E49-EEB22373E69E@redhat.com>
Message-ID: <20140206092745.GA95590@hibernate.org>

On Wed 2014-02-05 21:40, Mircea Markus wrote:
> 
> On Feb 5, 2014, at 7:34 PM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
> 
> > On Wed 2014-02-05 17:44, Radim Vansa wrote:
> >> On 02/05/2014 05:30 PM, Emmanuel Bernard wrote:
> >>> On Wed 2014-02-05 15:53, Mircea Markus wrote:
> >>>> On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
> >>>> 
> >>>>> Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed.
> >>>> Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-)
> >>> 
> >>> //some unified query giving me entries pointing by fk copy to bar and
> >>> //buz objects. So I need to manually load these references.
> >>> 
> >>> //happy emmanuel
> >>> Cache unifiedCache = cacheManager.getMotherOfAllCaches();
> >>> Bar bar = unifiedCache.get(foo);
> >>> Buz buz = unifiedCache.get(baz);
> >>> 
> >>> //not so happy emmanuel
> >>> Cache fooCache = cacheManager.getCache("foo");
> >>> Bar bar = fooCache.get(foo);
> >>> Cache bazCache = cacheManager.getCache("baz");
> >>> Buz buz = bazCache.put(baz);
> >> 
> >> cacheManager.getCache("foo").put("xxx", "yyy");
> >> cacheManager.getCache("foo").put("xxx", "zzz");
> >> 
> >> String xxx = cacheManager.getMotherOfAllCaches().get("xxx");
> >> System.out.println(xxx);
> >> 
> >> What should it print? Should an exception be thrown? Or should get on 
> >> mother of all caches return Map<Cache<String, String>, String>?
> >> 
> > 
> > Yes I'm aware of that.
> > What I am saying is that the idea of search across caches as
> > appealing as it is is is not the whole story.
> > 
> > People search, read, navigate and M/R their data in interleaved ways.
> 
> In all the non-trivial deployments I saw people used multiple caches for different data, instead of one. That's why for me this came as the straight forward way of structuring data and naturally I thought that querying multiple caches makes sense in this context: to allow querying to run over a model that is already in use and not to change the model to accommodate querying. 

Maybe it is but what is the right way to address that? What does the API
flow look like? Is that one app using 50 or 100 cache and juggling with
them or rather 50 apps using the same shared grid and using 1 maybe 2
cache.
Just to be clear, I think cross cache querying is something we need. I
am just questioning how it will be used in a bigger context and how the
over Infinispan API should look like to address the bigger context.

BTW the example you saw, Is that one cache per atomic type or rather one cache per
family of data.

From ben.cotton at ALUMNI.RUTGERS.EDU  Thu Feb  6 14:46:23 2014
From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben)
Date: Thu, 6 Feb 2014 11:46:23 -0800 (PST)
Subject: [infinispan-dev] Infinispan embedded off-heap cache
In-Reply-To: <1389893871449-4028653.post@n3.nabble.com>
References: <3BE9E09A-6651-45D9-B7F1-891C111F232C@redhat.com>
	<CAGNSiSbipQSBr-m+1KTCiQ8s9dnEWBM420K4okESA+J_7fuciQ@mail.gmail.com>
	<1389783264288-4028642.post@n3.nabble.com>
	<52D67480.9020908@redhat.com>
	<CAMxXA69j0+XGKWgAvs5+pn-fRJg3m2vCTS6Y6FizBy_WZ1uR1Q@mail.gmail.com>
	<52D678BD.5010400@redhat.com> <52D67A1E.2010509@redhat.com>
	<1389808266330-4028649.post@n3.nabble.com>
	<AA2614D5-38DD-4BAF-86F3-21BBAFB68146@hibernate.org>
	<1389893871449-4028653.post@n3.nabble.com>
Message-ID: <1391715983011-4028794.post@n3.nabble.com>


Hi everybody.  

We are getting started with our POC design/build of this post's ambition.

Currently at an ISPN build-from-scratch newbie roadblock.  I know I should
be patient, but if any of you have time could one of you hook me up with the
official "How 2 Fork/Clone/Extend/Build your own ISPN Master from GIT " wiki
link?

ROADBLOCK details here --> https://community.jboss.org/thread/236848

Thx,
Ben


--
View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028794.html
Sent from the Infinispan Developer List mailing list archive at Nabble.com.

From mmarkus at redhat.com  Thu Feb  6 15:49:01 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Thu, 6 Feb 2014 20:49:01 +0000
Subject: [infinispan-dev] Infinispan embedded off-heap cache
In-Reply-To: <1391715983011-4028794.post@n3.nabble.com>
References: <3BE9E09A-6651-45D9-B7F1-891C111F232C@redhat.com>
	<CAGNSiSbipQSBr-m+1KTCiQ8s9dnEWBM420K4okESA+J_7fuciQ@mail.gmail.com>
	<1389783264288-4028642.post@n3.nabble.com>
	<52D67480.9020908@redhat.com>
	<CAMxXA69j0+XGKWgAvs5+pn-fRJg3m2vCTS6Y6FizBy_WZ1uR1Q@mail.gmail.com>
	<52D678BD.5010400@redhat.com> <52D67A1E.2010509@redhat.com>
	<1389808266330-4028649.post@n3.nabble.com>
	<AA2614D5-38DD-4BAF-86F3-21BBAFB68146@hibernate.org>
	<1389893871449-4028653.post@n3.nabble.com>
	<1391715983011-4028794.post@n3.nabble.com>
Message-ID: <3560BCBE-3D25-472B-9D8B-C16230FBCB4B@redhat.com>

Replied on the forum ;) 


On Feb 6, 2014, at 7:46 PM, cotton-ben <ben.cotton at ALUMNI.RUTGERS.EDU> wrote:

> 
> Hi everybody.  
> 
> We are getting started with our POC design/build of this post's ambition.
> 
> Currently at an ISPN build-from-scratch newbie roadblock.  I know I should
> be patient, but if any of you have time could one of you hook me up with the
> official "How 2 Fork/Clone/Extend/Build your own ISPN Master from GIT " wiki
> link?
> 
> ROADBLOCK details here --> https://community.jboss.org/thread/236848
> 
> Thx,
> Ben
> 
> 
> 
> 
> --
> View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028794.html
> Sent from the Infinispan Developer List mailing list archive at Nabble.com.
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From galder at redhat.com  Fri Feb  7 09:27:11 2014
From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=)
Date: Fri, 7 Feb 2014 15:27:11 +0100
Subject: [infinispan-dev] Wildfly's build/lib.xml behaves unexpectedly with
	JDK8
Message-ID: <FAF74A37-8EF4-48EA-A862-4304D91D3EA6@redhat.com>

Hi,

In JDK8, [1] causes issues, since the replace only happens the first time the character is found.

We use this lib.xml in Infinispan as well [2]. I?ve workaround it by doing this instead:

name = name.split(".").join("/");

This seems to work fine, but have not fully tested it.

Cheers,

[1] https://github.com/wildfly/wildfly/blob/master/build/lib.xml#L75
[2] https://issues.jboss.org/browse/ISPN-3974?focusedCommentId=12942643&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12942643
--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From mmarkus at redhat.com  Fri Feb  7 14:44:37 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Fri, 7 Feb 2014 19:44:37 +0000
Subject: [infinispan-dev] 7.0.0.Alpha1
Message-ID: <FBB82262-ECF5-4E91-AE9C-7A66646C91CB@redhat.com>

Hey guys,

I think we have enough stuff to cut a 7.0.0.Alpha1 next week. Besides quite some fixes that came in, we have:
- Vladimir's parallel map reduce (ISPN-2284)
- Tristan's autorisation for embedded mode (ISPN-3909)
- Will's clustered listeners (ISPN-3355)

Let's aim for Thu 20 Feb. Next in charge with releasing is Dan (release rotation is defined in the release doc now).

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From vblagoje at redhat.com  Fri Feb  7 14:54:42 2014
From: vblagoje at redhat.com (Vladimir Blagojevic)
Date: Fri, 07 Feb 2014 14:54:42 -0500
Subject: [infinispan-dev] 7.0.0.Alpha1
In-Reply-To: <FBB82262-ECF5-4E91-AE9C-7A66646C91CB@redhat.com>
References: <FBB82262-ECF5-4E91-AE9C-7A66646C91CB@redhat.com>
Message-ID: <52F53A02.5050602@redhat.com>

Mircea,

ISPN-2284 has not been integrated yet. Dan and Will gave me some really 
good feedback that resulted in additional fixes, further explicit 
testing of parallel execution. I think we have it ready for integration now.

Regards,
Vladimir


On 2/7/2014, 2:44 PM, Mircea Markus wrote:
> Hey guys,
>
> I think we have enough stuff to cut a 7.0.0.Alpha1 next week. Besides quite some fixes that came in, we have:
> - Vladimir's parallel map reduce (ISPN-2284)
> - Tristan's autorisation for embedded mode (ISPN-3909)
> - Will's clustered listeners (ISPN-3355)
>
> Let's aim for Thu 20 Feb. Next in charge with releasing is Dan (release rotation is defined in the release doc now).
>
> Cheers,


From mmarkus at redhat.com  Fri Feb  7 15:18:56 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Fri, 7 Feb 2014 15:18:56 -0500 (EST)
Subject: [infinispan-dev] 7.0.0.Alpha1
In-Reply-To: <52F53A02.5050602@redhat.com>
References: <FBB82262-ECF5-4E91-AE9C-7A66646C91CB@redhat.com>
	<52F53A02.5050602@redhat.com>
Message-ID: <971B535B-95A0-4DDC-9904-242D7DDD8F33@redhat.com>

Yep, non of the above has, the plan is to get them in, though.

> On 7 Feb 2014, at 19:54, Vladimir Blagojevic <vblagoje at redhat.com> wrote:
> 
> Mircea,
> 
> ISPN-2284 has not been integrated yet. Dan and Will gave me some really 
> good feedback that resulted in additional fixes, further explicit 
> testing of parallel execution. I think we have it ready for integration now.
> 
> Regards,
> Vladimir
> 
> 
>> On 2/7/2014, 2:44 PM, Mircea Markus wrote:
>> Hey guys,
>> 
>> I think we have enough stuff to cut a 7.0.0.Alpha1 next week. Besides quite some fixes that came in, we have:
>> - Vladimir's parallel map reduce (ISPN-2284)
>> - Tristan's autorisation for embedded mode (ISPN-3909)
>> - Will's clustered listeners (ISPN-3355)
>> 
>> Let's aim for Thu 20 Feb. Next in charge with releasing is Dan (release rotation is defined in the release doc now).
>> 
>> Cheers,
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


From ben.cotton at ALUMNI.RUTGERS.EDU  Sun Feb  9 19:42:03 2014
From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben)
Date: Sun, 9 Feb 2014 16:42:03 -0800 (PST)
Subject: [infinispan-dev] Infinispan embedded off-heap cache
In-Reply-To: <3560BCBE-3D25-472B-9D8B-C16230FBCB4B@redhat.com>
References: <1389783264288-4028642.post@n3.nabble.com>
	<52D67480.9020908@redhat.com>
	<CAMxXA69j0+XGKWgAvs5+pn-fRJg3m2vCTS6Y6FizBy_WZ1uR1Q@mail.gmail.com>
	<52D678BD.5010400@redhat.com> <52D67A1E.2010509@redhat.com>
	<1389808266330-4028649.post@n3.nabble.com>
	<AA2614D5-38DD-4BAF-86F3-21BBAFB68146@hibernate.org>
	<1389893871449-4028653.post@n3.nabble.com>
	<1391715983011-4028794.post@n3.nabble.com>
	<3560BCBE-3D25-472B-9D8B-C16230FBCB4B@redhat.com>
Message-ID: <1391992923651-4028800.post@n3.nabble.com>


FYI, we've got all the "can we build thisfrom w/in  JPM.com?" plumbing
concerns 100% resolved.   So now it is

"Heap No! Heap No!  It's off to work we go ...."

https://github.com/Cotton-Ben/infinispan

Will share musings/fears/roadblocks/triumphs/etc here and at 
https://issues.jboss.org/browse/ISPN-871

Thx,
Ben


--
View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028800.html
Sent from the Infinispan Developer List mailing list archive at Nabble.com.

From galder at redhat.com  Mon Feb 10 03:34:43 2014
From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=)
Date: Mon, 10 Feb 2014 09:34:43 +0100
Subject: [infinispan-dev] Wildfly's build/lib.xml behaves unexpectedly
	with JDK8
In-Reply-To: <FAF74A37-8EF4-48EA-A862-4304D91D3EA6@redhat.com>
References: <FAF74A37-8EF4-48EA-A862-4304D91D3EA6@redhat.com>
Message-ID: <2771A788-3C19-4E11-9668-AE79469EEC36@redhat.com>

Actually, split/join does not work with JDK7. The following code seems to work with both:

name = name.split(".").join("/");
if (name) {
  self.log("Use JDK8 method to build module names");
} else {
  name = attributes.get("name");
  name = name.replace(".", "/");
  self.log("Use JDK7 method to build module names");
}

Cheers,

On 07 Feb 2014, at 15:27, Galder Zamarre?o <galder at redhat.com> wrote:

> Hi,
> 
> In JDK8, [1] causes issues, since the replace only happens the first time the character is found.
> 
> We use this lib.xml in Infinispan as well [2]. I?ve workaround it by doing this instead:
> 
> name = name.split(".").join("/");
> 
> This seems to work fine, but have not fully tested it.
> 
> Cheers,
> 
> [1] https://github.com/wildfly/wildfly/blob/master/build/lib.xml#L75
> [2] https://issues.jboss.org/browse/ISPN-3974?focusedCommentId=12942643&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12942643
> --
> Galder Zamarre?o
> galder at redhat.com
> twitter.com/galderz
> 
> Project Lead, Escalante
> http://escalante.io
> 
> Engineer, Infinispan
> http://infinispan.org
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From galder at redhat.com  Mon Feb 10 03:51:59 2014
From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=)
Date: Mon, 10 Feb 2014 09:51:59 +0100
Subject: [infinispan-dev] Dropping AtomicMap/FineGrainedAtomicMap
In-Reply-To: <52DD0961.90600@infinispan.org>
References: <C3FAED6C-4361-4D95-8708-EB53727FAF9A@redhat.com>
	<52DD0961.90600@infinispan.org>
Message-ID: <04AA54F3-1FA1-4BFF-9C0B-D8263E6FBBA2@redhat.com>


On 20 Jan 2014, at 12:32, Pedro Ruivo <pedro at infinispan.org> wrote:

> Hi,
> 
> On 01/20/2014 11:28 AM, Galder Zamarre?o wrote:
>> Hi all,
>> 
>> Dropping AtomicMap and FineGrainedAtomicMap was discussed last week in the F2F meeting [1]. It's complex and buggy, and we'd recommend people to use the Grouping API instead [2]. Grouping API would allow data to reside together, while the standard map API would apply per-key locking.
> 
> +1. are we going to dropping the Delta stuff?

The delta would be the k/v pair. 

Say you are storing HTTP sessions. With AMs, the key would be the session ID and all its attributes would be stored in the atomic map. Once you remove that, each session?s attributes is a single k/v pair in the cache, so that?s your delta.

>> 
>> We don't have a timeline for this yet, but we want to get as much feedback on the topic as possible so that we can evaluate the options.
> 
> before starting with it, I would recommend to add the following method 
> to cache API:
> 
> /**
>  * returns all the keys and values associated with the group name. The 
> Map<K, V> is immutable (i.e. read-only)
>  **/
> Map<K, V> getGroup(String groupName);

Yes, I think we?d need this for grouping to be able have a full replacement for atomic maps. You need a way to retrieve all the data associated with that group without the to iterate the cache yourself, or the need to add indexing.

In the case of HTTP sessions, you?d give the session ID as key and it?d give you a map view of all the attributes associated with that session.

Cheers,

> 
> Cheers,
> Pedro
> 
>> 
>> Cheers,
>> 
>> [1] https://issues.jboss.org/browse/ISPN-3901
>> [2] http://infinispan.org/docs/6.0.x/user_guide/user_guide.html#_the_grouping_api
>> --
>> Galder Zamarre?o
>> galder at redhat.com
>> twitter.com/galderz
>> 
>> Project Lead, Escalante
>> http://escalante.io
>> 
>> Engineer, Infinispan
>> http://infinispan.org
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From galder at redhat.com  Mon Feb 10 03:58:26 2014
From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=)
Date: Mon, 10 Feb 2014 09:58:26 +0100
Subject: [infinispan-dev] Dropping AtomicMap/FineGrainedAtomicMap
In-Reply-To: <52DEDBCF.7030204@redhat.com>
References: <C3FAED6C-4361-4D95-8708-EB53727FAF9A@redhat.com>	<52DD0961.90600@infinispan.org>
	<BAY174-DS101E974611C68CA588F7D0C3A40@phx.gbl>
	<52DEDBCF.7030204@redhat.com>
Message-ID: <DAD9EF6B-1354-44A9-8BCE-CFD96E173468@redhat.com>

The delta?s will remain, but they?re each key/value pair. 

Example: say you want to store a dehydrated list of three elements (?one?, ?two?, ?three?) in Infinispan

Before you?d do (approx):

key=my-list
value=AtomicMap(k=1,v=?one?, k=2,v=?two?, k3=?v3?)

Internally, we?d track deltas and only send those changes.

What I propose we do is:

key=1 (group=?my-list")
value=?one?
key=2 (group=?my-list")
value=?two?
key=3 (group=?my-list")
value=?three?

The delta?s are still there. Each changed key is sent separately, when it changes.

This is not the final product of course. As agreed with Pedro, we?d need a way to have a view map for all key/value pairs associated with a given group, i.e. cache.getGroups(?my-list?).

I know Sanne et al also need a way to have coarse grained locking on the entire group sometimes, as well as fine grained locking, so we?d need to find a way to accomodate that.

Cheers,

On 21 Jan 2014, at 21:42, Vladimir Blagojevic <vblagoje at redhat.com> wrote:

> I agree with Erik here. Deltas are used in M/R and I've never detected 
> any problems so far.
> On 1/21/2014, 1:39 PM, Erik Salter wrote:
>> Please don't remove the Delta stuff.  That's quite useful, especially for
>> large collections.
>> 
>> Erik
>> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From galder at redhat.com  Mon Feb 10 04:02:58 2014
From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=)
Date: Mon, 10 Feb 2014 10:02:58 +0100
Subject: [infinispan-dev] Dropping AtomicMap/FineGrainedAtomicMap
In-Reply-To: <CA+nfvwR5m69YHLm4dzLGuejQLF3aTdLu6zoory6=LMGFeeDXiQ@mail.gmail.com>
References: <C3FAED6C-4361-4D95-8708-EB53727FAF9A@redhat.com>
	<52DD0961.90600@infinispan.org>
	<BAY174-DS101E974611C68CA588F7D0C3A40@phx.gbl>
	<52DEDBCF.7030204@redhat.com>
	<1EB0E9C8-AFD2-4172-874F-25BC2B12C6C4@redhat.com>
	<CA+nfvwR5m69YHLm4dzLGuejQLF3aTdLu6zoory6=LMGFeeDXiQ@mail.gmail.com>
Message-ID: <14F07DD7-8432-4E3E-B758-8F8AE5956FE8@redhat.com>


On 27 Jan 2014, at 11:27, Dan Berindei <dan.berindei at gmail.com> wrote:

> I think it's way too early to discuss removing FineGrainedAtomicMap and AtomicMap, as long as we don't have a concrete alternative with similar properties.

You have a point there, but we can?t ignore the feedback that says that atomic maps are not being used because they are buggy, and instead they are using grouping.

Deeply, I think we have two ways of doing the same thing, which is confusing from my POV, and one of them is not being used enough, or we?re not fixing the stuff there. 

Regardless of whether it?s too early or not, this email is trying to spark a consolidation of the two technologies into a single solution that works for everyone and we maintained it actively :)

> Cache.getGroup(groupName) is just a method name at this point, we don't have any idea how it will compare to AtomicMap/FineGrainedAtomicMap from a transaction isolation or performance perspective. BTW, do we really need the group name to be a String?
> 
> A good way to prove that the grouping API is a proper replacement for the atomic maps would be to replace the usage of atomic maps in the Tree module with the grouping API. Unless we plan to drop the Tree module completely?

Tree was only ever meant as a bridge for JBC users to move to Infinispan. Paul F et al tried to build HTTP sessions on top of that, it didn?t work. Then they tried to do it on top of Atomic Maps, and it didn?t work either, and finally they?re using grouping and seems to work?

Cheers,

> 
> Cheers
> Dan
> 
> 
> 
> On Wed, Jan 22, 2014 at 2:45 PM, Mircea Markus <mmarkus at redhat.com> wrote:
> 
> On Jan 21, 2014, at 8:42 PM, Vladimir Blagojevic <vblagoje at redhat.com> wrote:
> 
> > I agree with Erik here. Deltas are used in M/R and I've never detected
> > any problems so far.
> > On 1/21/2014, 1:39 PM, Erik Salter wrote:
> >> Please don't remove the Delta stuff.  That's quite useful, especially for
> >> large collections.
> 
> +1 to keep DeltaAware. Thanks for the feedbak
> 
> >>
> >> Erik
> >>
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> Cheers,
> --
> Mircea Markus
> Infinispan lead (www.infinispan.org)
> 
> 
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From sanne at infinispan.org  Mon Feb 10 04:57:51 2014
From: sanne at infinispan.org (Sanne Grinovero)
Date: Mon, 10 Feb 2014 09:57:51 +0000
Subject: [infinispan-dev] Dropping AtomicMap/FineGrainedAtomicMap
In-Reply-To: <14F07DD7-8432-4E3E-B758-8F8AE5956FE8@redhat.com>
References: <C3FAED6C-4361-4D95-8708-EB53727FAF9A@redhat.com>
	<52DD0961.90600@infinispan.org>
	<BAY174-DS101E974611C68CA588F7D0C3A40@phx.gbl>
	<52DEDBCF.7030204@redhat.com>
	<1EB0E9C8-AFD2-4172-874F-25BC2B12C6C4@redhat.com>
	<CA+nfvwR5m69YHLm4dzLGuejQLF3aTdLu6zoory6=LMGFeeDXiQ@mail.gmail.com>
	<14F07DD7-8432-4E3E-B758-8F8AE5956FE8@redhat.com>
Message-ID: <CAFm4XO0QREkUMS=f8r5G63uUxs9m3LBR=AR=rMze8-CTcb9G3A@mail.gmail.com>

On 10 February 2014 09:02, Galder Zamarre?o <galder at redhat.com> wrote:
>
> On 27 Jan 2014, at 11:27, Dan Berindei <dan.berindei at gmail.com> wrote:
>
>> I think it's way too early to discuss removing FineGrainedAtomicMap and AtomicMap, as long as we don't have a concrete alternative with similar properties.
>
> You have a point there, but we can?t ignore the feedback that says that atomic maps are not being used because they are buggy, and instead they are using grouping.

Let's not generalize too much, some are still doing the opposite and
have commented on their good reasons ;-)

>
> Deeply, I think we have two ways of doing the same thing, which is confusing from my POV, and one of them is not being used enough, or we?re not fixing the stuff there.

+1 but since as you say there is confusion, I'm not sure if they
really are the same thing. I've asked for a detailed comparison but
the discussion derailed. It would probably help a lot if someone from
the Infinispan core team would reimplement the FGAM API on top of
Grouping, making sure to guarantee the same semantics also in terms of
concurrency, isolation and acidity.
That would provide the implementation cleanup you'd all love, a
migration path, and probably some deeper considerations on their
differences; I also suspect there would be some roadblocks,
potentially subtle differences which could then be better documented?

>
> Regardless of whether it?s too early or not, this email is trying to spark a consolidation of the two technologies into a single solution that works for everyone and we maintained it actively :)
>
>> Cache.getGroup(groupName) is just a method name at this point, we don't have any idea how it will compare to AtomicMap/FineGrainedAtomicMap from a transaction isolation or performance perspective. BTW, do we really need the group name to be a String?
>>
>> A good way to prove that the grouping API is a proper replacement for the atomic maps would be to replace the usage of atomic maps in the Tree module with the grouping API. Unless we plan to drop the Tree module completely?
>
> Tree was only ever meant as a bridge for JBC users to move to Infinispan. Paul F et al tried to build HTTP sessions on top of that, it didn?t work. Then they tried to do it on top of Atomic Maps, and it didn?t work either, and finally they?re using grouping and seems to work?

I don't think that proves that Atomic Maps where not working, if any
it's a statement that grouping is a better fit for this specific use
case?
BTW having a use case which matches way better that the other just
highlights that this is no duplicate functionality, but rather quite
different stuff.

>From an Hibernate OGM perspective it would be great to have some more
stability in not so old APIs, at least until there's a clearly
documented migration to grouping.

Sanne

>
> Cheers,
>
>>
>> Cheers
>> Dan
>>
>>
>>
>> On Wed, Jan 22, 2014 at 2:45 PM, Mircea Markus <mmarkus at redhat.com> wrote:
>>
>> On Jan 21, 2014, at 8:42 PM, Vladimir Blagojevic <vblagoje at redhat.com> wrote:
>>
>> > I agree with Erik here. Deltas are used in M/R and I've never detected
>> > any problems so far.
>> > On 1/21/2014, 1:39 PM, Erik Salter wrote:
>> >> Please don't remove the Delta stuff.  That's quite useful, especially for
>> >> large collections.
>>
>> +1 to keep DeltaAware. Thanks for the feedbak
>>
>> >>
>> >> Erik
>> >>
>> >
>> > _______________________________________________
>> > infinispan-dev mailing list
>> > infinispan-dev at lists.jboss.org
>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> Cheers,
>> --
>> Mircea Markus
>> Infinispan lead (www.infinispan.org)
>>
>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> --
> Galder Zamarre?o
> galder at redhat.com
> twitter.com/galderz
>
> Project Lead, Escalante
> http://escalante.io
>
> Engineer, Infinispan
> http://infinispan.org
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


From tomaz.cerar at gmail.com  Mon Feb 10 04:59:27 2014
From: tomaz.cerar at gmail.com (=?UTF-8?B?VG9tYcW+IENlcmFy?=)
Date: Mon, 10 Feb 2014 10:59:27 +0100
Subject: [infinispan-dev] [wildfly-dev] Wildfly's build/lib.xml behaves
 unexpectedly with JDK8
In-Reply-To: <2771A788-3C19-4E11-9668-AE79469EEC36@redhat.com>
References: <FAF74A37-8EF4-48EA-A862-4304D91D3EA6@redhat.com>
	<2771A788-3C19-4E11-9668-AE79469EEC36@redhat.com>
Message-ID: <CAMquZP5HeMrmpyuDUTt_C3_O-aCG96K1oPd8BcaVT7scPavHmA@mail.gmail.com>

Can you send PR with a fix?


On Mon, Feb 10, 2014 at 9:34 AM, Galder Zamarre?o <galder at redhat.com> wrote:

> Actually, split/join does not work with JDK7. The following code seems to
> work with both:
>
> name = name.split(".").join("/");
> if (name) {
>   self.log("Use JDK8 method to build module names");
> } else {
>   name = attributes.get("name");
>   name = name.replace(".", "/");
>   self.log("Use JDK7 method to build module names");
> }
>
> Cheers,
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140210/f9bdbbe7/attachment-0001.html 

From mmarkus at redhat.com  Mon Feb 10 06:12:06 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Mon, 10 Feb 2014 11:12:06 +0000
Subject: [infinispan-dev] Dropping AtomicMap/FineGrainedAtomicMap
In-Reply-To: <CAFm4XO0QREkUMS=f8r5G63uUxs9m3LBR=AR=rMze8-CTcb9G3A@mail.gmail.com>
References: <C3FAED6C-4361-4D95-8708-EB53727FAF9A@redhat.com>
	<52DD0961.90600@infinispan.org>
	<BAY174-DS101E974611C68CA588F7D0C3A40@phx.gbl>
	<52DEDBCF.7030204@redhat.com>
	<1EB0E9C8-AFD2-4172-874F-25BC2B12C6C4@redhat.com>
	<CA+nfvwR5m69YHLm4dzLGuejQLF3aTdLu6zoory6=LMGFeeDXiQ@mail.gmail.com>
	<14F07DD7-8432-4E3E-B758-8F8AE5956FE8@redhat.com>
	<CAFm4XO0QREkUMS=f8r5G63uUxs9m3LBR=AR=rMze8-CTcb9G3A@mail.gmail.com>
Message-ID: <4D0AD0A0-44E8-41DF-8201-AB75DA1D2BAE@redhat.com>

Dropping the FGAM API is just an idea and there were valid concerns for not doing it immediately. Indeed the reason it was considered for removal is in order not to  keep around two APIs that do the same thing. That is if really do the same thing. As a first step would be good to enhance the grouping API[1] to support group handling methods. Then see if grouping works for the (FG)AM users: if it does, we can drop FGAM. If it doesn't we'll fix FGAM. 

[1] https://issues.jboss.org/browse/ISPN-3981


On Feb 10, 2014, at 9:57 AM, Sanne Grinovero <sanne at infinispan.org> wrote:

> On 10 February 2014 09:02, Galder Zamarre?o <galder at redhat.com> wrote:
>> 
>> On 27 Jan 2014, at 11:27, Dan Berindei <dan.berindei at gmail.com> wrote:
>> 
>>> I think it's way too early to discuss removing FineGrainedAtomicMap and AtomicMap, as long as we don't have a concrete alternative with similar properties.
>> 
>> You have a point there, but we can?t ignore the feedback that says that atomic maps are not being used because they are buggy, and instead they are using grouping.
> 
> Let's not generalize too much, some are still doing the opposite and
> have commented on their good reasons ;-)
> 
>> 
>> Deeply, I think we have two ways of doing the same thing, which is confusing from my POV, and one of them is not being used enough, or we?re not fixing the stuff there.
> 
> +1 but since as you say there is confusion, I'm not sure if they
> really are the same thing. I've asked for a detailed comparison but
> the discussion derailed. It would probably help a lot if someone from
> the Infinispan core team would reimplement the FGAM API on top of
> Grouping, making sure to guarantee the same semantics also in terms of
> concurrency, isolation and acidity.
> That would provide the implementation cleanup you'd all love, a
> migration path, and probably some deeper considerations on their
> differences; I also suspect there would be some roadblocks,
> potentially subtle differences which could then be better documented?
> 
>> 
>> Regardless of whether it?s too early or not, this email is trying to spark a consolidation of the two technologies into a single solution that works for everyone and we maintained it actively :)
>> 
>>> Cache.getGroup(groupName) is just a method name at this point, we don't have any idea how it will compare to AtomicMap/FineGrainedAtomicMap from a transaction isolation or performance perspective. BTW, do we really need the group name to be a String?
>>> 
>>> A good way to prove that the grouping API is a proper replacement for the atomic maps would be to replace the usage of atomic maps in the Tree module with the grouping API. Unless we plan to drop the Tree module completely?
>> 
>> Tree was only ever meant as a bridge for JBC users to move to Infinispan. Paul F et al tried to build HTTP sessions on top of that, it didn?t work. Then they tried to do it on top of Atomic Maps, and it didn?t work either, and finally they?re using grouping and seems to work?
> 
> I don't think that proves that Atomic Maps where not working, if any
> it's a statement that grouping is a better fit for this specific use
> case?
> BTW having a use case which matches way better that the other just
> highlights that this is no duplicate functionality, but rather quite
> different stuff.
> 
> From an Hibernate OGM perspective it would be great to have some more
> stability in not so old APIs, at least until there's a clearly
> documented migration to grouping.
> 
> Sanne
> 
>> 
>> Cheers,
>> 
>>> 
>>> Cheers
>>> Dan
>>> 
>>> 
>>> 
>>> On Wed, Jan 22, 2014 at 2:45 PM, Mircea Markus <mmarkus at redhat.com> wrote:
>>> 
>>> On Jan 21, 2014, at 8:42 PM, Vladimir Blagojevic <vblagoje at redhat.com> wrote:
>>> 
>>>> I agree with Erik here. Deltas are used in M/R and I've never detected
>>>> any problems so far.
>>>> On 1/21/2014, 1:39 PM, Erik Salter wrote:
>>>>> Please don't remove the Delta stuff.  That's quite useful, especially for
>>>>> large collections.
>>> 
>>> +1 to keep DeltaAware. Thanks for the feedbak
>>> 
>>>>> 
>>>>> Erik
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> 
>>> Cheers,
>>> --
>>> Mircea Markus
>>> Infinispan lead (www.infinispan.org)
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> 
>> --
>> Galder Zamarre?o
>> galder at redhat.com
>> twitter.com/galderz
>> 
>> Project Lead, Escalante
>> http://escalante.io
>> 
>> Engineer, Infinispan
>> http://infinispan.org
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From ttarrant at redhat.com  Mon Feb 10 08:24:16 2014
From: ttarrant at redhat.com (Tristan Tarrant)
Date: Mon, 10 Feb 2014 14:24:16 +0100
Subject: [infinispan-dev] UI-Portlet-Plugins
In-Reply-To: <2D562768-0730-44F8-B937-23DED5E26557@redhat.com>
References: <2D562768-0730-44F8-B937-23DED5E26557@redhat.com>
Message-ID: <52F8D300.6030401@redhat.com>

Hi Heiko,

adding infinispan-dev.

Thanks for taking the time to investigate this. One of the things that 
would need to be "exposed" to such portlets is the ability to link to 
RHQ views/portlets (e.g. go to a specific service view) so that 
"drilling-down" would show the appropriate detailed node.

Additionally we would like to provide RHQ-specific configuration when 
installing our "server" plugin, such as cache/containers dynagroups, 
maybe even a custom initial dashboard. Can it be done ?

Tristan


On 02/08/2014 07:43 PM, Heiko W.Rupp wrote:
> Hey,
>
> after talking with Tristan Tarrant from Infinispan I got the idea, that we could create a generic Portlet, that
> gets its content data as HTML from a server plugin. The server plugin then has access to all the server logic
> to do its task and can e.g. compute various stats of an Infinispan cluster.
>
> The following drawing illustrates that idea:
>
>
>
>
> Instances of the portlet will call to the selected server plugin and invoking a well known "interface" like "getMessage".
> This message will then do the processing and return a HTML-snippet (not a full page), which is then displayed
> inside the portlet window.
>
> Attached are two screen shots from such a portlet + some PoC code.
>
>
>
>
> This is created in the backend via (abbreviated)
>
> complexResults.put(new PropertySimple("results", "<h1>Hello World</h1>Welcome to RHQ<br/>Have FUN<br/>Current date: " + date));
>
> This is the "generic" config screen:
>
>
>
>
> The drop down shows the list of plugins available.
>
> In this PoC, the plugin writer is responsible for creating sane HTML,
> if we decided to put that into RHQ, we may want to do some additional
> sanitation. I also have no idea about styling the inner content.
>
> While this is probably not the way for the (long term) future, at least
> the backend plugins can be re-used if we move to an Angular-based UI,
> so this investment would not be lost.
>
>    Heiko
>
>
>
>
>


From ttarrant at redhat.com  Mon Feb 10 11:54:02 2014
From: ttarrant at redhat.com (Tristan Tarrant)
Date: Mon, 10 Feb 2014 17:54:02 +0100
Subject: [infinispan-dev] Remote Query improvements
Message-ID: <52F9042A.3010205@redhat.com>

Hi everybody,

last week I developed a simple application using Remote Query, and ran 
into a few issues. Some of them are just technical hurdles, while others 
have to do with the complexity of the developer experience. Here they 
are for open discussion:

- the schemas registry should be persistent. Alternatively being able to 
either specify the ProtoBuf schema from the <indexing /> configuration 
in the server subsystem or use server's deployment processor to "deploy" 
schemas.
- the server should store the single protobuf source schemas to allow 
for easy inspection/update of each using our management tools. The 
server itself should then compile the protobuf schemas into the binary 
representation when any of the source schemas changes. This would 
require a Java implementation of the ProtoBuf schema compiler, which 
wouldn't probably be too hard to do with Antlr.
- we need to be able to annotate single protobuf fields for indexing 
(probably by using specially-formatted comments, a la doclets) to avoid 
indexing all of the fields
- since remote query is already imbued with JPA in some form, an 
interesting project would be to implement a JPA annotation processor 
which can produce a set of ProtoBuf schemas from JPA-annotated classes.
- on top of the above, a ProtoBuf marshaller/unmarshaller which can use 
the JPA entities directly.

Tristan


From mmarkus at redhat.com  Mon Feb 10 12:34:29 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Mon, 10 Feb 2014 17:34:29 +0000
Subject: [infinispan-dev] Remote Query improvements
In-Reply-To: <52F9042A.3010205@redhat.com>
References: <52F9042A.3010205@redhat.com>
Message-ID: <7A6FCCEF-71A9-4C22-A907-A963BF5B4E1F@redhat.com>

On Feb 10, 2014, at 4:54 PM, Tristan Tarrant <ttarrant at redhat.com> wrote:

> - since remote query is already imbued with JPA in some form, an 
> interesting project would be to implement a JPA annotation processor 
> which can produce a set of ProtoBuf schemas from JPA-annotated classes.
> - on top of the above, a ProtoBuf marshaller/unmarshaller which can use 
> the JPA entities directly.

I think it would be even more useful to infer the protbuf schema from the protostream marshaller: the marshaller is required in order to serialize object into the proto format and has the advantage that it works even without Java.

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From emmanuel at hibernate.org  Mon Feb 10 13:14:18 2014
From: emmanuel at hibernate.org (Emmanuel Bernard)
Date: Mon, 10 Feb 2014 18:14:18 +0000
Subject: [infinispan-dev] Remote Query improvements
In-Reply-To: <52F9042A.3010205@redhat.com>
References: <52F9042A.3010205@redhat.com>
Message-ID: <20140210181418.GC84404@hibernate.org>

On Mon 2014-02-10 17:54, Tristan Tarrant wrote:
> Hi everybody,
> 
> last week I developed a simple application using Remote Query, and ran 
> into a few issues. Some of them are just technical hurdles, while others 
> have to do with the complexity of the developer experience. Here they 
> are for open discussion:
> 
> - the schemas registry should be persistent. Alternatively being able to 
> either specify the ProtoBuf schema from the <indexing /> configuration 
> in the server subsystem or use server's deployment processor to "deploy" 
> schemas.
> - the server should store the single protobuf source schemas to allow 
> for easy inspection/update of each using our management tools. The 
> server itself should then compile the protobuf schemas into the binary 
> representation when any of the source schemas changes. This would 
> require a Java implementation of the ProtoBuf schema compiler, which 
> wouldn't probably be too hard to do with Antlr.
> - we need to be able to annotate single protobuf fields for indexing 
> (probably by using specially-formatted comments, a la doclets) to avoid 
> indexing all of the fields
> - since remote query is already imbued with JPA in some form, an 
> interesting project would be to implement a JPA annotation processor 
> which can produce a set of ProtoBuf schemas from JPA-annotated classes.
> - on top of the above, a ProtoBuf marshaller/unmarshaller which can use 
> the JPA entities directly.

I already argued in the last few weeks in the same vein but to me
reusing JPA's metadata or API and support 15% of it is going to be
misleading and confusing for the user. Plus it's Java only.

But I agree that by making things use a hand written hard schema we make
things suck equally for all client platforms :)

> 
> Tristan
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

From anistor at redhat.com  Mon Feb 10 13:43:38 2014
From: anistor at redhat.com (Adrian Nistor)
Date: Mon, 10 Feb 2014 20:43:38 +0200
Subject: [infinispan-dev] Remote Query improvements
In-Reply-To: <7A6FCCEF-71A9-4C22-A907-A963BF5B4E1F@redhat.com>
References: <52F9042A.3010205@redhat.com>
	<7A6FCCEF-71A9-4C22-A907-A963BF5B4E1F@redhat.com>
Message-ID: <52F91DDA.4070800@redhat.com>

The idea of auto-generating protobuf schemas based on the marshaller 
code was briefly mentioned last time we met in Palma. I would not 
qualify it as impossible to implement, but it would certainly be hacky 
and leads to more trouble than it's worth.

A lot of info is missing from the marshaller code (API calls) precisely 
because it is not normally needed, being provided by the schema already. 
Now trying to go backwards means we'll have to 'invent' that metadata 
using some common sense (examples: which field is required vs optional, 
which field is indexable, indexing options, etc). Too many options. I 
bet the notion of 'common sense' would quickly need to be configured 
somehow, for uncommon use cases :). But that's why we have protobuf 
schemas for. Plus, to run a marshaller for inferring the schema you'll 
first need a prototypical instance of your entity. Where from? So no, 
-1, now I have serious concerns about this, even though I initially 
nodded in approval.

And that would work only for Java anyway, because the marshaller and the 
schema-infering-process needs to run on the server side.


On 02/10/2014 07:34 PM, Mircea Markus wrote:
> On Feb 10, 2014, at 4:54 PM, Tristan Tarrant <ttarrant at redhat.com> wrote:
>
>> - since remote query is already imbued with JPA in some form, an
>> interesting project would be to implement a JPA annotation processor
>> which can produce a set of ProtoBuf schemas from JPA-annotated classes.
>> - on top of the above, a ProtoBuf marshaller/unmarshaller which can use
>> the JPA entities directly.
> I think it would be even more useful to infer the protbuf schema from the protostream marshaller: the marshaller is required in order to serialize object into the proto format and has the advantage that it works even without Java.
>
> Cheers,


From anistor at redhat.com  Mon Feb 10 13:49:54 2014
From: anistor at redhat.com (Adrian Nistor)
Date: Mon, 10 Feb 2014 20:49:54 +0200
Subject: [infinispan-dev] Remote Query improvements
In-Reply-To: <52F9042A.3010205@redhat.com>
References: <52F9042A.3010205@redhat.com>
Message-ID: <52F91F52.1010805@redhat.com>

Most of this is in jira already, so it would be good to comment there.

#1 = ISPN-3747 & ISPN-3926
#2 = ISPN-3480 (wording is not the same, but it's the same issue)
#3 = ISPN-3718
#4 = ????

On 02/10/2014 06:54 PM, Tristan Tarrant wrote:
> Hi everybody,
>
> last week I developed a simple application using Remote Query, and ran
> into a few issues. Some of them are just technical hurdles, while others
> have to do with the complexity of the developer experience. Here they
> are for open discussion:
>
> - the schemas registry should be persistent. Alternatively being able to
> either specify the ProtoBuf schema from the <indexing /> configuration
> in the server subsystem or use server's deployment processor to "deploy"
> schemas.
> - the server should store the single protobuf source schemas to allow
> for easy inspection/update of each using our management tools. The
> server itself should then compile the protobuf schemas into the binary
> representation when any of the source schemas changes. This would
> require a Java implementation of the ProtoBuf schema compiler, which
> wouldn't probably be too hard to do with Antlr.
> - we need to be able to annotate single protobuf fields for indexing
> (probably by using specially-formatted comments, a la doclets) to avoid
> indexing all of the fields
> - since remote query is already imbued with JPA in some form, an
> interesting project would be to implement a JPA annotation processor
> which can produce a set of ProtoBuf schemas from JPA-annotated classes.
> - on top of the above, a ProtoBuf marshaller/unmarshaller which can use
> the JPA entities directly.
>
> Tristan
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


From ttarrant at redhat.com  Tue Feb 11 02:55:30 2014
From: ttarrant at redhat.com (Tristan Tarrant)
Date: Tue, 11 Feb 2014 08:55:30 +0100
Subject: [infinispan-dev] HotRod near caches
Message-ID: <52F9D772.8080006@redhat.com>

Hi people,

this is a bit of a dump of ideas for getting our HotRod client in shape 
for supporting near caches:

- RemoteCaches should have an optional internal cache. This cache should 
probably be some form of bounded expiration-aware hashmap which would 
serve as a local copy of data retrieved over the wire. In the past we 
have advocated the use of combining an EmbeddedCacheManager with a 
RemoteCacheStore to achieve this, but this is only applicable to Java 
clients, while we need to think of a solution for our other clients too.
- Once remote listeners are in place, a RemoteCache would automatically 
invalidate entries in the near-cache.
- Remote Query should "pass-through" the near-cache, so that entries 
retrieved from a query would essentially be cached locally following the 
same semantics. This can be achieved by having the QUERY verb return 
just the set of matching keys instead of the whole entries
- Optionally we can even think about a query cache which would hash the 
query DSL and store the resulting keys locally so that successive 
invocations of a cached query wouldn't go through the wire. Matching 
this with invalidation is probably a tad more complex, and I'd probably 
avoid going down that path.

Tristan

From mmarkus at redhat.com  Tue Feb 11 04:18:50 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Tue, 11 Feb 2014 09:18:50 +0000
Subject: [infinispan-dev] Remote Query improvements
In-Reply-To: <52F91DDA.4070800@redhat.com>
References: <52F9042A.3010205@redhat.com>
	<7A6FCCEF-71A9-4C22-A907-A963BF5B4E1F@redhat.com>
	<52F91DDA.4070800@redhat.com>
Message-ID: <DFC63EA2-3B20-4299-A3DC-5C9BDF3CC00C@redhat.com>

I guess I put the solution before the problem, but basically where I want to get to is to allow people to write protostream marshallers without requiring them to write the proto file. This would mean the same effort for java users to write either JBMAR marshallers or proto marshallers. If that's possible and people and protostream is as fast as JBMAR (do you have any perf numbers on that BTW?) then we can suggest people use proto marshallers by default.

On Feb 10, 2014, at 6:43 PM, Adrian Nistor <anistor at redhat.com> wrote:

> The idea of auto-generating protobuf schemas based on the marshaller 
> code was briefly mentioned last time we met in Palma. I would not 
> qualify it as impossible to implement, but it would certainly be hacky 
> and leads to more trouble than it's worth.
> 
> A lot of info is missing from the marshaller code (API calls) precisely 
> because it is not normally needed, being provided by the schema already. 
> Now trying to go backwards means we'll have to 'invent' that metadata 
> using some common sense (examples: which field is required vs optional, 
> which field is indexable, indexing options, etc). Too many options. I 
> bet the notion of 'common sense' would quickly need to be configured 
> somehow, for uncommon use cases :). But that's why we have protobuf 
> schemas for. Plus, to run a marshaller for inferring the schema you'll 
> first need a prototypical instance of your entity. Where from? So no, 
> -1, now I have serious concerns about this, even though I initially 
> nodded in approval.
> 
> And that would work only for Java anyway, because the marshaller and the 
> schema-infering-process needs to run on the server side.
> 
> 
> On 02/10/2014 07:34 PM, Mircea Markus wrote:
>> On Feb 10, 2014, at 4:54 PM, Tristan Tarrant <ttarrant at redhat.com> wrote:
>> 
>>> - since remote query is already imbued with JPA in some form, an
>>> interesting project would be to implement a JPA annotation processor
>>> which can produce a set of ProtoBuf schemas from JPA-annotated classes.
>>> - on top of the above, a ProtoBuf marshaller/unmarshaller which can use
>>> the JPA entities directly.
>> I think it would be even more useful to infer the protbuf schema from the protostream marshaller: the marshaller is required in order to serialize object into the proto format and has the advantage that it works even without Java.
>> 
>> Cheers,
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From sanne at infinispan.org  Tue Feb 11 08:53:42 2014
From: sanne at infinispan.org (Sanne Grinovero)
Date: Tue, 11 Feb 2014 13:53:42 +0000
Subject: [infinispan-dev] Remote Query improvements
In-Reply-To: <DFC63EA2-3B20-4299-A3DC-5C9BDF3CC00C@redhat.com>
References: <52F9042A.3010205@redhat.com>
	<7A6FCCEF-71A9-4C22-A907-A963BF5B4E1F@redhat.com>
	<52F91DDA.4070800@redhat.com>
	<DFC63EA2-3B20-4299-A3DC-5C9BDF3CC00C@redhat.com>
Message-ID: <CAFm4XO1U3NtnEk-9yc1G5KVS5ccLR9zha0raBNHwLCmJPNh3KQ@mail.gmail.com>

In my experience people express a bit of confusion on their first
impact with our query/indexing technology as there is a strong
conceptual difference compared to the more familiar relational
databases.

The primary WTF effect is usually on the fact that when a field
included in a query is not indexed the query is not just "slower" but
it won't work at all; we have plans to compensate for that in the
scope of the simplified DSL (and remote queries) to fall-back to an
ad-hoc crafted map/reduce task which essentially implements a table
scan, but I'm thinking now that we should take it a step beyond and do
better.

Another source of trouble is that the fields need not just be indexed,
but also need to be indexed with the correct attributes depending on
the kind of query you mean to run: essentially this leads in practice
for people to need to have a very clear idea of which queries they
will be running.. and over the lifecycle of a complex application this
might become a complex to maintain, especially if you want to keep
peak performance you need to regularly cleanup indexing flags which
are related to queries no longer in use.

Nowadays we do some kind of validation of the queries to catch
situations in which these can't possibly match the metadata we have
about indexed fields, but this validation needs to be quite permissive
to not prevent rare and unusual advanced queries which are technically
valid, although potential candidates for a strong misunderstanding.

This all leads to a single clean solution: if we start from a
declarative set of query definitions, in which each query has the
specific extra metadata needed about their runtime execution (e.g.
using a specific Analyzer on a specific field, query time boosting,
hints about good candidates for filters), then we can actually get rid
of the need to define the indexing attributes at the schema level.

It would still be useful to maintain the current explicit control of
the indexing process: for example you might be building an index which
is consumed by a different application, or you simply know about an
advanced data mining feature that you're building on a custom
Query/Filtering/Collector which bypasses our helpful but constraining
query definition strategy.

Following this proposal, we wouldn't need to bother with extending the
document metadata with indexing annotations (annotations as a non-Java
term) but we'd need to focus on a way to pre-declare all queries users
intend to use.

I admit that this might sound limiting, but consider:
 - serialization of queries and all their potential advanced options
(not many in the remote case so far) needs to be done anyway, and
needs to be language agnostic anyway.
 - we'd be able to better validate complex query structures
 - when a user registers/unregisters "query definitions" from the
server we have a better opportunity to:
 -- cache parsing
 -- cache execution plans
 -- track metrics to improve on the execution plans
 -- adapt the indexes automatically (immediatelly or warn that it
needs to be done before the query is runnable)
 -- I suspect it would be easier to match queries with security ACLs,
both in terms of execution permission but also in terms of scoping on
a subset of the visible data (essentially I'm thinking that the
execution plans could be more advanced and prepare/hint about filter
caching and even adapt the indexing structure to better match the
security constraints).

# Essentially

We need to expose a standard, cross language and declarative form of
the queries the user intends to run from remote, and provide a way to
register these queries on the server, where
registration/deregistration triggers certain actions.

This would not be mandatory as you can still ask for ad-hoc queries,
but these will only take advantage of indexes which happen to exist
because of some registered query, or of no index at all.

I'm proposing for the format to be - initially to support only the
simple functions exposed by the remote DSL - a simple query String,
essentially the HQL we already use but obviously limited to the base
constraints we need. This language will probably evolve in future *if*
we ever want to expose also fulltext over this..

For the embedded query world - less of a priority - we could start
experimenting with more richer and typesafe query definitions, to also
provide the benefit listed above.

-- Sanne


On 11 February 2014 09:18, Mircea Markus <mmarkus at redhat.com> wrote:
> I guess I put the solution before the problem, but basically where I want to get to is to allow people to write protostream marshallers without requiring them to write the proto file. This would mean the same effort for java users to write either JBMAR marshallers or proto marshallers. If that's possible and people and protostream is as fast as JBMAR (do you have any perf numbers on that BTW?) then we can suggest people use proto marshallers by default.
>
> On Feb 10, 2014, at 6:43 PM, Adrian Nistor <anistor at redhat.com> wrote:
>
>> The idea of auto-generating protobuf schemas based on the marshaller
>> code was briefly mentioned last time we met in Palma. I would not
>> qualify it as impossible to implement, but it would certainly be hacky
>> and leads to more trouble than it's worth.
>>
>> A lot of info is missing from the marshaller code (API calls) precisely
>> because it is not normally needed, being provided by the schema already.
>> Now trying to go backwards means we'll have to 'invent' that metadata
>> using some common sense (examples: which field is required vs optional,
>> which field is indexable, indexing options, etc). Too many options. I
>> bet the notion of 'common sense' would quickly need to be configured
>> somehow, for uncommon use cases :). But that's why we have protobuf
>> schemas for. Plus, to run a marshaller for inferring the schema you'll
>> first need a prototypical instance of your entity. Where from? So no,
>> -1, now I have serious concerns about this, even though I initially
>> nodded in approval.
>>
>> And that would work only for Java anyway, because the marshaller and the
>> schema-infering-process needs to run on the server side.
>>
>>
>> On 02/10/2014 07:34 PM, Mircea Markus wrote:
>>> On Feb 10, 2014, at 4:54 PM, Tristan Tarrant <ttarrant at redhat.com> wrote:
>>>
>>>> - since remote query is already imbued with JPA in some form, an
>>>> interesting project would be to implement a JPA annotation processor
>>>> which can produce a set of ProtoBuf schemas from JPA-annotated classes.
>>>> - on top of the above, a ProtoBuf marshaller/unmarshaller which can use
>>>> the JPA entities directly.
>>> I think it would be even more useful to infer the protbuf schema from the protostream marshaller: the marshaller is required in order to serialize object into the proto format and has the advantage that it works even without Java.
>>>
>>> Cheers,
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> Cheers,
> --
> Mircea Markus
> Infinispan lead (www.infinispan.org)
>
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


From mmarkus at redhat.com  Tue Feb 11 09:43:09 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Tue, 11 Feb 2014 14:43:09 +0000
Subject: [infinispan-dev] Remote Query improvements
In-Reply-To: <CAFm4XO1U3NtnEk-9yc1G5KVS5ccLR9zha0raBNHwLCmJPNh3KQ@mail.gmail.com>
References: <52F9042A.3010205@redhat.com>
	<7A6FCCEF-71A9-4C22-A907-A963BF5B4E1F@redhat.com>
	<52F91DDA.4070800@redhat.com>
	<DFC63EA2-3B20-4299-A3DC-5C9BDF3CC00C@redhat.com>
	<CAFm4XO1U3NtnEk-9yc1G5KVS5ccLR9zha0raBNHwLCmJPNh3KQ@mail.gmail.com>
Message-ID: <54B74EFB-1AE2-4E8A-BE21-63F2003CE6AD@redhat.com>


On Feb 11, 2014, at 1:53 PM, Sanne Grinovero <sanne at infinispan.org> wrote:

> In my experience people express a bit of confusion on their first
> impact with our query/indexing technology as there is a strong
> conceptual difference compared to the more familiar relational
> databases.
> 
> The primary WTF effect is usually on the fact that when a field
> included in a query is not indexed the query is not just "slower" but
> it won't work at all; we have plans to compensate for that in the
> scope of the simplified DSL (and remote queries) to fall-back to an
> ad-hoc crafted map/reduce task which essentially implements a table
> scan, but I'm thinking now that we should take it a step beyond and do
> better.
> 
> Another source of trouble is that the fields need not just be indexed,
> but also need to be indexed with the correct attributes depending on
> the kind of query you mean to run:

Can you please elaborate a bit on this? An example in which an indexed field would not be caught by a running entry would be nice :-)

> essentially this leads in practice
> for people to need to have a very clear idea of which queries they
> will be running.. and over the lifecycle of a complex application this
> might become a complex to maintain, especially if you want to keep
> peak performance you need to regularly cleanup indexing flags which
> are related to queries no longer in use.
> 
> Nowadays we do some kind of validation of the queries to catch
> situations in which these can't possibly match the metadata we have
> about indexed fields, but this validation needs to be quite permissive
> to not prevent rare and unusual advanced queries which are technically
> valid, although potential candidates for a strong misunderstanding.
> 
> This all leads to a single clean solution: if we start from a
> declarative set of query definitions, in which each query has the
> specific extra metadata needed about their runtime execution (e.g.
> using a specific Analyzer on a specific field, query time boosting,
> hints about good candidates for filters), then we can actually get rid
> of the need to define the indexing attributes at the schema level.

So the application would need to define the type of queries it will run statically for the entire grid? 
Doesn't look like a familiar model for DB users either.

> 
> It would still be useful to maintain the current explicit control of
> the indexing process: for example you might be building an index which
> is consumed by a different application, or you simply know about an
> advanced data mining feature that you're building on a custom
> Query/Filtering/Collector which bypasses our helpful but constraining
> query definition strategy.
> 
> Following this proposal, we wouldn't need to bother with extending the
> document metadata with indexing annotations (annotations as a non-Java
> term) but we'd need to focus on a way to pre-declare all queries users
> intend to use.
> 
> I admit that this might sound limiting, but consider:
> - serialization of queries and all their potential advanced options
> (not many in the remote case so far) needs to be done anyway, and
> needs to be language agnostic anyway.

- would this be better than using HQL strings?

> - we'd be able to better validate complex query structures
> - when a user registers/unregisters "query definitions" from the
> server we have a better opportunity to:
> -- cache parsing
> -- cache execution plans
> -- track metrics to improve on the execution plans
> -- adapt the indexes automatically (immediatelly or warn that it
> needs to be done before the query is runnable)
> -- I suspect it would be easier to match queries with security ACLs,
> both in terms of execution permission but also in terms of scoping on
> a subset of the visible data (essentially I'm thinking that the
> execution plans could be more advanced and prepare/hint about filter
> caching and even adapt the indexing structure to better match the
> security constraints).
> 
> # Essentially
> 
> We need to expose a standard, cross language and declarative form of
> the queries the user intends to run from remote, and provide a way to
> register these queries on the server, where
> registration/deregistration triggers certain actions.
> 
> This would not be mandatory as you can still ask for ad-hoc queries,
> but these will only take advantage of indexes which happen to exist
> because of some registered query, or of no index at all.
> 
> I'm proposing for the format to be - initially to support only the
> simple functions exposed by the remote DSL - a simple query String,
> essentially the HQL we already use but obviously limited to the base
> constraints we need. This language will probably evolve in future *if*
> we ever want to expose also fulltext over this..
> 
> For the embedded query world - less of a priority - we could start
> experimenting with more richer and typesafe query definitions, to also
> provide the benefit listed above.

I think specifying the querying instead of requiring the user to index specific fields would result in a better user experience. OTOH I think it's not a common model for indexing data, as people are more accustomed to indexing specific fields. For me would also be good to know the limitations that current indexing model has, as at this stage I'm not that familiar with that. 

> 
> -- Sanne
> 
> 
> 
> 
> 
> 
> On 11 February 2014 09:18, Mircea Markus <mmarkus at redhat.com> wrote:
>> I guess I put the solution before the problem, but basically where I want to get to is to allow people to write protostream marshallers without requiring them to write the proto file. This would mean the same effort for java users to write either JBMAR marshallers or proto marshallers. If that's possible and people and protostream is as fast as JBMAR (do you have any perf numbers on that BTW?) then we can suggest people use proto marshallers by default.
>> 
>> On Feb 10, 2014, at 6:43 PM, Adrian Nistor <anistor at redhat.com> wrote:
>> 
>>> The idea of auto-generating protobuf schemas based on the marshaller
>>> code was briefly mentioned last time we met in Palma. I would not
>>> qualify it as impossible to implement, but it would certainly be hacky
>>> and leads to more trouble than it's worth.
>>> 
>>> A lot of info is missing from the marshaller code (API calls) precisely
>>> because it is not normally needed, being provided by the schema already.
>>> Now trying to go backwards means we'll have to 'invent' that metadata
>>> using some common sense (examples: which field is required vs optional,
>>> which field is indexable, indexing options, etc). Too many options. I
>>> bet the notion of 'common sense' would quickly need to be configured
>>> somehow, for uncommon use cases :). But that's why we have protobuf
>>> schemas for. Plus, to run a marshaller for inferring the schema you'll
>>> first need a prototypical instance of your entity. Where from? So no,
>>> -1, now I have serious concerns about this, even though I initially
>>> nodded in approval.
>>> 
>>> And that would work only for Java anyway, because the marshaller and the
>>> schema-infering-process needs to run on the server side.
>>> 
>>> 
>>> On 02/10/2014 07:34 PM, Mircea Markus wrote:
>>>> On Feb 10, 2014, at 4:54 PM, Tristan Tarrant <ttarrant at redhat.com> wrote:
>>>> 
>>>>> - since remote query is already imbued with JPA in some form, an
>>>>> interesting project would be to implement a JPA annotation processor
>>>>> which can produce a set of ProtoBuf schemas from JPA-annotated classes.
>>>>> - on top of the above, a ProtoBuf marshaller/unmarshaller which can use
>>>>> the JPA entities directly.
>>>> I think it would be even more useful to infer the protbuf schema from the protostream marshaller: the marshaller is required in order to serialize object into the proto format and has the advantage that it works even without Java.
>>>> 
>>>> Cheers,
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> Cheers,
>> --
>> Mircea Markus
>> Infinispan lead (www.infinispan.org)
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From mmarkus at redhat.com  Wed Feb 12 05:40:43 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Wed, 12 Feb 2014 10:40:43 +0000
Subject: [infinispan-dev] ClusteredListeners: message delivered twice
Message-ID: <6F2DD24C-4712-48AF-ABCD-7CE280089875@redhat.com>

Hey Will,

With the current design, during a topology change, an event might be delivered twice to a cluster listener. I think we might be able to identify such situations (a node becomes a key owner as a result of the topology change) and add this information to the event we send, e.g. a flag "potentiallyDuplicate" or something like that. Event implementors might be able to make good use of this, e.g. checking their internal state if an event is redelivered or not. What do you think? Are there any other more-than-once delivery situations we can't keep track of?

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From sanne at infinispan.org  Thu Feb 13 13:24:35 2014
From: sanne at infinispan.org (Sanne Grinovero)
Date: Thu, 13 Feb 2014 18:24:35 +0000
Subject: [infinispan-dev] HotRod near caches
In-Reply-To: <52F9D772.8080006@redhat.com>
References: <52F9D772.8080006@redhat.com>
Message-ID: <CAFm4XO17mU01axCAjZtJCUxKQqs32qH+LSKgZOYB3rn3fBMOKg@mail.gmail.com>

On 11 February 2014 07:55, Tristan Tarrant <ttarrant at redhat.com> wrote:
> Hi people,
>
> this is a bit of a dump of ideas for getting our HotRod client in shape
> for supporting near caches:
>
> - RemoteCaches should have an optional internal cache. This cache should
> probably be some form of bounded expiration-aware hashmap which would
> serve as a local copy of data retrieved over the wire. In the past we
> have advocated the use of combining an EmbeddedCacheManager with a
> RemoteCacheStore to achieve this, but this is only applicable to Java
> clients, while we need to think of a solution for our other clients too.

True we need a general solution, but only as a design: we can still
think of using an EmbeddedCacheManager as an implementation detail for
the JVM based clients right?
For other languages, I'd probably pick a mature and well known cache
from each language.

We'd probably want to mask Flag usage: for example SKIP_CACHE_LOAD
should only apply on the server nodes.

Also we'd probably want to verify that a failure of an operation on
our "cachestore" is not going to provide misleading messages, or being
ignored altogether when running in independent threads.

> - Once remote listeners are in place, a RemoteCache would automatically
> invalidate entries in the near-cache.

This is the point concerning me the most: I suspect there are so many
different ways in which this could get out of synch!
Essentially let's consider that a client requiring this level of
consistency is becoming part of the distributed system.
I'm not against doing it, just that I'm having the impression its
complexity is being underestimated.

> - Remote Query should "pass-through" the near-cache, so that entries
> retrieved from a query would essentially be cached locally following the
> same semantics. This can be achieved by having the QUERY verb return
> just the set of matching keys instead of the whole entries

+1, or even better - to avoid multiple roundtrips - we just store the
indivual results in the local cache.
The downside is that the gathering phase of query results might not be
taking advantage of the locally stored individual entries (when they
match); the good news is we have a similar case with Hibernate
Search/ORM dealing with 2nd level cache, for which we expose an option
to get a hint from the user: we could do the same.

> - Optionally we can even think about a query cache which would hash the
> query DSL and store the resulting keys locally so that successive
> invocations of a cached query wouldn't go through the wire. Matching
> this with invalidation is probably a tad more complex, and I'd probably
> avoid going down that path.

I'd agree especially in the first phase, but if needed that is
essentially just a continuous query so we can build on top of that.

Thanks for starting this!
Sanne

>
> Tristan
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

From mmarkus at redhat.com  Fri Feb 14 08:07:30 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Fri, 14 Feb 2014 13:07:30 +0000
Subject: [infinispan-dev] HotRod near caches
In-Reply-To: <CAFm4XO17mU01axCAjZtJCUxKQqs32qH+LSKgZOYB3rn3fBMOKg@mail.gmail.com>
References: <52F9D772.8080006@redhat.com>
	<CAFm4XO17mU01axCAjZtJCUxKQqs32qH+LSKgZOYB3rn3fBMOKg@mail.gmail.com>
Message-ID: <AB238C9C-39B9-45BC-8A73-8AF2D9476261@redhat.com>


On Feb 13, 2014, at 6:24 PM, Sanne Grinovero <sanne at infinispan.org> wrote:

> On 11 February 2014 07:55, Tristan Tarrant <ttarrant at redhat.com> wrote:
>> Hi people,
>> 
>> this is a bit of a dump of ideas for getting our HotRod client in shape
>> for supporting near caches:
>> 
>> - RemoteCaches should have an optional internal cache. This cache should
>> probably be some form of bounded expiration-aware hashmap which would
>> serve as a local copy of data retrieved over the wire. In the past we
>> have advocated the use of combining an EmbeddedCacheManager with a
>> RemoteCacheStore to achieve this, but this is only applicable to Java
>> clients, while we need to think of a solution for our other clients too.
> 
> True we need a general solution, but only as a design: we can still
> think of using an EmbeddedCacheManager as an implementation detail for
> the JVM based clients right?
> For other languages, I'd probably pick a mature and well known cache
> from each language.

+1

> 
> We'd probably want to mask Flag usage: for example SKIP_CACHE_LOAD
> should only apply on the server nodes.
> 
> Also we'd probably want to verify that a failure of an operation on
> our "cachestore" is not going to provide misleading messages, or being
> ignored altogether when running in independent threads.

Having the RemoteCacheManager and EmbeddedCacheManager following a common ancestry has caused a lot of confusion in the community, with people trying to replace one with the 
other and not succeeding. Might be worth splitting them, and then add/keep the relevant flags for HotRod java client only.

> 
>> - Once remote listeners are in place, a RemoteCache would automatically
>> invalidate entries in the near-cache.
> 
> This is the point concerning me the most: I suspect there are so many
> different ways in which this could get out of synch!
> Essentially let's consider that a client requiring this level of
> consistency is becoming part of the distributed system.
> I'm not against doing it, just that I'm having the impression its
> complexity is being underestimated.
> 
>> - Remote Query should "pass-through" the near-cache, so that entries
>> retrieved from a query would essentially be cached locally following the
>> same semantics. This can be achieved by having the QUERY verb return
>> just the set of matching keys instead of the whole entries
> 
> +1, or even better - to avoid multiple roundtrips - we just store the
> indivual results in the local cache.
> The downside is that the gathering phase of query results might not be
> taking advantage of the locally stored individual entries (when they
> match); the good news is we have a similar case with Hibernate
> Search/ORM dealing with 2nd level cache, for which we expose an option
> to get a hint from the user: we could do the same.

also this would not work if the queries project data, instead of returning fully fledged entries.

> 
>> - Optionally we can even think about a query cache which would hash the
>> query DSL and store the resulting keys locally so that successive
>> invocations of a cached query wouldn't go through the wire. Matching
>> this with invalidation is probably a tad more complex, and I'd probably
>> avoid going down that path.
> 
> I'd agree especially in the first phase, but if needed that is
> essentially just a continuous query so we can build on top of that.
> 
> Thanks for starting this!
> Sanne
> 
>> 
>> Tristan
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From vagvaz at gmail.com  Fri Feb 14 10:10:55 2014
From: vagvaz at gmail.com (Evangelos Vazaios)
Date: Fri, 14 Feb 2014 17:10:55 +0200
Subject: [infinispan-dev] MapReduce limitations and suggestions.
Message-ID: <52FE31FF.5050507@gmail.com>

Hello everyone,

I started using the MapReduce implementation of Infinispan and I came
across some possible limitations. Thus,  I want to make some suggestions
about the MapReduce (MR) implementation of Infinispan.
Depending on the algorithm,  there might be some memory problems,
especially for intermediate results.
An example of such a case is  group by. Suppose that we have a cluster
of 2 nodes with 2 GB  available. Let a distributed cache, where simple
car objects (id,brand,colour) are stored and the total size of data is
3.5GB. If all objects have the same colour , then all 3.5 GB would go to
only one reducer, as a result an OutOfMemoryException will be thrown.

To overcome these limitations, I propose to add as parameter the name of
the intermediate cache to be used. This will enable the creation of a
custom configured cache that deals with the memory limitations.

Another feature that I would like to have is to set the name of the
output cache. The reasoning behind this is similar to the one mentioned
above.

I wait for your thoughts on these two suggestions.

Regards,
Evangelos

From ttarrant at redhat.com  Fri Feb 14 10:16:05 2014
From: ttarrant at redhat.com (Tristan Tarrant)
Date: Fri, 14 Feb 2014 16:16:05 +0100
Subject: [infinispan-dev] MapReduce limitations and suggestions.
In-Reply-To: <52FE31FF.5050507@gmail.com>
References: <52FE31FF.5050507@gmail.com>
Message-ID: <52FE3335.1070806@redhat.com>

Hi Evangelos,

you might be interested in looking into a current pull request which 
addresses some (all?) of these issues

https://github.com/infinispan/infinispan/pull/2300

Tristan

On 14/02/2014 16:10, Evangelos Vazaios wrote:
> Hello everyone,
>
> I started using the MapReduce implementation of Infinispan and I came
> across some possible limitations. Thus,  I want to make some suggestions
> about the MapReduce (MR) implementation of Infinispan.
> Depending on the algorithm,  there might be some memory problems,
> especially for intermediate results.
> An example of such a case is  group by. Suppose that we have a cluster
> of 2 nodes with 2 GB  available. Let a distributed cache, where simple
> car objects (id,brand,colour) are stored and the total size of data is
> 3.5GB. If all objects have the same colour , then all 3.5 GB would go to
> only one reducer, as a result an OutOfMemoryException will be thrown.
>
> To overcome these limitations, I propose to add as parameter the name of
> the intermediate cache to be used. This will enable the creation of a
> custom configured cache that deals with the memory limitations.
>
> Another feature that I would like to have is to set the name of the
> output cache. The reasoning behind this is similar to the one mentioned
> above.
>
> I wait for your thoughts on these two suggestions.
>
> Regards,
> Evangelos
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>


From vblagoje at redhat.com  Fri Feb 14 10:54:27 2014
From: vblagoje at redhat.com (Vladimir Blagojevic)
Date: Fri, 14 Feb 2014 10:54:27 -0500
Subject: [infinispan-dev] MapReduce limitations and suggestions.
In-Reply-To: <52FE3335.1070806@redhat.com>
References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com>
Message-ID: <52FE3C33.3070107@redhat.com>

Tristan,

Actually they are not addressed in this pull request but the feature 
where custom output cache is used instead of results being returned is 
next in the implementation pipeline.

Evangelos, indeed, depending on a reducer function all intermediate 
KOut/VOut pairs might be moved to a single node. How would custom cache 
help in this case?

Regards,
Vladimir


On 2/14/2014, 10:16 AM, Tristan Tarrant wrote:
> Hi Evangelos,
>
> you might be interested in looking into a current pull request which
> addresses some (all?) of these issues
>
> https://github.com/infinispan/infinispan/pull/2300
>
> Tristan
>
> On 14/02/2014 16:10, Evangelos Vazaios wrote:
>> Hello everyone,
>>
>> I started using the MapReduce implementation of Infinispan and I came
>> across some possible limitations. Thus,  I want to make some suggestions
>> about the MapReduce (MR) implementation of Infinispan.
>> Depending on the algorithm,  there might be some memory problems,
>> especially for intermediate results.
>> An example of such a case is  group by. Suppose that we have a cluster
>> of 2 nodes with 2 GB  available. Let a distributed cache, where simple
>> car objects (id,brand,colour) are stored and the total size of data is
>> 3.5GB. If all objects have the same colour , then all 3.5 GB would go to
>> only one reducer, as a result an OutOfMemoryException will be thrown.
>>
>> To overcome these limitations, I propose to add as parameter the name of
>> the intermediate cache to be used. This will enable the creation of a
>> custom configured cache that deals with the memory limitations.
>>
>> Another feature that I would like to have is to set the name of the
>> output cache. The reasoning behind this is similar to the one mentioned
>> above.
>>
>> I wait for your thoughts on these two suggestions.
>>
>> Regards,
>> Evangelos
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


From rvansa at redhat.com  Mon Feb 17 02:48:28 2014
From: rvansa at redhat.com (Radim Vansa)
Date: Mon, 17 Feb 2014 08:48:28 +0100
Subject: [infinispan-dev] MapReduce limitations and suggestions.
In-Reply-To: <52FE3C33.3070107@redhat.com>
References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com>
	<52FE3C33.3070107@redhat.com>
Message-ID: <5301BECC.7010901@redhat.com>

I think that the intermediate cache is not required at all. The M/R 
algorithm itself can (and should!) run with memory occupied by the 
result of reduction. The current implementation with Map first and 
Reduce after that will always have these problems, using a cache for 
temporary caching the result is only a workaround.

The only situation when temporary cache could be useful is when the 
result grows linearly (or close to that or even more) with the amount of 
reduced entries. This would be the case for groupBy producing Map<Color, 
List<Entry>> from all entries in cache. Then the task does not scale and 
should be redesigned anyway, but flushing the results into cache backed 
by cache store could help.

Radim

On 02/14/2014 04:54 PM, Vladimir Blagojevic wrote:
> Tristan,
>
> Actually they are not addressed in this pull request but the feature
> where custom output cache is used instead of results being returned is
> next in the implementation pipeline.
>
> Evangelos, indeed, depending on a reducer function all intermediate
> KOut/VOut pairs might be moved to a single node. How would custom cache
> help in this case?
>
> Regards,
> Vladimir
>
>
> On 2/14/2014, 10:16 AM, Tristan Tarrant wrote:
>> Hi Evangelos,
>>
>> you might be interested in looking into a current pull request which
>> addresses some (all?) of these issues
>>
>> https://github.com/infinispan/infinispan/pull/2300
>>
>> Tristan
>>
>> On 14/02/2014 16:10, Evangelos Vazaios wrote:
>>> Hello everyone,
>>>
>>> I started using the MapReduce implementation of Infinispan and I came
>>> across some possible limitations. Thus,  I want to make some suggestions
>>> about the MapReduce (MR) implementation of Infinispan.
>>> Depending on the algorithm,  there might be some memory problems,
>>> especially for intermediate results.
>>> An example of such a case is  group by. Suppose that we have a cluster
>>> of 2 nodes with 2 GB  available. Let a distributed cache, where simple
>>> car objects (id,brand,colour) are stored and the total size of data is
>>> 3.5GB. If all objects have the same colour , then all 3.5 GB would go to
>>> only one reducer, as a result an OutOfMemoryException will be thrown.
>>>
>>> To overcome these limitations, I propose to add as parameter the name of
>>> the intermediate cache to be used. This will enable the creation of a
>>> custom configured cache that deals with the memory limitations.
>>>
>>> Another feature that I would like to have is to set the name of the
>>> output cache. The reasoning behind this is similar to the one mentioned
>>> above.
>>>
>>> I wait for your thoughts on these two suggestions.
>>>
>>> Regards,
>>> Evangelos
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


-- 
Radim Vansa <rvansa at redhat.com>
JBoss DataGrid QA


From etienne.riviere at unine.ch  Mon Feb 17 03:18:38 2014
From: etienne.riviere at unine.ch (Etienne Riviere)
Date: Mon, 17 Feb 2014 09:18:38 +0100
Subject: [infinispan-dev] MapReduce limitations and suggestions.
In-Reply-To: <5301BECC.7010901@redhat.com>
References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com>
	<52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com>
Message-ID: <BD90C3B8-C267-4128-B662-392D846CDFF0@unine.ch>

Hi Radim,

I might misunderstand your suggestion but many M/R jobs actually require to run the two phases one after the other, and henceforth to store the intermediate results somewhere. While some may slightly reduce intermediate memory usage by using a combiner function (e.g., the word-count example), I don?t see how we can avoid intermediate storage altogether.

Thanks,
Etienne (leads project ? as Evangelos who initiated the thread)

On 17 Feb 2014, at 08:48, Radim Vansa <rvansa at redhat.com> wrote:

> I think that the intermediate cache is not required at all. The M/R 
> algorithm itself can (and should!) run with memory occupied by the 
> result of reduction. The current implementation with Map first and 
> Reduce after that will always have these problems, using a cache for 
> temporary caching the result is only a workaround.
> 
> The only situation when temporary cache could be useful is when the 
> result grows linearly (or close to that or even more) with the amount of 
> reduced entries. This would be the case for groupBy producing Map<Color, 
> List<Entry>> from all entries in cache. Then the task does not scale and 
> should be redesigned anyway, but flushing the results into cache backed 
> by cache store could help.
> 
> Radim
> 
> On 02/14/2014 04:54 PM, Vladimir Blagojevic wrote:
>> Tristan,
>> 
>> Actually they are not addressed in this pull request but the feature
>> where custom output cache is used instead of results being returned is
>> next in the implementation pipeline.
>> 
>> Evangelos, indeed, depending on a reducer function all intermediate
>> KOut/VOut pairs might be moved to a single node. How would custom cache
>> help in this case?
>> 
>> Regards,
>> Vladimir
>> 
>> 
>> On 2/14/2014, 10:16 AM, Tristan Tarrant wrote:
>>> Hi Evangelos,
>>> 
>>> you might be interested in looking into a current pull request which
>>> addresses some (all?) of these issues
>>> 
>>> https://github.com/infinispan/infinispan/pull/2300
>>> 
>>> Tristan
>>> 
>>> On 14/02/2014 16:10, Evangelos Vazaios wrote:
>>>> Hello everyone,
>>>> 
>>>> I started using the MapReduce implementation of Infinispan and I came
>>>> across some possible limitations. Thus,  I want to make some suggestions
>>>> about the MapReduce (MR) implementation of Infinispan.
>>>> Depending on the algorithm,  there might be some memory problems,
>>>> especially for intermediate results.
>>>> An example of such a case is  group by. Suppose that we have a cluster
>>>> of 2 nodes with 2 GB  available. Let a distributed cache, where simple
>>>> car objects (id,brand,colour) are stored and the total size of data is
>>>> 3.5GB. If all objects have the same colour , then all 3.5 GB would go to
>>>> only one reducer, as a result an OutOfMemoryException will be thrown.
>>>> 
>>>> To overcome these limitations, I propose to add as parameter the name of
>>>> the intermediate cache to be used. This will enable the creation of a
>>>> custom configured cache that deals with the memory limitations.
>>>> 
>>>> Another feature that I would like to have is to set the name of the
>>>> output cache. The reasoning behind this is similar to the one mentioned
>>>> above.
>>>> 
>>>> I wait for your thoughts on these two suggestions.
>>>> 
>>>> Regards,
>>>> Evangelos
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>> 
>>>> 
>>> 
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> 
> -- 
> Radim Vansa <rvansa at redhat.com>
> JBoss DataGrid QA
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


From dan.berindei at gmail.com  Mon Feb 17 03:42:22 2014
From: dan.berindei at gmail.com (Dan Berindei)
Date: Mon, 17 Feb 2014 10:42:22 +0200
Subject: [infinispan-dev] MapReduce limitations and suggestions.
In-Reply-To: <BD90C3B8-C267-4128-B662-392D846CDFF0@unine.ch>
References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com>
	<52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com>
	<BD90C3B8-C267-4128-B662-392D846CDFF0@unine.ch>
Message-ID: <CA+nfvwTykZgm=YVaGBkqMgHDvA+qTNhT9A2goy9gXU6yiW9aWg@mail.gmail.com>

Hi Etienne

I was going to suggest using a combiner - the combiner would process the
mapper results from just one node, so you should need at most double the
memory on that node. I guess we could reduce the memory requirements even
more if the combiner could run concurrently with the mapper... Vladimir,
does it sound like a reasonable feature request?

I'm afraid in your situation using a cache store wouldn't help, as the
intermediate values for the same key are stored as a list in a single
entry. So if all cars are red, there would be just one intermediate key in
the intermediate cache, and there would be nothing to evict to the cache
store. Vladimir, do you think we could somehow "chunk" the intermediary
values into multiple entries grouped by the intermediary key, to support
this scenario?

For reference, though, a limited version of what you're asking for is
already available. You can change the configuration of the intermediary
cache by defining a "__tmpMapReduce" cache in your configuration. That
configuration will be used for all M/R tasks, whether they use the shared
intermediate cache or they create their own.

Cheers
Dan


On Mon, Feb 17, 2014 at 10:18 AM, Etienne Riviere
<etienne.riviere at unine.ch>wrote:

> Hi Radim,
>
> I might misunderstand your suggestion but many M/R jobs actually require
> to run the two phases one after the other, and henceforth to store the
> intermediate results somewhere. While some may slightly reduce intermediate
> memory usage by using a combiner function (e.g., the word-count example), I
> don't see how we can avoid intermediate storage altogether.
>
> Thanks,
> Etienne (leads project -- as Evangelos who initiated the thread)
>
> On 17 Feb 2014, at 08:48, Radim Vansa <rvansa at redhat.com> wrote:
>
> > I think that the intermediate cache is not required at all. The M/R
> > algorithm itself can (and should!) run with memory occupied by the
> > result of reduction. The current implementation with Map first and
> > Reduce after that will always have these problems, using a cache for
> > temporary caching the result is only a workaround.
> >
> > The only situation when temporary cache could be useful is when the
> > result grows linearly (or close to that or even more) with the amount of
> > reduced entries. This would be the case for groupBy producing Map<Color,
> > List<Entry>> from all entries in cache. Then the task does not scale and
> > should be redesigned anyway, but flushing the results into cache backed
> > by cache store could help.
> >
> > Radim
> >
> > On 02/14/2014 04:54 PM, Vladimir Blagojevic wrote:
> >> Tristan,
> >>
> >> Actually they are not addressed in this pull request but the feature
> >> where custom output cache is used instead of results being returned is
> >> next in the implementation pipeline.
> >>
> >> Evangelos, indeed, depending on a reducer function all intermediate
> >> KOut/VOut pairs might be moved to a single node. How would custom cache
> >> help in this case?
> >>
> >> Regards,
> >> Vladimir
> >>
> >>
> >> On 2/14/2014, 10:16 AM, Tristan Tarrant wrote:
> >>> Hi Evangelos,
> >>>
> >>> you might be interested in looking into a current pull request which
> >>> addresses some (all?) of these issues
> >>>
> >>> https://github.com/infinispan/infinispan/pull/2300
> >>>
> >>> Tristan
> >>>
> >>> On 14/02/2014 16:10, Evangelos Vazaios wrote:
> >>>> Hello everyone,
> >>>>
> >>>> I started using the MapReduce implementation of Infinispan and I came
> >>>> across some possible limitations. Thus,  I want to make some
> suggestions
> >>>> about the MapReduce (MR) implementation of Infinispan.
> >>>> Depending on the algorithm,  there might be some memory problems,
> >>>> especially for intermediate results.
> >>>> An example of such a case is  group by. Suppose that we have a cluster
> >>>> of 2 nodes with 2 GB  available. Let a distributed cache, where simple
> >>>> car objects (id,brand,colour) are stored and the total size of data is
> >>>> 3.5GB. If all objects have the same colour , then all 3.5 GB would go
> to
> >>>> only one reducer, as a result an OutOfMemoryException will be thrown.
> >>>>
> >>>> To overcome these limitations, I propose to add as parameter the name
> of
> >>>> the intermediate cache to be used. This will enable the creation of a
> >>>> custom configured cache that deals with the memory limitations.
> >>>>
> >>>> Another feature that I would like to have is to set the name of the
> >>>> output cache. The reasoning behind this is similar to the one
> mentioned
> >>>> above.
> >>>>
> >>>> I wait for your thoughts on these two suggestions.
> >>>>
> >>>> Regards,
> >>>> Evangelos
> >>>> _______________________________________________
> >>>> infinispan-dev mailing list
> >>>> infinispan-dev at lists.jboss.org
> >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>>>
> >>>>
> >>>
> >>>
> >>> _______________________________________________
> >>> infinispan-dev mailing list
> >>> infinispan-dev at lists.jboss.org
> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >> _______________________________________________
> >> infinispan-dev mailing list
> >> infinispan-dev at lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> >
> > --
> > Radim Vansa <rvansa at redhat.com>
> > JBoss DataGrid QA
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140217/7c747384/attachment-0001.html 

From vagvaz at gmail.com  Mon Feb 17 06:48:29 2014
From: vagvaz at gmail.com (Evangelos Vazaios)
Date: Mon, 17 Feb 2014 13:48:29 +0200
Subject: [infinispan-dev] MapReduce limitations and suggestions.
In-Reply-To: <mailman.242.1392626570.1877.infinispan-dev@lists.jboss.org>
References: <mailman.242.1392626570.1877.infinispan-dev@lists.jboss.org>
Message-ID: <5301F70D.6050605@gmail.com>

On 02/17/2014 10:42 AM, infinispan-dev-request at lists.jboss.org wrote:
> Hi Etienne
> 
> I was going to suggest using a combiner - the combiner would process the
> mapper results from just one node, so you should need at most double the
> memory on that node. I guess we could reduce the memory requirements even
> more if the combiner could run concurrently with the mapper... Vladimir,
> does it sound like a reasonable feature request?
> 
There are algorithms where combiners cannot be applied.
> I'm afraid in your situation using a cache store wouldn't help, as the
> intermediate values for the same key are stored as a list in a single
> entry. So if all cars are red, there would be just one intermediate key in
> the intermediate cache, and there would be nothing to evict to the cache
> store. Vladimir, do you think we could somehow "chunk" the intermediary
> values into multiple entries grouped by the intermediary key, to support
> this scenario?
> 
I was thinking a custom cache implementation that maintains the overall
size of cache and each key individually and when a threshold is reached
it spills things on disk. Note that I am not familiar with the internals
of Infinispan, but I think it is doable. Such a cache solves the problem
in both cases (when one key is too large to be in memory as my example
and the case where the keys assigned to one reducer exceeds its memory).
> For reference, though, a limited version of what you're asking for is
> already available. You can change the configuration of the intermediary
> cache by defining a "__tmpMapReduce" cache in your configuration. That
> configuration will be used for all M/R tasks, whether they use the shared
> intermediate cache or they create their own.
> 
I have one question about this. if I start two MR tasks at once will
these tasks use the same Cache? thus, the intermediate results are going
to be mixed?. This cache can be used in order as a test case.


Regards,
Evangelos

> Cheers
> Dan
> 
> 
> 
> On Mon, Feb 17, 2014 at 10:18 AM, Etienne Riviere
> <etienne.riviere at unine.ch>wrote:
> 
>> > Hi Radim,
>> >
>> > I might misunderstand your suggestion but many M/R jobs actually require
>> > to run the two phases one after the other, and henceforth to store the
>> > intermediate results somewhere. While some may slightly reduce intermediate
>> > memory usage by using a combiner function (e.g., the word-count example), I
>> > don't see how we can avoid intermediate storage altogether.
>> >
>> > Thanks,
>> > Etienne (leads project -- as Evangelos who initiated the thread)
>> >
>> > On 17 Feb 2014, at 08:48, Radim Vansa <rvansa at redhat.com> wrote:
>> >
>>> > > I think that the intermediate cache is not required at all. The M/R
>>> > > algorithm itself can (and should!) run with memory occupied by the
>>> > > result of reduction. The current implementation with Map first and
>>> > > Reduce after that will always have these problems, using a cache for
>>> > > temporary caching the result is only a workaround.
>>> > >
>>> > > The only situation when temporary cache could be useful is when the
>>> > > result grows linearly (or close to that or even more) with the amount of
>>> > > reduced entries. This would be the case for groupBy producing Map<Color,
>>> > > List<Entry>> from all entries in cache. Then the task does not scale and
>>> > > should be redesigned anyway, but flushing the results into cache backed
>>> > > by cache store could help.
>>> > >
>>> > > Radim
>>> > >
>>> > > On 02/14/2014 04:54 PM, Vladimir Blagojevic wrote:
>>>> > >> Tristan,
>>>> > >>
>>>> > >> Actually they are not addressed in this pull request but the feature
>>>> > >> where custom output cache is used instead of results being returned is
>>>> > >> next in the implementation pipeline.
>>>> > >>
>>>> > >> Evangelos, indeed, depending on a reducer function all intermediate
>>>> > >> KOut/VOut pairs might be moved to a single node. How would custom cache
>>>> > >> help in this case?
>>>> > >>
>>>> > >> Regards,
>>>> > >> Vladimir
>>>> > >>
>>>> > >>
>>>> > >> On 2/14/2014, 10:16 AM, Tristan Tarrant wrote:
>>>>> > >>> Hi Evangelos,
>>>>> > >>>
>>>>> > >>> you might be interested in looking into a current pull request which
>>>>> > >>> addresses some (all?) of these issues
>>>>> > >>>
>>>>> > >>> https://github.com/infinispan/infinispan/pull/2300
>>>>> > >>>
>>>>> > >>> Tristan
>>>>> > >>>
>>>>> > >>> On 14/02/2014 16:10, Evangelos Vazaios wrote:
>>>>>> > >>>> Hello everyone,
>>>>>> > >>>>
>>>>>> > >>>> I started using the MapReduce implementation of Infinispan and I came
>>>>>> > >>>> across some possible limitations. Thus,  I want to make some
>> > suggestions
>>>>>> > >>>> about the MapReduce (MR) implementation of Infinispan.
>>>>>> > >>>> Depending on the algorithm,  there might be some memory problems,
>>>>>> > >>>> especially for intermediate results.
>>>>>> > >>>> An example of such a case is  group by. Suppose that we have a cluster
>>>>>> > >>>> of 2 nodes with 2 GB  available. Let a distributed cache, where simple
>>>>>> > >>>> car objects (id,brand,colour) are stored and the total size of data is
>>>>>> > >>>> 3.5GB. If all objects have the same colour , then all 3.5 GB would go
>> > to
>>>>>> > >>>> only one reducer, as a result an OutOfMemoryException will be thrown.
>>>>>> > >>>>
>>>>>> > >>>> To overcome these limitations, I propose to add as parameter the name
>> > of
>>>>>> > >>>> the intermediate cache to be used. This will enable the creation of a
>>>>>> > >>>> custom configured cache that deals with the memory limitations.
>>>>>> > >>>>
>>>>>> > >>>> Another feature that I would like to have is to set the name of the
>>>>>> > >>>> output cache. The reasoning behind this is similar to the one
>> > mentioned
>>>>>> > >>>> above.
>>>>>> > >>>>
>>>>>> > >>>> I wait for your thoughts on these two suggestions.
>>>>>> > >>>>
>>>>>> > >>>> Regards,
>>>>>> > >>>> Evangelos
>>>>>> > >>>> _______________________________________________
>>>>>> > >>>> infinispan-dev mailing list
>>>>>> > >>>> infinispan-dev at lists.jboss.org
>>>>>> > >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>> > >>>>
>>>>>> > >>>>
>>>>> > >>>
>>>>> > >>>
>>>>> > >>> _______________________________________________
>>>>> > >>> infinispan-dev mailing list
>>>>> > >>> infinispan-dev at lists.jboss.org
>>>>> > >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>> > >> _______________________________________________
>>>> > >> infinispan-dev mailing list
>>>> > >> infinispan-dev at lists.jboss.org
>>>> > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> > >
>>> > >
>>> > > --
>>> > > Radim Vansa <rvansa at redhat.com>
>>> > > JBoss DataGrid QA
>>> > >
>>> > > _______________________________________________
>>> > > infinispan-dev mailing list
>>> > > infinispan-dev at lists.jboss.org
>>> > > https://lists.jboss.org/mailman/listinfo/infinispan-d


From sanne at infinispan.org  Mon Feb 17 07:25:35 2014
From: sanne at infinispan.org (Sanne Grinovero)
Date: Mon, 17 Feb 2014 12:25:35 +0000
Subject: [infinispan-dev] MapReduce limitations and suggestions.
In-Reply-To: <CA+nfvwTykZgm=YVaGBkqMgHDvA+qTNhT9A2goy9gXU6yiW9aWg@mail.gmail.com>
References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com>
	<52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com>
	<BD90C3B8-C267-4128-B662-392D846CDFF0@unine.ch>
	<CA+nfvwTykZgm=YVaGBkqMgHDvA+qTNhT9A2goy9gXU6yiW9aWg@mail.gmail.com>
Message-ID: <CAFm4XO0zdPfqE0uqKBdz9xysRRMr8JJR6nUdJ0DA0h=DcEGLxg@mail.gmail.com>

On 17 February 2014 08:42, Dan Berindei <dan.berindei at gmail.com> wrote:
> Hi Etienne
>
> I was going to suggest using a combiner - the combiner would process the
> mapper results from just one node, so you should need at most double the
> memory on that node. I guess we could reduce the memory requirements even
> more if the combiner could run concurrently with the mapper... Vladimir,
> does it sound like a reasonable feature request?

Yes that's something I've discussed with Vladimir before.
The problem is - as the LEADS experts of M/R like Evangelios explained
to us in London - is that in many practical use cases you can't apply
a combiner, and the Reducer needs to be run on the full set.

I think Evangelios also mentioned that the actual the set processed by
the Reducer is also expected to be sorted, apparently it's
"interesting" that we don't do such things. This can be taken as a
negative point as not all problems are solvable, but is also making it
interesting for being able to resolve some other problems with a
higher level of efficiency so it's not necessarily something that we
might want to throw away.

Might be interesting to keep our design with the current limitations,
and to also pursue a second mode of operation in which we make a good
Hadoop integration, to not reinvent the wheel in the area of the more
complex tasks, also providing the benefit of API compatibility to
allow other systems such as Apache Nutch and Mahout to run on
Infinispan without significant changes.

>
> I'm afraid in your situation using a cache store wouldn't help, as the
> intermediate values for the same key are stored as a list in a single entry.
> So if all cars are red, there would be just one intermediate key in the
> intermediate cache, and there would be nothing to evict to the cache store.
> Vladimir, do you think we could somehow "chunk" the intermediary values into
> multiple entries grouped by the intermediary key, to support this scenario?
>
> For reference, though, a limited version of what you're asking for is
> already available. You can change the configuration of the intermediary
> cache by defining a "__tmpMapReduce" cache in your configuration. That
> configuration will be used for all M/R tasks, whether they use the shared
> intermediate cache or they create their own.

I really hope we can get rid of temporary caches, but if need be
please make sure each task has an isolated execution context: names of
temporary caches - or their keys - need to avoid collisions with other
jobs.
Also, if we start spawning additional caches automagically I have no
idea how people will be able to define boundaries of heap size we're
allowed to use: such matters can not be left to the user's
responsibility to figure out.

Sanne

>
> Cheers
> Dan
>
>
>
> On Mon, Feb 17, 2014 at 10:18 AM, Etienne Riviere <etienne.riviere at unine.ch>
> wrote:
>>
>> Hi Radim,
>>
>> I might misunderstand your suggestion but many M/R jobs actually require
>> to run the two phases one after the other, and henceforth to store the
>> intermediate results somewhere. While some may slightly reduce intermediate
>> memory usage by using a combiner function (e.g., the word-count example), I
>> don?t see how we can avoid intermediate storage altogether.
>>
>> Thanks,
>> Etienne (leads project ? as Evangelos who initiated the thread)
>>
>> On 17 Feb 2014, at 08:48, Radim Vansa <rvansa at redhat.com> wrote:
>>
>> > I think that the intermediate cache is not required at all. The M/R
>> > algorithm itself can (and should!) run with memory occupied by the
>> > result of reduction. The current implementation with Map first and
>> > Reduce after that will always have these problems, using a cache for
>> > temporary caching the result is only a workaround.
>> >
>> > The only situation when temporary cache could be useful is when the
>> > result grows linearly (or close to that or even more) with the amount of
>> > reduced entries. This would be the case for groupBy producing Map<Color,
>> > List<Entry>> from all entries in cache. Then the task does not scale and
>> > should be redesigned anyway, but flushing the results into cache backed
>> > by cache store could help.
>> >
>> > Radim
>> >
>> > On 02/14/2014 04:54 PM, Vladimir Blagojevic wrote:
>> >> Tristan,
>> >>
>> >> Actually they are not addressed in this pull request but the feature
>> >> where custom output cache is used instead of results being returned is
>> >> next in the implementation pipeline.
>> >>
>> >> Evangelos, indeed, depending on a reducer function all intermediate
>> >> KOut/VOut pairs might be moved to a single node. How would custom cache
>> >> help in this case?
>> >>
>> >> Regards,
>> >> Vladimir
>> >>
>> >>
>> >> On 2/14/2014, 10:16 AM, Tristan Tarrant wrote:
>> >>> Hi Evangelos,
>> >>>
>> >>> you might be interested in looking into a current pull request which
>> >>> addresses some (all?) of these issues
>> >>>
>> >>> https://github.com/infinispan/infinispan/pull/2300
>> >>>
>> >>> Tristan
>> >>>
>> >>> On 14/02/2014 16:10, Evangelos Vazaios wrote:
>> >>>> Hello everyone,
>> >>>>
>> >>>> I started using the MapReduce implementation of Infinispan and I came
>> >>>> across some possible limitations. Thus,  I want to make some
>> >>>> suggestions
>> >>>> about the MapReduce (MR) implementation of Infinispan.
>> >>>> Depending on the algorithm,  there might be some memory problems,
>> >>>> especially for intermediate results.
>> >>>> An example of such a case is  group by. Suppose that we have a
>> >>>> cluster
>> >>>> of 2 nodes with 2 GB  available. Let a distributed cache, where
>> >>>> simple
>> >>>> car objects (id,brand,colour) are stored and the total size of data
>> >>>> is
>> >>>> 3.5GB. If all objects have the same colour , then all 3.5 GB would go
>> >>>> to
>> >>>> only one reducer, as a result an OutOfMemoryException will be thrown.
>> >>>>
>> >>>> To overcome these limitations, I propose to add as parameter the name
>> >>>> of
>> >>>> the intermediate cache to be used. This will enable the creation of a
>> >>>> custom configured cache that deals with the memory limitations.
>> >>>>
>> >>>> Another feature that I would like to have is to set the name of the
>> >>>> output cache. The reasoning behind this is similar to the one
>> >>>> mentioned
>> >>>> above.
>> >>>>
>> >>>> I wait for your thoughts on these two suggestions.
>> >>>>
>> >>>> Regards,
>> >>>> Evangelos
>> >>>> _______________________________________________
>> >>>> infinispan-dev mailing list
>> >>>> infinispan-dev at lists.jboss.org
>> >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> infinispan-dev mailing list
>> >>> infinispan-dev at lists.jboss.org
>> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >> _______________________________________________
>> >> infinispan-dev mailing list
>> >> infinispan-dev at lists.jboss.org
>> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >
>> >
>> > --
>> > Radim Vansa <rvansa at redhat.com>
>> > JBoss DataGrid QA
>> >
>> > _______________________________________________
>> > infinispan-dev mailing list
>> > infinispan-dev at lists.jboss.org
>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


From sanne at infinispan.org  Mon Feb 17 07:53:16 2014
From: sanne at infinispan.org (Sanne Grinovero)
Date: Mon, 17 Feb 2014 12:53:16 +0000
Subject: [infinispan-dev] ClusteredListeners: message delivered twice
In-Reply-To: <6F2DD24C-4712-48AF-ABCD-7CE280089875@redhat.com>
References: <6F2DD24C-4712-48AF-ABCD-7CE280089875@redhat.com>
Message-ID: <CAFm4XO1ys5ffa8n2xOSZjpDVXaxE5P-1ecTJbX7WTmCZ=zU27g@mail.gmail.com>

On 12 February 2014 10:40, Mircea Markus <mmarkus at redhat.com> wrote:
> Hey Will,
>
> With the current design, during a topology change, an event might be delivered twice to a cluster listener. I think we might be able to identify such situations (a node becomes a key owner as a result of the topology change) and add this information to the event we send, e.g. a flag "potentiallyDuplicate" or something like that. Event implementors might be able to make good use of this, e.g. checking their internal state if an event is redelivered or not. What do you think? Are there any other more-than-once delivery situations we can't keep track of?

I would really wish we would not push such a burden to the API
consumer. If we at least had a modification counter associated with
each entry this could help to identify duplicate triggers as well (on
top of ordering of modification events as already discussed many
times).

Sanne

>
> Cheers,
> --
> Mircea Markus
> Infinispan lead (www.infinispan.org)
>
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


From etienne.riviere at unine.ch  Mon Feb 17 09:57:00 2014
From: etienne.riviere at unine.ch (Etienne Riviere)
Date: Mon, 17 Feb 2014 15:57:00 +0100
Subject: [infinispan-dev] MapReduce limitations and suggestions.
In-Reply-To: <CAFm4XO0zdPfqE0uqKBdz9xysRRMr8JJR6nUdJ0DA0h=DcEGLxg@mail.gmail.com>
References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com>
	<52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com>
	<BD90C3B8-C267-4128-B662-392D846CDFF0@unine.ch>
	<CA+nfvwTykZgm=YVaGBkqMgHDvA+qTNhT9A2goy9gXU6yiW9aWg@mail.gmail.com>
	<CAFm4XO0zdPfqE0uqKBdz9xysRRMr8JJR6nUdJ0DA0h=DcEGLxg@mail.gmail.com>
Message-ID: <8B043BD7-8331-4720-B242-D8FC24B42004@unine.ch>

Hi Sanne,

As Evangelos pointed out in London, it is not possible to run a mapper and a combiner concurrently in the general case (there are exceptions where the combiner can run on the stream of tuples generated by the Mapper).

The proposal to tighter integrate with Hadoop would make sense also for the support of Nutch that we need in the project. How complex do you think this would be?

Etienne

On 17 Feb 2014, at 13:25, Sanne Grinovero <sanne at infinispan.org> wrote:

> On 17 February 2014 08:42, Dan Berindei <dan.berindei at gmail.com> wrote:
>> Hi Etienne
>> 
>> I was going to suggest using a combiner - the combiner would process the
>> mapper results from just one node, so you should need at most double the
>> memory on that node. I guess we could reduce the memory requirements even
>> more if the combiner could run concurrently with the mapper... Vladimir,
>> does it sound like a reasonable feature request?
> 
> Yes that's something I've discussed with Vladimir before.
> The problem is - as the LEADS experts of M/R like Evangelios explained
> to us in London - is that in many practical use cases you can't apply
> a combiner, and the Reducer needs to be run on the full set.
> 
> I think Evangelios also mentioned that the actual the set processed by
> the Reducer is also expected to be sorted, apparently it's
> "interesting" that we don't do such things. This can be taken as a
> negative point as not all problems are solvable, but is also making it
> interesting for being able to resolve some other problems with a
> higher level of efficiency so it's not necessarily something that we
> might want to throw away.
> 
> Might be interesting to keep our design with the current limitations,
> and to also pursue a second mode of operation in which we make a good
> Hadoop integration, to not reinvent the wheel in the area of the more
> complex tasks, also providing the benefit of API compatibility to
> allow other systems such as Apache Nutch and Mahout to run on
> Infinispan without significant changes.
> 
>> 
>> I'm afraid in your situation using a cache store wouldn't help, as the
>> intermediate values for the same key are stored as a list in a single entry.
>> So if all cars are red, there would be just one intermediate key in the
>> intermediate cache, and there would be nothing to evict to the cache store.
>> Vladimir, do you think we could somehow "chunk" the intermediary values into
>> multiple entries grouped by the intermediary key, to support this scenario?
>> 
>> For reference, though, a limited version of what you're asking for is
>> already available. You can change the configuration of the intermediary
>> cache by defining a "__tmpMapReduce" cache in your configuration. That
>> configuration will be used for all M/R tasks, whether they use the shared
>> intermediate cache or they create their own.
> 
> I really hope we can get rid of temporary caches, but if need be
> please make sure each task has an isolated execution context: names of
> temporary caches - or their keys - need to avoid collisions with other
> jobs.
> Also, if we start spawning additional caches automagically I have no
> idea how people will be able to define boundaries of heap size we're
> allowed to use: such matters can not be left to the user's
> responsibility to figure out.
> 
> Sanne
> 
>> 
>> Cheers
>> Dan
>> 
>> 
>> 
>> On Mon, Feb 17, 2014 at 10:18 AM, Etienne Riviere <etienne.riviere at unine.ch>
>> wrote:
>>> 
>>> Hi Radim,
>>> 
>>> I might misunderstand your suggestion but many M/R jobs actually require
>>> to run the two phases one after the other, and henceforth to store the
>>> intermediate results somewhere. While some may slightly reduce intermediate
>>> memory usage by using a combiner function (e.g., the word-count example), I
>>> don?t see how we can avoid intermediate storage altogether.
>>> 
>>> Thanks,
>>> Etienne (leads project ? as Evangelos who initiated the thread)
>>> 
>>> On 17 Feb 2014, at 08:48, Radim Vansa <rvansa at redhat.com> wrote:
>>> 
>>>> I think that the intermediate cache is not required at all. The M/R
>>>> algorithm itself can (and should!) run with memory occupied by the
>>>> result of reduction. The current implementation with Map first and
>>>> Reduce after that will always have these problems, using a cache for
>>>> temporary caching the result is only a workaround.
>>>> 
>>>> The only situation when temporary cache could be useful is when the
>>>> result grows linearly (or close to that or even more) with the amount of
>>>> reduced entries. This would be the case for groupBy producing Map<Color,
>>>> List<Entry>> from all entries in cache. Then the task does not scale and
>>>> should be redesigned anyway, but flushing the results into cache backed
>>>> by cache store could help.
>>>> 
>>>> Radim
>>>> 
>>>> On 02/14/2014 04:54 PM, Vladimir Blagojevic wrote:
>>>>> Tristan,
>>>>> 
>>>>> Actually they are not addressed in this pull request but the feature
>>>>> where custom output cache is used instead of results being returned is
>>>>> next in the implementation pipeline.
>>>>> 
>>>>> Evangelos, indeed, depending on a reducer function all intermediate
>>>>> KOut/VOut pairs might be moved to a single node. How would custom cache
>>>>> help in this case?
>>>>> 
>>>>> Regards,
>>>>> Vladimir
>>>>> 
>>>>> 
>>>>> On 2/14/2014, 10:16 AM, Tristan Tarrant wrote:
>>>>>> Hi Evangelos,
>>>>>> 
>>>>>> you might be interested in looking into a current pull request which
>>>>>> addresses some (all?) of these issues
>>>>>> 
>>>>>> https://github.com/infinispan/infinispan/pull/2300
>>>>>> 
>>>>>> Tristan
>>>>>> 
>>>>>> On 14/02/2014 16:10, Evangelos Vazaios wrote:
>>>>>>> Hello everyone,
>>>>>>> 
>>>>>>> I started using the MapReduce implementation of Infinispan and I came
>>>>>>> across some possible limitations. Thus,  I want to make some
>>>>>>> suggestions
>>>>>>> about the MapReduce (MR) implementation of Infinispan.
>>>>>>> Depending on the algorithm,  there might be some memory problems,
>>>>>>> especially for intermediate results.
>>>>>>> An example of such a case is  group by. Suppose that we have a
>>>>>>> cluster
>>>>>>> of 2 nodes with 2 GB  available. Let a distributed cache, where
>>>>>>> simple
>>>>>>> car objects (id,brand,colour) are stored and the total size of data
>>>>>>> is
>>>>>>> 3.5GB. If all objects have the same colour , then all 3.5 GB would go
>>>>>>> to
>>>>>>> only one reducer, as a result an OutOfMemoryException will be thrown.
>>>>>>> 
>>>>>>> To overcome these limitations, I propose to add as parameter the name
>>>>>>> of
>>>>>>> the intermediate cache to be used. This will enable the creation of a
>>>>>>> custom configured cache that deals with the memory limitations.
>>>>>>> 
>>>>>>> Another feature that I would like to have is to set the name of the
>>>>>>> output cache. The reasoning behind this is similar to the one
>>>>>>> mentioned
>>>>>>> above.
>>>>>>> 
>>>>>>> I wait for your thoughts on these two suggestions.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Evangelos
>>>>>>> _______________________________________________
>>>>>>> infinispan-dev mailing list
>>>>>>> infinispan-dev at lists.jboss.org
>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> infinispan-dev mailing list
>>>>>> infinispan-dev at lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>> 
>>>> 
>>>> --
>>>> Radim Vansa <rvansa at redhat.com>
>>>> JBoss DataGrid QA
>>>> 
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> 
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


From galder at redhat.com  Mon Feb 17 12:35:25 2014
From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=)
Date: Mon, 17 Feb 2014 18:35:25 +0100
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
References: <CAFm4XO2k=BxKVa=+k57=HE+oKtBJ+Wn+NhmBZqupXYU0fGjsqA@mail.gmail.com>
	<888EA204-30A1-4BFF-9469-7118996024A1@hibernate.org>
	<CAFm4XO2gzCeG466ZKTcoNkoYspu9Upiqp7L_81_RQhdfNOL-rw@mail.gmail.com>
	<6211A55D-9F1D-4686-9EF4-373C216E4927@hibernate.org>
	<B0428B32-CD88-46DE-8C07-4785387B657E@redhat.com>
	<C23D0695-57B2-43DE-873F-C198272977FB@redhat.com>
	<CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
Message-ID: <8955F382-8A6E-43AA-864E-1EC0C190654E@redhat.com>


On 30 Jan 2014, at 20:51, Mircea Markus <mmarkus at redhat.com> wrote:

> 
> On Jan 30, 2014, at 9:42 AM, Galder Zamarre?o <galder at redhat.com> wrote:
> 
>> 
>> On Jan 21, 2014, at 11:52 PM, Mircea Markus <mmarkus at redhat.com> wrote:
>> 
>>> 
>>> On Jan 15, 2014, at 1:42 PM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>> 
>>>> By the way, people looking for that feature are also asking for a unified Cache API accessing these several caches right? Otherwise I am not fully understanding why they ask for a unified query.
>>>> Do you have written detailed use cases somewhere for me to better understand what is really requested?
>>> 
>>> IMO from a user perspective, being able to run queries spreading several caches makes the programming simplifies the programming model: each cache corresponding to a single entity type, with potentially different configuration.
>> 
>> Not sure if it simplifies things TBH if the configuration is the same. IMO, it just adds clutter.
> 
> Not sure I follow: having a cache that contains both Cars and Persons sound more cluttering to me. I think it's cumbersome to write any kind of querying with an heterogenous  cache, e.g. Map/Reduce tasks that need to count all the green Cars would need to be aware of Persons and ignore them. Not only it is harder to write, but discourages code reuse and makes it hard to maintain (if you'll add Pets in the same cache in future you need to update the M/R code as well). And of course there are also different cache-based configuration options that are not immediately obvious (at design time) but will be in the future (there are more Persons than Cars, they live longer/expiry etc): mixing everything together in the same cache from the begging is a design decision that might bite you in the future.
> 
> The way I see it - and very curious to see your opinion on this - following an database analogy, the CacheManager corresponds to an Database and the Cache to a Table. Hence my thought that queries spreading multiple caches are both useful and needed (same as query spreading over multiple tables).

My opinion is that seeing it this way is limiting. A key/value store is schemaless. Your view is forcing a particular schema on how to structure things. 

I don?t pretend everyone to store everything in a single cache and of course there will be situations where it?s not ideal or the best solution, such as in cases like the ones you mention above, but if you want to do it, for any of the reasons I or Paul mentioned in [1], it?d be nice to be able to do so. 

Cheers,

[1] https://issues.jboss.org/browse/ISPN-3640

> 
>> 
>> Just yesterday I discovered this gem in Scala's Shapeless extensions [1]. This is experimental stuff but essentially it allows to define what the key/value type pairs a map will contain, and it does type checking at compile time. I almost wet my pants when I saw that ;) :p. In the example, it defines a map as containing Int -> String, and String -> Int key/value pairs. If you try to add an Int -> Int, it fails compilation.
> 
> Agreed the compile time check is pretty awesome :-) Still mix and matching types in a Map doesn't look great to me for ISPN.
> 
>> 
>> Java's type checking is not powerful enough to do this, and it's compilation logic is not extendable in the same way Scala macros does, but I think the fact that other languages are looking into this validates Paul's suggestion in [2], on top of all the benefits listed there.
>> 
>> Cheers,
>> 
>> [1] https://github.com/milessabin/shapeless/wiki/Feature-overview:-shapeless-2.0.0#heterogenous-maps
>> [2] https://issues.jboss.org/browse/ISPN-3640
>> 
>>> Besides the query API that would need to be extended to support accessing multiple caches, not sure what other APIs would need to be extended to take advantage of this?
>>> 
>>>> 
>>>> Emmanuel
>>>> 
>>>> On 14 Jan 2014, at 12:59, Sanne Grinovero <sanne at infinispan.org> wrote:
>>>> 
>>>>> Up this: it was proposed again today ad a face to face meeting.
>>>>> Apparently multiple parties have been asking to be able to run
>>>>> cross-cache queries.
>>>>> 
>>>>> Sanne
>>>>> 
>>>>> On 11 April 2012 12:47, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>>>>> 
>>>>>> On 10 avr. 2012, at 19:10, Sanne Grinovero wrote:
>>>>>> 
>>>>>>> Hello all,
>>>>>>> currently Infinispan Query is an interceptor registering on the
>>>>>>> specific Cache instance which has indexing enabled; one such
>>>>>>> interceptor is doing all what it needs to do in the sole scope of the
>>>>>>> cache it was registered in.
>>>>>>> 
>>>>>>> If you enable indexing - for example - on 3 different caches, there
>>>>>>> will be 3 different Hibernate Search engines started in background,
>>>>>>> and they are all unaware of each other.
>>>>>>> 
>>>>>>> After some design discussions with Ales for CapeDwarf, but also
>>>>>>> calling attention on something that bothered me since some time, I'd
>>>>>>> evaluate the option to have a single Hibernate Search Engine
>>>>>>> registered in the CacheManager, and have it shared across indexed
>>>>>>> caches.
>>>>>>> 
>>>>>>> Current design limitations:
>>>>>>> 
>>>>>>> A- If they are all configured to use the same base directory to
>>>>>>> store indexes, and happen to have same-named indexes, they'll share
>>>>>>> the index without being aware of each other. This is going to break
>>>>>>> unless the user configures some tricky parameters, and even so
>>>>>>> performance won't be great: instances will lock each other out, or at
>>>>>>> best write in alternate turns.
>>>>>>> B- The search engine isn't particularly "heavy", still it would be
>>>>>>> nice to share some components and internal services.
>>>>>>> C- Configuration details which need some care - like injecting a
>>>>>>> JGroups channel for clustering - needs to be done right isolating each
>>>>>>> instance (so large parts of configuration would be quite similar but
>>>>>>> not totally equal)
>>>>>>> D- Incoming messages into a JGroups Receiver need to be routed not
>>>>>>> only among indexes, but also among Engine instances. This prevents
>>>>>>> Query to reuse code from Hibernate Search.
>>>>>>> 
>>>>>>> Problems with a unified Hibernate Search Engine:
>>>>>>> 
>>>>>>> 1#- Isolation of types / indexes. If the same indexed class is
>>>>>>> stored in different (indexed) caches, they'll share the same index. Is
>>>>>>> it a problem? I'm tempted to consider this a good thing, but wonder if
>>>>>>> it would surprise some users. Would you expect that?
>>>>>> 
>>>>>> I would not expect that. Unicity in Hibernate Search is not defined per identity but per class + provided id.
>>>>>> I can see people reusing the same class as partial DTO and willing to index that. I can even see people
>>>>>> using the Hibernate Search programmatic API to index the "DTO" stored in cache 2 differently than the
>>>>>> domain class stored in cache 1.
>>>>>> I can concede that I am pushing a bit the use case towards bad-ish design approaches.
>>>>>> 
>>>>>>> 2#- configuration format overhaul: indexing options won't be set on
>>>>>>> the cache section but in the global section. I'm looking forward to
>>>>>>> use the schema extensions anyway to provide a better configuration
>>>>>>> experience than the current <properties />.
>>>>>>> 3#- Assuming 1# is fine, when a search hit is found I'd need to be
>>>>>>> able to figure out from which cache the value should be loaded.
>>>>>>> 3#A  we could have the cache name encoded in the index, as part
>>>>>>> of the identifier: {PK,cacheName}
>>>>>>> 3#B  we actually shard the index, keeping a physically separate
>>>>>>> index per cache. This would mean searching on the joint index view but
>>>>>>> extracting hits from specific indexes to keep track of "which index"..
>>>>>>> I think we can do that but it's definitely tricky.
>>>>>>> 
>>>>>>> It's likely easier to keep indexed values from different caches in
>>>>>>> different indexes. that would mean to reject #1 and mess with the user
>>>>>>> defined index name, to add for example the cache name to the user
>>>>>>> defined string.
>>>>>>> 
>>>>>>> Any comment?
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> Sanne
>>>>>>> _______________________________________________
>>>>>>> infinispan-dev mailing list
>>>>>>> infinispan-dev at lists.jboss.org
>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> infinispan-dev mailing list
>>>>>> infinispan-dev at lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>> 
>>>> 
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> 
>>> Cheers,
>>> -- 
>>> Mircea Markus
>>> Infinispan lead (www.infinispan.org)
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> 
>> --
>> Galder Zamarre?o
>> galder at redhat.com
>> twitter.com/galderz
>> 
>> Project Lead, Escalante
>> http://escalante.io
>> 
>> Engineer, Infinispan
>> http://infinispan.org
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> Cheers,
> -- 
> Mircea Markus
> Infinispan lead (www.infinispan.org)
> 
> 
> 
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From galder at redhat.com  Mon Feb 17 12:36:39 2014
From: galder at redhat.com (=?iso-8859-1?Q?Galder_Zamarre=F1o?=)
Date: Mon, 17 Feb 2014 18:36:39 +0100
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <2C233AC3-BEFC-4FD5-A297-A854FEA8165D@hibernate.org>
References: <CAFm4XO2k=BxKVa=+k57=HE+oKtBJ+Wn+NhmBZqupXYU0fGjsqA@mail.gmail.com>
	<888EA204-30A1-4BFF-9469-7118996024A1@hibernate.org>
	<CAFm4XO2gzCeG466ZKTcoNkoYspu9Upiqp7L_81_RQhdfNOL-rw@mail.gmail.com>
	<6211A55D-9F1D-4686-9EF4-373C216E4927@hibernate.org>
	<B0428B32-CD88-46DE-8C07-4785387B657E@redhat.com>
	<C23D0695-57B2-43DE-873F-C198272977FB@redhat.com>
	<CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<2C233AC3-BEFC-4FD5-A297-A854FEA8165D@hibernate.org>
Message-ID: <2D1C63B2-7313-4FE4-93D2-D50B91565FF2@redhat.com>


On 31 Jan 2014, at 09:28, Emmanuel Bernard <emmanuel at hibernate.org> wrote:

> 
> 
>> On 30 janv. 2014, at 20:51, Mircea Markus <mmarkus at redhat.com> wrote:
>> 
>> 
>>> On Jan 30, 2014, at 9:42 AM, Galder Zamarre?o <galder at redhat.com> wrote:
>>> 
>>> 
>>>> On Jan 21, 2014, at 11:52 PM, Mircea Markus <mmarkus at redhat.com> wrote:
>>>> 
>>>> 
>>>>> On Jan 15, 2014, at 1:42 PM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>>>> 
>>>>> By the way, people looking for that feature are also asking for a unified Cache API accessing these several caches right? Otherwise I am not fully understanding why they ask for a unified query.
>>>>> Do you have written detailed use cases somewhere for me to better understand what is really requested?
>>>> 
>>>> IMO from a user perspective, being able to run queries spreading several caches makes the programming simplifies the programming model: each cache corresponding to a single entity type, with potentially different configuration.
>>> 
>>> Not sure if it simplifies things TBH if the configuration is the same. IMO, it just adds clutter.
>> 
>> Not sure I follow: having a cache that contains both Cars and Persons sound more cluttering to me. I think it's cumbersome to write any kind of querying with an heterogenous  cache, e.g. Map/Reduce tasks that need to count all the green Cars would need to be aware of Persons and ignore them. Not only it is harder to write, but discourages code reuse and makes it hard to maintain (if you'll add Pets in the same cache in future you need to update the M/R code as well). And of course there are also different cache-based configuration options that are not immediately obvious (at design time) but will be in the future (there are more Persons than Cars, they live longer/expiry etc): mixing everything together in the same cache from the begging is a design decision that might bite you in the future.
>> 
>> The way I see it - and very curious to see your opinion on this - following an database analogy, the CacheManager corresponds to an Database and the Cache to a Table. Hence my thought that queries spreading multiple caches are both useful and needed (same as query spreading over multiple tables).
> 
> I know Sanne and you are keen to have one entity type per cache to be able to fine tune the configuration. I am a little more skeptical but I don't have strong opinions on the subject. 
> 
> However, I don't think you can forbid the case where people want to store heterogenous types in the same cache:
> 
> - it's easy to start with
> - configuration is indeed simpler
> - when you work in the same service with cats, dogs, owners, addresses and refuges, juggling between these n Cache instances begins to be fugly I suspect - should write some application code to confirm
> - people will add to the grid types unknown at configuration time. They might want a single bucket. 

+100

> 
> Btw with the distributed execution engine, it looks reasonably simple to migrate data from one cache to another. I imagine you can also focus only on the keys whose node is primary which should limit data transfers. Am I missing something?
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From galder at redhat.com  Mon Feb 17 12:43:44 2014
From: galder at redhat.com (=?iso-8859-1?Q?Galder_Zamarre=F1o?=)
Date: Mon, 17 Feb 2014 18:43:44 +0100
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <20140205163032.GB93108@hibernate.org>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>
	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>
	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>
	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>
	<20140205163032.GB93108@hibernate.org>
Message-ID: <F1EB6F85-D5B2-4459-B2BD-185124E046B3@redhat.com>


On 05 Feb 2014, at 17:30, Emmanuel Bernard <emmanuel at hibernate.org> wrote:

> On Wed 2014-02-05 15:53, Mircea Markus wrote:
>> 
>> On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>> 
>>> Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed. 
>> 
>> Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-)
> 
> 
> //some unified query giving me entries pointing by fk copy to bar and
> //buz objects. So I need to manually load these references.
> 
> //happy emmanuel
> Cache unifiedCache = cacheManager.getMotherOfAllCaches();
> Bar bar = unifiedCache.get(foo);
> Buz buz = unifiedCache.get(baz);
> 
> //not so happy emmanuel
> Cache fooCache = cacheManager.getCache("foo");
> Bar bar = fooCache.get(foo);
> Cache bazCache = cacheManager.getCache("baz");
> Buz buz = bazCache.put(baz);

Would something like what Paul suggests in https://issues.jboss.org/browse/ISPN-3640 help you better? IOW, have a single cache, and then have a filtered view for Bar or Buz types? Not sure I understand the differences in your code changes in terms of what makes you happy vs not.

Cheers,

> 
> 
>> 
>>> 
>>> I have to admit that having to add a cache name to the stored elements of the index documents makes me a bit sad.
>> 
>> sad because of the increased index size? 
> 
> It makes the index non natural and less reusable using direct Lucene
> APIs. But that might be less of a concern for Infinispan.
> 
>> 
>>> I was already unhappy when I had to do it for class names. Renaming a cache will be a heavy operation too. 
>>> Sanne, if we know that we don't share the semi index for different caches, can we avoid the need to store the cache name in each document?
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From mudokonman at gmail.com  Mon Feb 17 12:44:32 2014
From: mudokonman at gmail.com (William Burns)
Date: Mon, 17 Feb 2014 12:44:32 -0500
Subject: [infinispan-dev] ClusteredListeners: message delivered twice
In-Reply-To: <CAFm4XO1ys5ffa8n2xOSZjpDVXaxE5P-1ecTJbX7WTmCZ=zU27g@mail.gmail.com>
References: <6F2DD24C-4712-48AF-ABCD-7CE280089875@redhat.com>
	<CAFm4XO1ys5ffa8n2xOSZjpDVXaxE5P-1ecTJbX7WTmCZ=zU27g@mail.gmail.com>
Message-ID: <CA+YCuUMmKxo_f1+1phVmZkT==ekzm+WS5drh1YnKSN0L3a_hGA@mail.gmail.com>

On Mon, Feb 17, 2014 at 7:53 AM, Sanne Grinovero <sanne at infinispan.org> wrote:
> On 12 February 2014 10:40, Mircea Markus <mmarkus at redhat.com> wrote:
>> Hey Will,
>>
>> With the current design, during a topology change, an event might be delivered twice to a cluster listener. I think we might be able to identify such situations (a node becomes a key owner as a result of the topology change) and add this information to the event we send, e.g. a flag "potentiallyDuplicate" or something like that. Event implementors might be able to make good use of this, e.g. checking their internal state if an event is redelivered or not. What do you think? Are there any other more-than-once delivery situations we can't keep track of?

I agree, this would be important to track.  I have thus added a new
flag to listeners that is set to true when a modification, removal, or
create that is done on behalf of a command that was retried due to a
topology change during the middle of it.  Also this gives the benefit
not just for cluster listeners but regular listeners, since we could
have double notification currently even.

>
> I would really wish we would not push such a burden to the API
> consumer. If we at least had a modification counter associated with
> each entry this could help to identify duplicate triggers as well (on
> top of ordering of modification events as already discussed many
> times).

The issue in particular we have issues with listeners is when the
primary owner replicates the update to backup owners and then crashes
before the notification is sent.  In this case we have no idea from
the originator's perspective if the backup owner has the update.  When
the topology changes if updated it will be persisted to new owners
(possibly without notification).  We could add a counter, however the
backup owner then has no idea if the primary owner has sent the
notification or not.  Without adding some kind of 2PC to the primary
owner to tell the backup that it occurred, he won't know.  However
this doesn't reliably tell the backup owner if the notification was
fired even if the node goes down during this period.  Without
seriously rewriting our nontx dist code I don't see a viable way to do
this without the API consumer having to be alerted.

>
> Sanne
>
>>
>> Cheers,
>> --
>> Mircea Markus
>> Infinispan lead (www.infinispan.org)
>>
>>
>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


From emmanuel at hibernate.org  Mon Feb 17 12:51:15 2014
From: emmanuel at hibernate.org (Emmanuel Bernard)
Date: Mon, 17 Feb 2014 18:51:15 +0100
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <F1EB6F85-D5B2-4459-B2BD-185124E046B3@redhat.com>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>
	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>
	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>
	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>
	<20140205163032.GB93108@hibernate.org>
	<F1EB6F85-D5B2-4459-B2BD-185124E046B3@redhat.com>
Message-ID: <20140217175115.GC639@hibernate.org>

On Mon 2014-02-17 18:43, Galder Zamarre?o wrote:
> 
> On 05 Feb 2014, at 17:30, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
> 
> > On Wed 2014-02-05 15:53, Mircea Markus wrote:
> >> 
> >> On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
> >> 
> >>> Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed. 
> >> 
> >> Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-)
> > 
> > 
> > //some unified query giving me entries pointing by fk copy to bar and
> > //buz objects. So I need to manually load these references.
> > 
> > //happy emmanuel
> > Cache unifiedCache = cacheManager.getMotherOfAllCaches();
> > Bar bar = unifiedCache.get(foo);
> > Buz buz = unifiedCache.get(baz);
> > 
> > //not so happy emmanuel
> > Cache fooCache = cacheManager.getCache("foo");
> > Bar bar = fooCache.get(foo);
> > Cache bazCache = cacheManager.getCache("baz");
> > Buz buz = bazCache.put(baz);
> 
> Would something like what Paul suggests in https://issues.jboss.org/browse/ISPN-3640 help you better? IOW, have a single cache, and then have a filtered view for Bar or Buz types? Not sure I understand the differences in your code changes in terms of what makes you happy vs not.

Not really.
What makes me unhappy is to have to keep in my app all the
references to these specific cache store instances. The filtering
approach only moves the problem.

From ben.cotton at ALUMNI.RUTGERS.EDU  Mon Feb 17 16:02:17 2014
From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben)
Date: Mon, 17 Feb 2014 13:02:17 -0800 (PST)
Subject: [infinispan-dev] Infinispan embedded off-heap cache
In-Reply-To: <1391992923651-4028800.post@n3.nabble.com>
References: <52D67480.9020908@redhat.com>
	<CAMxXA69j0+XGKWgAvs5+pn-fRJg3m2vCTS6Y6FizBy_WZ1uR1Q@mail.gmail.com>
	<52D678BD.5010400@redhat.com> <52D67A1E.2010509@redhat.com>
	<1389808266330-4028649.post@n3.nabble.com>
	<AA2614D5-38DD-4BAF-86F3-21BBAFB68146@hibernate.org>
	<1389893871449-4028653.post@n3.nabble.com>
	<1391715983011-4028794.post@n3.nabble.com>
	<3560BCBE-3D25-472B-9D8B-C16230FBCB4B@redhat.com>
	<1391992923651-4028800.post@n3.nabble.com>
Message-ID: <1392670937953-4028836.post@n3.nabble.com>

Hi Tristan,  

We are still waiting for an OpenHFT HugeCollections update before we start
key stroking its adaption as an Off-Heap  Impl of javax.cache.Cache (via
ISPN  DataContainer API bridge).  We envision our openHFT<-->ISPN adaptation
effort to look something like the attached slide.  

Question for you: anywhere w/in the ISPN 7 master tree do you a class that
simultaneously implements both javax.cache.Cache /and/
org.infinispan.container.Container?  

Thx,
Ben

<http://infinispan-developer-list.980875.n3.nabble.com/file/n4028836/hft2ispn.png> 


--
View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-Infinispan-embedded-off-heap-cache-tp4026102p4028836.html
Sent from the Infinispan Developer List mailing list archive at Nabble.com.

From emmanuel at hibernate.org  Mon Feb 17 17:13:56 2014
From: emmanuel at hibernate.org (Emmanuel Bernard)
Date: Mon, 17 Feb 2014 23:13:56 +0100
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <20140217175115.GC639@hibernate.org>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>
	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>
	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>
	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>
	<20140205163032.GB93108@hibernate.org>
	<F1EB6F85-D5B2-4459-B2BD-185124E046B3@redhat.com>
	<20140217175115.GC639@hibernate.org>
Message-ID: <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org>

By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome.

> On 17 f?vr. 2014, at 18:51, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
> 
>> On Mon 2014-02-17 18:43, Galder Zamarre?o wrote:
>> 
>>> On 05 Feb 2014, at 17:30, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>> 
>>>> On Wed 2014-02-05 15:53, Mircea Markus wrote:
>>>> 
>>>>> On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>>>> 
>>>>> Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed. 
>>>> 
>>>> Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-)
>>> 
>>> 
>>> //some unified query giving me entries pointing by fk copy to bar and
>>> //buz objects. So I need to manually load these references.
>>> 
>>> //happy emmanuel
>>> Cache unifiedCache = cacheManager.getMotherOfAllCaches();
>>> Bar bar = unifiedCache.get(foo);
>>> Buz buz = unifiedCache.get(baz);
>>> 
>>> //not so happy emmanuel
>>> Cache fooCache = cacheManager.getCache("foo");
>>> Bar bar = fooCache.get(foo);
>>> Cache bazCache = cacheManager.getCache("baz");
>>> Buz buz = bazCache.put(baz);
>> 
>> Would something like what Paul suggests in https://issues.jboss.org/browse/ISPN-3640 help you better? IOW, have a single cache, and then have a filtered view for Bar or Buz types? Not sure I understand the differences in your code changes in terms of what makes you happy vs not.
> 
> Not really.
> What makes me unhappy is to have to keep in my app all the
> references to these specific cache store instances. The filtering
> approach only moves the problem.
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


From rvansa at redhat.com  Tue Feb 18 03:59:37 2014
From: rvansa at redhat.com (Radim Vansa)
Date: Tue, 18 Feb 2014 09:59:37 +0100
Subject: [infinispan-dev] MapReduce limitations and suggestions.
In-Reply-To: <BD90C3B8-C267-4128-B662-392D846CDFF0@unine.ch>
References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com>
	<52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com>
	<BD90C3B8-C267-4128-B662-392D846CDFF0@unine.ch>
Message-ID: <530320F9.300@redhat.com>

Hi Etienne,

how does the requirement for all data provided to Reducer as a whole 
work for distributed caches? There you'd get only a subset of the whole 
mapped set on each node (afaik each node maps the nodes locally and 
performs a reduction before executing the "global" reduction). Or are 
these M/R jobs applicable only to local caches?
I have to admit I have only a limited knowledge of M/R, could you give 
me an example where the algorithm works in distributed environment and 
still cannot be parallelized?

Thanks

Radim

On 02/17/2014 09:18 AM, Etienne Riviere wrote:
> Hi Radim,
>
> I might misunderstand your suggestion but many M/R jobs actually require to run the two phases one after the other, and henceforth to store the intermediate results somewhere. While some may slightly reduce intermediate memory usage by using a combiner function (e.g., the word-count example), I don?t see how we can avoid intermediate storage altogether.
>
> Thanks,
> Etienne (leads project ? as Evangelos who initiated the thread)
>
> On 17 Feb 2014, at 08:48, Radim Vansa <rvansa at redhat.com> wrote:
>
>> I think that the intermediate cache is not required at all. The M/R
>> algorithm itself can (and should!) run with memory occupied by the
>> result of reduction. The current implementation with Map first and
>> Reduce after that will always have these problems, using a cache for
>> temporary caching the result is only a workaround.
>>
>> The only situation when temporary cache could be useful is when the
>> result grows linearly (or close to that or even more) with the amount of
>> reduced entries. This would be the case for groupBy producing Map<Color,
>> List<Entry>> from all entries in cache. Then the task does not scale and
>> should be redesigned anyway, but flushing the results into cache backed
>> by cache store could help.
>>
>> Radim
>>
>> On 02/14/2014 04:54 PM, Vladimir Blagojevic wrote:
>>> Tristan,
>>>
>>> Actually they are not addressed in this pull request but the feature
>>> where custom output cache is used instead of results being returned is
>>> next in the implementation pipeline.
>>>
>>> Evangelos, indeed, depending on a reducer function all intermediate
>>> KOut/VOut pairs might be moved to a single node. How would custom cache
>>> help in this case?
>>>
>>> Regards,
>>> Vladimir
>>>
>>>
>>> On 2/14/2014, 10:16 AM, Tristan Tarrant wrote:
>>>> Hi Evangelos,
>>>>
>>>> you might be interested in looking into a current pull request which
>>>> addresses some (all?) of these issues
>>>>
>>>> https://github.com/infinispan/infinispan/pull/2300
>>>>
>>>> Tristan
>>>>
>>>> On 14/02/2014 16:10, Evangelos Vazaios wrote:
>>>>> Hello everyone,
>>>>>
>>>>> I started using the MapReduce implementation of Infinispan and I came
>>>>> across some possible limitations. Thus,  I want to make some suggestions
>>>>> about the MapReduce (MR) implementation of Infinispan.
>>>>> Depending on the algorithm,  there might be some memory problems,
>>>>> especially for intermediate results.
>>>>> An example of such a case is  group by. Suppose that we have a cluster
>>>>> of 2 nodes with 2 GB  available. Let a distributed cache, where simple
>>>>> car objects (id,brand,colour) are stored and the total size of data is
>>>>> 3.5GB. If all objects have the same colour , then all 3.5 GB would go to
>>>>> only one reducer, as a result an OutOfMemoryException will be thrown.
>>>>>
>>>>> To overcome these limitations, I propose to add as parameter the name of
>>>>> the intermediate cache to be used. This will enable the creation of a
>>>>> custom configured cache that deals with the memory limitations.
>>>>>
>>>>> Another feature that I would like to have is to set the name of the
>>>>> output cache. The reasoning behind this is similar to the one mentioned
>>>>> above.
>>>>>
>>>>> I wait for your thoughts on these two suggestions.
>>>>>
>>>>> Regards,
>>>>> Evangelos
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> -- 
>> Radim Vansa <rvansa at redhat.com>
>> JBoss DataGrid QA
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


-- 
Radim Vansa <rvansa at redhat.com>
JBoss DataGrid QA


From dan.berindei at gmail.com  Tue Feb 18 04:59:34 2014
From: dan.berindei at gmail.com (Dan Berindei)
Date: Tue, 18 Feb 2014 11:59:34 +0200
Subject: [infinispan-dev] MapReduce limitations and suggestions.
In-Reply-To: <530320F9.300@redhat.com>
References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com>
	<52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com>
	<BD90C3B8-C267-4128-B662-392D846CDFF0@unine.ch>
	<530320F9.300@redhat.com>
Message-ID: <CA+nfvwTKZTb+KnEkt3YpgFBdchyo_S+QrkxqO+acX9n1SdkbGg@mail.gmail.com>

Radim, this is how our M/R algorithm works (Hadoop may do it differently):

* The mapping phase generates a Map<IntKey, Collection<IntValue>> on each
node (Int meaning intermediate).
* In the combine (local reduce) phase, a combine operation takes as input
an IntKey and a Collection<IntValue> with only the values that were
produced on that node.
* In the (global) reduce phase, all the intermediate values for each key
are merged, and a reduce operation takes an intermediate key and a sequence
of *all* the intermediate values generated for that key. These reduce
operations are completely independent, so each intermediate key can be
mapped to a different node (distributed reduce), while still having access
to all the intermediate values at once.
* In the end, the collator takes the Map<IntKey, IntValue> from the reduce
phase and produces a single value.

If a combiner can be used, then I believe it can also be run in parallel
with a LinkedBlockingQueue between the mapper and the combiner. But
sometimes the reduce algorithm can only be run on the entire collection of
values (e.g if you want to find the median, or a percentile).

The limitation we have now is that in the reduce phase, the entire list of
values for one intermediate key must be in memory at once. I think Hadoop
only loads a block of intermediate values in memory at once, and can even
sort the intermediate values (with a user-supplied comparison function) so
that the reduce function can work on a sorted list without loading the
values in memory itself.

Cheers
Dan


On Tue, Feb 18, 2014 at 10:59 AM, Radim Vansa <rvansa at redhat.com> wrote:

> Hi Etienne,
>
> how does the requirement for all data provided to Reducer as a whole
> work for distributed caches? There you'd get only a subset of the whole
> mapped set on each node (afaik each node maps the nodes locally and
> performs a reduction before executing the "global" reduction). Or are
> these M/R jobs applicable only to local caches?
> I have to admit I have only a limited knowledge of M/R, could you give
> me an example where the algorithm works in distributed environment and
> still cannot be parallelized?
>
> Thanks
>
> Radim
>
> On 02/17/2014 09:18 AM, Etienne Riviere wrote:
> > Hi Radim,
> >
> > I might misunderstand your suggestion but many M/R jobs actually require
> to run the two phases one after the other, and henceforth to store the
> intermediate results somewhere. While some may slightly reduce intermediate
> memory usage by using a combiner function (e.g., the word-count example), I
> don't see how we can avoid intermediate storage altogether.
> >
> > Thanks,
> > Etienne (leads project -- as Evangelos who initiated the thread)
> >
> > On 17 Feb 2014, at 08:48, Radim Vansa <rvansa at redhat.com> wrote:
> >
> >> I think that the intermediate cache is not required at all. The M/R
> >> algorithm itself can (and should!) run with memory occupied by the
> >> result of reduction. The current implementation with Map first and
> >> Reduce after that will always have these problems, using a cache for
> >> temporary caching the result is only a workaround.
> >>
> >> The only situation when temporary cache could be useful is when the
> >> result grows linearly (or close to that or even more) with the amount of
> >> reduced entries. This would be the case for groupBy producing Map<Color,
> >> List<Entry>> from all entries in cache. Then the task does not scale and
> >> should be redesigned anyway, but flushing the results into cache backed
> >> by cache store could help.
> >>
> >> Radim
> >>
> >> On 02/14/2014 04:54 PM, Vladimir Blagojevic wrote:
> >>> Tristan,
> >>>
> >>> Actually they are not addressed in this pull request but the feature
> >>> where custom output cache is used instead of results being returned is
> >>> next in the implementation pipeline.
> >>>
> >>> Evangelos, indeed, depending on a reducer function all intermediate
> >>> KOut/VOut pairs might be moved to a single node. How would custom cache
> >>> help in this case?
> >>>
> >>> Regards,
> >>> Vladimir
> >>>
> >>>
> >>> On 2/14/2014, 10:16 AM, Tristan Tarrant wrote:
> >>>> Hi Evangelos,
> >>>>
> >>>> you might be interested in looking into a current pull request which
> >>>> addresses some (all?) of these issues
> >>>>
> >>>> https://github.com/infinispan/infinispan/pull/2300
> >>>>
> >>>> Tristan
> >>>>
> >>>> On 14/02/2014 16:10, Evangelos Vazaios wrote:
> >>>>> Hello everyone,
> >>>>>
> >>>>> I started using the MapReduce implementation of Infinispan and I came
> >>>>> across some possible limitations. Thus,  I want to make some
> suggestions
> >>>>> about the MapReduce (MR) implementation of Infinispan.
> >>>>> Depending on the algorithm,  there might be some memory problems,
> >>>>> especially for intermediate results.
> >>>>> An example of such a case is  group by. Suppose that we have a
> cluster
> >>>>> of 2 nodes with 2 GB  available. Let a distributed cache, where
> simple
> >>>>> car objects (id,brand,colour) are stored and the total size of data
> is
> >>>>> 3.5GB. If all objects have the same colour , then all 3.5 GB would
> go to
> >>>>> only one reducer, as a result an OutOfMemoryException will be thrown.
> >>>>>
> >>>>> To overcome these limitations, I propose to add as parameter the
> name of
> >>>>> the intermediate cache to be used. This will enable the creation of a
> >>>>> custom configured cache that deals with the memory limitations.
> >>>>>
> >>>>> Another feature that I would like to have is to set the name of the
> >>>>> output cache. The reasoning behind this is similar to the one
> mentioned
> >>>>> above.
> >>>>>
> >>>>> I wait for your thoughts on these two suggestions.
> >>>>>
> >>>>> Regards,
> >>>>> Evangelos
> >>>>> _______________________________________________
> >>>>> infinispan-dev mailing list
> >>>>> infinispan-dev at lists.jboss.org
> >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>>>>
> >>>>>
> >>>>
> >>>> _______________________________________________
> >>>> infinispan-dev mailing list
> >>>> infinispan-dev at lists.jboss.org
> >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>> _______________________________________________
> >>> infinispan-dev mailing list
> >>> infinispan-dev at lists.jboss.org
> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>
> >> --
> >> Radim Vansa <rvansa at redhat.com>
> >> JBoss DataGrid QA
> >>
> >> _______________________________________________
> >> infinispan-dev mailing list
> >> infinispan-dev at lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> --
> Radim Vansa <rvansa at redhat.com>
> JBoss DataGrid QA
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140218/5732287e/attachment-0001.html 

From marcelo.pasin at unine.ch  Tue Feb 18 05:19:55 2014
From: marcelo.pasin at unine.ch (Marcelo Pasin)
Date: Tue, 18 Feb 2014 11:19:55 +0100
Subject: [infinispan-dev] MapReduce limitations and suggestions.
In-Reply-To: <CA+nfvwTKZTb+KnEkt3YpgFBdchyo_S+QrkxqO+acX9n1SdkbGg@mail.gmail.com>
References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com>
	<52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com>
	<BD90C3B8-C267-4128-B662-392D846CDFF0@unine.ch>
	<530320F9.300@redhat.com>
	<CA+nfvwTKZTb+KnEkt3YpgFBdchyo_S+QrkxqO+acX9n1SdkbGg@mail.gmail.com>
Message-ID: <B0390CA5-20CE-4B86-AA47-EF68DF58AF89@unine.ch>

On 18/Feb/2014, at 10:59 , Dan Berindei <dan.berindei at gmail.com> wrote:
> I think Hadoop only loads a block of intermediate values in memory at once, and can even sort the intermediate values (with a user-supplied comparison function) so that the reduce function can work on a sorted list without loading the values in memory itself.

Actually, Hadoop sorts in the map node, the last two steps being sort and combine. Reduce nodes fetch partitions from the map nodes and just merges them. Such partitions are fetched incrementally, and whenever a given key ends in all partially fetched partitions, reduce() is called.

Cheers, MP
--
Marcelo Pasin
Universit? de Neuch?tel ? Institut d'informatique
rue Emile-Argand 11 ? Case postale 158 ? 2000 Neuch?tel ? Switzerland


From vagvaz at gmail.com  Tue Feb 18 05:21:30 2014
From: vagvaz at gmail.com (Evangelos Vazaios)
Date: Tue, 18 Feb 2014 12:21:30 +0200
Subject: [infinispan-dev] MapReduce limitations and suggestions.
In-Reply-To: <CA+nfvwTKZTb+KnEkt3YpgFBdchyo_S+QrkxqO+acX9n1SdkbGg@mail.gmail.com>
References: <52FE31FF.5050507@gmail.com>
	<52FE3335.1070806@redhat.com>	<52FE3C33.3070107@redhat.com>
	<5301BECC.7010901@redhat.com>	<BD90C3B8-C267-4128-B662-392D846CDFF0@unine.ch>	<530320F9.300@redhat.com>
	<CA+nfvwTKZTb+KnEkt3YpgFBdchyo_S+QrkxqO+acX9n1SdkbGg@mail.gmail.com>
Message-ID: <5303342A.1050800@gmail.com>

Hi Radim,

Since Hadoop is the most popular implementation of MapReduce I will give
a brief overview of how it works and then I'll provide with an example
where the reducers must run over the whole list of values with the same key.

Hadoop MR overview.

MAP

1) Input file(s) are split into pieces of 64MB
2) For each split hadoop creates one map task and then assign the task
to a cluster node
3) The splits are read as key,value pairs and the map function of Mapper
is called. The mapper can output arbitrary number of intermediate
key,value pairs
4) the output from the mapper is stored in a buffer in memory. After a
certain threshold is reached the pairs are sorted by key and if there is
a combiner it is run on the pairs that have the same key. Then, the
output is flushed on the HDFS.

SHUFFLE

hadoop decides the Reducer that should process each key by running a
partitioner. The default partitioner decides with the following way:
reducer = intermidKey.hashCode() % numberOfReducer
Finally, the intermediate key,value pairs are sent to the reducers

REDUCE

1) Reducer sorts all key,value pairs by key and then groups the values
with the same key. As a result reducers receive their keys sorted.
2) for each Key,List<Value> the reduce function of the reducer is
called. Reducer can also emit arbitrary number of key,value pairs

Additionally, hadoop lets you customize almost every aspect of the code
run from how the input is split and read as key value pairs to how it is
partitioned and sorted.

A simple example is group by and computing an average over the grouped
values. Let the dataset be webpages (url,domain,sentiment) and we want
to compute the average sentiment for each domain in the dataset then the
mapper for each webpages wp. will run
map(wp.url,wp):
  emit(wp.domain,wp.sentiment)

and in reducer:
reduce(domain,Iterable<Double> values):
  counter = 0
  sum = 0
  while(values.hasNext())
    counter++;	
    sum += values.next()
  emit(domain,sum/counter)

I know that this approach is not optimized. But, I wanted give a simple
example.
Dan, only the the values for one intermediate key must be in memory? or
all the intermediate key,value pairs  that are assigned to one reducer
must be in memory?

Cheers,
Evangelos

On 02/18/2014 11:59 AM, Dan Berindei wrote:
> Radim, this is how our M/R algorithm works (Hadoop may do it differently):
> 
> * The mapping phase generates a Map<IntKey, Collection<IntValue>> on each
> node (Int meaning intermediate).
> * In the combine (local reduce) phase, a combine operation takes as input
> an IntKey and a Collection<IntValue> with only the values that were
> produced on that node.
> * In the (global) reduce phase, all the intermediate values for each key
> are merged, and a reduce operation takes an intermediate key and a sequence
> of *all* the intermediate values generated for that key. These reduce
> operations are completely independent, so each intermediate key can be
> mapped to a different node (distributed reduce), while still having access
> to all the intermediate values at once.
> * In the end, the collator takes the Map<IntKey, IntValue> from the reduce
> phase and produces a single value.
> 
> If a combiner can be used, then I believe it can also be run in parallel
> with a LinkedBlockingQueue between the mapper and the combiner. But
> sometimes the reduce algorithm can only be run on the entire collection of
> values (e.g if you want to find the median, or a percentile).
> 
> The limitation we have now is that in the reduce phase, the entire list of
> values for one intermediate key must be in memory at once. I think Hadoop
> only loads a block of intermediate values in memory at once, and can even
> sort the intermediate values (with a user-supplied comparison function) so
> that the reduce function can work on a sorted list without loading the
> values in memory itself.
> 
> Cheers
> Dan
> 
> 
> 
> On Tue, Feb 18, 2014 at 10:59 AM, Radim Vansa <rvansa at redhat.com> wrote:
> 
>> Hi Etienne,
>>
>> how does the requirement for all data provided to Reducer as a whole
>> work for distributed caches? There you'd get only a subset of the whole
>> mapped set on each node (afaik each node maps the nodes locally and
>> performs a reduction before executing the "global" reduction). Or are
>> these M/R jobs applicable only to local caches?
>> I have to admit I have only a limited knowledge of M/R, could you give
>> me an example where the algorithm works in distributed environment and
>> still cannot be parallelized?
>>
>> Thanks
>>
>> Radim
>>
>> On 02/17/2014 09:18 AM, Etienne Riviere wrote:
>>> Hi Radim,
>>>
>>> I might misunderstand your suggestion but many M/R jobs actually require
>> to run the two phases one after the other, and henceforth to store the
>> intermediate results somewhere. While some may slightly reduce intermediate
>> memory usage by using a combiner function (e.g., the word-count example), I
>> don't see how we can avoid intermediate storage altogether.
>>>
>>> Thanks,
>>> Etienne (leads project -- as Evangelos who initiated the thread)
>>>
>>> On 17 Feb 2014, at 08:48, Radim Vansa <rvansa at redhat.com> wrote:
>>>
>>>> I think that the intermediate cache is not required at all. The M/R
>>>> algorithm itself can (and should!) run with memory occupied by the
>>>> result of reduction. The current implementation with Map first and
>>>> Reduce after that will always have these problems, using a cache for
>>>> temporary caching the result is only a workaround.
>>>>
>>>> The only situation when temporary cache could be useful is when the
>>>> result grows linearly (or close to that or even more) with the amount of
>>>> reduced entries. This would be the case for groupBy producing Map<Color,
>>>> List<Entry>> from all entries in cache. Then the task does not scale and
>>>> should be redesigned anyway, but flushing the results into cache backed
>>>> by cache store could help.
>>>>
>>>> Radim
>>>>
>>>> On 02/14/2014 04:54 PM, Vladimir Blagojevic wrote:
>>>>> Tristan,
>>>>>
>>>>> Actually they are not addressed in this pull request but the feature
>>>>> where custom output cache is used instead of results being returned is
>>>>> next in the implementation pipeline.
>>>>>
>>>>> Evangelos, indeed, depending on a reducer function all intermediate
>>>>> KOut/VOut pairs might be moved to a single node. How would custom cache
>>>>> help in this case?
>>>>>
>>>>> Regards,
>>>>> Vladimir
>>>>>
>>>>>
>>>>> On 2/14/2014, 10:16 AM, Tristan Tarrant wrote:
>>>>>> Hi Evangelos,
>>>>>>
>>>>>> you might be interested in looking into a current pull request which
>>>>>> addresses some (all?) of these issues
>>>>>>
>>>>>> https://github.com/infinispan/infinispan/pull/2300
>>>>>>
>>>>>> Tristan
>>>>>>
>>>>>> On 14/02/2014 16:10, Evangelos Vazaios wrote:
>>>>>>> Hello everyone,
>>>>>>>
>>>>>>> I started using the MapReduce implementation of Infinispan and I came
>>>>>>> across some possible limitations. Thus,  I want to make some
>> suggestions
>>>>>>> about the MapReduce (MR) implementation of Infinispan.
>>>>>>> Depending on the algorithm,  there might be some memory problems,
>>>>>>> especially for intermediate results.
>>>>>>> An example of such a case is  group by. Suppose that we have a
>> cluster
>>>>>>> of 2 nodes with 2 GB  available. Let a distributed cache, where
>> simple
>>>>>>> car objects (id,brand,colour) are stored and the total size of data
>> is
>>>>>>> 3.5GB. If all objects have the same colour , then all 3.5 GB would
>> go to
>>>>>>> only one reducer, as a result an OutOfMemoryException will be thrown.
>>>>>>>
>>>>>>> To overcome these limitations, I propose to add as parameter the
>> name of
>>>>>>> the intermediate cache to be used. This will enable the creation of a
>>>>>>> custom configured cache that deals with the memory limitations.
>>>>>>>
>>>>>>> Another feature that I would like to have is to set the name of the
>>>>>>> output cache. The reasoning behind this is similar to the one
>> mentioned
>>>>>>> above.
>>>>>>>
>>>>>>> I wait for your thoughts on these two suggestions.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Evangelos
>>>>>>> _______________________________________________
>>>>>>> infinispan-dev mailing list
>>>>>>> infinispan-dev at lists.jboss.org
>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> infinispan-dev mailing list
>>>>>> infinispan-dev at lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>
>>>> --
>>>> Radim Vansa <rvansa at redhat.com>
>>>> JBoss DataGrid QA
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
>> --
>> Radim Vansa <rvansa at redhat.com>
>> JBoss DataGrid QA
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
> 
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 


From dan.berindei at gmail.com  Tue Feb 18 06:40:49 2014
From: dan.berindei at gmail.com (Dan Berindei)
Date: Tue, 18 Feb 2014 13:40:49 +0200
Subject: [infinispan-dev] MapReduce limitations and suggestions.
In-Reply-To: <5303342A.1050800@gmail.com>
References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com>
	<52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com>
	<BD90C3B8-C267-4128-B662-392D846CDFF0@unine.ch>
	<530320F9.300@redhat.com>
	<CA+nfvwTKZTb+KnEkt3YpgFBdchyo_S+QrkxqO+acX9n1SdkbGg@mail.gmail.com>
	<5303342A.1050800@gmail.com>
Message-ID: <CA+nfvwSte75a78w2JYSy6sQ352H46n-hKSVU-T9tKTJWT5bR5w@mail.gmail.com>

On Tue, Feb 18, 2014 at 12:21 PM, Evangelos Vazaios <vagvaz at gmail.com>wrote:

> Hi Radim,
>
> Since Hadoop is the most popular implementation of MapReduce I will give
> a brief overview of how it works and then I'll provide with an example
> where the reducers must run over the whole list of values with the same
> key.
>
> Hadoop MR overview.
>
> MAP
>
> 1) Input file(s) are split into pieces of 64MB
> 2) For each split hadoop creates one map task and then assign the task
> to a cluster node
> 3) The splits are read as key,value pairs and the map function of Mapper
> is called. The mapper can output arbitrary number of intermediate
> key,value pairs
> 4) the output from the mapper is stored in a buffer in memory. After a
> certain threshold is reached the pairs are sorted by key and if there is
> a combiner it is run on the pairs that have the same key. Then, the
> output is flushed on the HDFS.
>

Ok, so Hadoop runs the combiner more or less concurrently with the mappers.

I'm curious if there are any M/R tasks that benefit from the sorting the
keys here, we just put the intermediate values in a Map<IK, Coll<IV>>. We
could do about the same by passing this map (or rather each entry in the
map) to the combiner when it reaches a certain threshold, but I'm not
convinced about the need to sort it.


> SHUFFLE
>
> hadoop decides the Reducer that should process each key by running a
> partitioner. The default partitioner decides with the following way:
> reducer = intermidKey.hashCode() % numberOfReducer
> Finally, the intermediate key,value pairs are sent to the reducers
>

Is this algorithm set in stone, in that some M/R tasks rely on it? In our
impl, the user could use grouping to direct a set of intermediate keys to
the same node for reducing, but otherwise the reducing node is more or less
random.


> REDUCE
>
> 1) Reducer sorts all key,value pairs by key and then groups the values
> with the same key. As a result reducers receive their keys sorted.
>

I guess this sorting is only relevant if the reduce phase happens on a
single thread, on a single node? If the reduce happens in parallel, the
ordering is going to be lost anyway.


> 2) for each Key,List<Value> the reduce function of the reducer is
> called. Reducer can also emit arbitrary number of key,value pairs
>

We limit the reducer (and the combiner) to emit a single value, which is
paired with the input key. We may need to lift this restriction, if only to
make porting/adapting tasks easier.


>
> Additionally, hadoop lets you customize almost every aspect of the code
> run from how the input is split and read as key value pairs to how it is
> partitioned and sorted.
>

Does that mean you can sort the values as well? I was thinking of each
reduce() call as independent, and then only the order of values for one
intermediate key would be relevant. I guess some tasks may require keeping
state across all the reduce() calls and then the order of key matters, but
then the reduce phase can't be parallelized, either across the cluster or
on a single node.


> A simple example is group by and computing an average over the grouped
> values. Let the dataset be webpages (url,domain,sentiment) and we want
> to compute the average sentiment for each domain in the dataset then the
> mapper for each webpages wp. will run
> map(wp.url,wp):
>   emit(wp.domain,wp.sentiment)
>
> and in reducer:
> reduce(domain,Iterable<Double> values):
>   counter = 0
>   sum = 0
>   while(values.hasNext())
>     counter++;
>     sum += values.next()
>   emit(domain,sum/counter)
>
> I know that this approach is not optimized. But, I wanted give a simple
> example.
>

I think it can also be optimized to use a combiner, if we emit a (domain,
counter, sum) tuple :)


> Dan, only the the values for one intermediate key must be in memory? or
> all the intermediate key,value pairs  that are assigned to one reducer
> must be in memory?
>

With the default configuration, all the key/value pairs assigned to one
reducer must be in memory. But one can define the __tmpMapReduce cache in
the configuration and configure eviction with a cache store (note that
because of how our eviction works, the actual container size is at least
concurrencyLevel rounded up to the next power of 2). The problem is that
there is only one configuration for all the M/R tasks [1].

Note that because we only run the combiner after the mapping phase is
complete, we do need to keep in memory all the results of the mapping phase
from that node (those are not stored in a cache). I've created an issue in
JIRA for this [2].

Cheers
Dan

[1] https://issues.jboss.org/browse/ISPN-4021
[2] https://issues.jboss.org/browse/ISPN-4022


> Cheers,
> Evangelos
>
> On 02/18/2014 11:59 AM, Dan Berindei wrote:
> > Radim, this is how our M/R algorithm works (Hadoop may do it
> differently):
> >
> > * The mapping phase generates a Map<IntKey, Collection<IntValue>> on each
> > node (Int meaning intermediate).
> > * In the combine (local reduce) phase, a combine operation takes as input
> > an IntKey and a Collection<IntValue> with only the values that were
> > produced on that node.
> > * In the (global) reduce phase, all the intermediate values for each key
> > are merged, and a reduce operation takes an intermediate key and a
> sequence
> > of *all* the intermediate values generated for that key. These reduce
> > operations are completely independent, so each intermediate key can be
> > mapped to a different node (distributed reduce), while still having
> access
> > to all the intermediate values at once.
> > * In the end, the collator takes the Map<IntKey, IntValue> from the
> reduce
> > phase and produces a single value.
> >
> > If a combiner can be used, then I believe it can also be run in parallel
> > with a LinkedBlockingQueue between the mapper and the combiner. But
> > sometimes the reduce algorithm can only be run on the entire collection
> of
> > values (e.g if you want to find the median, or a percentile).
> >
> > The limitation we have now is that in the reduce phase, the entire list
> of
> > values for one intermediate key must be in memory at once. I think Hadoop
> > only loads a block of intermediate values in memory at once, and can even
> > sort the intermediate values (with a user-supplied comparison function)
> so
> > that the reduce function can work on a sorted list without loading the
> > values in memory itself.
> >
> > Cheers
> > Dan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140218/a2543df8/attachment-0001.html 

From anistor at redhat.com  Tue Feb 18 07:02:03 2014
From: anistor at redhat.com (Adrian Nistor)
Date: Tue, 18 Feb 2014 14:02:03 +0200
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>	<20140205163032.GB93108@hibernate.org>	<F1EB6F85-D5B2-4459-B2BD-185124E046B3@redhat.com>	<20140217175115.GC639@hibernate.org>
	<8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org>
Message-ID: <53034BBB.1030809@redhat.com>

Well, OGM and Infinispan are different species :) So, Infinispan being 
what it is today - a non-homogenous, schema-less KV store, without 
support for entity associations (except embedding) - which simplifies 
the whole thing a lot, should we or should we not provide transparent 
cross-cacheManager search capabilities, in this exact context? Vote?

There were some points raised previously like /"if you search for more 
than one cache transparently, then you probably need to CRUD for more 
than one cache transparently as well"/. In the SQL world you would also 
probably CRUD against a table or set of tables and then query against a 
view - a bit like what we're doing here. I don't see any problem with 
this in principle. There is however something currently missing in the 
query result set API - it currently does not provide you the keys of the 
matching entities. People work around this by storing the key in the 
entity.  Now with the addition of the cross-cacheManager search we'll 
probably need to fix the result api  and also provide a reference to the 
cache (or just the name?) where the entity is stored.

The (enforced) one entity type per cache rule is not conceptually or 
technically required for implementing this, so I won't start raving 
against it :)  Sane users should apply it however.


On 02/18/2014 12:13 AM, Emmanuel Bernard wrote:
> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome.
>
>> On 17 f?vr. 2014, at 18:51, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>
>>> On Mon 2014-02-17 18:43, Galder Zamarre?o wrote:
>>>
>>>> On 05 Feb 2014, at 17:30, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>>>
>>>>> On Wed 2014-02-05 15:53, Mircea Markus wrote:
>>>>>
>>>>>> On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>>>>>
>>>>>> Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed.
>>>>> Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-)
>>>>
>>>> //some unified query giving me entries pointing by fk copy to bar and
>>>> //buz objects. So I need to manually load these references.
>>>>
>>>> //happy emmanuel
>>>> Cache unifiedCache = cacheManager.getMotherOfAllCaches();
>>>> Bar bar = unifiedCache.get(foo);
>>>> Buz buz = unifiedCache.get(baz);
>>>>
>>>> //not so happy emmanuel
>>>> Cache fooCache = cacheManager.getCache("foo");
>>>> Bar bar = fooCache.get(foo);
>>>> Cache bazCache = cacheManager.getCache("baz");
>>>> Buz buz = bazCache.put(baz);
>>> Would something like what Paul suggests in https://issues.jboss.org/browse/ISPN-3640 help you better? IOW, have a single cache, and then have a filtered view for Bar or Buz types? Not sure I understand the differences in your code changes in terms of what makes you happy vs not.
>> Not really.
>> What makes me unhappy is to have to keep in my app all the
>> references to these specific cache store instances. The filtering
>> approach only moves the problem.
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140218/45d4c949/attachment.html 

From vagvaz at gmail.com  Tue Feb 18 07:17:34 2014
From: vagvaz at gmail.com (Evangelos Vazaios)
Date: Tue, 18 Feb 2014 14:17:34 +0200
Subject: [infinispan-dev] MapReduce limitations and suggestions.
In-Reply-To: <CA+nfvwSte75a78w2JYSy6sQ352H46n-hKSVU-T9tKTJWT5bR5w@mail.gmail.com>
References: <52FE31FF.5050507@gmail.com>
	<52FE3335.1070806@redhat.com>	<52FE3C33.3070107@redhat.com>
	<5301BECC.7010901@redhat.com>	<BD90C3B8-C267-4128-B662-392D846CDFF0@unine.ch>	<530320F9.300@redhat.com>	<CA+nfvwTKZTb+KnEkt3YpgFBdchyo_S+QrkxqO+acX9n1SdkbGg@mail.gmail.com>	<5303342A.1050800@gmail.com>
	<CA+nfvwSte75a78w2JYSy6sQ352H46n-hKSVU-T9tKTJWT5bR5w@mail.gmail.com>
Message-ID: <53034F5E.6060706@gmail.com>

On 02/18/2014 01:40 PM, Dan Berindei wrote:
> On Tue, Feb 18, 2014 at 12:21 PM, Evangelos Vazaios <vagvaz at gmail.com>wrote:
> 
>> Hi Radim,
>>
>> Since Hadoop is the most popular implementation of MapReduce I will give
>> a brief overview of how it works and then I'll provide with an example
>> where the reducers must run over the whole list of values with the same
>> key.
>>
>> Hadoop MR overview.
>>
>> MAP
>>
>> 1) Input file(s) are split into pieces of 64MB
>> 2) For each split hadoop creates one map task and then assign the task
>> to a cluster node
>> 3) The splits are read as key,value pairs and the map function of Mapper
>> is called. The mapper can output arbitrary number of intermediate
>> key,value pairs
>> 4) the output from the mapper is stored in a buffer in memory. After a
>> certain threshold is reached the pairs are sorted by key and if there is
>> a combiner it is run on the pairs that have the same key. Then, the
>> output is flushed on the HDFS.
>>
> 
> Ok, so Hadoop runs the combiner more or less concurrently with the mappers.
> 
> I'm curious if there are any M/R tasks that benefit from the sorting the
> keys here, we just put the intermediate values in a Map<IK, Coll<IV>>. We
> could do about the same by passing this map (or rather each entry in the
> map) to the combiner when it reaches a certain threshold, but I'm not
> convinced about the need to sort it.
> 
Well there are algorithms that make use of it. Implementing a graph
algorithm can take use of it.Where the graph is split into k partitions
and each partition is assigned to one Mapper and Reducer. Mappers
compute the outgoing messages and output them to reducers. Then,
reducers can read the partition file sequentially to update the
vertices. This is just one use case that came to my mind.
> 
>> SHUFFLE
>>
>> hadoop decides the Reducer that should process each key by running a
>> partitioner. The default partitioner decides with the following way:
>> reducer = intermidKey.hashCode() % numberOfReducer
>> Finally, the intermediate key,value pairs are sent to the reducers
>>
> 
> Is this algorithm set in stone, in that some M/R tasks rely on it? In our
> impl, the user could use grouping to direct a set of intermediate keys to
> the same node for reducing, but otherwise the reducing node is more or less
> random.
> 
The default partitioner does exactly that check the actual code for
hadoop 1.2.1 here
http://goo.gl/he9yHO
> 
>> REDUCE
>>
>> 1) Reducer sorts all key,value pairs by key and then groups the values
>> with the same key. As a result reducers receive their keys sorted.
>>
> 
> I guess this sorting is only relevant if the reduce phase happens on a
> single thread, on a single node? If the reduce happens in parallel, the
> ordering is going to be lost anyway.
Each reduce task is run on a single thread, but you can run more than
one reduce tasks on a given node. The key ordering will not be lost. The
values are not ordered in any way. Moreover, the call to the reducer is
reduce(Key key, Iterable<Value> values) I cannot think of a way that the
order is lost.
> 
> 
>> 2) for each Key,List<Value> the reduce function of the reducer is
>> called. Reducer can also emit arbitrary number of key,value pairs
>>
> 
> We limit the reducer (and the combiner) to emit a single value, which is
> paired with the input key. We may need to lift this restriction, if only to
> make porting/adapting tasks easier.
> 
> 
>>
>> Additionally, hadoop lets you customize almost every aspect of the code
>> run from how the input is split and read as key value pairs to how it is
>> partitioned and sorted.
>>
> 
> Does that mean you can sort the values as well? I was thinking of each
> reduce() call as independent, and then only the order of values for one
> intermediate key would be relevant. I guess some tasks may require keeping
> state across all the reduce() calls and then the order of key matters, but
> then the reduce phase can't be parallelized, either across the cluster or
> on a single node.

I was not very clear here. You can set the partitioner for a specific
job. You may also set the key comparator, as a result change the way
that intermediate keys are sorted. Additionally, one can change how keys
are grouped into one reduce call by setting the GroupComparator class. A
simple example would be to have sales(date,amount) and you want to
create  totals for each month of the year.
so for the key: (year,month) and value: amount.
by overriding the keyClass hashCode function you can send all the
intermediate pairs with the same year to the same reducer

and then you can set the groupComparator to group together all the
values with the same year.

Cheers,
Evangelos


> 
>> A simple example is group by and computing an average over the grouped
>> values. Let the dataset be webpages (url,domain,sentiment) and we want
>> to compute the average sentiment for each domain in the dataset then the
>> mapper for each webpages wp. will run
>> map(wp.url,wp):
>>   emit(wp.domain,wp.sentiment)
>>
>> and in reducer:
>> reduce(domain,Iterable<Double> values):
>>   counter = 0
>>   sum = 0
>>   while(values.hasNext())
>>     counter++;
>>     sum += values.next()
>>   emit(domain,sum/counter)
>>
>> I know that this approach is not optimized. But, I wanted give a simple
>> example.
>>
> 
> I think it can also be optimized to use a combiner, if we emit a (domain,
> counter, sum) tuple :)

> 
> 
> 
>> Dan, only the the values for one intermediate key must be in memory? or
>> all the intermediate key,value pairs  that are assigned to one reducer
>> must be in memory?
>>
> 
> With the default configuration, all the key/value pairs assigned to one
> reducer must be in memory. But one can define the __tmpMapReduce cache in
> the configuration and configure eviction with a cache store (note that
> because of how our eviction works, the actual container size is at least
> concurrencyLevel rounded up to the next power of 2). The problem is that
> there is only one configuration for all the M/R tasks [1].
> 
> Note that because we only run the combiner after the mapping phase is
> complete, we do need to keep in memory all the results of the mapping phase
> from that node (those are not stored in a cache). I've created an issue in
> JIRA for this [2].
> 
> Cheers
> Dan
> 
> [1] https://issues.jboss.org/browse/ISPN-4021
> [2] https://issues.jboss.org/browse/ISPN-4022
> 
> 
> 
>> Cheers,
>> Evangelos
>>
>> On 02/18/2014 11:59 AM, Dan Berindei wrote:
>>> Radim, this is how our M/R algorithm works (Hadoop may do it
>> differently):
>>>
>>> * The mapping phase generates a Map<IntKey, Collection<IntValue>> on each
>>> node (Int meaning intermediate).
>>> * In the combine (local reduce) phase, a combine operation takes as input
>>> an IntKey and a Collection<IntValue> with only the values that were
>>> produced on that node.
>>> * In the (global) reduce phase, all the intermediate values for each key
>>> are merged, and a reduce operation takes an intermediate key and a
>> sequence
>>> of *all* the intermediate values generated for that key. These reduce
>>> operations are completely independent, so each intermediate key can be
>>> mapped to a different node (distributed reduce), while still having
>> access
>>> to all the intermediate values at once.
>>> * In the end, the collator takes the Map<IntKey, IntValue> from the
>> reduce
>>> phase and produces a single value.
>>>
>>> If a combiner can be used, then I believe it can also be run in parallel
>>> with a LinkedBlockingQueue between the mapper and the combiner. But
>>> sometimes the reduce algorithm can only be run on the entire collection
>> of
>>> values (e.g if you want to find the median, or a percentile).
>>>
>>> The limitation we have now is that in the reduce phase, the entire list
>> of
>>> values for one intermediate key must be in memory at once. I think Hadoop
>>> only loads a block of intermediate values in memory at once, and can even
>>> sort the intermediate values (with a user-supplied comparison function)
>> so
>>> that the reduce function can work on a sorted list without loading the
>>> values in memory itself.
>>>
>>> Cheers
>>> Dan
>>
> 
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 


From tsykora at redhat.com  Tue Feb 18 07:35:21 2014
From: tsykora at redhat.com (Tomas Sykora)
Date: Tue, 18 Feb 2014 07:35:21 -0500 (EST)
Subject: [infinispan-dev] Introducing Infinispan OData server: Remote JSON
	documents querying
In-Reply-To: <1418358555.4942381.1392725971149.JavaMail.zimbra@redhat.com>
Message-ID: <1836092005.4948036.1392726921319.JavaMail.zimbra@redhat.com>

Hello all! :)

It's the right time to make it a little bit more public and share some results of work on Infinispan OData server, finally!
This solution can serve as a proof of concept where we are able to remotely query JSON documents stored in Infinispan caches and using industrial standard and platform independent way of communication with the server (OData).

There is still much to do/implement/improve in the server, but it is working as it is now.

Check a blog post if you are interested:
http://tsykora-tech.blogspot.cz/2014/02/introducing-infinispan-odata-server.html

Any feedback is more than welcome.

+ I'd like to say a big THANK YOU to all who supported me! 
Mainly: JDG QE guys, Manik, Mircea, Sanne and Adrian. 
It wouldn't be done without your patience and willingness to help me :-)

Tomas

From emmanuel at hibernate.org  Tue Feb 18 08:01:22 2014
From: emmanuel at hibernate.org (Emmanuel Bernard)
Date: Tue, 18 Feb 2014 14:01:22 +0100
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <53034BBB.1030809@redhat.com>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>
	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>
	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>
	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>
	<20140205163032.GB93108@hibernate.org>
	<F1EB6F85-D5B2-4459-B2BD-185124E046B3@redhat.com>
	<20140217175115.GC639@hibernate.org>
	<8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org>
	<53034BBB.1030809@redhat.com>
Message-ID: <20140218130122.GA11962@hibernate.org>

On Tue 2014-02-18 14:02, Adrian Nistor wrote:
> Well, OGM and Infinispan are different species :) So, Infinispan being what
> it is today - a non-homogenous, schema-less KV store, without support for
> entity associations (except embedding) - which simplifies the whole thing a
> lot, should we or should we not provide transparent cross-cacheManager
> search capabilities, in this exact context? Vote?

Yes it makes sense to do queries like

    where name or title = "foo" AND description or content contains "bar"

over a heterogeneous set (say books and DVDs)

But if you had in mind to do joins between different entries in the
cache, then this would require some cross-cache map reduce and be
inefficient so that's not a good use case.

> 
> There were some points raised previously like /"if you search for more than
> one cache transparently, then you probably need to CRUD for more than one
> cache transparently as well"/. In the SQL world you would also probably CRUD
> against a table or set of tables and then query against a view - a bit like
> what we're doing here. I don't see any problem with this in principle. There
> is however something currently missing in the query result set API - it
> currently does not provide you the keys of the matching entities. People

Really? I think we have the info in the index at least when the
"ProvidedId" and the keys are the same.

> work around this by storing the key in the entity.  Now with the addition of
> the cross-cacheManager search we'll probably need to fix the result api  and
> also provide a reference to the cache (or just the name?) where the entity
> is stored.

Right, I'm not sure Sanne agrees with me yet but you need to store the
cache name in the index. Hibernate Search can reason at query time to
see if it can avoid using this term to speed things up (massively). That
will depend whether or no indexes are shared between caches.


From sanne at infinispan.org  Tue Feb 18 08:27:03 2014
From: sanne at infinispan.org (Sanne Grinovero)
Date: Tue, 18 Feb 2014 13:27:03 +0000
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <20140218130122.GA11962@hibernate.org>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>
	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>
	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>
	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>
	<20140205163032.GB93108@hibernate.org>
	<F1EB6F85-D5B2-4459-B2BD-185124E046B3@redhat.com>
	<20140217175115.GC639@hibernate.org>
	<8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org>
	<53034BBB.1030809@redhat.com> <20140218130122.GA11962@hibernate.org>
Message-ID: <CAFm4XO1_nnOwxvLk8a-E=VWUANCMR_41NORfMXDpbOTaF6iXYA@mail.gmail.com>

On 18 February 2014 13:01, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
> On Tue 2014-02-18 14:02, Adrian Nistor wrote:
>> Well, OGM and Infinispan are different species :) So, Infinispan being what
>> it is today - a non-homogenous, schema-less KV store, without support for
>> entity associations (except embedding) - which simplifies the whole thing a
>> lot, should we or should we not provide transparent cross-cacheManager
>> search capabilities, in this exact context? Vote?
>
> Yes it makes sense to do queries like
>
>     where name or title = "foo" AND description or content contains "bar"
>
> over a heterogeneous set (say books and DVDs)

Right

>
> But if you had in mind to do joins between different entries in the
> cache, then this would require some cross-cache map reduce and be
> inefficient so that's not a good use case.

+1

>
>>
>> There were some points raised previously like /"if you search for more than
>> one cache transparently, then you probably need to CRUD for more than one
>> cache transparently as well"/. In the SQL world you would also probably CRUD
>> against a table or set of tables and then query against a view - a bit like
>> what we're doing here. I don't see any problem with this in principle. There
>> is however something currently missing in the query result set API - it
>> currently does not provide you the keys of the matching entities. People
>
> Really? I think we have the info in the index at least when the
> "ProvidedId" and the keys are the same.

We have this info in the engine, but the results to the user don't
usually include the keys.
For some this is a bit unnatural: a different perspective would be to
return _only_ the keys and avoid doing the lookup.

We provide a "LazyIterator" on the results which fetches only each
matching entry on demand, which I think covers a good deal of use
cases but there might be other usages for these keys.

I would be great if we had Lambda support to allow users to say what
they want us to do with the resultset, rather than fetching it.

>
>> work around this by storing the key in the entity.  Now with the addition of
>> the cross-cacheManager search we'll probably need to fix the result api  and
>> also provide a reference to the cache (or just the name?) where the entity
>> is stored.
>
> Right, I'm not sure Sanne agrees with me yet but you need to store the
> cache name in the index. Hibernate Search can reason at query time to
> see if it can avoid using this term to speed things up (massively). That
> will depend whether or no indexes are shared between caches.

I do agree that this would be required, but I'm sad on the
implications this has.
To allow those not familiar with Lucene to understand the
consequences: deleting a single entry from the index by using a single
term - like the key could be - is many orders of magnitude more
efficient than deleting from an index by "composite keys", like it
would be if we need to delete by tuples { cachename, typename, id }.

Considering that in Infinispan I can never be sure if a key already
existed or not (which is a fundamental difference when comparing to
Search/ORM), ANY WRITE on Infinispan triggers a delete operation
first.
Not least, such a delete requires an index flush, while we normally
just flush at the end of the batch (transaction).

In other words if we could avoid needing to discriminate an index
entry by Cache Name, each and every operation would be many orders of
magniture more efficient.

To be noted that even today we aren't achieving this higher efficiency
mode because we're using the tuple { typename, id}, but that's a
legacy mapping related to how Search could handle multi-table
structures and I was planning to finally enable this very interesting
optimization in the next few weeks in the scope of Search5.

I do agree that supporting Queries on multiple Caches (cross-cache but
no joins) makes sense, but if only we could figure out a way to move
away from "dynamically defined indexed types" we could apply many of
these optimizations transparently, when we know there is no risk of
key ambiguity.

We've been through a lot of trouble just to allow the user to not
register his indexed types upfront, but I don't think it's worth it.
After all, the user still has to annotate or provide a schema: listing
the types would be the lesser pain.

- Sanne


>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

From rvansa at redhat.com  Tue Feb 18 08:36:01 2014
From: rvansa at redhat.com (Radim Vansa)
Date: Tue, 18 Feb 2014 14:36:01 +0100
Subject: [infinispan-dev] MapReduce limitations and suggestions.
In-Reply-To: <53034F5E.6060706@gmail.com>
References: <52FE31FF.5050507@gmail.com>
	<52FE3335.1070806@redhat.com>	<52FE3C33.3070107@redhat.com>
	<5301BECC.7010901@redhat.com>	<BD90C3B8-C267-4128-B662-392D846CDFF0@unine.ch>	<530320F9.300@redhat.com>	<CA+nfvwTKZTb+KnEkt3YpgFBdchyo_S+QrkxqO+acX9n1SdkbGg@mail.gmail.com>	<5303342A.1050800@gmail.com>
	<CA+nfvwSte75a78w2JYSy6sQ352H46n-hKSVU-T9tKTJWT5bR5w@mail.gmail.com>
	<53034F5E.6060706@gmail.com>
Message-ID: <530361C1.3030404@redhat.com>

Thanks a lot for this explanations, guys (Dan and Evangelos), I was 
confused with nomenclature in Hadoop/Infinispan vs. wiki/something I 
learned in the past. I was considering M/R to be

      node1     |    node2 |
---------------|--------------|
  K1,V1 | K2,V2 | K3,V3 | K4,V4|
    |   |   |   |  |   |   |  |
    v   |   v   |   v   |   v |MAP
   Foo  | null  |  Bar  |  Goo |
------------------------------|
    \           |   \       /  | LOCAL
      Foo       |     BarGoo   | REDUCE
       |        |       |      |
------------------------------|
        \              / |GLOBAL
           FooBarGoo           | REDUCE
------------------------------|

But now I understand that the model introduced here is somewhat different.
I have propagated parallel Map-Combine, but I understand that now you're 
trying to solve problem in the reduce phase.

Thanks again

Radim

On 02/18/2014 01:17 PM, Evangelos Vazaios wrote:
> On 02/18/2014 01:40 PM, Dan Berindei wrote:
>> On Tue, Feb 18, 2014 at 12:21 PM, Evangelos Vazaios <vagvaz at gmail.com>wrote:
>>
>>> Hi Radim,
>>>
>>> Since Hadoop is the most popular implementation of MapReduce I will give
>>> a brief overview of how it works and then I'll provide with an example
>>> where the reducers must run over the whole list of values with the same
>>> key.
>>>
>>> Hadoop MR overview.
>>>
>>> MAP
>>>
>>> 1) Input file(s) are split into pieces of 64MB
>>> 2) For each split hadoop creates one map task and then assign the task
>>> to a cluster node
>>> 3) The splits are read as key,value pairs and the map function of Mapper
>>> is called. The mapper can output arbitrary number of intermediate
>>> key,value pairs
>>> 4) the output from the mapper is stored in a buffer in memory. After a
>>> certain threshold is reached the pairs are sorted by key and if there is
>>> a combiner it is run on the pairs that have the same key. Then, the
>>> output is flushed on the HDFS.
>>>
>> Ok, so Hadoop runs the combiner more or less concurrently with the mappers.
>>
>> I'm curious if there are any M/R tasks that benefit from the sorting the
>> keys here, we just put the intermediate values in a Map<IK, Coll<IV>>. We
>> could do about the same by passing this map (or rather each entry in the
>> map) to the combiner when it reaches a certain threshold, but I'm not
>> convinced about the need to sort it.
>>
> Well there are algorithms that make use of it. Implementing a graph
> algorithm can take use of it.Where the graph is split into k partitions
> and each partition is assigned to one Mapper and Reducer. Mappers
> compute the outgoing messages and output them to reducers. Then,
> reducers can read the partition file sequentially to update the
> vertices. This is just one use case that came to my mind.
>>> SHUFFLE
>>>
>>> hadoop decides the Reducer that should process each key by running a
>>> partitioner. The default partitioner decides with the following way:
>>> reducer = intermidKey.hashCode() % numberOfReducer
>>> Finally, the intermediate key,value pairs are sent to the reducers
>>>
>> Is this algorithm set in stone, in that some M/R tasks rely on it? In our
>> impl, the user could use grouping to direct a set of intermediate keys to
>> the same node for reducing, but otherwise the reducing node is more or less
>> random.
>>
> The default partitioner does exactly that check the actual code for
> hadoop 1.2.1 here
> http://goo.gl/he9yHO
>>> REDUCE
>>>
>>> 1) Reducer sorts all key,value pairs by key and then groups the values
>>> with the same key. As a result reducers receive their keys sorted.
>>>
>> I guess this sorting is only relevant if the reduce phase happens on a
>> single thread, on a single node? If the reduce happens in parallel, the
>> ordering is going to be lost anyway.
> Each reduce task is run on a single thread, but you can run more than
> one reduce tasks on a given node. The key ordering will not be lost. The
> values are not ordered in any way. Moreover, the call to the reducer is
> reduce(Key key, Iterable<Value> values) I cannot think of a way that the
> order is lost.
>>
>>> 2) for each Key,List<Value> the reduce function of the reducer is
>>> called. Reducer can also emit arbitrary number of key,value pairs
>>>
>> We limit the reducer (and the combiner) to emit a single value, which is
>> paired with the input key. We may need to lift this restriction, if only to
>> make porting/adapting tasks easier.
>>
>>
>>> Additionally, hadoop lets you customize almost every aspect of the code
>>> run from how the input is split and read as key value pairs to how it is
>>> partitioned and sorted.
>>>
>> Does that mean you can sort the values as well? I was thinking of each
>> reduce() call as independent, and then only the order of values for one
>> intermediate key would be relevant. I guess some tasks may require keeping
>> state across all the reduce() calls and then the order of key matters, but
>> then the reduce phase can't be parallelized, either across the cluster or
>> on a single node.
> I was not very clear here. You can set the partitioner for a specific
> job. You may also set the key comparator, as a result change the way
> that intermediate keys are sorted. Additionally, one can change how keys
> are grouped into one reduce call by setting the GroupComparator class. A
> simple example would be to have sales(date,amount) and you want to
> create  totals for each month of the year.
> so for the key: (year,month) and value: amount.
> by overriding the keyClass hashCode function you can send all the
> intermediate pairs with the same year to the same reducer
>
> and then you can set the groupComparator to group together all the
> values with the same year.
>
> Cheers,
> Evangelos
>
>
>>> A simple example is group by and computing an average over the grouped
>>> values. Let the dataset be webpages (url,domain,sentiment) and we want
>>> to compute the average sentiment for each domain in the dataset then the
>>> mapper for each webpages wp. will run
>>> map(wp.url,wp):
>>>    emit(wp.domain,wp.sentiment)
>>>
>>> and in reducer:
>>> reduce(domain,Iterable<Double> values):
>>>    counter = 0
>>>    sum = 0
>>>    while(values.hasNext())
>>>      counter++;
>>>      sum += values.next()
>>>    emit(domain,sum/counter)
>>>
>>> I know that this approach is not optimized. But, I wanted give a simple
>>> example.
>>>
>> I think it can also be optimized to use a combiner, if we emit a (domain,
>> counter, sum) tuple :)
>>
>>
>>> Dan, only the the values for one intermediate key must be in memory? or
>>> all the intermediate key,value pairs  that are assigned to one reducer
>>> must be in memory?
>>>
>> With the default configuration, all the key/value pairs assigned to one
>> reducer must be in memory. But one can define the __tmpMapReduce cache in
>> the configuration and configure eviction with a cache store (note that
>> because of how our eviction works, the actual container size is at least
>> concurrencyLevel rounded up to the next power of 2). The problem is that
>> there is only one configuration for all the M/R tasks [1].
>>
>> Note that because we only run the combiner after the mapping phase is
>> complete, we do need to keep in memory all the results of the mapping phase
>> from that node (those are not stored in a cache). I've created an issue in
>> JIRA for this [2].
>>
>> Cheers
>> Dan
>>
>> [1] https://issues.jboss.org/browse/ISPN-4021
>> [2] https://issues.jboss.org/browse/ISPN-4022
>>
>>
>>
>>> Cheers,
>>> Evangelos
>>>
>>> On 02/18/2014 11:59 AM, Dan Berindei wrote:
>>>> Radim, this is how our M/R algorithm works (Hadoop may do it
>>> differently):
>>>> * The mapping phase generates a Map<IntKey, Collection<IntValue>> on each
>>>> node (Int meaning intermediate).
>>>> * In the combine (local reduce) phase, a combine operation takes as input
>>>> an IntKey and a Collection<IntValue> with only the values that were
>>>> produced on that node.
>>>> * In the (global) reduce phase, all the intermediate values for each key
>>>> are merged, and a reduce operation takes an intermediate key and a
>>> sequence
>>>> of *all* the intermediate values generated for that key. These reduce
>>>> operations are completely independent, so each intermediate key can be
>>>> mapped to a different node (distributed reduce), while still having
>>> access
>>>> to all the intermediate values at once.
>>>> * In the end, the collator takes the Map<IntKey, IntValue> from the
>>> reduce
>>>> phase and produces a single value.
>>>>
>>>> If a combiner can be used, then I believe it can also be run in parallel
>>>> with a LinkedBlockingQueue between the mapper and the combiner. But
>>>> sometimes the reduce algorithm can only be run on the entire collection
>>> of
>>>> values (e.g if you want to find the median, or a percentile).
>>>>
>>>> The limitation we have now is that in the reduce phase, the entire list
>>> of
>>>> values for one intermediate key must be in memory at once. I think Hadoop
>>>> only loads a block of intermediate values in memory at once, and can even
>>>> sort the intermediate values (with a user-supplied comparison function)
>>> so
>>>> that the reduce function can work on a sorted list without loading the
>>>> values in memory itself.
>>>>
>>>> Cheers
>>>> Dan
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


-- 
Radim Vansa <rvansa at redhat.com>
JBoss DataGrid QA


From dan.berindei at gmail.com  Tue Feb 18 09:39:04 2014
From: dan.berindei at gmail.com (Dan Berindei)
Date: Tue, 18 Feb 2014 16:39:04 +0200
Subject: [infinispan-dev] MapReduce limitations and suggestions.
In-Reply-To: <53034F5E.6060706@gmail.com>
References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com>
	<52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com>
	<BD90C3B8-C267-4128-B662-392D846CDFF0@unine.ch>
	<530320F9.300@redhat.com>
	<CA+nfvwTKZTb+KnEkt3YpgFBdchyo_S+QrkxqO+acX9n1SdkbGg@mail.gmail.com>
	<5303342A.1050800@gmail.com>
	<CA+nfvwSte75a78w2JYSy6sQ352H46n-hKSVU-T9tKTJWT5bR5w@mail.gmail.com>
	<53034F5E.6060706@gmail.com>
Message-ID: <CA+nfvwT7hiFMM+GbcxGNBOM6NSH5+xUSeKke6xqhk0wfEoOHDw@mail.gmail.com>

On Tue, Feb 18, 2014 at 2:17 PM, Evangelos Vazaios <vagvaz at gmail.com> wrote:

> On 02/18/2014 01:40 PM, Dan Berindei wrote:
> > On Tue, Feb 18, 2014 at 12:21 PM, Evangelos Vazaios <vagvaz at gmail.com
> >wrote:
> >
> >> Hi Radim,
> >>
> >> Since Hadoop is the most popular implementation of MapReduce I will give
> >> a brief overview of how it works and then I'll provide with an example
> >> where the reducers must run over the whole list of values with the same
> >> key.
> >>
> >> Hadoop MR overview.
> >>
> >> MAP
> >>
> >> 1) Input file(s) are split into pieces of 64MB
> >> 2) For each split hadoop creates one map task and then assign the task
> >> to a cluster node
> >> 3) The splits are read as key,value pairs and the map function of Mapper
> >> is called. The mapper can output arbitrary number of intermediate
> >> key,value pairs
> >> 4) the output from the mapper is stored in a buffer in memory. After a
> >> certain threshold is reached the pairs are sorted by key and if there is
> >> a combiner it is run on the pairs that have the same key. Then, the
> >> output is flushed on the HDFS.
> >>
> >
> > Ok, so Hadoop runs the combiner more or less concurrently with the
> mappers.
> >
> > I'm curious if there are any M/R tasks that benefit from the sorting the
> > keys here, we just put the intermediate values in a Map<IK, Coll<IV>>. We
> > could do about the same by passing this map (or rather each entry in the
> > map) to the combiner when it reaches a certain threshold, but I'm not
> > convinced about the need to sort it.
> >
> Well there are algorithms that make use of it. Implementing a graph
> algorithm can take use of it.Where the graph is split into k partitions
> and each partition is assigned to one Mapper and Reducer. Mappers
> compute the outgoing messages and output them to reducers. Then,
> reducers can read the partition file sequentially to update the
> vertices. This is just one use case that came to my mind.
>

I thought the partitioning only happens during the shuffle phase, and
mappers/combiners don't know about partitions at all?
I understand that reducers may need the intermediary keys to be sorted, I'm
asking about the combiners, since even if the keys from one block are
sorted, the complete list of keys they receive is not sorted (unless a new
combiner is created for each input block).

>
> >
> >> SHUFFLE
> >>
> >> hadoop decides the Reducer that should process each key by running a
> >> partitioner. The default partitioner decides with the following way:
> >> reducer = intermidKey.hashCode() % numberOfReducer
> >> Finally, the intermediate key,value pairs are sent to the reducers
> >>
> >
> > Is this algorithm set in stone, in that some M/R tasks rely on it? In our
> > impl, the user could use grouping to direct a set of intermediate keys to
> > the same node for reducing, but otherwise the reducing node is more or
> less
> > random.
> >
> The default partitioner does exactly that check the actual code for
> hadoop 1.2.1 here
> http://goo.gl/he9yHO
>

So API documentation doesn't specify it, but users still rely on this
particular behaviour?

BTW, is there always one reducer one each node, or can there be multiple
reducers on each node? If it's the latter, it should be relatively easy to
model this in Infinispan using grouping. If it's the former, I'm not so
sure...


> >
> >> REDUCE
> >>
> >> 1) Reducer sorts all key,value pairs by key and then groups the values
> >> with the same key. As a result reducers receive their keys sorted.
> >>
> >
> > I guess this sorting is only relevant if the reduce phase happens on a
> > single thread, on a single node? If the reduce happens in parallel, the
> > ordering is going to be lost anyway.
> Each reduce task is run on a single thread, but you can run more than
> one reduce tasks on a given node. The key ordering will not be lost. The
> values are not ordered in any way. Moreover, the call to the reducer is
> reduce(Key key, Iterable<Value> values) I cannot think of a way that the
> order is lost.
> >
>

Right, the call to the reducer is with a single key, but I'm assuming the
order of the calls matters (e.g. because the reduces keeps some internal
state across reduce() calls), otherwise there's no point in sorting the
keys. Calling the same reducer from multiple threads (like we do) would
definitely mess up the order of the calls.

ATM we only have one reducer per node, which can be called from multiple
threads, but it shouldn't be too hard to allow multiple reducers per node
and to run each of them in a single thread.


> >
> >> 2) for each Key,List<Value> the reduce function of the reducer is
> >> called. Reducer can also emit arbitrary number of key,value pairs
> >>
> >
> > We limit the reducer (and the combiner) to emit a single value, which is
> > paired with the input key. We may need to lift this restriction, if only
> to
> > make porting/adapting tasks easier.
> >
> >
> >>
> >> Additionally, hadoop lets you customize almost every aspect of the code
> >> run from how the input is split and read as key value pairs to how it is
> >> partitioned and sorted.
> >>
> >
> > Does that mean you can sort the values as well? I was thinking of each
> > reduce() call as independent, and then only the order of values for one
> > intermediate key would be relevant. I guess some tasks may require
> keeping
> > state across all the reduce() calls and then the order of key matters,
> but
> > then the reduce phase can't be parallelized, either across the cluster or
> > on a single node.
>
> I was not very clear here. You can set the partitioner for a specific
> job. You may also set the key comparator, as a result change the way
> that intermediate keys are sorted. Additionally, one can change how keys
> are grouped into one reduce call by setting the GroupComparator class. A
> simple example would be to have sales(date,amount) and you want to
> create  totals for each month of the year.
> so for the key: (year,month) and value: amount.
> by overriding the keyClass hashCode function you can send all the
> intermediate pairs with the same year to the same reducer
>
> and then you can set the groupComparator to group together all the
> values with the same year.
>

You mean set the groupComparator to group together all the values with the
same month? I don't think so, because the key is already (year, month). But
if you wanted to collect the totals for each year you could just use the
year as the intermediary key. So I don't quite understand how your example
is supposed to work.

Besides, each reduce() call receives just one key, if you have keys (2013,
1) and (2013, 2) and the groupComparator decides they should map to the
same group, which key does the reducer see? I think a regular equals()
should be good enough for us here, since we already need equals() in order
to put the intermediary keys in the intermediary cache.

Cheers
Dan


>
> Cheers,
> Evangelos
>
>
> >
> >> A simple example is group by and computing an average over the grouped
> >> values. Let the dataset be webpages (url,domain,sentiment) and we want
> >> to compute the average sentiment for each domain in the dataset then the
> >> mapper for each webpages wp. will run
> >> map(wp.url,wp):
> >>   emit(wp.domain,wp.sentiment)
> >>
> >> and in reducer:
> >> reduce(domain,Iterable<Double> values):
> >>   counter = 0
> >>   sum = 0
> >>   while(values.hasNext())
> >>     counter++;
> >>     sum += values.next()
> >>   emit(domain,sum/counter)
> >>
> >> I know that this approach is not optimized. But, I wanted give a simple
> >> example.
> >>
> >
> > I think it can also be optimized to use a combiner, if we emit a (domain,
> > counter, sum) tuple :)
>
> >
> >
> >
> >> Dan, only the the values for one intermediate key must be in memory? or
> >> all the intermediate key,value pairs  that are assigned to one reducer
> >> must be in memory?
> >>
> >
> > With the default configuration, all the key/value pairs assigned to one
> > reducer must be in memory. But one can define the __tmpMapReduce cache in
> > the configuration and configure eviction with a cache store (note that
> > because of how our eviction works, the actual container size is at least
> > concurrencyLevel rounded up to the next power of 2). The problem is that
> > there is only one configuration for all the M/R tasks [1].
> >
> > Note that because we only run the combiner after the mapping phase is
> > complete, we do need to keep in memory all the results of the mapping
> phase
> > from that node (those are not stored in a cache). I've created an issue
> in
> > JIRA for this [2].
> >
> > Cheers
> > Dan
> >
> > [1] https://issues.jboss.org/browse/ISPN-4021
> > [2] https://issues.jboss.org/browse/ISPN-4022
> >
> >
> >
> >> Cheers,
> >> Evangelos
> >>
> >> On 02/18/2014 11:59 AM, Dan Berindei wrote:
> >>> Radim, this is how our M/R algorithm works (Hadoop may do it
> >> differently):
> >>>
> >>> * The mapping phase generates a Map<IntKey, Collection<IntValue>> on
> each
> >>> node (Int meaning intermediate).
> >>> * In the combine (local reduce) phase, a combine operation takes as
> input
> >>> an IntKey and a Collection<IntValue> with only the values that were
> >>> produced on that node.
> >>> * In the (global) reduce phase, all the intermediate values for each
> key
> >>> are merged, and a reduce operation takes an intermediate key and a
> >> sequence
> >>> of *all* the intermediate values generated for that key. These reduce
> >>> operations are completely independent, so each intermediate key can be
> >>> mapped to a different node (distributed reduce), while still having
> >> access
> >>> to all the intermediate values at once.
> >>> * In the end, the collator takes the Map<IntKey, IntValue> from the
> >> reduce
> >>> phase and produces a single value.
> >>>
> >>> If a combiner can be used, then I believe it can also be run in
> parallel
> >>> with a LinkedBlockingQueue between the mapper and the combiner. But
> >>> sometimes the reduce algorithm can only be run on the entire collection
> >> of
> >>> values (e.g if you want to find the median, or a percentile).
> >>>
> >>> The limitation we have now is that in the reduce phase, the entire list
> >> of
> >>> values for one intermediate key must be in memory at once. I think
> Hadoop
> >>> only loads a block of intermediate values in memory at once, and can
> even
> >>> sort the intermediate values (with a user-supplied comparison function)
> >> so
> >>> that the reduce function can work on a sorted list without loading the
> >>> values in memory itself.
> >>>
> >>> Cheers
> >>> Dan
> >>
> >
> >
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140218/a1c60195/attachment-0001.html 

From emmanuel at hibernate.org  Tue Feb 18 09:47:42 2014
From: emmanuel at hibernate.org (Emmanuel Bernard)
Date: Tue, 18 Feb 2014 15:47:42 +0100
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <CAFm4XO1_nnOwxvLk8a-E=VWUANCMR_41NORfMXDpbOTaF6iXYA@mail.gmail.com>
References: <CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>
	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>
	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>
	<20140205163032.GB93108@hibernate.org>
	<F1EB6F85-D5B2-4459-B2BD-185124E046B3@redhat.com>
	<20140217175115.GC639@hibernate.org>
	<8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org>
	<53034BBB.1030809@redhat.com>
	<20140218130122.GA11962@hibernate.org>
	<CAFm4XO1_nnOwxvLk8a-E=VWUANCMR_41NORfMXDpbOTaF6iXYA@mail.gmail.com>
Message-ID: <20140218144742.GD11962@hibernate.org>

On Tue 2014-02-18 13:27, Sanne Grinovero wrote:
> On 18 February 2014 13:01, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
> > On Tue 2014-02-18 14:02, Adrian Nistor wrote:
> >> There were some points raised previously like /"if you search for more than
> >> one cache transparently, then you probably need to CRUD for more than one
> >> cache transparently as well"/. In the SQL world you would also probably CRUD
> >> against a table or set of tables and then query against a view - a bit like
> >> what we're doing here. I don't see any problem with this in principle. There
> >> is however something currently missing in the query result set API - it
> >> currently does not provide you the keys of the matching entities. People
> >
> > Really? I think we have the info in the index at least when the
> > "ProvidedId" and the keys are the same.
> 
> We have this info in the engine, but the results to the user don't
> usually include the keys.
> For some this is a bit unnatural: a different perspective would be to
> return _only_ the keys and avoid doing the lookup.
> 
> We provide a "LazyIterator" on the results which fetches only each
> matching entry on demand, which I think covers a good deal of use
> cases but there might be other usages for these keys.
> 
> I would be great if we had Lambda support to allow users to say what
> they want us to do with the resultset, rather than fetching it.

I was thinking of offering a way to project the key / id

    select key(user) from User user where user.email = "emmanuel at hibernate.org"
    select key(user), user from User user where user.email = "emmanuel at hibernate.org"

If you guys really want, you can add a cache(user) function as well to
project the Cache instance. Looks wrong at first sight though.

From vagvaz at gmail.com  Tue Feb 18 10:33:20 2014
From: vagvaz at gmail.com (Evangelos Vazaios)
Date: Tue, 18 Feb 2014 17:33:20 +0200
Subject: [infinispan-dev] MapReduce limitations and suggestions.
In-Reply-To: <CA+nfvwT7hiFMM+GbcxGNBOM6NSH5+xUSeKke6xqhk0wfEoOHDw@mail.gmail.com>
References: <52FE31FF.5050507@gmail.com>
	<52FE3335.1070806@redhat.com>	<52FE3C33.3070107@redhat.com>
	<5301BECC.7010901@redhat.com>	<BD90C3B8-C267-4128-B662-392D846CDFF0@unine.ch>	<530320F9.300@redhat.com>	<CA+nfvwTKZTb+KnEkt3YpgFBdchyo_S+QrkxqO+acX9n1SdkbGg@mail.gmail.com>	<5303342A.1050800@gmail.com>	<CA+nfvwSte75a78w2JYSy6sQ352H46n-hKSVU-T9tKTJWT5bR5w@mail.gmail.com>	<53034F5E.6060706@gmail.com>
	<CA+nfvwT7hiFMM+GbcxGNBOM6NSH5+xUSeKke6xqhk0wfEoOHDw@mail.gmail.com>
Message-ID: <53037D40.6060101@gmail.com>


On 02/18/2014 04:39 PM, Dan Berindei wrote:
> On Tue, Feb 18, 2014 at 2:17 PM, Evangelos Vazaios <vagvaz at gmail.com> wrote:
> 
>> On 02/18/2014 01:40 PM, Dan Berindei wrote:
>>> On Tue, Feb 18, 2014 at 12:21 PM, Evangelos Vazaios <vagvaz at gmail.com
>>> wrote:
>>>
>>>> Hi Radim,
>>>>
>>>> Since Hadoop is the most popular implementation of MapReduce I will give
>>>> a brief overview of how it works and then I'll provide with an example
>>>> where the reducers must run over the whole list of values with the same
>>>> key.
>>>>
>>>> Hadoop MR overview.
>>>>
>>>> MAP
>>>>
>>>> 1) Input file(s) are split into pieces of 64MB
>>>> 2) For each split hadoop creates one map task and then assign the task
>>>> to a cluster node
>>>> 3) The splits are read as key,value pairs and the map function of Mapper
>>>> is called. The mapper can output arbitrary number of intermediate
>>>> key,value pairs
>>>> 4) the output from the mapper is stored in a buffer in memory. After a
>>>> certain threshold is reached the pairs are sorted by key and if there is
>>>> a combiner it is run on the pairs that have the same key. Then, the
>>>> output is flushed on the HDFS.
>>>>
>>>
>>> Ok, so Hadoop runs the combiner more or less concurrently with the
>> mappers.
>>>
>>> I'm curious if there are any M/R tasks that benefit from the sorting the
>>> keys here, we just put the intermediate values in a Map<IK, Coll<IV>>. We
>>> could do about the same by passing this map (or rather each entry in the
>>> map) to the combiner when it reaches a certain threshold, but I'm not
>>> convinced about the need to sort it.
>>>
>> Well there are algorithms that make use of it. Implementing a graph
>> algorithm can take use of it.Where the graph is split into k partitions
>> and each partition is assigned to one Mapper and Reducer. Mappers
>> compute the outgoing messages and output them to reducers. Then,
>> reducers can read the partition file sequentially to update the
>> vertices. This is just one use case that came to my mind.
>>
> 
> I thought the partitioning only happens during the shuffle phase, and
> mappers/combiners don't know about partitions at all?
> I understand that reducers may need the intermediary keys to be sorted, I'm
> asking about the combiners, since even if the keys from one block are
> sorted, the complete list of keys they receive is not sorted (unless a new
> combiner is created for each input block).
You are absolutely right partitioning happens during the shuffle phase
and mappers/combiners do not know about partitions. Did I say something
different?
> 
>>
>>>
>>>> SHUFFLE
>>>>
>>>> hadoop decides the Reducer that should process each key by running a
>>>> partitioner. The default partitioner decides with the following way:
>>>> reducer = intermidKey.hashCode() % numberOfReducer
>>>> Finally, the intermediate key,value pairs are sent to the reducers
>>>>
>>>
>>> Is this algorithm set in stone, in that some M/R tasks rely on it? In our
>>> impl, the user could use grouping to direct a set of intermediate keys to
>>> the same node for reducing, but otherwise the reducing node is more or
>> less
>>> random.
>>>
>> The default partitioner does exactly that check the actual code for
>> hadoop 1.2.1 here
>> http://goo.gl/he9yHO
>>
> 
> So API documentation doesn't specify it, but users still rely on this
> particular behaviour?
> 
> BTW, is there always one reducer one each node, or can there be multiple
> reducers on each node? If it's the latter, it should be relatively easy to
> model this in Infinispan using grouping. If it's the former, I'm not so
> sure...
> 
Actually, the configuration of the MapReduce job (MapReduce task in
infinispan) defines the number of reducers and is programmatically
configurable. The short answer to your answer is the latter multiple
Reduce tasks are assigned to nodes almost equally.
> 
>>>
>>>> REDUCE
>>>>
>>>> 1) Reducer sorts all key,value pairs by key and then groups the values
>>>> with the same key. As a result reducers receive their keys sorted.
>>>>
>>>
>>> I guess this sorting is only relevant if the reduce phase happens on a
>>> single thread, on a single node? If the reduce happens in parallel, the
>>> ordering is going to be lost anyway.
>> Each reduce task is run on a single thread, but you can run more than
>> one reduce tasks on a given node. The key ordering will not be lost. The
>> values are not ordered in any way. Moreover, the call to the reducer is
>> reduce(Key key, Iterable<Value> values) I cannot think of a way that the
>> order is lost.
>>>
>>
> 
> Right, the call to the reducer is with a single key, but I'm assuming the
> order of the calls matters (e.g. because the reduces keeps some internal
> state across reduce() calls), otherwise there's no point in sorting the
> keys. Calling the same reducer from multiple threads (like we do) would
> definitely mess up the order of the calls.
> 
> ATM we only have one reducer per node, which can be called from multiple
> threads, but it shouldn't be too hard to allow multiple reducers per node
> and to run each of them in a single thread.
> 
I belive the sorting is done in order to group the values with same key
since there are large data stored on files the easiest way to group is
to sort and then group values with the same keys.
> 
>>>
>>>> 2) for each Key,List<Value> the reduce function of the reducer is
>>>> called. Reducer can also emit arbitrary number of key,value pairs
>>>>
>>>
>>> We limit the reducer (and the combiner) to emit a single value, which is
>>> paired with the input key. We may need to lift this restriction, if only
>> to
>>> make porting/adapting tasks easier.
>>>
>>>
>>>>
>>>> Additionally, hadoop lets you customize almost every aspect of the code
>>>> run from how the input is split and read as key value pairs to how it is
>>>> partitioned and sorted.
>>>>
>>>
>>> Does that mean you can sort the values as well? I was thinking of each
>>> reduce() call as independent, and then only the order of values for one
>>> intermediate key would be relevant. I guess some tasks may require
>> keeping
>>> state across all the reduce() calls and then the order of key matters,
>> but
>>> then the reduce phase can't be parallelized, either across the cluster or
>>> on a single node.
>>
>> I was not very clear here. You can set the partitioner for a specific
>> job. You may also set the key comparator, as a result change the way
>> that intermediate keys are sorted. Additionally, one can change how keys
>> are grouped into one reduce call by setting the GroupComparator class. A
>> simple example would be to have sales(date,amount) and you want to
>> create  totals for each month of the year.
>> so for the key: (year,month) and value: amount.
>> by overriding the keyClass hashCode function you can send all the
>> intermediate pairs with the same year to the same reducer
>>
>> and then you can set the groupComparator to group together all the
>> values with the same year.
>>
> 
> You mean set the groupComparator to group together all the values with the
> same month? I don't think so, because the key is already (year, month). But
> if you wanted to collect the totals for each year you could just use the
> year as the intermediary key. So I don't quite understand how your example
> is supposed to work.
Well you can do that as well, but I meant to group all the months of the
same year in one reduce call. The idea is that you want to receive in
one reduce the values for one year and the values for that year to be
sorted by month.
> 
> Besides, each reduce() call receives just one key, if you have keys (2013,
> 1) and (2013, 2) and the groupComparator decides they should map to the
> same group, which key does the reducer see? I think a regular equals()
> should be good enough for us here, since we already need equals() in order
> to put the intermediary keys in the intermediary cache.
> 

I am not be very good with examples you can check this
https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-8/sorting.
It is more or less the same problem with different setting.
> Cheers
> Dan
> 
> 
Cheers
Evangelos
> 
>>
>> Cheers,
>> Evangelos
>>
>>
>>>
>>>> A simple example is group by and computing an average over the grouped
>>>> values. Let the dataset be webpages (url,domain,sentiment) and we want
>>>> to compute the average sentiment for each domain in the dataset then the
>>>> mapper for each webpages wp. will run
>>>> map(wp.url,wp):
>>>>   emit(wp.domain,wp.sentiment)
>>>>
>>>> and in reducer:
>>>> reduce(domain,Iterable<Double> values):
>>>>   counter = 0
>>>>   sum = 0
>>>>   while(values.hasNext())
>>>>     counter++;
>>>>     sum += values.next()
>>>>   emit(domain,sum/counter)
>>>>
>>>> I know that this approach is not optimized. But, I wanted give a simple
>>>> example.
>>>>
>>>
>>> I think it can also be optimized to use a combiner, if we emit a (domain,
>>> counter, sum) tuple :)
>>
>>>
>>>
>>>
>>>> Dan, only the the values for one intermediate key must be in memory? or
>>>> all the intermediate key,value pairs  that are assigned to one reducer
>>>> must be in memory?
>>>>
>>>
>>> With the default configuration, all the key/value pairs assigned to one
>>> reducer must be in memory. But one can define the __tmpMapReduce cache in
>>> the configuration and configure eviction with a cache store (note that
>>> because of how our eviction works, the actual container size is at least
>>> concurrencyLevel rounded up to the next power of 2). The problem is that
>>> there is only one configuration for all the M/R tasks [1].
>>>
>>> Note that because we only run the combiner after the mapping phase is
>>> complete, we do need to keep in memory all the results of the mapping
>> phase
>>> from that node (those are not stored in a cache). I've created an issue
>> in
>>> JIRA for this [2].
>>>
>>> Cheers
>>> Dan
>>>
>>> [1] https://issues.jboss.org/browse/ISPN-4021
>>> [2] https://issues.jboss.org/browse/ISPN-4022
>>>
>>>
>>>
>>>> Cheers,
>>>> Evangelos
>>>>
>>>> On 02/18/2014 11:59 AM, Dan Berindei wrote:
>>>>> Radim, this is how our M/R algorithm works (Hadoop may do it
>>>> differently):
>>>>>
>>>>> * The mapping phase generates a Map<IntKey, Collection<IntValue>> on
>> each
>>>>> node (Int meaning intermediate).
>>>>> * In the combine (local reduce) phase, a combine operation takes as
>> input
>>>>> an IntKey and a Collection<IntValue> with only the values that were
>>>>> produced on that node.
>>>>> * In the (global) reduce phase, all the intermediate values for each
>> key
>>>>> are merged, and a reduce operation takes an intermediate key and a
>>>> sequence
>>>>> of *all* the intermediate values generated for that key. These reduce
>>>>> operations are completely independent, so each intermediate key can be
>>>>> mapped to a different node (distributed reduce), while still having
>>>> access
>>>>> to all the intermediate values at once.
>>>>> * In the end, the collator takes the Map<IntKey, IntValue> from the
>>>> reduce
>>>>> phase and produces a single value.
>>>>>
>>>>> If a combiner can be used, then I believe it can also be run in
>> parallel
>>>>> with a LinkedBlockingQueue between the mapper and the combiner. But
>>>>> sometimes the reduce algorithm can only be run on the entire collection
>>>> of
>>>>> values (e.g if you want to find the median, or a percentile).
>>>>>
>>>>> The limitation we have now is that in the reduce phase, the entire list
>>>> of
>>>>> values for one intermediate key must be in memory at once. I think
>> Hadoop
>>>>> only loads a block of intermediate values in memory at once, and can
>> even
>>>>> sort the intermediate values (with a user-supplied comparison function)
>>>> so
>>>>> that the reduce function can work on a sorted list without loading the
>>>>> values in memory itself.
>>>>>
>>>>> Cheers
>>>>> Dan
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
> 
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 


From vblagoje at redhat.com  Tue Feb 18 10:36:35 2014
From: vblagoje at redhat.com (Vladimir Blagojevic)
Date: Tue, 18 Feb 2014 10:36:35 -0500
Subject: [infinispan-dev] MapReduce limitations and suggestions.
In-Reply-To: <CA+nfvwTKZTb+KnEkt3YpgFBdchyo_S+QrkxqO+acX9n1SdkbGg@mail.gmail.com>
References: <52FE31FF.5050507@gmail.com>
	<52FE3335.1070806@redhat.com>	<52FE3C33.3070107@redhat.com>
	<5301BECC.7010901@redhat.com>	<BD90C3B8-C267-4128-B662-392D846CDFF0@unine.ch>	<530320F9.300@redhat.com>
	<CA+nfvwTKZTb+KnEkt3YpgFBdchyo_S+QrkxqO+acX9n1SdkbGg@mail.gmail.com>
Message-ID: <53037E03.9020502@redhat.com>

On 2/18/2014, 4:59 AM, Dan Berindei wrote:
>
> The limitation we have now is that in the reduce phase, the entire 
> list of values for one intermediate key must be in memory at once. I 
> think Hadoop only loads a block of intermediate values in memory at 
> once, and can even sort the intermediate values (with a user-supplied 
> comparison function) so that the reduce function can work on a sorted 
> list without loading the values in memory itself.
>
>
Dan and others,

This is where Sanne's idea comes into play. Why collect entire list of 
intermediate values for each intermediate key and then invoke reduce on 
those values when we can invoke reduce each time new intermediate value 
gets inserted?

https://issues.jboss.org/browse/ISPN-3999

Cheers,
Vladimir

From vagvaz at gmail.com  Tue Feb 18 10:46:05 2014
From: vagvaz at gmail.com (Evangelos Vazaios)
Date: Tue, 18 Feb 2014 17:46:05 +0200
Subject: [infinispan-dev] MapReduce limitations and suggestions.
In-Reply-To: <53037E03.9020502@redhat.com>
References: <52FE31FF.5050507@gmail.com>	<52FE3335.1070806@redhat.com>	<52FE3C33.3070107@redhat.com>	<5301BECC.7010901@redhat.com>	<BD90C3B8-C267-4128-B662-392D846CDFF0@unine.ch>	<530320F9.300@redhat.com>	<CA+nfvwTKZTb+KnEkt3YpgFBdchyo_S+QrkxqO+acX9n1SdkbGg@mail.gmail.com>
	<53037E03.9020502@redhat.com>
Message-ID: <5303803D.9080204@gmail.com>

On 02/18/2014 05:36 PM, Vladimir Blagojevic wrote:
> On 2/18/2014, 4:59 AM, Dan Berindei wrote:
>>
>> The limitation we have now is that in the reduce phase, the entire 
>> list of values for one intermediate key must be in memory at once. I 
>> think Hadoop only loads a block of intermediate values in memory at 
>> once, and can even sort the intermediate values (with a user-supplied 
>> comparison function) so that the reduce function can work on a sorted 
>> list without loading the values in memory itself.
>>
>>
> Dan and others,
> 
> This is where Sanne's idea comes into play. Why collect entire list of 
> intermediate values for each intermediate key and then invoke reduce on 
> those values when we can invoke reduce each time new intermediate value 
> gets inserted?
> 
Because you cant. What you are saying is more like combining than
reducing. If there is a combiner in the MapReduceTask you can execute
the combiner on a subset (in your case 2)  values with the same key and
output one. But, this is not possible always.
> https://issues.jboss.org/browse/ISPN-3999
> 
> Cheers,
> Vladimir
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 


From dan.berindei at gmail.com  Tue Feb 18 13:41:52 2014
From: dan.berindei at gmail.com (Dan Berindei)
Date: Tue, 18 Feb 2014 20:41:52 +0200
Subject: [infinispan-dev] MapReduce limitations and suggestions.
In-Reply-To: <5303803D.9080204@gmail.com>
References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com>
	<52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com>
	<BD90C3B8-C267-4128-B662-392D846CDFF0@unine.ch>
	<530320F9.300@redhat.com>
	<CA+nfvwTKZTb+KnEkt3YpgFBdchyo_S+QrkxqO+acX9n1SdkbGg@mail.gmail.com>
	<53037E03.9020502@redhat.com> <5303803D.9080204@gmail.com>
Message-ID: <CA+nfvwRBtZUn-oK3wGtym=QBSGvdahTA7MeMU=9XFTeW2eZoDQ@mail.gmail.com>

On Tue, Feb 18, 2014 at 5:46 PM, Evangelos Vazaios <vagvaz at gmail.com> wrote:

> On 02/18/2014 05:36 PM, Vladimir Blagojevic wrote:
> > On 2/18/2014, 4:59 AM, Dan Berindei wrote:
> >>
> >> The limitation we have now is that in the reduce phase, the entire
> >> list of values for one intermediate key must be in memory at once. I
> >> think Hadoop only loads a block of intermediate values in memory at
> >> once, and can even sort the intermediate values (with a user-supplied
> >> comparison function) so that the reduce function can work on a sorted
> >> list without loading the values in memory itself.
> >>
> >>
> > Dan and others,
> >
> > This is where Sanne's idea comes into play. Why collect entire list of
> > intermediate values for each intermediate key and then invoke reduce on
> > those values when we can invoke reduce each time new intermediate value
> > gets inserted?
> >
> Because you cant. What you are saying is more like combining than
> reducing. If there is a combiner in the MapReduceTask you can execute
> the combiner on a subset (in your case 2)  values with the same key and
> output one. But, this is not possible always.
>

In theory we could stream each intermediate value independently to the
combiner and then to the node of the reducer, and the reducer could start
up immediately on the reducer node instead of waiting for the mapping phase
to finish on all the mapping nodes (blocking when it doesn't have any more
values to process). But I imagine that would be kind of tricky to implement.


> > https://issues.jboss.org/browse/ISPN-3999
> >
> > Cheers,
> > Vladimir
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140218/b6a52c75/attachment-0001.html 

From galder at redhat.com  Wed Feb 19 01:57:14 2014
From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=)
Date: Wed, 19 Feb 2014 07:57:14 +0100
Subject: [infinispan-dev] Ditching ASYNC modes for
	REPL/DIST/INV/CacheStores?
In-Reply-To: <52EB5197.4050801@redhat.com>
References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com>
	<52EB5197.4050801@redhat.com>
Message-ID: <EFBC0CC6-7373-4411-9165-4A53B4652144@redhat.com>


On 31 Jan 2014, at 08:32, Dennis Reed <dereed at redhat.com> wrote:

> It would be a loss of functionality.
> 
> As a common example, the AS web session replication cache is configured 
> for ASYNC by default, for performance reasons.
> But it can be changed to SYNC to guarantee that when the request 
> finishes that the session was replicated.
> 
> That wouldn't be possible if you could no longer switch between 
> ASYNC/SYNC with just a configuration change.

I disagree :). 

AS could abstract that configuration detail. IOW, if all Infinispan returned was Futures, AS or any other client application, has the choice in their hands: do they wait for the future to complete or not? If they do, they?re SYNC, if not ASYNC. AS can still expose this and no functionality is lost.

What happens is that SYNC/ASYNC decision stops being a configuration option (bad, bad, bad) and becomes an actual programming decision Infinispan clients must address (good, good, good).

Chers,

> 
> -Dennis
> 
> On 01/31/2014 01:08 AM, Galder Zamarre?o wrote:
>> Hi all,
>> 
>> The following came to my mind yesterday: I think we should ditch ASYNC modes for DIST/REPL/INV and our async cache store functionality.
>> 
>> Instead, whoever wants to store something asyncronously should use asynchronous methods, i.e. call putAsync. So, this would mean that when you call put(), it's always sync. This would reduce the complexity and configuration of our code base, without affecting our functionality, and it would make things more logical IMO.
>> 
>> WDYT?
>> 
>> Cheers,
>> --
>> Galder Zamarre?o
>> galder at redhat.com
>> twitter.com/galderz
>> 
>> Project Lead, Escalante
>> http://escalante.io
>> 
>> Engineer, Infinispan
>> http://infinispan.org
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From galder at redhat.com  Wed Feb 19 02:05:02 2014
From: galder at redhat.com (=?iso-8859-1?Q?Galder_Zamarre=F1o?=)
Date: Wed, 19 Feb 2014 08:05:02 +0100
Subject: [infinispan-dev] Ditching ASYNC modes for
	REPL/DIST/INV/CacheStores?
In-Reply-To: <CAFm4XO11Wof2LcA_NbeDSVp_tL3kguc1sMb5oARsNsmvVRRscA@mail.gmail.com>
References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com>
	<52EB5197.4050801@redhat.com> <52EB7F6C.505@redhat.com>
	<CAFm4XO11Wof2LcA_NbeDSVp_tL3kguc1sMb5oARsNsmvVRRscA@mail.gmail.com>
Message-ID: <34AB5973-4745-4720-8D43-6AF8D32121B1@redhat.com>


On 31 Jan 2014, at 11:59, Sanne Grinovero <sanne at infinispan.org> wrote:

> Generally I like the systems designed with SYNC_DIST + async shared cachestore.
> 
> It's probably the best setup we can offer:
> - you need a shared cachestore for persistence consistency
> - using SYNC distribution to other replicas provides a fairly decent resilience
> - if your cachestore needs to be updated in sync, your write
> performance will be limited by the cachestore performance: this
> prevents you to use Infinispan to buffer, absorbing write spikes, and
> reducing write latency

Ok, this a limitation of my approach. 

For such scenarios, you could maybe leave the async store option around, with a note on when the future completes based on this option. 

> But I agree we should investigate on removing duplicate
> "asynchronizations" where they are not needed, there might be some
> opportunities to remove thread switching and blocking.
> 
> 
> On 31 January 2014 10:48, Tristan Tarrant <ttarrant at redhat.com> wrote:
>> Couldn't this be handled higher up in our implementatoin then ?
>> 
>> If I enable an async mode, all puts / gets become putAsync/getAsync
>> transparently to both the application and to the state transfer.
>> 
>> Tristan
>> 
>> On 01/31/2014 08:32 AM, Dennis Reed wrote:
>>> It would be a loss of functionality.
>>> 
>>> As a common example, the AS web session replication cache is configured
>>> for ASYNC by default, for performance reasons.
>>> But it can be changed to SYNC to guarantee that when the request
>>> finishes that the session was replicated.
>>> 
>>> That wouldn't be possible if you could no longer switch between
>>> ASYNC/SYNC with just a configuration change.
>>> 
>>> -Dennis
>>> 
>>> On 01/31/2014 01:08 AM, Galder Zamarre?o wrote:
>>>> Hi all,
>>>> 
>>>> The following came to my mind yesterday: I think we should ditch ASYNC modes for DIST/REPL/INV and our async cache store functionality.
>>>> 
>>>> Instead, whoever wants to store something asyncronously should use asynchronous methods, i.e. call putAsync. So, this would mean that when you call put(), it's always sync. This would reduce the complexity and configuration of our code base, without affecting our functionality, and it would make things more logical IMO.
>>>> 
>>>> WDYT?
>>>> 
>>>> Cheers,
>>>> --
>>>> Galder Zamarre?o
>>>> galder at redhat.com
>>>> twitter.com/galderz
>>>> 
>>>> Project Lead, Escalante
>>>> http://escalante.io
>>>> 
>>>> Engineer, Infinispan
>>>> http://infinispan.org
>>>> 
>>>> 
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From galder at redhat.com  Wed Feb 19 02:12:08 2014
From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=)
Date: Wed, 19 Feb 2014 08:12:08 +0100
Subject: [infinispan-dev] Ditching ASYNC modes for
	REPL/DIST/INV/CacheStores?
In-Reply-To: <CA+nfvwSsJAPQdrq7sEzHHHjEJZqh_Mg1bmuaMEmzySdrBVSCfQ@mail.gmail.com>
References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com>
	<52EB5197.4050801@redhat.com> <52EB7F6C.505@redhat.com>
	<52EB9889.9070800@redhat.com>
	<CA+YCuUNG1Pf0tD2ZtRJJTmsvKY4Gtyo0QkSCCaamCzWpHwbxzg@mail.gmail.com>
	<52EFA355.2070203@redhat.com>
	<CAFm4XO3ZwDY5Fxp5sVajDU4bju287-4P_JiDmAm-z+aQo2JKrw@mail.gmail.com>
	<CA+YCuUPge2spEAH=6jBLbbYyCp_DA0MkOu9ym5MEfL5pZvUXWA@mail.gmail.com>
	<52EFC3AF.5060201@redhat.com>
	<CA+nfvwSsJAPQdrq7sEzHHHjEJZqh_Mg1bmuaMEmzySdrBVSCfQ@mail.gmail.com>
Message-ID: <BD36524B-BE39-4782-A1CC-145C5ED4CFB3@redhat.com>


On 03 Feb 2014, at 19:01, Dan Berindei <dan.berindei at gmail.com> wrote:

> 
> 
> 
> On Mon, Feb 3, 2014 at 6:28 PM, Radim Vansa <rvansa at redhat.com> wrote:
> >>>> For sync we would want to invoke directly to avoid context switching.
> >>> I think you haven't properly understood what I was talking about: the
> >>> putAsync should not switch context at all in the ideal design. It should
> >>> traverse through the interceptors all the way down (logically, in
> >>> current behaviour), invoke JGroups async API and jump out. Then, as soon
> >>> as the response is received, the thread which delivered it should
> >>> traverse the interceptor stack up (again, logically), and fire the future.
> > A Future doesn't make much sense with an async transport.  The problem
> > is with an async transport you never get back a response so you never
> > know when the actual command is completed and thus a Future is
> > worthless.  The caller wouldn't know if they could rely on the use of
> > the Future or not.
> 
> You're right, there's one important difference between putAsync and put
> with async transport: in the first case you can find out when the
> request is completed while you cannot with the latter. Not requiring the
> ack can be an important optimization. I think that both versions are
> very valid: first mostly for bulk operations = reduction of latency,
> second for modifications that are acceptable to fail without handling that.
> I had the first case in my mind when talking about async operations, and
> there the futures are necessary.
> 
> A couple more differences:
> 1. You can't do commitAsync(), but you can configure the commit to be replicated asynchronously (1PC). Although we did talk about removing that option...
> 2. If you do putAsync(k, v1); putAsync(k, v2), there is no ordering between the two and you might end up with k=v1 in the cache.

If there?s any relationship between both puts for the caller thread, the caller must make sure that the second put is only called after the first has completed. 

If there?s separate threads calling it and it relies on this, it should call replace the second time, i.e. replaceAsync(k, v1, v2) to get the guarantees it wants.

What is really important is that the order in which they are executed in one node/replica is the same order in which they?re executed in all other nodes. This was something that was not maintained when async marshalling was enabled.

> 
> 
> >
> > Also it depends what you are trying to do with async.  Currently async
> > transport is only for sending messages to another node, we never think
> > of when we are the owning node.  In this case the calling thread would
> > have to go down the interceptor stack and acquire any locks if it is
> > the owner, thus causing this "async" to block if you have any
> > contention on the given key.  The use of another thread would allow
> > the calling thread to be able to return immediately no matter what
> > else is occurring.  Also I don't see what is so wrong about having a
> > context switch to run something asynchronously, we shouldn't have a
> > context switch to block the user thread imo, which is very possible
> > with locking.
> 
> This is an important notice! Locking would complicate the design a lot,
> because the thread in "async" mode should do only tryLocks - if this
> fails, further processing should be dispatched to another thread. Not
> sure if this could be implemented at all, because the thread may be
> blocked inside JGroups as well (async API is about receiving the
> response asynchronously, not about sending the message asynchronously).
> 
> I don't say that the context switch is that bad. My concern is that you
> have a very limited amount of requests that can be processed in
> parallel. I consider a "request" something pretty lightweight in concept
> - but one thread per request makes this rather heavyweight stuff.
> 
> We did talk in Farnborough/Palma about removing the current LockManager with a queue-based structure like the one used for ordering total-order transactions. And about removing the implicit stack in the current interceptor stack with an explicit stack, to allow resuming a command mid-execution. But the feeling I got was that neither is going to make it into 7.0.
>  
> 
> >
> >> +1 much cleaner, I love it. Actually wasn't aware the current code
> >> didn't do this :-(
> > This is what the current async transport does, but it does nothing with Futures.
> 
> Nevermind the futures, this is not the important part. It's not about
> async transport neither, it's about async executors.
> (okay, the thread was about dropping async transport, I have hijacked it)
> 
> Radim
> 
> --
> Radim Vansa <rvansa at redhat.com>
> JBoss DataGrid QA
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From lthon at redhat.com  Wed Feb 19 04:20:09 2014
From: lthon at redhat.com (Ladislav Thon)
Date: Wed, 19 Feb 2014 10:20:09 +0100
Subject: [infinispan-dev] MapReduce limitations and suggestions.
In-Reply-To: <53037E03.9020502@redhat.com>
References: <52FE31FF.5050507@gmail.com>	<52FE3335.1070806@redhat.com>	<52FE3C33.3070107@redhat.com>	<5301BECC.7010901@redhat.com>	<BD90C3B8-C267-4128-B662-392D846CDFF0@unine.ch>	<530320F9.300@redhat.com>	<CA+nfvwTKZTb+KnEkt3YpgFBdchyo_S+QrkxqO+acX9n1SdkbGg@mail.gmail.com>
	<53037E03.9020502@redhat.com>
Message-ID: <53047749.5090602@redhat.com>

On 18.2.2014 16:36, Vladimir Blagojevic wrote:
> On 2/18/2014, 4:59 AM, Dan Berindei wrote:
>>
>> The limitation we have now is that in the reduce phase, the entire 
>> list of values for one intermediate key must be in memory at once. I 
>> think Hadoop only loads a block of intermediate values in memory at 
>> once, and can even sort the intermediate values (with a user-supplied 
>> comparison function) so that the reduce function can work on a sorted 
>> list without loading the values in memory itself.
>>
>>
> Dan and others,
> 
> This is where Sanne's idea comes into play. Why collect entire list of 
> intermediate values for each intermediate key and then invoke reduce on 
> those values when we can invoke reduce each time new intermediate value 
> gets inserted?

I don't know about MR in Infinispan, but MR in CouchDB is doing a very
similar thing to what you describe. In order to actually get a final
result, they have to do an entire tree of reductions, and the reduce
function has to distinguish between a "first-level" reduce (on bare
values) and rereduce (on intermediate results from previous reductions).
They are _not_ always the same, and it's fairly confusing.

LT

> 
> https://issues.jboss.org/browse/ISPN-3999
> 
> Cheers,
> Vladimir
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 


From sanne at infinispan.org  Wed Feb 19 06:03:35 2014
From: sanne at infinispan.org (Sanne Grinovero)
Date: Wed, 19 Feb 2014 11:03:35 +0000
Subject: [infinispan-dev] Ditching ASYNC modes for
	REPL/DIST/INV/CacheStores?
In-Reply-To: <BD36524B-BE39-4782-A1CC-145C5ED4CFB3@redhat.com>
References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com>
	<52EB5197.4050801@redhat.com> <52EB7F6C.505@redhat.com>
	<52EB9889.9070800@redhat.com>
	<CA+YCuUNG1Pf0tD2ZtRJJTmsvKY4Gtyo0QkSCCaamCzWpHwbxzg@mail.gmail.com>
	<52EFA355.2070203@redhat.com>
	<CAFm4XO3ZwDY5Fxp5sVajDU4bju287-4P_JiDmAm-z+aQo2JKrw@mail.gmail.com>
	<CA+YCuUPge2spEAH=6jBLbbYyCp_DA0MkOu9ym5MEfL5pZvUXWA@mail.gmail.com>
	<52EFC3AF.5060201@redhat.com>
	<CA+nfvwSsJAPQdrq7sEzHHHjEJZqh_Mg1bmuaMEmzySdrBVSCfQ@mail.gmail.com>
	<BD36524B-BE39-4782-A1CC-145C5ED4CFB3@redhat.com>
Message-ID: <CAFm4XO1Ckr0J_891fxNmVS2CwnkNcqeov5dubFpRKd0==em7sQ@mail.gmail.com>

On 19 February 2014 07:12, Galder Zamarre?o <galder at redhat.com> wrote:
>
> On 03 Feb 2014, at 19:01, Dan Berindei <dan.berindei at gmail.com> wrote:
>
>>
>>
>>
>> On Mon, Feb 3, 2014 at 6:28 PM, Radim Vansa <rvansa at redhat.com> wrote:
>> >>>> For sync we would want to invoke directly to avoid context switching.
>> >>> I think you haven't properly understood what I was talking about: the
>> >>> putAsync should not switch context at all in the ideal design. It should
>> >>> traverse through the interceptors all the way down (logically, in
>> >>> current behaviour), invoke JGroups async API and jump out. Then, as soon
>> >>> as the response is received, the thread which delivered it should
>> >>> traverse the interceptor stack up (again, logically), and fire the future.
>> > A Future doesn't make much sense with an async transport.  The problem
>> > is with an async transport you never get back a response so you never
>> > know when the actual command is completed and thus a Future is
>> > worthless.  The caller wouldn't know if they could rely on the use of
>> > the Future or not.
>>
>> You're right, there's one important difference between putAsync and put
>> with async transport: in the first case you can find out when the
>> request is completed while you cannot with the latter. Not requiring the
>> ack can be an important optimization. I think that both versions are
>> very valid: first mostly for bulk operations = reduction of latency,
>> second for modifications that are acceptable to fail without handling that.
>> I had the first case in my mind when talking about async operations, and
>> there the futures are necessary.
>>
>> A couple more differences:
>> 1. You can't do commitAsync(), but you can configure the commit to be replicated asynchronously (1PC). Although we did talk about removing that option...
>> 2. If you do putAsync(k, v1); putAsync(k, v2), there is no ordering between the two and you might end up with k=v1 in the cache.
>
> If there?s any relationship between both puts for the caller thread, the caller must make sure that the second put is only called after the first has completed.

Actually in such a case I would strongly expect Infinispan to keep the
two operations in order. This is not to be pushed on user's
responsibility.

>
> If there?s separate threads calling it and it relies on this, it should call replace the second time, i.e. replaceAsync(k, v1, v2) to get the guarantees it wants.
>
> What is really important is that the order in which they are executed in one node/replica is the same order in which they?re executed in all other nodes. This was something that was not maintained when async marshalling was enabled.

+1000

But also I'd stress that any sync operation should have a Future
returned, someone in this long thread suggested to have an option to
drop it for example to speedup bulk imports, but I really can't see a
scenario in which I wouldn't want to know about a failure. Let's not
do the same mistake that made MongoDB so "popular" ;-)
Bulk imports can still be mad efficient without strictly needing to go
these lenghts.

Sanne


>
>>
>>
>> >
>> > Also it depends what you are trying to do with async.  Currently async
>> > transport is only for sending messages to another node, we never think
>> > of when we are the owning node.  In this case the calling thread would
>> > have to go down the interceptor stack and acquire any locks if it is
>> > the owner, thus causing this "async" to block if you have any
>> > contention on the given key.  The use of another thread would allow
>> > the calling thread to be able to return immediately no matter what
>> > else is occurring.  Also I don't see what is so wrong about having a
>> > context switch to run something asynchronously, we shouldn't have a
>> > context switch to block the user thread imo, which is very possible
>> > with locking.
>>
>> This is an important notice! Locking would complicate the design a lot,
>> because the thread in "async" mode should do only tryLocks - if this
>> fails, further processing should be dispatched to another thread. Not
>> sure if this could be implemented at all, because the thread may be
>> blocked inside JGroups as well (async API is about receiving the
>> response asynchronously, not about sending the message asynchronously).
>>
>> I don't say that the context switch is that bad. My concern is that you
>> have a very limited amount of requests that can be processed in
>> parallel. I consider a "request" something pretty lightweight in concept
>> - but one thread per request makes this rather heavyweight stuff.
>>
>> We did talk in Farnborough/Palma about removing the current LockManager with a queue-based structure like the one used for ordering total-order transactions. And about removing the implicit stack in the current interceptor stack with an explicit stack, to allow resuming a command mid-execution. But the feeling I got was that neither is going to make it into 7.0.
>>
>>
>> >
>> >> +1 much cleaner, I love it. Actually wasn't aware the current code
>> >> didn't do this :-(
>> > This is what the current async transport does, but it does nothing with Futures.
>>
>> Nevermind the futures, this is not the important part. It's not about
>> async transport neither, it's about async executors.
>> (okay, the thread was about dropping async transport, I have hijacked it)
>>
>> Radim
>>
>> --
>> Radim Vansa <rvansa at redhat.com>
>> JBoss DataGrid QA
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> --
> Galder Zamarre?o
> galder at redhat.com
> twitter.com/galderz
>
> Project Lead, Escalante
> http://escalante.io
>
> Engineer, Infinispan
> http://infinispan.org
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


From dan.berindei at gmail.com  Wed Feb 19 08:22:23 2014
From: dan.berindei at gmail.com (Dan Berindei)
Date: Wed, 19 Feb 2014 15:22:23 +0200
Subject: [infinispan-dev] MapReduce limitations and suggestions.
In-Reply-To: <53037D40.6060101@gmail.com>
References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com>
	<52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com>
	<BD90C3B8-C267-4128-B662-392D846CDFF0@unine.ch>
	<530320F9.300@redhat.com>
	<CA+nfvwTKZTb+KnEkt3YpgFBdchyo_S+QrkxqO+acX9n1SdkbGg@mail.gmail.com>
	<5303342A.1050800@gmail.com>
	<CA+nfvwSte75a78w2JYSy6sQ352H46n-hKSVU-T9tKTJWT5bR5w@mail.gmail.com>
	<53034F5E.6060706@gmail.com>
	<CA+nfvwT7hiFMM+GbcxGNBOM6NSH5+xUSeKke6xqhk0wfEoOHDw@mail.gmail.com>
	<53037D40.6060101@gmail.com>
Message-ID: <CA+nfvwRT3qJ8-Oy7s5aE=Dvj4bLoxJisyoaB0BSiDMTMcmQrpA@mail.gmail.com>

On Tue, Feb 18, 2014 at 5:33 PM, Evangelos Vazaios <vagvaz at gmail.com> wrote:

>
> On 02/18/2014 04:39 PM, Dan Berindei wrote:
> > On Tue, Feb 18, 2014 at 2:17 PM, Evangelos Vazaios <vagvaz at gmail.com>
> wrote:
> >
> >> On 02/18/2014 01:40 PM, Dan Berindei wrote:
> >>> On Tue, Feb 18, 2014 at 12:21 PM, Evangelos Vazaios <vagvaz at gmail.com
> >>> wrote:
> >>>
> >>>> Hi Radim,
> >>>>
> >>>> Since Hadoop is the most popular implementation of MapReduce I will
> give
> >>>> a brief overview of how it works and then I'll provide with an example
> >>>> where the reducers must run over the whole list of values with the
> same
> >>>> key.
> >>>>
> >>>> Hadoop MR overview.
> >>>>
> >>>> MAP
> >>>>
> >>>> 1) Input file(s) are split into pieces of 64MB
> >>>> 2) For each split hadoop creates one map task and then assign the task
> >>>> to a cluster node
> >>>> 3) The splits are read as key,value pairs and the map function of
> Mapper
> >>>> is called. The mapper can output arbitrary number of intermediate
> >>>> key,value pairs
>

I forgot to ask about this... we already have the entries stored as
key,value pairs, so we expect the data to be already in the cache. That
means there is no ordering in the inputs, and the mapper can't rely on
sequential inputs to be related. Would you consider that to be a reasonable
expectation?


> >>>> 4) the output from the mapper is stored in a buffer in memory. After a
> >>>> certain threshold is reached the pairs are sorted by key and if there
> is
> >>>> a combiner it is run on the pairs that have the same key. Then, the
> >>>> output is flushed on the HDFS.
> >>>>
> >>>
> >>> Ok, so Hadoop runs the combiner more or less concurrently with the
> >> mappers.
> >>>
> >>> I'm curious if there are any M/R tasks that benefit from the sorting
> the
> >>> keys here, we just put the intermediate values in a Map<IK, Coll<IV>>.
> We
> >>> could do about the same by passing this map (or rather each entry in
> the
> >>> map) to the combiner when it reaches a certain threshold, but I'm not
> >>> convinced about the need to sort it.
> >>>
> >> Well there are algorithms that make use of it. Implementing a graph
> >> algorithm can take use of it.Where the graph is split into k partitions
> >> and each partition is assigned to one Mapper and Reducer. Mappers
> >> compute the outgoing messages and output them to reducers. Then,
> >> reducers can read the partition file sequentially to update the
> >> vertices. This is just one use case that came to my mind.
> >>
> >
> > I thought the partitioning only happens during the shuffle phase, and
> > mappers/combiners don't know about partitions at all?
> > I understand that reducers may need the intermediary keys to be sorted,
> I'm
> > asking about the combiners, since even if the keys from one block are
> > sorted, the complete list of keys they receive is not sorted (unless a
> new
> > combiner is created for each input block).
> You are absolutely right partitioning happens during the shuffle phase
> and mappers/combiners do not know about partitions. Did I say something
> different?
> >
>

My initial question was whether there is a real need to sort the keys
before calling the combiner. So when you presented the example with the
graph being split in k partitions, I got a bit confused and I thought
combiners might know about partitions, too.


>  >>
> >>>
> >>>> SHUFFLE
> >>>>
> >>>> hadoop decides the Reducer that should process each key by running a
> >>>> partitioner. The default partitioner decides with the following way:
> >>>> reducer = intermidKey.hashCode() % numberOfReducer
> >>>> Finally, the intermediate key,value pairs are sent to the reducers
> >>>>
> >>>
> >>> Is this algorithm set in stone, in that some M/R tasks rely on it? In
> our
> >>> impl, the user could use grouping to direct a set of intermediate keys
> to
> >>> the same node for reducing, but otherwise the reducing node is more or
> >> less
> >>> random.
> >>>
> >> The default partitioner does exactly that check the actual code for
> >> hadoop 1.2.1 here
> >> http://goo.gl/he9yHO
> >>
> >
> > So API documentation doesn't specify it, but users still rely on this
> > particular behaviour?
> >
> > BTW, is there always one reducer one each node, or can there be multiple
> > reducers on each node? If it's the latter, it should be relatively easy
> to
> > model this in Infinispan using grouping. If it's the former, I'm not so
> > sure...
> >
> Actually, the configuration of the MapReduce job (MapReduce task in
> infinispan) defines the number of reducers and is programmatically
> configurable. The short answer to your answer is the latter multiple
> Reduce tasks are assigned to nodes almost equally.
>

Ok, partitioning sounds like something we could do in Infinispan.
Partitioning seems like a pretty big deal in Hadoop M/R descriptions, so
implementing it should be quite useful.

>
> >>>
> >>>> REDUCE
> >>>>
> >>>> 1) Reducer sorts all key,value pairs by key and then groups the values
> >>>> with the same key. As a result reducers receive their keys sorted.
> >>>>
> >>>
> >>> I guess this sorting is only relevant if the reduce phase happens on a
> >>> single thread, on a single node? If the reduce happens in parallel, the
> >>> ordering is going to be lost anyway.
> >> Each reduce task is run on a single thread, but you can run more than
> >> one reduce tasks on a given node. The key ordering will not be lost. The
> >> values are not ordered in any way. Moreover, the call to the reducer is
> >> reduce(Key key, Iterable<Value> values) I cannot think of a way that the
> >> order is lost.
> >>>
> >>
> >
> > Right, the call to the reducer is with a single key, but I'm assuming the
> > order of the calls matters (e.g. because the reduces keeps some internal
> > state across reduce() calls), otherwise there's no point in sorting the
> > keys. Calling the same reducer from multiple threads (like we do) would
> > definitely mess up the order of the calls.
> >
> > ATM we only have one reducer per node, which can be called from multiple
> > threads, but it shouldn't be too hard to allow multiple reducers per node
> > and to run each of them in a single thread.
> >
> I belive the sorting is done in order to group the values with same key
> since there are large data stored on files the easiest way to group is
> to sort and then group values with the same keys.
>

Yeah, I realized that my idea of keeping state between reduce() calls is
kind of tricky to use, because you'd have to insert a sentinel value in
each partition, and make sure that after the sorting the sentinel value
will come last, in order to flush the final results to the output. I see
Hadoop does offer some stuff to keep global state, like counters, so
perhaps it's not even necessary.


> >
> >>>
> >>>> 2) for each Key,List<Value> the reduce function of the reducer is
> >>>> called. Reducer can also emit arbitrary number of key,value pairs
> >>>>
> >>>
> >>> We limit the reducer (and the combiner) to emit a single value, which
> is
> >>> paired with the input key. We may need to lift this restriction, if
> only
> >> to
> >>> make porting/adapting tasks easier.
> >>>
> >>>
> >>>>
> >>>> Additionally, hadoop lets you customize almost every aspect of the
> code
> >>>> run from how the input is split and read as key value pairs to how it
> is
> >>>> partitioned and sorted.
> >>>>
> >>>
> >>> Does that mean you can sort the values as well? I was thinking of each
> >>> reduce() call as independent, and then only the order of values for one
> >>> intermediate key would be relevant. I guess some tasks may require
> >> keeping
> >>> state across all the reduce() calls and then the order of key matters,
> >> but
> >>> then the reduce phase can't be parallelized, either across the cluster
> or
> >>> on a single node.
> >>
> >> I was not very clear here. You can set the partitioner for a specific
> >> job. You may also set the key comparator, as a result change the way
> >> that intermediate keys are sorted. Additionally, one can change how keys
> >> are grouped into one reduce call by setting the GroupComparator class. A
> >> simple example would be to have sales(date,amount) and you want to
> >> create  totals for each month of the year.
> >> so for the key: (year,month) and value: amount.
> >> by overriding the keyClass hashCode function you can send all the
> >> intermediate pairs with the same year to the same reducer
> >>
> >> and then you can set the groupComparator to group together all the
> >> values with the same year.
> >>
> >
> > You mean set the groupComparator to group together all the values with
> the
> > same month? I don't think so, because the key is already (year, month).
> But
> > if you wanted to collect the totals for each year you could just use the
> > year as the intermediary key. So I don't quite understand how your
> example
> > is supposed to work.
> Well you can do that as well, but I meant to group all the months of the
> same year in one reduce call. The idea is that you want to receive in
> one reduce the values for one year and the values for that year to be
> sorted by month.
>

Ok, I didn't get it because I was looking at the problem from the other way
around: if I'd want the values to be sorted, I'd include the month in the
value and configure sorting for the values. But with Hadoop's streaming
model it's probably easier to always sort by the keys.


> >
> > Besides, each reduce() call receives just one key, if you have keys
> (2013,
> > 1) and (2013, 2) and the groupComparator decides they should map to the
> > same group, which key does the reducer see? I think a regular equals()
> > should be good enough for us here, since we already need equals() in
> order
> > to put the intermediary keys in the intermediary cache.
> >
>
> I am not be very good with examples you can check this
>
> https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-8/sorting
> .
> It is more or less the same problem with different setting.
>

Sorry, I didn't get too much from that example either, I gave up after the
second "registering is fun" popup :)

One last question: with Hadoop I imagine it's quite easy to leave the
results of the M/R job on the distributed FS and start a new job to M/R
from that. Do you think it would be important to offer something similar in
Infinispan (i.e. put the result of the reducers in a cache instead of
returning it to the user)?


> Cheers
> > Dan
> >
> >
> Cheers
> Evangelos
> >
> >>
> >> Cheers,
> >> Evangelos
> >>
> >>
> >>>
> >>>> A simple example is group by and computing an average over the grouped
> >>>> values. Let the dataset be webpages (url,domain,sentiment) and we want
> >>>> to compute the average sentiment for each domain in the dataset then
> the
> >>>> mapper for each webpages wp. will run
> >>>> map(wp.url,wp):
> >>>>   emit(wp.domain,wp.sentiment)
> >>>>
> >>>> and in reducer:
> >>>> reduce(domain,Iterable<Double> values):
> >>>>   counter = 0
> >>>>   sum = 0
> >>>>   while(values.hasNext())
> >>>>     counter++;
> >>>>     sum += values.next()
> >>>>   emit(domain,sum/counter)
> >>>>
> >>>> I know that this approach is not optimized. But, I wanted give a
> simple
> >>>> example.
> >>>>
> >>>
> >>> I think it can also be optimized to use a combiner, if we emit a
> (domain,
> >>> counter, sum) tuple :)
> >>
> >>>
> >>>
> >>>
> >>>> Dan, only the the values for one intermediate key must be in memory?
> or
> >>>> all the intermediate key,value pairs  that are assigned to one reducer
> >>>> must be in memory?
> >>>>
> >>>
> >>> With the default configuration, all the key/value pairs assigned to one
> >>> reducer must be in memory. But one can define the __tmpMapReduce cache
> in
> >>> the configuration and configure eviction with a cache store (note that
> >>> because of how our eviction works, the actual container size is at
> least
> >>> concurrencyLevel rounded up to the next power of 2). The problem is
> that
> >>> there is only one configuration for all the M/R tasks [1].
> >>>
> >>> Note that because we only run the combiner after the mapping phase is
> >>> complete, we do need to keep in memory all the results of the mapping
> >> phase
> >>> from that node (those are not stored in a cache). I've created an issue
> >> in
> >>> JIRA for this [2].
> >>>
> >>> Cheers
> >>> Dan
> >>>
> >>> [1] https://issues.jboss.org/browse/ISPN-4021
> >>> [2] https://issues.jboss.org/browse/ISPN-4022
> >>>
> >>>
> >>>
> >>>> Cheers,
> >>>> Evangelos
> >>>>
> >>>> On 02/18/2014 11:59 AM, Dan Berindei wrote:
> >>>>> Radim, this is how our M/R algorithm works (Hadoop may do it
> >>>> differently):
> >>>>>
> >>>>> * The mapping phase generates a Map<IntKey, Collection<IntValue>> on
> >> each
> >>>>> node (Int meaning intermediate).
> >>>>> * In the combine (local reduce) phase, a combine operation takes as
> >> input
> >>>>> an IntKey and a Collection<IntValue> with only the values that were
> >>>>> produced on that node.
> >>>>> * In the (global) reduce phase, all the intermediate values for each
> >> key
> >>>>> are merged, and a reduce operation takes an intermediate key and a
> >>>> sequence
> >>>>> of *all* the intermediate values generated for that key. These reduce
> >>>>> operations are completely independent, so each intermediate key can
> be
> >>>>> mapped to a different node (distributed reduce), while still having
> >>>> access
> >>>>> to all the intermediate values at once.
> >>>>> * In the end, the collator takes the Map<IntKey, IntValue> from the
> >>>> reduce
> >>>>> phase and produces a single value.
> >>>>>
> >>>>> If a combiner can be used, then I believe it can also be run in
> >> parallel
> >>>>> with a LinkedBlockingQueue between the mapper and the combiner. But
> >>>>> sometimes the reduce algorithm can only be run on the entire
> collection
> >>>> of
> >>>>> values (e.g if you want to find the median, or a percentile).
> >>>>>
> >>>>> The limitation we have now is that in the reduce phase, the entire
> list
> >>>> of
> >>>>> values for one intermediate key must be in memory at once. I think
> >> Hadoop
> >>>>> only loads a block of intermediate values in memory at once, and can
> >> even
> >>>>> sort the intermediate values (with a user-supplied comparison
> function)
> >>>> so
> >>>>> that the reduce function can work on a sorted list without loading
> the
> >>>>> values in memory itself.
> >>>>>
> >>>>> Cheers
> >>>>> Dan
> >>>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> infinispan-dev mailing list
> >>> infinispan-dev at lists.jboss.org
> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>>
> >>
> >> _______________________________________________
> >> infinispan-dev mailing list
> >> infinispan-dev at lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>
> >
> >
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140219/6690d65d/attachment-0001.html 

From dan.berindei at gmail.com  Wed Feb 19 08:43:36 2014
From: dan.berindei at gmail.com (Dan Berindei)
Date: Wed, 19 Feb 2014 15:43:36 +0200
Subject: [infinispan-dev] Ditching ASYNC modes for
	REPL/DIST/INV/CacheStores?
In-Reply-To: <CAFm4XO1Ckr0J_891fxNmVS2CwnkNcqeov5dubFpRKd0==em7sQ@mail.gmail.com>
References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com>
	<52EB5197.4050801@redhat.com> <52EB7F6C.505@redhat.com>
	<52EB9889.9070800@redhat.com>
	<CA+YCuUNG1Pf0tD2ZtRJJTmsvKY4Gtyo0QkSCCaamCzWpHwbxzg@mail.gmail.com>
	<52EFA355.2070203@redhat.com>
	<CAFm4XO3ZwDY5Fxp5sVajDU4bju287-4P_JiDmAm-z+aQo2JKrw@mail.gmail.com>
	<CA+YCuUPge2spEAH=6jBLbbYyCp_DA0MkOu9ym5MEfL5pZvUXWA@mail.gmail.com>
	<52EFC3AF.5060201@redhat.com>
	<CA+nfvwSsJAPQdrq7sEzHHHjEJZqh_Mg1bmuaMEmzySdrBVSCfQ@mail.gmail.com>
	<BD36524B-BE39-4782-A1CC-145C5ED4CFB3@redhat.com>
	<CAFm4XO1Ckr0J_891fxNmVS2CwnkNcqeov5dubFpRKd0==em7sQ@mail.gmail.com>
Message-ID: <CA+nfvwS-c7gwT=svCubxC+3S2EfrAn1ruLKjsAvUGABoy56s1Q@mail.gmail.com>

On Wed, Feb 19, 2014 at 1:03 PM, Sanne Grinovero <sanne at infinispan.org>wrote:

> On 19 February 2014 07:12, Galder Zamarre?o <galder at redhat.com> wrote:
> >
> > On 03 Feb 2014, at 19:01, Dan Berindei <dan.berindei at gmail.com> wrote:
> >
> >>
> >>
> >>
> >> On Mon, Feb 3, 2014 at 6:28 PM, Radim Vansa <rvansa at redhat.com> wrote:
> >> >>>> For sync we would want to invoke directly to avoid context
> switching.
> >> >>> I think you haven't properly understood what I was talking about:
> the
> >> >>> putAsync should not switch context at all in the ideal design. It
> should
> >> >>> traverse through the interceptors all the way down (logically, in
> >> >>> current behaviour), invoke JGroups async API and jump out. Then, as
> soon
> >> >>> as the response is received, the thread which delivered it should
> >> >>> traverse the interceptor stack up (again, logically), and fire the
> future.
> >> > A Future doesn't make much sense with an async transport.  The problem
> >> > is with an async transport you never get back a response so you never
> >> > know when the actual command is completed and thus a Future is
> >> > worthless.  The caller wouldn't know if they could rely on the use of
> >> > the Future or not.
> >>
> >> You're right, there's one important difference between putAsync and put
> >> with async transport: in the first case you can find out when the
> >> request is completed while you cannot with the latter. Not requiring the
> >> ack can be an important optimization. I think that both versions are
> >> very valid: first mostly for bulk operations = reduction of latency,
> >> second for modifications that are acceptable to fail without handling
> that.
> >> I had the first case in my mind when talking about async operations, and
> >> there the futures are necessary.
> >>
> >> A couple more differences:
> >> 1. You can't do commitAsync(), but you can configure the commit to be
> replicated asynchronously (1PC). Although we did talk about removing that
> option...
> >> 2. If you do putAsync(k, v1); putAsync(k, v2), there is no ordering
> between the two and you might end up with k=v1 in the cache.
> >
> > If there's any relationship between both puts for the caller thread, the
> caller must make sure that the second put is only called after the first
> has completed.
>
> Actually in such a case I would strongly expect Infinispan to keep the
> two operations in order. This is not to be pushed on user's
> responsibility.
>

I think you're talking about some other kind of putAsync(k, v) than we have
now... all the work in putAsync happens on a separate thread, so there is
no ordering between two separate putAsync calls whatsoever.


> >
> > If there's separate threads calling it and it relies on this, it should
> call replace the second time, i.e. replaceAsync(k, v1, v2) to get the
> guarantees it wants.
> >
> > What is really important is that the order in which they are executed in
> one node/replica is the same order in which they're executed in all other
> nodes. This was something that was not maintained when async marshalling
> was enabled.
>
> +1000
>
> But also I'd stress that any sync operation should have a Future
> returned, someone in this long thread suggested to have an option to
> drop it for example to speedup bulk imports, but I really can't see a
> scenario in which I wouldn't want to know about a failure. Let's not
> do the same mistake that made MongoDB so "popular" ;-)
> Bulk imports can still be mad efficient without strictly needing to go
> these lenghts.
>

You mean if the operation is synchronous, but the cache store/replication
is async? I don't see how sync operations could return a Future, since most
of them already have a return value.

Bulk imports could certainly use putAsync(k, v), and that would indeed
return a Future.


>
> Sanne
>
>
> >
> >>
> >>
> >> >
> >> > Also it depends what you are trying to do with async.  Currently async
> >> > transport is only for sending messages to another node, we never think
> >> > of when we are the owning node.  In this case the calling thread would
> >> > have to go down the interceptor stack and acquire any locks if it is
> >> > the owner, thus causing this "async" to block if you have any
> >> > contention on the given key.  The use of another thread would allow
> >> > the calling thread to be able to return immediately no matter what
> >> > else is occurring.  Also I don't see what is so wrong about having a
> >> > context switch to run something asynchronously, we shouldn't have a
> >> > context switch to block the user thread imo, which is very possible
> >> > with locking.
> >>
> >> This is an important notice! Locking would complicate the design a lot,
> >> because the thread in "async" mode should do only tryLocks - if this
> >> fails, further processing should be dispatched to another thread. Not
> >> sure if this could be implemented at all, because the thread may be
> >> blocked inside JGroups as well (async API is about receiving the
> >> response asynchronously, not about sending the message asynchronously).
> >>
> >> I don't say that the context switch is that bad. My concern is that you
> >> have a very limited amount of requests that can be processed in
> >> parallel. I consider a "request" something pretty lightweight in concept
> >> - but one thread per request makes this rather heavyweight stuff.
> >>
> >> We did talk in Farnborough/Palma about removing the current LockManager
> with a queue-based structure like the one used for ordering total-order
> transactions. And about removing the implicit stack in the current
> interceptor stack with an explicit stack, to allow resuming a command
> mid-execution. But the feeling I got was that neither is going to make it
> into 7.0.
> >>
> >>
> >> >
> >> >> +1 much cleaner, I love it. Actually wasn't aware the current code
> >> >> didn't do this :-(
> >> > This is what the current async transport does, but it does nothing
> with Futures.
> >>
> >> Nevermind the futures, this is not the important part. It's not about
> >> async transport neither, it's about async executors.
> >> (okay, the thread was about dropping async transport, I have hijacked
> it)
> >>
> >> Radim
> >>
> >> --
> >> Radim Vansa <rvansa at redhat.com>
> >> JBoss DataGrid QA
> >>
> >> _______________________________________________
> >> infinispan-dev mailing list
> >> infinispan-dev at lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>
> >> _______________________________________________
> >> infinispan-dev mailing list
> >> infinispan-dev at lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> >
> > --
> > Galder Zamarre?o
> > galder at redhat.com
> > twitter.com/galderz
> >
> > Project Lead, Escalante
> > http://escalante.io
> >
> > Engineer, Infinispan
> > http://infinispan.org
> >
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140219/297a4821/attachment.html 

From vblagoje at redhat.com  Wed Feb 19 10:26:37 2014
From: vblagoje at redhat.com (Vladimir Blagojevic)
Date: Wed, 19 Feb 2014 10:26:37 -0500
Subject: [infinispan-dev] MapReduce limitations and suggestions.
In-Reply-To: <CA+nfvwRT3qJ8-Oy7s5aE=Dvj4bLoxJisyoaB0BSiDMTMcmQrpA@mail.gmail.com>
References: <52FE31FF.5050507@gmail.com>
	<52FE3335.1070806@redhat.com>	<52FE3C33.3070107@redhat.com>
	<5301BECC.7010901@redhat.com>	<BD90C3B8-C267-4128-B662-392D846CDFF0@unine.ch>	<530320F9.300@redhat.com>	<CA+nfvwTKZTb+KnEkt3YpgFBdchyo_S+QrkxqO+acX9n1SdkbGg@mail.gmail.com>	<5303342A.1050800@gmail.com>	<CA+nfvwSte75a78w2JYSy6sQ352H46n-hKSVU-T9tKTJWT5bR5w@mail.gmail.com>	<53034F5E.6060706@gmail.com>	<CA+nfvwT7hiFMM+GbcxGNBOM6NSH5+xUSeKke6xqhk0wfEoOHDw@mail.gmail.com>	<53037D40.6060101@gmail.com>
	<CA+nfvwRT3qJ8-Oy7s5aE=Dvj4bLoxJisyoaB0BSiDMTMcmQrpA@mail.gmail.com>
Message-ID: <5304CD2D.1040909@redhat.com>

On 2/19/2014, 8:22 AM, Dan Berindei wrote:
>
>
> Sorry, I didn't get too much from that example either, I gave up after 
> the second "registering is fun" popup :)
>
> One last question: with Hadoop I imagine it's quite easy to leave the 
> results of the M/R job on the distributed FS and start a new job to 
> M/R from that. Do you think it would be important to offer something 
> similar in Infinispan (i.e. put the result of the reducers in a cache 
> instead of returning it to the user)?
>
>

This is on our todo list https://issues.jboss.org/browse/ISPN-4002

From dereed at redhat.com  Wed Feb 19 11:44:32 2014
From: dereed at redhat.com (Dennis Reed)
Date: Wed, 19 Feb 2014 10:44:32 -0600
Subject: [infinispan-dev] Ditching ASYNC modes for
	REPL/DIST/INV/CacheStores?
In-Reply-To: <EFBC0CC6-7373-4411-9165-4A53B4652144@redhat.com>
References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com>
	<52EB5197.4050801@redhat.com>
	<EFBC0CC6-7373-4411-9165-4A53B4652144@redhat.com>
Message-ID: <5304DF70.2050603@redhat.com>

On 02/19/2014 12:57 AM, Galder Zamarre?o wrote:
> On 31 Jan 2014, at 08:32, Dennis Reed <dereed at redhat.com> wrote:
>
>> It would be a loss of functionality.
>>
>> As a common example, the AS web session replication cache is configured
>> for ASYNC by default, for performance reasons.
>> But it can be changed to SYNC to guarantee that when the request
>> finishes that the session was replicated.
>>
>> That wouldn't be possible if you could no longer switch between
>> ASYNC/SYNC with just a configuration change.
> I disagree :).
>
> AS could abstract that configuration detail. IOW, if all Infinispan returned was Futures, AS or any other client application, has the choice in their hands: do they wait for the future to complete or not? If they do, they?re SYNC, if not ASYNC. AS can still expose this and no functionality is lost.

Yes, the functionality is still lost.  Your suggestion is just to 
re-implement the functionality over and over in each ISPN caller. :)

> What happens is that SYNC/ASYNC decision stops being a configuration option (bad, bad, bad) and becomes an actual programming decision Infinispan clients must address (good, good, good).

This really depends on the client.  For the AS session replication use 
case, a config option is good, good, good.
But re-implementing the same functionality in every caller that may want 
it to be a config option is bad, bad, bad.

-Dennis

>> -Dennis
>>
>> On 01/31/2014 01:08 AM, Galder Zamarre?o wrote:
>>> Hi all,
>>>
>>> The following came to my mind yesterday: I think we should ditch ASYNC modes for DIST/REPL/INV and our async cache store functionality.
>>>
>>> Instead, whoever wants to store something asyncronously should use asynchronous methods, i.e. call putAsync. So, this would mean that when you call put(), it's always sync. This would reduce the complexity and configuration of our code base, without affecting our functionality, and it would make things more logical IMO.
>>>
>>> WDYT?
>>>
>>>

From dan.berindei at gmail.com  Wed Feb 19 12:43:36 2014
From: dan.berindei at gmail.com (Dan Berindei)
Date: Wed, 19 Feb 2014 19:43:36 +0200
Subject: [infinispan-dev] MapReduce limitations and suggestions.
In-Reply-To: <5304CD2D.1040909@redhat.com>
References: <52FE31FF.5050507@gmail.com> <52FE3335.1070806@redhat.com>
	<52FE3C33.3070107@redhat.com> <5301BECC.7010901@redhat.com>
	<BD90C3B8-C267-4128-B662-392D846CDFF0@unine.ch>
	<530320F9.300@redhat.com>
	<CA+nfvwTKZTb+KnEkt3YpgFBdchyo_S+QrkxqO+acX9n1SdkbGg@mail.gmail.com>
	<5303342A.1050800@gmail.com>
	<CA+nfvwSte75a78w2JYSy6sQ352H46n-hKSVU-T9tKTJWT5bR5w@mail.gmail.com>
	<53034F5E.6060706@gmail.com>
	<CA+nfvwT7hiFMM+GbcxGNBOM6NSH5+xUSeKke6xqhk0wfEoOHDw@mail.gmail.com>
	<53037D40.6060101@gmail.com>
	<CA+nfvwRT3qJ8-Oy7s5aE=Dvj4bLoxJisyoaB0BSiDMTMcmQrpA@mail.gmail.com>
	<5304CD2D.1040909@redhat.com>
Message-ID: <CA+nfvwSq30Rk3dyP9THuE1CFECYbVQc3Qc9JmOCCWy9PZL16LQ@mail.gmail.com>

On Wed, Feb 19, 2014 at 5:26 PM, Vladimir Blagojevic <vblagoje at redhat.com>wrote:

> On 2/19/2014, 8:22 AM, Dan Berindei wrote:
> >
> >
> > Sorry, I didn't get too much from that example either, I gave up after
> > the second "registering is fun" popup :)
> >
> > One last question: with Hadoop I imagine it's quite easy to leave the
> > results of the M/R job on the distributed FS and start a new job to
> > M/R from that. Do you think it would be important to offer something
> > similar in Infinispan (i.e. put the result of the reducers in a cache
> > instead of returning it to the user)?
> >
> >
>
> This is on our todo list https://issues.jboss.org/browse/ISPN-4002
>

Cool, I thought I saw it somewhere but I didn't get to actually search in
JIRA for it :)

Vladimir, what do you think about the partitioning/sorting/grouping stuff?
I'm not sure if it should be a priority for us: there are certainly Hadoop
jobs that use those and would be pretty tricky to translate to our API, but
on the other hand I'm sure most jobs are ok with an unordered Map<K, V> as
the output.

Cheers
Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140219/2918017c/attachment-0001.html 

From vblagoje at redhat.com  Wed Feb 19 14:08:20 2014
From: vblagoje at redhat.com (Vladimir Blagojevic)
Date: Wed, 19 Feb 2014 14:08:20 -0500
Subject: [infinispan-dev] MapReduce limitations and suggestions.
In-Reply-To: <CA+nfvwSq30Rk3dyP9THuE1CFECYbVQc3Qc9JmOCCWy9PZL16LQ@mail.gmail.com>
References: <52FE31FF.5050507@gmail.com>
	<52FE3335.1070806@redhat.com>	<52FE3C33.3070107@redhat.com>
	<5301BECC.7010901@redhat.com>	<BD90C3B8-C267-4128-B662-392D846CDFF0@unine.ch>	<530320F9.300@redhat.com>	<CA+nfvwTKZTb+KnEkt3YpgFBdchyo_S+QrkxqO+acX9n1SdkbGg@mail.gmail.com>	<5303342A.1050800@gmail.com>	<CA+nfvwSte75a78w2JYSy6sQ352H46n-hKSVU-T9tKTJWT5bR5w@mail.gmail.com>	<53034F5E.6060706@gmail.com>	<CA+nfvwT7hiFMM+GbcxGNBOM6NSH5+xUSeKke6xqhk0wfEoOHDw@mail.gmail.com>	<53037D40.6060101@gmail.com>	<CA+nfvwRT3qJ8-Oy7s5aE=Dvj4bLoxJisyoaB0BSiDMTMcmQrpA@mail.gmail.com>	<5304CD2D.1040909@redhat.com>
	<CA+nfvwSq30Rk3dyP9THuE1CFECYbVQc3Qc9JmOCCWy9PZL16LQ@mail.gmail.com>
Message-ID: <53050124.2070002@redhat.com>

On 2/19/2014, 12:43 PM, Dan Berindei wrote:
>
>
>     This is on our todo list https://issues.jboss.org/browse/ISPN-4002
>
>
> Cool, I thought I saw it somewhere but I didn't get to actually search 
> in JIRA for it :)
>
> Vladimir, what do you think about the partitioning/sorting/grouping 
> stuff? I'm not sure if it should be a priority for us: there are 
> certainly Hadoop jobs that use those and would be pretty tricky to 
> translate to our API, but on the other hand I'm sure most jobs are ok 
> with an unordered Map<K, V> as the output.
>
> Cheers
> Dan
>
Dan, I have to focus on the pending tasks in JIRA and in the meantime 
I'll read up on this subject of partitioning/sorting/grouping. If anyone 
else has some extra cycles then they are more than welcome to help out.

Vladimir


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140219/94e3ac45/attachment.html 

From vblagoje at redhat.com  Wed Feb 19 15:45:45 2014
From: vblagoje at redhat.com (Vladimir Blagojevic)
Date: Wed, 19 Feb 2014 15:45:45 -0500
Subject: [infinispan-dev] Further dist.exec and M/R API improvements
Message-ID: <530517F9.3060008@redhat.com>

Hey guys,

As some of you might know we have received additional requirements from 
community and internally to add a few things to dist.executors and 
map/reduce API. On distributed executors front we need to enable 
distributed executors to store results into cache directly rather than 
returning them to invoker [1]. As soon as we introduce this API we also 
need a asyc. mechanism to allow notifications of subtask 
completion/failure. I was thinking we add a concept of 
DistributedTaskExecutionListener which can be specified in 
DistributedTaskBuilder:

DistributedTaskBuilder<T> 
executionListener(DistributedTaskExecutionListener<K, T> listener);


We needed DistributedTaskExecutionListener anyway. All distributed tasks 
might use some feedback about task progress, completion/failure and on. 
My proposal is roughly:


public interface DistributedTaskExecutionListener<K, T> {

    void subtaskSent(Address node, Set<K> inputKeys);
    void subtaskFailed(Address node, Set<K> inputKeys, Exception e);
    void subtaskSucceded(Address node, Set<K> inputKeys, T result);
    void allSubtasksCompleted();

}

So much for that. If tasks do not use input keys these parameters would 
be emply sets. Now for [1] we need to add additional methods to 
DistributedExecutorService. We can not specify result cache in 
DistributedTaskBuilder as we are still bound to only submit methods in 
DistributedExecutorService that return futures and we don't want that. 
We need two new void methods:

<T, K> void submitEverywhere(DistributedTask<T> task, 
Cache<DistExecResultKey<K>, T> result);
<T, K > void submitEverywhere(DistributedTask<T> task, 
Cache<DistExecResultKey<K>, T> result, K... input);


Now, why bother with DistExecResultKey? Well we have tasks that use 
input keys and tasks that don't. So results cache could only be keyed by 
either keys or execution address, or combination of those two. 
Therefore, DistExecResultKey could be something like:

public interface DistExecResultKey<K> {

    Address getExecutionAddress();
    K getKey();

}

If you have a better idea how to address this aspect let us know. So 
much for distributed executors.


For map/reduce we also have to enable storing of map reduce task results 
into cache [2] and allow users to specify custom cache for intermediate 
results[3]. Part of task [2] is to allow notification about map/reduce 
task progress and completion. Just as in dist.executor I would add 
MapReduceTaskExecutionListener interface:


public interface MapReduceTaskExecutionListener {

    void mapTaskInitialized(Address executionAddress);
    void mapTaskSucceeded(Address executionAddress);
    void mapTaskFailed(Address executionTarget, Exception cause);
    void mapPhaseCompleted();

    void reduceTaskInitialized(Address executionAddress);
    void reduceTaskSucceeded(Address executionAddress);
    void reduceTaskFailed(Address address, Exception cause);
    void reducePhaseCompleted();

}

while MapReduceTask would have an additional method:

public void execute(Cache<KOut, VOut> resultsCache);

MapReduceTaskExecutionListener could be specified using fluent 
MapReduceTask API just as intermediate cache would be:

public MapReduceTask<KIn, VIn, KOut, VOut> 
usingIntermediateCache(Cache<KOut, List<VOut>> tmpCache);

thus addressing issue [3].

Let me know what you think,
Vladimir


[1] https://issues.jboss.org/browse/ISPN-4030
[2] https://issues.jboss.org/browse/ISPN-4002
[3] https://issues.jboss.org/browse/ISPN-4021

From vagvaz at gmail.com  Thu Feb 20 03:37:04 2014
From: vagvaz at gmail.com (Evangelos Vazaios)
Date: Thu, 20 Feb 2014 10:37:04 +0200
Subject: [infinispan-dev] MapReduce limitations and suggestions.
In-Reply-To: <CA+nfvwRT3qJ8-Oy7s5aE=Dvj4bLoxJisyoaB0BSiDMTMcmQrpA@mail.gmail.com>
References: <52FE31FF.5050507@gmail.com>
	<52FE3335.1070806@redhat.com>	<52FE3C33.3070107@redhat.com>
	<5301BECC.7010901@redhat.com>	<BD90C3B8-C267-4128-B662-392D846CDFF0@unine.ch>	<530320F9.300@redhat.com>	<CA+nfvwTKZTb+KnEkt3YpgFBdchyo_S+QrkxqO+acX9n1SdkbGg@mail.gmail.com>	<5303342A.1050800@gmail.com>	<CA+nfvwSte75a78w2JYSy6sQ352H46n-hKSVU-T9tKTJWT5bR5w@mail.gmail.com>	<53034F5E.6060706@gmail.com>	<CA+nfvwT7hiFMM+GbcxGNBOM6NSH5+xUSeKke6xqhk0wfEoOHDw@mail.gmail.com>	<53037D40.6060101@gmail.com>
	<CA+nfvwRT3qJ8-Oy7s5aE=Dvj4bLoxJisyoaB0BSiDMTMcmQrpA@mail.gmail.com>
Message-ID: <5305BEB0.9000900@gmail.com>

On 02/19/2014 03:22 PM, Dan Berindei wrote:

> I forgot to ask about this... we already have the entries stored as
> key,value pairs, so we expect the data to be already in the cache. That
> means there is no ordering in the inputs, and the mapper can't rely on
> sequential inputs to be related. Would you consider that to be a reasonable
> expectation?
> 
Yes, I have not encounter an algorithm in M/R that assumes such
relations during the map phase.

Cheers,
Evangelos

From faseela.k at ericsson.com  Thu Feb 20 04:11:29 2014
From: faseela.k at ericsson.com (Faseela K)
Date: Thu, 20 Feb 2014 09:11:29 +0000
Subject: [infinispan-dev] How to add programmatic config to an exisitng xml
	configured cache
Message-ID: <B14421FA3F549F4A877442542D16672C179CCB@ESESSMB109.ericsson.se>

Hi,

  I have some infinispan configurations available in "config.xml".
  After loading this configuration, I want to append some more configurations programmatically, using Configuration Builder.
  I am doing something like this :

                        Configuration template = null;
                ConfigurationBuilder builder = new ConfigurationBuilder();

                DefaultCacheManager manager = new DefaultCacheManager(
                                "config.xml");
                template = manager.getCacheConfiguration("evictionCache");
                builder.read(template);
                builder.loaders().passivation(false).shared(false).preload(true)
                                .addFileCacheStore().fetchPersistentState(true)
                                .purgerThreads(3).purgeSynchronously(true)
                                .ignoreModifications(false).purgeOnStartup(false)
                                .location("tmp").async()
                                .enabled(true).flushLockTimeout(15000).threadPoolSize(5)
                                .singletonStore().enabled(true).pushStateWhenCoordinator(true)
                                .pushStateTimeout(20000);

                manager.defineConfiguration("abcd", builder.build());

  The problem with this code is, it's overwriting the evictionCache configuration.
  Can somebody help me to fix this issue?

Thanks,
Faseela

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140220/31a82d43/attachment.html 

From galder at redhat.com  Thu Feb 20 06:37:13 2014
From: galder at redhat.com (=?iso-8859-1?Q?Galder_Zamarre=F1o?=)
Date: Thu, 20 Feb 2014 12:37:13 +0100
Subject: [infinispan-dev] [infinispan-internal] Introducing Infinispan
	OData server: Remote JSON documents querying
In-Reply-To: <1836092005.4948036.1392726921319.JavaMail.zimbra@redhat.com>
References: <1836092005.4948036.1392726921319.JavaMail.zimbra@redhat.com>
Message-ID: <A2039A9F-F90C-4D47-B4D1-48B318435F7A@redhat.com>

Great work Tomas!! :)

On 18 Feb 2014, at 13:35, Tomas Sykora <tsykora at redhat.com> wrote:

> Hello all! :)
> 
> It's the right time to make it a little bit more public and share some results of work on Infinispan OData server, finally!
> This solution can serve as a proof of concept where we are able to remotely query JSON documents stored in Infinispan caches and using industrial standard and platform independent way of communication with the server (OData).
> 
> There is still much to do/implement/improve in the server, but it is working as it is now.
> 
> Check a blog post if you are interested:
> http://tsykora-tech.blogspot.cz/2014/02/introducing-infinispan-odata-server.html
> 
> Any feedback is more than welcome.
> 
> + I'd like to say a big THANK YOU to all who supported me! 
> Mainly: JDG QE guys, Manik, Mircea, Sanne and Adrian. 
> It wouldn't be done without your patience and willingness to help me :-)
> 
> Tomas
> 


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From rvansa at redhat.com  Thu Feb 20 09:40:41 2014
From: rvansa at redhat.com (Radim Vansa)
Date: Thu, 20 Feb 2014 15:40:41 +0100
Subject: [infinispan-dev] RadarGun 1.1.0.Final released
Message-ID: <530613E9.3020602@redhat.com>

Hi all,

it has been a long time since last release of RadarGun. We have been 
using it intensively and developed many new features - 1.0.0 had 7,340 
lines of Java code, 1.1.0 has 32,978 lines. RadarGun has become 
multi-purpose tool, used for checking both performance and functionality 
of caches under stress.

During 1.1.0 development, most parts of code changed beyond the beyonds, 
but we tried to keep the old configuration compatible. However, the 
design started to be rather limiting, and therefore, we have decided to 
make the last release for 1.1.0 and move on to RadarGun 2.0.0. In 1.1.x 
branch we will provide bugfixes, but all new features should go to 2.0.0.

Some decoys for features expected for RadarGun 2.0.0:

* non-homogenous clusters: client/server setups, cooperation of 
different versions of products, or easy setup of cross-site deployment 
with different configurations
* abstracting from cache wrapper: you will be able to use RadarGun for 
more than just caches without any hacks
** current CacheWrapper interface will be designed to match JSR-107 
javax.cache.Cache rather than java.util.Map
* pluggable reporting: statistics will be directly multiplexed to 
configured reporters (again, without cheating on directories), reporters 
will provide the output formatted as CSV, HTML or even can deploy the 
results to external repository
* merging local and distributed benchmark -> master + single slave 
within one JVM
* better property parsing: evaluation of expressions, property 
replacement executed on slaves

I hope you will like it! And enjoy 1.1.0.Final release now.

Radim

------
Radim Vansa <rvansa at redhat.com> JBoss DataGrid QA


From afield at redhat.com  Thu Feb 20 09:49:28 2014
From: afield at redhat.com (Alan Field)
Date: Thu, 20 Feb 2014 09:49:28 -0500 (EST)
Subject: [infinispan-dev] RadarGun 1.1.0.Final released
In-Reply-To: <530613E9.3020602@redhat.com>
References: <530613E9.3020602@redhat.com>
Message-ID: <2023235320.6731410.1392907768166.JavaMail.zimbra@redhat.com>

Yes! Congratulations Radim on defeating Maven's Release Plugin!

----- Original Message -----
> From: "Radim Vansa" <rvansa at redhat.com>
> To: "infinispan -Dev List" <infinispan-dev at lists.jboss.org>
> Sent: Thursday, February 20, 2014 9:40:41 AM
> Subject: [infinispan-dev] RadarGun 1.1.0.Final released
> 
> Hi all,
> 
> it has been a long time since last release of RadarGun. We have been
> using it intensively and developed many new features - 1.0.0 had 7,340
> lines of Java code, 1.1.0 has 32,978 lines. RadarGun has become
> multi-purpose tool, used for checking both performance and functionality
> of caches under stress.
> 
> During 1.1.0 development, most parts of code changed beyond the beyonds,
> but we tried to keep the old configuration compatible. However, the
> design started to be rather limiting, and therefore, we have decided to
> make the last release for 1.1.0 and move on to RadarGun 2.0.0. In 1.1.x
> branch we will provide bugfixes, but all new features should go to 2.0.0.
> 
> Some decoys for features expected for RadarGun 2.0.0:
> 
> * non-homogenous clusters: client/server setups, cooperation of
> different versions of products, or easy setup of cross-site deployment
> with different configurations
> * abstracting from cache wrapper: you will be able to use RadarGun for
> more than just caches without any hacks
> ** current CacheWrapper interface will be designed to match JSR-107
> javax.cache.Cache rather than java.util.Map
> * pluggable reporting: statistics will be directly multiplexed to
> configured reporters (again, without cheating on directories), reporters
> will provide the output formatted as CSV, HTML or even can deploy the
> results to external repository
> * merging local and distributed benchmark -> master + single slave
> within one JVM
> * better property parsing: evaluation of expressions, property
> replacement executed on slaves
> 
> I hope you will like it! And enjoy 1.1.0.Final release now.
> 
> Radim
> 
> ------
> Radim Vansa <rvansa at redhat.com> JBoss DataGrid QA
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 

From dan.berindei at gmail.com  Fri Feb 21 11:03:11 2014
From: dan.berindei at gmail.com (Dan Berindei)
Date: Fri, 21 Feb 2014 18:03:11 +0200
Subject: [infinispan-dev] ClusteredListeners: message delivered twice
In-Reply-To: <CA+YCuUMmKxo_f1+1phVmZkT==ekzm+WS5drh1YnKSN0L3a_hGA@mail.gmail.com>
References: <6F2DD24C-4712-48AF-ABCD-7CE280089875@redhat.com>
	<CAFm4XO1ys5ffa8n2xOSZjpDVXaxE5P-1ecTJbX7WTmCZ=zU27g@mail.gmail.com>
	<CA+YCuUMmKxo_f1+1phVmZkT==ekzm+WS5drh1YnKSN0L3a_hGA@mail.gmail.com>
Message-ID: <CA+nfvwS7=oLQapZvJiR9bHj2c-MrOATYyvTmF1n=r+d+_CdY2w@mail.gmail.com>

On Mon, Feb 17, 2014 at 7:44 PM, William Burns <mudokonman at gmail.com> wrote:

> On Mon, Feb 17, 2014 at 7:53 AM, Sanne Grinovero <sanne at infinispan.org>
> wrote:
> > On 12 February 2014 10:40, Mircea Markus <mmarkus at redhat.com> wrote:
> >> Hey Will,
> >>
> >> With the current design, during a topology change, an event might be
> delivered twice to a cluster listener. I think we might be able to identify
> such situations (a node becomes a key owner as a result of the topology
> change) and add this information to the event we send, e.g. a flag
> "potentiallyDuplicate" or something like that. Event implementors might be
> able to make good use of this, e.g. checking their internal state if an
> event is redelivered or not. What do you think? Are there any other
> more-than-once delivery situations we can't keep track of?
>
> I agree, this would be important to track.  I have thus added a new
> flag to listeners that is set to true when a modification, removal, or
> create that is done on behalf of a command that was retried due to a
> topology change during the middle of it.  Also this gives the benefit
> not just for cluster listeners but regular listeners, since we could
> have double notification currently even.
>
> >
> > I would really wish we would not push such a burden to the API
> > consumer. If we at least had a modification counter associated with
> > each entry this could help to identify duplicate triggers as well (on
> > top of ordering of modification events as already discussed many
> > times).
>
> The issue in particular we have issues with listeners is when the
> primary owner replicates the update to backup owners and then crashes
> before the notification is sent.  In this case we have no idea from
> the originator's perspective if the backup owner has the update.  When
> the topology changes if updated it will be persisted to new owners
> (possibly without notification).  We could add a counter, however the
> backup owner then has no idea if the primary owner has sent the
> notification or not.  Without adding some kind of 2PC to the primary
> owner to tell the backup that it occurred, he won't know.  However
> this doesn't reliably tell the backup owner if the notification was
> fired even if the node goes down during this period.  Without
> seriously rewriting our nontx dist code I don't see a viable way to do
> this without the API consumer having to be alerted.
>

There's always going to be the possibility that a replication to one of the
backup owner fails and the command is aborted after the listener was
notified (but not on the successful backup owners). And even in tx mode,
the listeners are notified during the prepare phase and not during the
commit.

So I don't think we'll ever be able to make listeners 100% reliable, but
the "potentially duplicate" flag should be good enough.

Cheers
Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140221/6eff2b4d/attachment.html 

From mmarkus at redhat.com  Mon Feb 24 11:32:24 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Mon, 24 Feb 2014 16:32:24 +0000
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <8955F382-8A6E-43AA-864E-1EC0C190654E@redhat.com>
References: <CAFm4XO2k=BxKVa=+k57=HE+oKtBJ+Wn+NhmBZqupXYU0fGjsqA@mail.gmail.com>
	<888EA204-30A1-4BFF-9469-7118996024A1@hibernate.org>
	<CAFm4XO2gzCeG466ZKTcoNkoYspu9Upiqp7L_81_RQhdfNOL-rw@mail.gmail.com>
	<6211A55D-9F1D-4686-9EF4-373C216E4927@hibernate.org>
	<B0428B32-CD88-46DE-8C07-4785387B657E@redhat.com>
	<C23D0695-57B2-43DE-873F-C198272977FB@redhat.com>
	<CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<8955F382-8A6E-43AA-864E-1EC0C190654E@redhat.com>
Message-ID: <34A01AED-0DDF-4171-9B83-BB3B6C9DF0E8@redhat.com>


On Feb 17, 2014, at 5:35 PM, Galder Zamarre?o <galder at redhat.com> wrote:

> 
> On 30 Jan 2014, at 20:51, Mircea Markus <mmarkus at redhat.com> wrote:
> 
>> 
>> On Jan 30, 2014, at 9:42 AM, Galder Zamarre?o <galder at redhat.com> wrote:
>> 
>>> 
>>> On Jan 21, 2014, at 11:52 PM, Mircea Markus <mmarkus at redhat.com> wrote:
>>> 
>>>> 
>>>> On Jan 15, 2014, at 1:42 PM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>>> 
>>>>> By the way, people looking for that feature are also asking for a unified Cache API accessing these several caches right? Otherwise I am not fully understanding why they ask for a unified query.
>>>>> Do you have written detailed use cases somewhere for me to better understand what is really requested?
>>>> 
>>>> IMO from a user perspective, being able to run queries spreading several caches makes the programming simplifies the programming model: each cache corresponding to a single entity type, with potentially different configuration.
>>> 
>>> Not sure if it simplifies things TBH if the configuration is the same. IMO, it just adds clutter.
>> 
>> Not sure I follow: having a cache that contains both Cars and Persons sound more cluttering to me. I think it's cumbersome to write any kind of querying with an heterogenous  cache, e.g. Map/Reduce tasks that need to count all the green Cars would need to be aware of Persons and ignore them. Not only it is harder to write, but discourages code reuse and makes it hard to maintain (if you'll add Pets in the same cache in future you need to update the M/R code as well). And of course there are also different cache-based configuration options that are not immediately obvious (at design time) but will be in the future (there are more Persons than Cars, they live longer/expiry etc): mixing everything together in the same cache from the begging is a design decision that might bite you in the future.
>> 
>> The way I see it - and very curious to see your opinion on this - following an database analogy, the CacheManager corresponds to an Database and the Cache to a Table. Hence my thought that queries spreading multiple caches are both useful and needed (same as query spreading over multiple tables).
> 
> My opinion is that seeing it this way is limiting. A key/value store is schemaless. Your view is forcing a particular schema on how to structure things. 

I'm not forcing anything at all, people can still use a Cache<Object,Object> if they want to. What I'm saying is that, especially for larger application, grouping the type on caches makes a lot of sense for the users.

> 
> I don?t pretend everyone to store everything in a single cache and of course there will be situations where it?s not ideal or the best solution, such as in cases like the ones you mention above, but if you want to do it, for any of the reasons I or Paul mentioned in [1], it?d be nice to be able to do so. 

of course, I don't plan to enforce this model at all, as it is useful. Just pondering on the way the domain model is split between caches. 

> 
> Cheers,
> 
> [1] https://issues.jboss.org/browse/ISPN-3640
> 
>> 
>>> 
>>> Just yesterday I discovered this gem in Scala's Shapeless extensions [1]. This is experimental stuff but essentially it allows to define what the key/value type pairs a map will contain, and it does type checking at compile time. I almost wet my pants when I saw that ;) :p. In the example, it defines a map as containing Int -> String, and String -> Int key/value pairs. If you try to add an Int -> Int, it fails compilation.
>> 
>> Agreed the compile time check is pretty awesome :-) Still mix and matching types in a Map doesn't look great to me for ISPN.
>> 
>>> 
>>> Java's type checking is not powerful enough to do this, and it's compilation logic is not extendable in the same way Scala macros does, but I think the fact that other languages are looking into this validates Paul's suggestion in [2], on top of all the benefits listed there.
>>> 
>>> Cheers,
>>> 
>>> [1] https://github.com/milessabin/shapeless/wiki/Feature-overview:-shapeless-2.0.0#heterogenous-maps
>>> [2] https://issues.jboss.org/browse/ISPN-3640
>>> 
>>>> Besides the query API that would need to be extended to support accessing multiple caches, not sure what other APIs would need to be extended to take advantage of this?
>>>> 
>>>>> 
>>>>> Emmanuel
>>>>> 
>>>>> On 14 Jan 2014, at 12:59, Sanne Grinovero <sanne at infinispan.org> wrote:
>>>>> 
>>>>>> Up this: it was proposed again today ad a face to face meeting.
>>>>>> Apparently multiple parties have been asking to be able to run
>>>>>> cross-cache queries.
>>>>>> 
>>>>>> Sanne
>>>>>> 
>>>>>> On 11 April 2012 12:47, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>>>>>> 
>>>>>>> On 10 avr. 2012, at 19:10, Sanne Grinovero wrote:
>>>>>>> 
>>>>>>>> Hello all,
>>>>>>>> currently Infinispan Query is an interceptor registering on the
>>>>>>>> specific Cache instance which has indexing enabled; one such
>>>>>>>> interceptor is doing all what it needs to do in the sole scope of the
>>>>>>>> cache it was registered in.
>>>>>>>> 
>>>>>>>> If you enable indexing - for example - on 3 different caches, there
>>>>>>>> will be 3 different Hibernate Search engines started in background,
>>>>>>>> and they are all unaware of each other.
>>>>>>>> 
>>>>>>>> After some design discussions with Ales for CapeDwarf, but also
>>>>>>>> calling attention on something that bothered me since some time, I'd
>>>>>>>> evaluate the option to have a single Hibernate Search Engine
>>>>>>>> registered in the CacheManager, and have it shared across indexed
>>>>>>>> caches.
>>>>>>>> 
>>>>>>>> Current design limitations:
>>>>>>>> 
>>>>>>>> A- If they are all configured to use the same base directory to
>>>>>>>> store indexes, and happen to have same-named indexes, they'll share
>>>>>>>> the index without being aware of each other. This is going to break
>>>>>>>> unless the user configures some tricky parameters, and even so
>>>>>>>> performance won't be great: instances will lock each other out, or at
>>>>>>>> best write in alternate turns.
>>>>>>>> B- The search engine isn't particularly "heavy", still it would be
>>>>>>>> nice to share some components and internal services.
>>>>>>>> C- Configuration details which need some care - like injecting a
>>>>>>>> JGroups channel for clustering - needs to be done right isolating each
>>>>>>>> instance (so large parts of configuration would be quite similar but
>>>>>>>> not totally equal)
>>>>>>>> D- Incoming messages into a JGroups Receiver need to be routed not
>>>>>>>> only among indexes, but also among Engine instances. This prevents
>>>>>>>> Query to reuse code from Hibernate Search.
>>>>>>>> 
>>>>>>>> Problems with a unified Hibernate Search Engine:
>>>>>>>> 
>>>>>>>> 1#- Isolation of types / indexes. If the same indexed class is
>>>>>>>> stored in different (indexed) caches, they'll share the same index. Is
>>>>>>>> it a problem? I'm tempted to consider this a good thing, but wonder if
>>>>>>>> it would surprise some users. Would you expect that?
>>>>>>> 
>>>>>>> I would not expect that. Unicity in Hibernate Search is not defined per identity but per class + provided id.
>>>>>>> I can see people reusing the same class as partial DTO and willing to index that. I can even see people
>>>>>>> using the Hibernate Search programmatic API to index the "DTO" stored in cache 2 differently than the
>>>>>>> domain class stored in cache 1.
>>>>>>> I can concede that I am pushing a bit the use case towards bad-ish design approaches.
>>>>>>> 
>>>>>>>> 2#- configuration format overhaul: indexing options won't be set on
>>>>>>>> the cache section but in the global section. I'm looking forward to
>>>>>>>> use the schema extensions anyway to provide a better configuration
>>>>>>>> experience than the current <properties />.
>>>>>>>> 3#- Assuming 1# is fine, when a search hit is found I'd need to be
>>>>>>>> able to figure out from which cache the value should be loaded.
>>>>>>>> 3#A  we could have the cache name encoded in the index, as part
>>>>>>>> of the identifier: {PK,cacheName}
>>>>>>>> 3#B  we actually shard the index, keeping a physically separate
>>>>>>>> index per cache. This would mean searching on the joint index view but
>>>>>>>> extracting hits from specific indexes to keep track of "which index"..
>>>>>>>> I think we can do that but it's definitely tricky.
>>>>>>>> 
>>>>>>>> It's likely easier to keep indexed values from different caches in
>>>>>>>> different indexes. that would mean to reject #1 and mess with the user
>>>>>>>> defined index name, to add for example the cache name to the user
>>>>>>>> defined string.
>>>>>>>> 
>>>>>>>> Any comment?
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> Sanne
>>>>>>>> _______________________________________________
>>>>>>>> infinispan-dev mailing list
>>>>>>>> infinispan-dev at lists.jboss.org
>>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> infinispan-dev mailing list
>>>>>>> infinispan-dev at lists.jboss.org
>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>> _______________________________________________
>>>>>> infinispan-dev mailing list
>>>>>> infinispan-dev at lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>> 
>>>> Cheers,
>>>> -- 
>>>> Mircea Markus
>>>> Infinispan lead (www.infinispan.org)
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> 
>>> 
>>> --
>>> Galder Zamarre?o
>>> galder at redhat.com
>>> twitter.com/galderz
>>> 
>>> Project Lead, Escalante
>>> http://escalante.io
>>> 
>>> Engineer, Infinispan
>>> http://infinispan.org
>>> 
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> Cheers,
>> -- 
>> Mircea Markus
>> Infinispan lead (www.infinispan.org)
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> 
> --
> Galder Zamarre?o
> galder at redhat.com
> twitter.com/galderz
> 
> Project Lead, Escalante
> http://escalante.io
> 
> Engineer, Infinispan
> http://infinispan.org
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From mmarkus at redhat.com  Mon Feb 24 11:39:05 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Mon, 24 Feb 2014 16:39:05 +0000
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>
	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>
	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>
	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>
	<20140205163032.GB93108@hibernate.org>
	<F1EB6F85-D5B2-4459-B2BD-185124E046B3@redhat.com>
	<20140217175115.GC639@hibernate.org>
	<8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org>
Message-ID: <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com>


On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:

> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome.

Curious to hear the whole story :-)
We cannot mandate all the suers to use OGM though, one of the reasons being OGM is not platform independent (hotrod). 

> 
>> On 17 f?vr. 2014, at 18:51, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>> 
>>> On Mon 2014-02-17 18:43, Galder Zamarre?o wrote:
>>> 
>>>> On 05 Feb 2014, at 17:30, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>>> 
>>>>> On Wed 2014-02-05 15:53, Mircea Markus wrote:
>>>>> 
>>>>>> On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>>>>> 
>>>>>> Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed. 
>>>>> 
>>>>> Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-)
>>>> 
>>>> 
>>>> //some unified query giving me entries pointing by fk copy to bar and
>>>> //buz objects. So I need to manually load these references.
>>>> 
>>>> //happy emmanuel
>>>> Cache unifiedCache = cacheManager.getMotherOfAllCaches();
>>>> Bar bar = unifiedCache.get(foo);
>>>> Buz buz = unifiedCache.get(baz);
>>>> 
>>>> //not so happy emmanuel
>>>> Cache fooCache = cacheManager.getCache("foo");
>>>> Bar bar = fooCache.get(foo);
>>>> Cache bazCache = cacheManager.getCache("baz");
>>>> Buz buz = bazCache.put(baz);
>>> 
>>> Would something like what Paul suggests in https://issues.jboss.org/browse/ISPN-3640 help you better? IOW, have a single cache, and then have a filtered view for Bar or Buz types? Not sure I understand the differences in your code changes in terms of what makes you happy vs not.
>> 
>> Not really.
>> What makes me unhappy is to have to keep in my app all the
>> references to these specific cache store instances. The filtering
>> approach only moves the problem.
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From mmarkus at redhat.com  Mon Feb 24 11:47:45 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Mon, 24 Feb 2014 16:47:45 +0000
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <53034BBB.1030809@redhat.com>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>	<20140205163032.GB93108@hibernate.org>	<F1EB6F85-D5B2-4459-B2BD-185124E046B3@redhat.com>	<20140217175115.GC639@hibernate.org>
	<8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org>
	<53034BBB.1030809@redhat.com>
Message-ID: <BCFF5A2A-10EA-477E-A7E3-FDB228D3E82F@redhat.com>

On Feb 18, 2014, at 12:02 PM, Adrian Nistor <anistor at redhat.com> wrote:

> Well, OGM and Infinispan are different species :) So, Infinispan being what it is today - a non-homogenous, schema-less KV store, without support for entity associations (except embedding) - which simplifies the whole thing a lot, should we or should we not provide transparent cross-cacheManager search capabilities, in this exact context?  Vote?

TBH I think users should push us for this if they need it. -1 to do it right now.

> 
> There were some points raised previously like "if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well". In the SQL world you would also probably CRUD against a table or set of tables and then query against a view - a bit like what we're doing here.
> I don't see any problem with this in principle. There is however something currently missing in the query result set API - it currently does not provide you the keys of the matching entities.

would be nice to have an option for that, indeed. 

> People work around this by storing the key in the entity.  Now with the addition of the cross-cacheManager search we'll probably need to fix the result api  and also provide a reference to the cache (or just the name?) where the entity is stored.
> 
> The (enforced) one entity type per cache rule is not conceptually or technically required for implementing this, so I won't start raving against it :)  Sane users should apply it however.
> 
> 
> On 02/18/2014 12:13 AM, Emmanuel Bernard wrote:
>> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome.
>> 
>> 
>>> On 17 f?vr. 2014, at 18:51, Emmanuel Bernard <emmanuel at hibernate.org>
>>>  wrote:
>>> 
>>> 
>>>> On Mon 2014-02-17 18:43, Galder Zamarre?o wrote:
>>>> 
>>>> 
>>>>> On 05 Feb 2014, at 17:30, Emmanuel Bernard <emmanuel at hibernate.org>
>>>>>  wrote:
>>>>> 
>>>>> 
>>>>>> On Wed 2014-02-05 15:53, Mircea Markus wrote:
>>>>>> 
>>>>>> 
>>>>>>> On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard <emmanuel at hibernate.org>
>>>>>>>  wrote:
>>>>>>> 
>>>>>>> Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed. 
>>>>>>> 
>>>>>> Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-)
>>>>>> 
>>>>> 
>>>>> //some unified query giving me entries pointing by fk copy to bar and
>>>>> //buz objects. So I need to manually load these references.
>>>>> 
>>>>> //happy emmanuel
>>>>> Cache unifiedCache = cacheManager.getMotherOfAllCaches();
>>>>> Bar bar = unifiedCache.get(foo);
>>>>> Buz buz = unifiedCache.get(baz);
>>>>> 
>>>>> //not so happy emmanuel
>>>>> Cache fooCache = cacheManager.getCache("foo");
>>>>> Bar bar = fooCache.get(foo);
>>>>> Cache bazCache = cacheManager.getCache("baz");
>>>>> Buz buz = bazCache.put(baz);
>>>>> 
>>>> Would something like what Paul suggests in https://issues.jboss.org/browse/ISPN-3640
>>>>  help you better? IOW, have a single cache, and then have a filtered view for Bar or Buz types? Not sure I understand the differences in your code changes in terms of what makes you happy vs not.
>>>> 
>>> Not really.
>>> What makes me unhappy is to have to keep in my app all the
>>> references to these specific cache store instances. The filtering
>>> approach only moves the problem.
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> 
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> _______________________________________________
>> infinispan-dev mailing list
>> 
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From mmarkus at redhat.com  Mon Feb 24 12:27:07 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Mon, 24 Feb 2014 17:27:07 +0000
Subject: [infinispan-dev] RadarGun 1.1.0.Final released
In-Reply-To: <530613E9.3020602@redhat.com>
References: <530613E9.3020602@redhat.com>
Message-ID: <4EC95CDC-9146-4E27-96D6-FCEAD6B76F27@redhat.com>

Nice work, Radim!
And the roadmap looks very good.
On Feb 20, 2014, at 2:40 PM, Radim Vansa <rvansa at redhat.com> wrote:

> Hi all,
> 
> it has been a long time since last release of RadarGun. We have been 
> using it intensively and developed many new features - 1.0.0 had 7,340 
> lines of Java code, 1.1.0 has 32,978 lines. RadarGun has become 
> multi-purpose tool, used for checking both performance and functionality 
> of caches under stress.
> 
> During 1.1.0 development, most parts of code changed beyond the beyonds, 
> but we tried to keep the old configuration compatible. However, the 
> design started to be rather limiting, and therefore, we have decided to 
> make the last release for 1.1.0 and move on to RadarGun 2.0.0. In 1.1.x 
> branch we will provide bugfixes, but all new features should go to 2.0.0.
> 
> Some decoys for features expected for RadarGun 2.0.0:
> 
> * non-homogenous clusters: client/server setups, cooperation of 
> different versions of products, or easy setup of cross-site deployment 
> with different configurations
> * abstracting from cache wrapper: you will be able to use RadarGun for 
> more than just caches without any hacks
> ** current CacheWrapper interface will be designed to match JSR-107 
> javax.cache.Cache rather than java.util.Map
> * pluggable reporting: statistics will be directly multiplexed to 
> configured reporters (again, without cheating on directories), reporters 
> will provide the output formatted as CSV, HTML or even can deploy the 
> results to external repository
> * merging local and distributed benchmark -> master + single slave 
> within one JVM
> * better property parsing: evaluation of expressions, property 
> replacement executed on slaves
> 
> I hope you will like it! And enjoy 1.1.0.Final release now.
> 
> Radim
> 
> ------
> Radim Vansa <rvansa at redhat.com> JBoss DataGrid QA
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From mmarkus at redhat.com  Mon Feb 24 12:28:56 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Mon, 24 Feb 2014 17:28:56 +0000
Subject: [infinispan-dev] [infinispan-internal] Introducing Infinispan
	OData server: Remote JSON documents querying
In-Reply-To: <A2039A9F-F90C-4D47-B4D1-48B318435F7A@redhat.com>
References: <1836092005.4948036.1392726921319.JavaMail.zimbra@redhat.com>
	<A2039A9F-F90C-4D47-B4D1-48B318435F7A@redhat.com>
Message-ID: <288CCCC7-A08B-4852-A405-57D7E2D640A1@redhat.com>

Great work! You might want to add a blog entry to the infinispan blog as well? that would gain you more visibility.

On Feb 20, 2014, at 11:37 AM, Galder Zamarre?o <galder at redhat.com> wrote:

> Great work Tomas!! :)
> 
> On 18 Feb 2014, at 13:35, Tomas Sykora <tsykora at redhat.com> wrote:
> 
>> Hello all! :)
>> 
>> It's the right time to make it a little bit more public and share some results of work on Infinispan OData server, finally!
>> This solution can serve as a proof of concept where we are able to remotely query JSON documents stored in Infinispan caches and using industrial standard and platform independent way of communication with the server (OData).
>> 
>> There is still much to do/implement/improve in the server, but it is working as it is now.
>> 
>> Check a blog post if you are interested:
>> http://tsykora-tech.blogspot.cz/2014/02/introducing-infinispan-odata-server.html
>> 
>> Any feedback is more than welcome.
>> 
>> + I'd like to say a big THANK YOU to all who supported me! 
>> Mainly: JDG QE guys, Manik, Mircea, Sanne and Adrian. 
>> It wouldn't be done without your patience and willingness to help me :-)
>> 
>> Tomas
>> 
> 
> 
> --
> Galder Zamarre?o
> galder at redhat.com
> twitter.com/galderz
> 
> Project Lead, Escalante
> http://escalante.io
> 
> Engineer, Infinispan
> http://infinispan.org
> 
> 

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From mmarkus at redhat.com  Mon Feb 24 12:47:59 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Mon, 24 Feb 2014 17:47:59 +0000
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <CAFm4XO0pvX_FERFLWTdeOzT9JkO4qM5_=YtmVwW7abj2x+733g@mail.gmail.com>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>
	<B8CA3C8E-F87F-4C11-BBD6-D5CB2BD0D164@redhat.com>
	<CAAuDtmdXJY4sk-5Jwgd6tckwwz1e_RUovWUrhYE5j_j5rugSuQ@mail.gmail.com>
	<20140218130345.GB11962@hibernate.org>
	<53035D8F.4000604@infinispan.org>
	<9B723B39-E9F4-4C53-8AF3-1367925BBC2B@redhat.com>
	<CAFm4XO0pvX_FERFLWTdeOzT9JkO4qM5_=YtmVwW7abj2x+733g@mail.gmail.com>
Message-ID: <A90BBC2C-1D3A-4A7D-8BCB-5ED3DC3A3EE3@redhat.com>


On Feb 24, 2014, at 5:39 PM, Sanne Grinovero <sanne at infinispan.org> wrote:

> On 24 February 2014 16:51, Mircea Markus <mmarkus at redhat.com> wrote:
>> Just to recap the main reason for the JPA cache store is to be a replacement for the JDBCacheStore, nothing more than that.
>> And it certainly has advantages compared with the JDBC Cache Stores:
>> - JPA offers database independence/portability
>> - doesn't put that many restrictions on the schema
>> - it's easier write/read from an exiting database table
> 
> Don't you dare hijacking my nice 2 years old thread :-D

:-D

> BTW why is this dicussion not public anymore? I missed the switch to undercover.

I don't know where it switched to private, make it public again ;)


> 
> Cheers,
> Sanne
> 
>> 
>> 
>> 
>> On Feb 18, 2014, at 1:18 PM, Tristan Tarrant <tristan at infinispan.org> wrote:
>> 
>>> I think that the CacheLoader/Store SPI should be enhanced with "schema" information, whatever its source (JPA annotations, ProtoBuf, etc).
>>> 
>>> A schema-aware store can then do what it pleases.
>>> 
>>> Tristan
>>> 
>>> On 18/02/2014 14:03, Emmanuel Bernard wrote:
>>>> On Tue 2014-02-18 13:16, Adrian Nistor wrote:
>>>>>> JPA cache store is a waste of time IMO :)
>>>>> +1 :)
>>>> My understanding is that the JPACacheStore discussion is revived because
>>>> users want to map an existing databases, load the data in the grid and
>>>> keep both synchronized.
>>>> At least that's the use case I was told was needed to be covered.
>>> 
>> 
>> Cheers,
>> --
>> Mircea Markus
>> Infinispan lead (www.infinispan.org)
>> 
>> 
>> 
>> 

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From mmarkus at redhat.com  Mon Feb 24 11:36:30 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Mon, 24 Feb 2014 16:36:30 +0000
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <2D1C63B2-7313-4FE4-93D2-D50B91565FF2@redhat.com>
References: <CAFm4XO2k=BxKVa=+k57=HE+oKtBJ+Wn+NhmBZqupXYU0fGjsqA@mail.gmail.com>
	<888EA204-30A1-4BFF-9469-7118996024A1@hibernate.org>
	<CAFm4XO2gzCeG466ZKTcoNkoYspu9Upiqp7L_81_RQhdfNOL-rw@mail.gmail.com>
	<6211A55D-9F1D-4686-9EF4-373C216E4927@hibernate.org>
	<B0428B32-CD88-46DE-8C07-4785387B657E@redhat.com>
	<C23D0695-57B2-43DE-873F-C198272977FB@redhat.com>
	<CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<2C233AC3-BEFC-4FD5-A297-A854FEA8165D@hibernate.org>
	<2D1C63B2-7313-4FE4-93D2-D50B91565FF2@redhat.com>
Message-ID: <92D9B688-7285-4406-9DAE-B120452C1655@redhat.com>


On Feb 17, 2014, at 5:36 PM, Galder Zamarre?o <galder at redhat.com> wrote:

> 
> On 31 Jan 2014, at 09:28, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
> 
>> 
>> 
>>> On 30 janv. 2014, at 20:51, Mircea Markus <mmarkus at redhat.com> wrote:
>>> 
>>> 
>>>> On Jan 30, 2014, at 9:42 AM, Galder Zamarre?o <galder at redhat.com> wrote:
>>>> 
>>>> 
>>>>> On Jan 21, 2014, at 11:52 PM, Mircea Markus <mmarkus at redhat.com> wrote:
>>>>> 
>>>>> 
>>>>>> On Jan 15, 2014, at 1:42 PM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>>>>> 
>>>>>> By the way, people looking for that feature are also asking for a unified Cache API accessing these several caches right? Otherwise I am not fully understanding why they ask for a unified query.
>>>>>> Do you have written detailed use cases somewhere for me to better understand what is really requested?
>>>>> 
>>>>> IMO from a user perspective, being able to run queries spreading several caches makes the programming simplifies the programming model: each cache corresponding to a single entity type, with potentially different configuration.
>>>> 
>>>> Not sure if it simplifies things TBH if the configuration is the same. IMO, it just adds clutter.
>>> 
>>> Not sure I follow: having a cache that contains both Cars and Persons sound more cluttering to me. I think it's cumbersome to write any kind of querying with an heterogenous  cache, e.g. Map/Reduce tasks that need to count all the green Cars would need to be aware of Persons and ignore them. Not only it is harder to write, but discourages code reuse and makes it hard to maintain (if you'll add Pets in the same cache in future you need to update the M/R code as well). And of course there are also different cache-based configuration options that are not immediately obvious (at design time) but will be in the future (there are more Persons than Cars, they live longer/expiry etc): mixing everything together in the same cache from the begging is a design decision that might bite you in the future.
>>> 
>>> The way I see it - and very curious to see your opinion on this - following an database analogy, the CacheManager corresponds to an Database and the Cache to a Table. Hence my thought that queries spreading multiple caches are both useful and needed (same as query spreading over multiple tables).
>> 
>> I know Sanne and you are keen to have one entity type per cache to be able to fine tune the configuration. I am a little more skeptical but I don't have strong opinions on the subject. 
>> 
>> However, I don't think you can forbid the case where people want to store heterogenous types in the same cache:
>> 
>> - it's easy to start with
>> - configuration is indeed simpler
>> - when you work in the same service with cats, dogs, owners, addresses and refuges, juggling between these n Cache instances begins to be fugly I suspect - should write some application code to confirm
>> - people will add to the grid types unknown at configuration time. They might want a single bucket. 
> 
> +100

Totally agreed, there's no plan to forbid people storing heterogenous values in the same cache. The discussion at hand was actually the other way around: do we want to allow people to store data in multiple caches? if so we querying across multiple caches makes sense, hence this email.

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From mmarkus at redhat.com  Mon Feb 24 12:57:17 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Mon, 24 Feb 2014 17:57:17 +0000
Subject: [infinispan-dev] Further dist.exec and M/R API improvements
In-Reply-To: <530517F9.3060008@redhat.com>
References: <530517F9.3060008@redhat.com>
Message-ID: <696741CC-41E9-4D03-9A91-BF8BA07EFF2D@redhat.com>


On Feb 19, 2014, at 8:45 PM, Vladimir Blagojevic <vblagoje at redhat.com> wrote:

> Hey guys,
> 
> As some of you might know we have received additional requirements from 
> community and internally to add a few things to dist.executors and 
> map/reduce API. On distributed executors front we need to enable 
> distributed executors to store results into cache directly rather than 
> returning them to invoker [1]. As soon as we introduce this API we also 
> need a asyc. mechanism to allow notifications of subtask 
> completion/failure.

I think we need both in at the same time :-)

> I was thinking we add a concept of 
> DistributedTaskExecutionListener which can be specified in 
> DistributedTaskBuilder:
> 
> DistributedTaskBuilder<T> 
> executionListener(DistributedTaskExecutionListener<K, T> listener);
> 
> 
> We needed DistributedTaskExecutionListener anyway. All distributed tasks 
> might use some feedback about task progress, completion/failure and on. 
> My proposal is roughly:
> 
> 
> public interface DistributedTaskExecutionListener<K, T> {
> 
>    void subtaskSent(Address node, Set<K> inputKeys);
>    void subtaskFailed(Address node, Set<K> inputKeys, Exception e);
>    void subtaskSucceded(Address node, Set<K> inputKeys, T result);
>    void allSubtasksCompleted();
> 
> }
> 
> So much for that.

I think this it would make sense to add this logic for monitoring, + additional info such as average execution time etc. I'm not sure if this is a generally useful API though, unless there were people asking for it already?

> If tasks do not use input keys these parameters would 
> be emply sets. Now for [1] we need to add additional methods to 
> DistributedExecutorService. We can not specify result cache in 
> DistributedTaskBuilder as we are still bound to only submit methods in 
> DistributedExecutorService that return futures and we don't want that. 
> We need two new void methods:
> 
> <T, K> void submitEverywhere(DistributedTask<T> task, 
> Cache<DistExecResultKey<K>, T> result);
> <T, K > void submitEverywhere(DistributedTask<T> task, 
> Cache<DistExecResultKey<K>, T> result, K... input);
> 
> 
> Now, why bother with DistExecResultKey? Well we have tasks that use 
> input keys and tasks that don't. So results cache could only be keyed by 
> either keys or execution address, or combination of those two. 
> Therefore, DistExecResultKey could be something like:
> 
> public interface DistExecResultKey<K> {
> 
>    Address getExecutionAddress();
>    K getKey();
> 
> }
> 
> If you have a better idea how to address this aspect let us know. So 
> much for distributed executors.
> 
> 
> For map/reduce we also have to enable storing of map reduce task results 
> into cache [2] and allow users to specify custom cache for intermediate 
> results[3]. Part of task [2] is to allow notification about map/reduce 
> task progress and completion. Just as in dist.executor I would add 
> MapReduceTaskExecutionListener interface:
> 
> 
> public interface MapReduceTaskExecutionListener {
> 
>    void mapTaskInitialized(Address executionAddress);
>    void mapTaskSucceeded(Address executionAddress);
>    void mapTaskFailed(Address executionTarget, Exception cause);
>    void mapPhaseCompleted();
> 
>    void reduceTaskInitialized(Address executionAddress);
>    void reduceTaskSucceeded(Address executionAddress);
>    void reduceTaskFailed(Address address, Exception cause);
>    void reducePhaseCompleted();
> 
> }

IMO - in the first stage at leas - I would rather use a simpler (Notifying)Future, on which the user can wait till the computation happens: it's simpler and more aligned with the rest of our async API.

> 
> while MapReduceTask would have an additional method:
> 
> public void execute(Cache<KOut, VOut> resultsCache);

you could overload it with cache name only method.

> 
> MapReduceTaskExecutionListener could be specified using fluent 
> MapReduceTask API just as intermediate cache would be:
> 
> public MapReduceTask<KIn, VIn, KOut, VOut> 
> usingIntermediateCache(Cache<KOut, List<VOut>> tmpCache);
> 
> thus addressing issue [3]
 

+1

> 
> Let me know what you think,
> Vladimir
> 
> 
> [1] https://issues.jboss.org/browse/ISPN-4030
> [2] https://issues.jboss.org/browse/ISPN-4002
> [3] https://issues.jboss.org/browse/ISPN-4021
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From vblagoje at redhat.com  Mon Feb 24 15:55:43 2014
From: vblagoje at redhat.com (Vladimir Blagojevic)
Date: Mon, 24 Feb 2014 15:55:43 -0500
Subject: [infinispan-dev] Further dist.exec and M/R API improvements
In-Reply-To: <696741CC-41E9-4D03-9A91-BF8BA07EFF2D@redhat.com>
References: <530517F9.3060008@redhat.com>
	<696741CC-41E9-4D03-9A91-BF8BA07EFF2D@redhat.com>
Message-ID: <530BB1CF.1020307@redhat.com>

See inline
On 2/24/2014, 12:57 PM, Mircea Markus wrote:
> On Feb 19, 2014, at 8:45 PM, Vladimir Blagojevic <vblagoje at redhat.com> wrote:
>
>> Hey guys,
>>
>> As some of you might know we have received additional requirements from
>> community and internally to add a few things to dist.executors and
>> map/reduce API. On distributed executors front we need to enable
>> distributed executors to store results into cache directly rather than
>> returning them to invoker [1]. As soon as we introduce this API we also
>> need a asyc. mechanism to allow notifications of subtask
>> completion/failure.
> I think we need both in at the same time :-)
Yes, that is what I actually meant. Poor wording.
>
>> I was thinking we add a concept of
>> DistributedTaskExecutionListener which can be specified in
>> DistributedTaskBuilder:
>>
>> DistributedTaskBuilder<T>
>> executionListener(DistributedTaskExecutionListener<K, T> listener);
>>
>>
>> We needed DistributedTaskExecutionListener anyway. All distributed tasks
>> might use some feedback about task progress, completion/failure and on.
>> My proposal is roughly:
>>
>>
>> public interface DistributedTaskExecutionListener<K, T> {
>>
>>     void subtaskSent(Address node, Set<K> inputKeys);
>>     void subtaskFailed(Address node, Set<K> inputKeys, Exception e);
>>     void subtaskSucceded(Address node, Set<K> inputKeys, T result);
>>     void allSubtasksCompleted();
>>
>> }
>>
>> So much for that.
> I think this it would make sense to add this logic for monitoring, + additional info such as average execution time etc. I'm not sure if this is a generally useful API though, unless there were people asking for it already?
Ok, noted. If you remember any references about this let me know and 
I'll incorporate what people actually asked for rather than guess.

>
>> If tasks do not use input keys these parameters would
>> be emply sets. Now for [1] we need to add additional methods to
>> DistributedExecutorService. We can not specify result cache in
>> DistributedTaskBuilder as we are still bound to only submit methods in
>> DistributedExecutorService that return futures and we don't want that.
>> We need two new void methods:
>>
>> <T, K> void submitEverywhere(DistributedTask<T> task,
>> Cache<DistExecResultKey<K>, T> result);
>> <T, K > void submitEverywhere(DistributedTask<T> task,
>> Cache<DistExecResultKey<K>, T> result, K... input);
>>
>>
>> Now, why bother with DistExecResultKey? Well we have tasks that use
>> input keys and tasks that don't. So results cache could only be keyed by
>> either keys or execution address, or combination of those two.
>> Therefore, DistExecResultKey could be something like:
>>
>> public interface DistExecResultKey<K> {
>>
>>     Address getExecutionAddress();
>>     K getKey();
>>
>> }
>>
>> If you have a better idea how to address this aspect let us know. So
>> much for distributed executors.
>>
>>
>> For map/reduce we also have to enable storing of map reduce task results
>> into cache [2] and allow users to specify custom cache for intermediate
>> results[3]. Part of task [2] is to allow notification about map/reduce
>> task progress and completion. Just as in dist.executor I would add
>> MapReduceTaskExecutionListener interface:
>>
>>
>> public interface MapReduceTaskExecutionListener {
>>
>>     void mapTaskInitialized(Address executionAddress);
>>     void mapTaskSucceeded(Address executionAddress);
>>     void mapTaskFailed(Address executionTarget, Exception cause);
>>     void mapPhaseCompleted();
>>
>>     void reduceTaskInitialized(Address executionAddress);
>>     void reduceTaskSucceeded(Address executionAddress);
>>     void reduceTaskFailed(Address address, Exception cause);
>>     void reducePhaseCompleted();
>>
>> }
> IMO - in the first stage at leas - I would rather use a simpler (Notifying)Future, on which the user can wait till the computation happens: it's simpler and more aligned with the rest of our async API.
>
What do you mean? We already have futures in MapReduceTask API. This API 
is more fine grained and allows monitoring/reporting of task progress. 
Please clarify.

>> while MapReduceTask would have an additional method:
>>
>> public void execute(Cache<KOut, VOut> resultsCache);
> you could overload it with cache name only method.
Yeah, good idea. Same for usingIntermediateCache? I actually asked you 
this here https://issues.jboss.org/browse/ISPN-4021

Thanks Mircea!
Vladimir

From emmanuel at hibernate.org  Tue Feb 25 04:28:51 2014
From: emmanuel at hibernate.org (Emmanuel Bernard)
Date: Tue, 25 Feb 2014 10:28:51 +0100
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <1256491E-44CB-4B7D-8494-CB0061954571@redhat.com>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>
	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>
	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>
	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>
	<20140205163032.GB93108@hibernate.org>
	<F1EB6F85-D5B2-4459-B2BD-185124E046B3@redhat.com>
	<20140217175115.GC639@hibernate.org>
	<8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org>
	<1256491E-44CB-4B7D-8494-CB0061954571@redhat.com>
Message-ID: <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org>


> On 24 f?vr. 2014, at 17:39, Mircea Markus <mmarkus at redhat.com> wrote:
> 
> 
>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>> 
>> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome.
> 
> Curious to hear the whole story :-)
> We cannot mandate all the suers to use OGM though, one of the reasons being OGM is not platform independent (hotrod). 

Then solve all the issues I have raised with a magic wand and come back to me when you have done it, I'm interested.

From dan.berindei at gmail.com  Tue Feb 25 07:33:32 2014
From: dan.berindei at gmail.com (Dan Berindei)
Date: Tue, 25 Feb 2014 14:33:32 +0200
Subject: [infinispan-dev] Further dist.exec and M/R API improvements
In-Reply-To: <530BB1CF.1020307@redhat.com>
References: <530517F9.3060008@redhat.com>
	<696741CC-41E9-4D03-9A91-BF8BA07EFF2D@redhat.com>
	<530BB1CF.1020307@redhat.com>
Message-ID: <CA+nfvwQohMEkMx+zrJYpw8KavbAysfgQeBOON13z01tFempaiw@mail.gmail.com>

On Mon, Feb 24, 2014 at 10:55 PM, Vladimir Blagojevic
<vblagoje at redhat.com>wrote:

> See inline
> On 2/24/2014, 12:57 PM, Mircea Markus wrote:
> > On Feb 19, 2014, at 8:45 PM, Vladimir Blagojevic <vblagoje at redhat.com>
> wrote:
> >
> >> Hey guys,
> >>
> >> As some of you might know we have received additional requirements from
> >> community and internally to add a few things to dist.executors and
> >> map/reduce API. On distributed executors front we need to enable
> >> distributed executors to store results into cache directly rather than
> >> returning them to invoker [1]. As soon as we introduce this API we also
> >> need a asyc. mechanism to allow notifications of subtask
> >> completion/failure.
> > I think we need both in at the same time :-)
> Yes, that is what I actually meant. Poor wording.
>

Do we really need special support for distributed tasks to write results to
another cache? We already allow a task to do

cache.getCacheManager().getCache("outputCache").put(k, v)


> >
> >> I was thinking we add a concept of
> >> DistributedTaskExecutionListener which can be specified in
> >> DistributedTaskBuilder:
> >>
> >> DistributedTaskBuilder<T>
> >> executionListener(DistributedTaskExecutionListener<K, T> listener);
> >>
> >>
> >> We needed DistributedTaskExecutionListener anyway. All distributed tasks
> >> might use some feedback about task progress, completion/failure and on.
> >> My proposal is roughly:
> >>
> >>
> >> public interface DistributedTaskExecutionListener<K, T> {
> >>
> >>     void subtaskSent(Address node, Set<K> inputKeys);
> >>     void subtaskFailed(Address node, Set<K> inputKeys, Exception e);
> >>     void subtaskSucceded(Address node, Set<K> inputKeys, T result);
> >>     void allSubtasksCompleted();
> >>
> >> }
> >>
> >> So much for that.
> > I think this it would make sense to add this logic for monitoring, +
> additional info such as average execution time etc. I'm not sure if this is
> a generally useful API though, unless there were people asking for it
> already?
> Ok, noted. If you remember any references about this let me know and
> I'll incorporate what people actually asked for rather than guess.
>

Ok, let's wait until we get some actual requests from users then. TBH I
don't think distributed tasks with subtasks are something that users care
about. E.g. with Map/Reduce the reduce tasks are not subtasks of the
map/combine tasks, so this API wouldn't help.

Hadoop has a Reporter interface that allows you to report "ticks" and
increment counters, maybe we should add something like that instead?


>
> >
> >> If tasks do not use input keys these parameters would
> >> be emply sets. Now for [1] we need to add additional methods to
> >> DistributedExecutorService. We can not specify result cache in
> >> DistributedTaskBuilder as we are still bound to only submit methods in
> >> DistributedExecutorService that return futures and we don't want that.
> >> We need two new void methods:
> >>
> >> <T, K> void submitEverywhere(DistributedTask<T> task,
> >> Cache<DistExecResultKey<K>, T> result);
> >> <T, K > void submitEverywhere(DistributedTask<T> task,
> >> Cache<DistExecResultKey<K>, T> result, K... input);
> >>
> >>
> >> Now, why bother with DistExecResultKey? Well we have tasks that use
> >> input keys and tasks that don't. So results cache could only be keyed by
> >> either keys or execution address, or combination of those two.
> >> Therefore, DistExecResultKey could be something like:
> >>
> >> public interface DistExecResultKey<K> {
> >>
> >>     Address getExecutionAddress();
> >>     K getKey();
> >>
> >> }
> >>
> >> If you have a better idea how to address this aspect let us know. So
> >> much for distributed executors.
> >>
>

I think we should allow each distributed task to deal with output in its
own way, the existing API should be enough.


>  >>
> >> For map/reduce we also have to enable storing of map reduce task results
> >> into cache [2] and allow users to specify custom cache for intermediate
> >> results[3]. Part of task [2] is to allow notification about map/reduce
> >> task progress and completion. Just as in dist.executor I would add
> >> MapReduceTaskExecutionListener interface:
> >>
> >>
> >> public interface MapReduceTaskExecutionListener {
> >>
> >>     void mapTaskInitialized(Address executionAddress);
> >>     void mapTaskSucceeded(Address executionAddress);
> >>     void mapTaskFailed(Address executionTarget, Exception cause);
> >>     void mapPhaseCompleted();
> >>
> >>     void reduceTaskInitialized(Address executionAddress);
> >>     void reduceTaskSucceeded(Address executionAddress);
> >>     void reduceTaskFailed(Address address, Exception cause);
> >>     void reducePhaseCompleted();
> >>
> >> }
> > IMO - in the first stage at leas - I would rather use a simpler
> (Notifying)Future, on which the user can wait till the computation happens:
> it's simpler and more aligned with the rest of our async API.
> >
> What do you mean? We already have futures in MapReduceTask API. This API
> is more fine grained and allows monitoring/reporting of task progress.
> Please clarify.
>

I'm not sure about the usefulness of an API like this either... if the
intention is to allow the user to collect statistics about duration of
various phases, then I think exposing the durations via MapReduceTasks
would be better.


>
> >> while MapReduceTask would have an additional method:
> >>
> >> public void execute(Cache<KOut, VOut> resultsCache);
> > you could overload it with cache name only method.
> Yeah, good idea. Same for usingIntermediateCache? I actually asked you
> this here https://issues.jboss.org/browse/ISPN-4021
>

+1 to allow a cache name only. For the intermediate cache I don't think it
makes sense to allow a Cache version at all.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140225/7048c1d6/attachment.html 

From sanne at infinispan.org  Tue Feb 25 08:39:00 2014
From: sanne at infinispan.org (Sanne Grinovero)
Date: Tue, 25 Feb 2014 13:39:00 +0000
Subject: [infinispan-dev] Where's the roadmap?
Message-ID: <CAFm4XO18zypGhWvKY-jiSC-uKHi7qyXhYMJpXb5k5L7DhwURxQ@mail.gmail.com>

I was asked about the Infinispan roadmap on a forum post, my draft reads:

"Sure it's available online, see.."

but then I could actually only find this:
  https://community.jboss.org/wiki/InfinispanRoadmap

(which is very outdated).

So, what's the roadmap?

Would be nice if we could have it updated and published on the new website.

Cheers,
Sanne

From mmarkus at redhat.com  Tue Feb 25 10:08:06 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Tue, 25 Feb 2014 15:08:06 +0000
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>
	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>
	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>
	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>
	<20140205163032.GB93108@hibernate.org>
	<F1EB6F85-D5B2-4459-B2BD-185124E046B3@redhat.com>
	<20140217175115.GC639@hibernate.org>
	<8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org>
	<1256491E-44CB-4B7D-8494-CB0061954571@redhat.com>
	<43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org>
Message-ID: <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com>


On Feb 25, 2014, at 9:28 AM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:

>> On 24 f?vr. 2014, at 17:39, Mircea Markus <mmarkus at redhat.com> wrote:
>> 
>> 
>>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>> 
>>> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome.
>> 
>> Curious to hear the whole story :-)
>> We cannot mandate all the suers to use OGM though, one of the reasons being OGM is not platform independent (hotrod). 
> 
> Then solve all the issues I have raised with a magic wand and come back to me when you have done it, I'm interested.

People are going to use infinispan with one cache per entity, because it makes sense:
- different config (repl/dist | persistent/non-persistent) for different data types
- have map/reduce tasks running only the Person entires not on Dog as well, when you want to select (Person) where age > 18
I don't see a reason to forbid this, on the contrary. The way I see it the relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be a better abstraction and should be recommended as such for the Java clients, but ultimately we're a general purpose storage engine that is available to different platforms as well.


Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From vblagoje at redhat.com  Tue Feb 25 10:09:29 2014
From: vblagoje at redhat.com (Vladimir Blagojevic)
Date: Tue, 25 Feb 2014 10:09:29 -0500
Subject: [infinispan-dev] Further dist.exec and M/R API improvements
In-Reply-To: <CA+nfvwQohMEkMx+zrJYpw8KavbAysfgQeBOON13z01tFempaiw@mail.gmail.com>
References: <530517F9.3060008@redhat.com>	<696741CC-41E9-4D03-9A91-BF8BA07EFF2D@redhat.com>	<530BB1CF.1020307@redhat.com>
	<CA+nfvwQohMEkMx+zrJYpw8KavbAysfgQeBOON13z01tFempaiw@mail.gmail.com>
Message-ID: <530CB229.4090301@redhat.com>

On 2/25/2014, 7:33 AM, Dan Berindei wrote:
>
>
> Do we really need special support for distributed tasks to write 
> results to another cache? We already allow a task to do
>
> cache.getCacheManager().getCache("outputCache").put(k, v)
Yeah, very good point Dan. Thanks for being sanity check. Mircea?

>
>     >
>     >> I was thinking we add a concept of
>     >> DistributedTaskExecutionListener which can be specified in
>     >> DistributedTaskBuilder:
>     >>
>     >> DistributedTaskBuilder<T>
>     >> executionListener(DistributedTaskExecutionListener<K, T> listener);
>     >>
>     >>
>     >> We needed DistributedTaskExecutionListener anyway. All
>     distributed tasks
>     >> might use some feedback about task progress, completion/failure
>     and on.
>     >> My proposal is roughly:
>     >>
>     >>
>     >> public interface DistributedTaskExecutionListener<K, T> {
>     >>
>     >>     void subtaskSent(Address node, Set<K> inputKeys);
>     >>     void subtaskFailed(Address node, Set<K> inputKeys,
>     Exception e);
>     >>     void subtaskSucceded(Address node, Set<K> inputKeys, T result);
>     >>     void allSubtasksCompleted();
>     >>
>     >> }
>     >>
>     >> So much for that.
>     > I think this it would make sense to add this logic for
>     monitoring, + additional info such as average execution time etc.
>     I'm not sure if this is a generally useful API though, unless
>     there were people asking for it already?
>     Ok, noted. If you remember any references about this let me know and
>     I'll incorporate what people actually asked for rather than guess.
>
>
> Ok, let's wait until we get some actual requests from users then. TBH 
> I don't think distributed tasks with subtasks are something that users 
> care about. E.g. with Map/Reduce the reduce tasks are not subtasks of 
> the map/combine tasks, so this API wouldn't help.
>
> Hadoop has a Reporter interface that allows you to report "ticks" and 
> increment counters, maybe we should add something like that instead?

The subtask I am referring to here is just to denote part of the 
distributed task initiated using dist.executors. This interface (maybe 
extended a bit with ideas from Reporter) could be used for both 
monitoring and more application specific logic about task re-execution 
and so on.


>
>
> I think we should allow each distributed task to deal with output in 
> its own way, the existing API should be enough.

Yes, I can see your point. Mircea?
>
>
>     >> public interface MapReduceTaskExecutionListener {
>     >>
>     >>     void mapTaskInitialized(Address executionAddress);
>     >>     void mapTaskSucceeded(Address executionAddress);
>     >>     void mapTaskFailed(Address executionTarget, Exception cause);
>     >>     void mapPhaseCompleted();
>     >>
>     >>     void reduceTaskInitialized(Address executionAddress);
>     >>     void reduceTaskSucceeded(Address executionAddress);
>     >>     void reduceTaskFailed(Address address, Exception cause);
>     >>     void reducePhaseCompleted();
>     >>
>     >> }
>     > IMO - in the first stage at leas - I would rather use a simpler
>     (Notifying)Future, on which the user can wait till the computation
>     happens: it's simpler and more aligned with the rest of our async API.
>     >
>     What do you mean? We already have futures in MapReduceTask API.
>     This API
>     is more fine grained and allows monitoring/reporting of task progress.
>     Please clarify.
>
>
> I'm not sure about the usefulness of an API like this either... if the 
> intention is to allow the user to collect statistics about duration of 
> various phases, then I think exposing the durations via MapReduceTasks 
> would be better.
How would you design that API Dan? Something other than 
listener/callback interface?

>
>     >> while MapReduceTask would have an additional method:
>     >>
>     >> public void execute(Cache<KOut, VOut> resultsCache);
>     > you could overload it with cache name only method.
>     Yeah, good idea. Same for usingIntermediateCache? I actually asked you
>     this here https://issues.jboss.org/browse/ISPN-4021
>
>
> +1 to allow a cache name only. For the intermediate cache I don't 
> think it makes sense to allow a Cache version at all.
Ok good. Deal.


Thanks,
Vladimir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140225/da649f6f/attachment.html 

From mmarkus at redhat.com  Tue Feb 25 11:24:03 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Tue, 25 Feb 2014 16:24:03 +0000
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <CAAuDtmddwEZbf1v8hod_FBX_tTUH5hz3uAkLH8KJbkW3v249rw@mail.gmail.com>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>
	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>
	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>
	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>
	<20140205163032.GB93108@hibernate.org>
	<F1EB6F85-D5B2-4459-B2BD-185124E046B3@redhat.com>
	<20140217175115.GC639@hibernate.org>
	<8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org>
	<1256491E-44CB-4B7D-8494-CB0061954571@redhat.com>
	<43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org>
	<5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com>
	<CAAuDtmddwEZbf1v8hod_FBX_tTUH5hz3uAkLH8KJbkW3v249rw@mail.gmail.com>
Message-ID: <DB9D3746-0FFD-43D6-A87D-8E3B9A8EA995@redhat.com>


On Feb 25, 2014, at 3:46 PM, Adrian Nistor <anistor at gmail.com> wrote:

> They can do what they please. Either put multiple types in one basket or put them in separate caches (one type per cache). But allowing / recommending is one thing, mandating it is a different story.
> 
> There's no reason to forbid _any_ of these scenarios / mandate one over the other! There was previously in this thread some suggestion of mandating the one type per cache usage. -1 for it

Agreed. I actually don't see how we can enforce people that declare Cache<Object,Object> not put whatever they want in it. Also makes total sense for smaller caches as it is easy to set up etc.
The debate in this email, the way I understood it, was: are/should people using multiple caches for storing data? If yes we should consider querying functionality spreading over multiple caches.

> 
> 
> 
> On Tue, Feb 25, 2014 at 5:08 PM, Mircea Markus <mmarkus at redhat.com> wrote:
> 
> On Feb 25, 2014, at 9:28 AM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
> 
> >> On 24 f?vr. 2014, at 17:39, Mircea Markus <mmarkus at redhat.com> wrote:
> >>
> >>
> >>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
> >>>
> >>> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome.
> >>
> >> Curious to hear the whole story :-)
> >> We cannot mandate all the suers to use OGM though, one of the reasons being OGM is not platform independent (hotrod).
> >
> > Then solve all the issues I have raised with a magic wand and come back to me when you have done it, I'm interested.
> 
> People are going to use infinispan with one cache per entity, because it makes sense:
> - different config (repl/dist | persistent/non-persistent) for different data types
> - have map/reduce tasks running only the Person entires not on Dog as well, when you want to select (Person) where age > 18
> I don't see a reason to forbid this, on the contrary. The way I see it the relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be a better abstraction and should be recommended as such for the Java clients, but ultimately we're a general purpose storage engine that is available to different platforms as well.
> 
> 
> Cheers,
> --
> Mircea Markus
> Infinispan lead (www.infinispan.org)
> 
> 
> 
> 
> 

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From mmarkus at redhat.com  Tue Feb 25 11:33:58 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Tue, 25 Feb 2014 16:33:58 +0000
Subject: [infinispan-dev] Where's the roadmap?
In-Reply-To: <CAFm4XO18zypGhWvKY-jiSC-uKHi7qyXhYMJpXb5k5L7DhwURxQ@mail.gmail.com>
References: <CAFm4XO18zypGhWvKY-jiSC-uKHi7qyXhYMJpXb5k5L7DhwURxQ@mail.gmail.com>
Message-ID: <7335F27A-7B85-4341-A8A0-35670F8E827C@redhat.com>

I'm working on it right now..

On Feb 25, 2014, at 1:39 PM, Sanne Grinovero <sanne at infinispan.org> wrote:

> I was asked about the Infinispan roadmap on a forum post, my draft reads:
> 
> "Sure it's available online, see.."
> 
> but then I could actually only find this:
>  https://community.jboss.org/wiki/InfinispanRoadmap
> 
> (which is very outdated).
> 
> So, what's the roadmap?
> 
> Would be nice if we could have it updated and published on the new website.
> 
> Cheers,
> Sanne
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From sanne at infinispan.org  Tue Feb 25 12:08:27 2014
From: sanne at infinispan.org (Sanne Grinovero)
Date: Tue, 25 Feb 2014 17:08:27 +0000
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <DB9D3746-0FFD-43D6-A87D-8E3B9A8EA995@redhat.com>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>
	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>
	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>
	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>
	<20140205163032.GB93108@hibernate.org>
	<F1EB6F85-D5B2-4459-B2BD-185124E046B3@redhat.com>
	<20140217175115.GC639@hibernate.org>
	<8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org>
	<1256491E-44CB-4B7D-8494-CB0061954571@redhat.com>
	<43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org>
	<5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com>
	<CAAuDtmddwEZbf1v8hod_FBX_tTUH5hz3uAkLH8KJbkW3v249rw@mail.gmail.com>
	<DB9D3746-0FFD-43D6-A87D-8E3B9A8EA995@redhat.com>
Message-ID: <CAFm4XO1ds3eQDV0oEENGgZbdRv-rrQ-1s5TpP0_2zYhXs+T=5Q@mail.gmail.com>

There also is the opposite problem to be considered, as Emmanuel
suggested on 11/04/2012:
you can't forbid the user to store the same object (same type and same
id) in two different caches, where each Cache might be using different
indexing options.

If the "search service" is a global concept, and you run a query which
matches object X, we'll return it to the user but he won't be able to
figure out from which cache it's being sourced: is that ok?

Ultimately this implies a query might return the same object X in
multiple positions in the result list of the query; for example it
might be the top result according to some criteria but also be the 5th
result because of how it was indexed in a different case: maybe
someone will find good use for this "capability" but I see it
primarily as a source of confusion.
Finally, if we move the search service as a global component, there
might be an impact in how we explain security: an ACL filter applied
on one cache - or the index metadata produced by that cache - might
not be applied in the same way by an entity being matched through a
second cache.
Not least a user's permission to access one cache (or not) will affect
his results in a rather complex way.

I'm wondering if we need to prevent such situations.

Sanne

On 25 February 2014 16:24, Mircea Markus <mmarkus at redhat.com> wrote:
>
> On Feb 25, 2014, at 3:46 PM, Adrian Nistor <anistor at gmail.com> wrote:
>
>> They can do what they please. Either put multiple types in one basket or put them in separate caches (one type per cache). But allowing / recommending is one thing, mandating it is a different story.
>>
>> There's no reason to forbid _any_ of these scenarios / mandate one over the other! There was previously in this thread some suggestion of mandating the one type per cache usage. -1 for it
>
> Agreed. I actually don't see how we can enforce people that declare Cache<Object,Object> not put whatever they want in it. Also makes total sense for smaller caches as it is easy to set up etc.
> The debate in this email, the way I understood it, was: are/should people using multiple caches for storing data? If yes we should consider querying functionality spreading over multiple caches.
>
>>
>>
>>
>> On Tue, Feb 25, 2014 at 5:08 PM, Mircea Markus <mmarkus at redhat.com> wrote:
>>
>> On Feb 25, 2014, at 9:28 AM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>
>> >> On 24 f?vr. 2014, at 17:39, Mircea Markus <mmarkus at redhat.com> wrote:
>> >>
>> >>
>> >>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>> >>>
>> >>> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome.
>> >>
>> >> Curious to hear the whole story :-)
>> >> We cannot mandate all the suers to use OGM though, one of the reasons being OGM is not platform independent (hotrod).
>> >
>> > Then solve all the issues I have raised with a magic wand and come back to me when you have done it, I'm interested.
>>
>> People are going to use infinispan with one cache per entity, because it makes sense:
>> - different config (repl/dist | persistent/non-persistent) for different data types
>> - have map/reduce tasks running only the Person entires not on Dog as well, when you want to select (Person) where age > 18
>> I don't see a reason to forbid this, on the contrary. The way I see it the relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be a better abstraction and should be recommended as such for the Java clients, but ultimately we're a general purpose storage engine that is available to different platforms as well.
>>
>>
>> Cheers,
>> --
>> Mircea Markus
>> Infinispan lead (www.infinispan.org)
>>
>>
>>
>>
>>
>
> Cheers,
> --
> Mircea Markus
> Infinispan lead (www.infinispan.org)
>
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


From sanne at infinispan.org  Tue Feb 25 12:09:45 2014
From: sanne at infinispan.org (Sanne Grinovero)
Date: Tue, 25 Feb 2014 17:09:45 +0000
Subject: [infinispan-dev] Where's the roadmap?
In-Reply-To: <7335F27A-7B85-4341-A8A0-35670F8E827C@redhat.com>
References: <CAFm4XO18zypGhWvKY-jiSC-uKHi7qyXhYMJpXb5k5L7DhwURxQ@mail.gmail.com>
	<7335F27A-7B85-4341-A8A0-35670F8E827C@redhat.com>
Message-ID: <CAFm4XO036DGTE-jpMB4fr6g9noLSxWUHd4zmtUK9CyZFxxOYJA@mail.gmail.com>

On 25 February 2014 16:33, Mircea Markus <mmarkus at redhat.com> wrote:
> I'm working on it right now..

Thanks! As soon as you have a draft I'm happy to help with the Query section.

Cheers,
Sanne

>
> On Feb 25, 2014, at 1:39 PM, Sanne Grinovero <sanne at infinispan.org> wrote:
>
>> I was asked about the Infinispan roadmap on a forum post, my draft reads:
>>
>> "Sure it's available online, see.."
>>
>> but then I could actually only find this:
>>  https://community.jboss.org/wiki/InfinispanRoadmap
>>
>> (which is very outdated).
>>
>> So, what's the roadmap?
>>
>> Would be nice if we could have it updated and published on the new website.
>>
>> Cheers,
>> Sanne
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> Cheers,
> --
> Mircea Markus
> Infinispan lead (www.infinispan.org)
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

From mmarkus at redhat.com  Tue Feb 25 11:30:25 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Tue, 25 Feb 2014 16:30:25 +0000
Subject: [infinispan-dev] Further dist.exec and M/R API improvements
In-Reply-To: <530CB229.4090301@redhat.com>
References: <530517F9.3060008@redhat.com>	<696741CC-41E9-4D03-9A91-BF8BA07EFF2D@redhat.com>	<530BB1CF.1020307@redhat.com>
	<CA+nfvwQohMEkMx+zrJYpw8KavbAysfgQeBOON13z01tFempaiw@mail.gmail.com>
	<530CB229.4090301@redhat.com>
Message-ID: <14276AB5-94EC-4303-9CB6-77FCBD667F8F@redhat.com>


On Feb 25, 2014, at 3:09 PM, Vladimir Blagojevic <vblagoje at redhat.com> wrote:

> On 2/25/2014, 7:33 AM, Dan Berindei wrote:
>> 
>> 
>> Do we really need special support for distributed tasks to write results to another cache? We already allow a task to do
>> 
>> cache.getCacheManager().getCache("outputCache").put(k, v)
> Yeah, very good point Dan. Thanks for being sanity check. Mircea? 
+1


>> 
>>  
>> >
>> >> I was thinking we add a concept of
>> >> DistributedTaskExecutionListener which can be specified in
>> >> DistributedTaskBuilder:
>> >>
>> >> DistributedTaskBuilder<T>
>> >> executionListener(DistributedTaskExecutionListener<K, T> listener);
>> >>
>> >>
>> >> We needed DistributedTaskExecutionListener anyway. All distributed tasks
>> >> might use some feedback about task progress, completion/failure and on.
>> >> My proposal is roughly:
>> >>
>> >>
>> >> public interface DistributedTaskExecutionListener<K, T> {
>> >>
>> >>     void subtaskSent(Address node, Set<K> inputKeys);
>> >>     void subtaskFailed(Address node, Set<K> inputKeys, Exception e);
>> >>     void subtaskSucceded(Address node, Set<K> inputKeys, T result);
>> >>     void allSubtasksCompleted();
>> >>
>> >> }
>> >>
>> >> So much for that.
>> > I think this it would make sense to add this logic for monitoring, + additional info such as average execution time etc. I'm not sure if this is a generally useful API though, unless there were people asking for it already?
>> Ok, noted. If you remember any references about this let me know and
>> I'll incorporate what people actually asked for rather than guess.
>> 
>> Ok, let's wait until we get some actual requests from users then. TBH I don't think distributed tasks with subtasks are something that users care about. E.g. with Map/Reduce the reduce tasks are not subtasks of the map/combine tasks, so this API wouldn't help.
>> 
>> Hadoop has a Reporter interface that allows you to report "ticks" and increment counters, maybe we should add something like that instead?
> 
> The subtask I am referring to here is just to denote part of the distributed task initiated using dist.executors. This interface (maybe extended a bit with ideas from Reporter) could be used for both monitoring and more application specific logic about task re-execution and so on.
> 
> 
>>  
>> 
>> I think we should allow each distributed task to deal with output in its own way, the existing API should be enough.
> 
> Yes, I can see your point. Mircea?

+1 user driven features

>>  
>> 
>> >> public interface MapReduceTaskExecutionListener {
>> >>
>> >>     void mapTaskInitialized(Address executionAddress);
>> >>     void mapTaskSucceeded(Address executionAddress);
>> >>     void mapTaskFailed(Address executionTarget, Exception cause);
>> >>     void mapPhaseCompleted();
>> >>
>> >>     void reduceTaskInitialized(Address executionAddress);
>> >>     void reduceTaskSucceeded(Address executionAddress);
>> >>     void reduceTaskFailed(Address address, Exception cause);
>> >>     void reducePhaseCompleted();
>> >>
>> >> }
>> > IMO - in the first stage at leas - I would rather use a simpler (Notifying)Future, on which the user can wait till the computation happens: it's simpler and more aligned with the rest of our async API.
>> >
>> What do you mean? We already have futures in MapReduceTask API. This API
>> is more fine grained and allows monitoring/reporting of task progress.
>> Please clarify.

ah right, wasn't aware of MapReduceTask.executeAsynchronously() :-) That's what I was after.

>> 
>> I'm not sure about the usefulness of an API like this either... if the intention is to allow the user to collect statistics about duration of various phases, then I think exposing the durations via MapReduceTasks would be better.
> How would you design that API Dan? Something other than listener/callback interface?

Functionally, what I was having in mind was JMX stats for the MapReduce tasks in general: like average execution time, count etc. Also the ability to cancel a running task through JMX/JON would be nice. I don't think we need to expose this to the user through the MapReduceTaskExecutionListener above, though.

> 
>>  
>> 
>> >> while MapReduceTask would have an additional method:
>> >>
>> >> public void execute(Cache<KOut, VOut> resultsCache);
>> > you could overload it with cache name only method.
>> Yeah, good idea. Same for usingIntermediateCache? I actually asked you
>> this here https://issues.jboss.org/browse/ISPN-4021
>> 
>> +1 to allow a cache name only. For the intermediate cache I don't think it makes sense to allow a Cache version at all.
> Ok good. Deal.
> 
> 
> Thanks,
> Vladimir
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From vblagoje at redhat.com  Tue Feb 25 14:31:05 2014
From: vblagoje at redhat.com (Vladimir Blagojevic)
Date: Tue, 25 Feb 2014 14:31:05 -0500
Subject: [infinispan-dev] Further dist.exec and M/R API improvements
In-Reply-To: <14276AB5-94EC-4303-9CB6-77FCBD667F8F@redhat.com>
References: <530517F9.3060008@redhat.com>	<696741CC-41E9-4D03-9A91-BF8BA07EFF2D@redhat.com>	<530BB1CF.1020307@redhat.com>	<CA+nfvwQohMEkMx+zrJYpw8KavbAysfgQeBOON13z01tFempaiw@mail.gmail.com>	<530CB229.4090301@redhat.com>
	<14276AB5-94EC-4303-9CB6-77FCBD667F8F@redhat.com>
Message-ID: <530CEF79.3000308@redhat.com>

Hey,

I am starting to like this thread more and more :-) In conclusion, for 
distributed executors we are not adding any new APIs because Callable 
implementers can already write to cache using existing API. We don't 
have to add any new elaborate callback/listener API either as users have 
not requested but should investigate Hadoop Reporter like interface to 
allow users some sense of task current execution phase.

For map/reduce we will add a new method:

public void execute(Cache<KOut, VOut> resultsCache);

Using fluent MapReduceTask API users would be able to specify an 
intermediate cache:

public MapReduceTask<KIn, VIn, KOut, VOut> usingIntermediateCache(String 
cacheName);

We are not adding MapReduceTaskExecutionListener but more like JMX stats 
for the MapReduce tasks in general: like average execution time, count 
etc. Also the ability to cancel a running task through JMX/JON would be 
nice.

Regards,
Vladimir


From dan.berindei at gmail.com  Tue Feb 25 15:44:18 2014
From: dan.berindei at gmail.com (Dan Berindei)
Date: Tue, 25 Feb 2014 22:44:18 +0200
Subject: [infinispan-dev] Further dist.exec and M/R API improvements
In-Reply-To: <530CEF79.3000308@redhat.com>
References: <530517F9.3060008@redhat.com>
	<696741CC-41E9-4D03-9A91-BF8BA07EFF2D@redhat.com>
	<530BB1CF.1020307@redhat.com>
	<CA+nfvwQohMEkMx+zrJYpw8KavbAysfgQeBOON13z01tFempaiw@mail.gmail.com>
	<530CB229.4090301@redhat.com>
	<14276AB5-94EC-4303-9CB6-77FCBD667F8F@redhat.com>
	<530CEF79.3000308@redhat.com>
Message-ID: <CA+nfvwTu1jDMYcC-A8e30mDmTQQYtqNZ6r8SFuQ0zHF5gHA0gQ@mail.gmail.com>

On Tue, Feb 25, 2014 at 9:31 PM, Vladimir Blagojevic <vblagoje at redhat.com>wrote:

> Hey,
>
> I am starting to like this thread more and more :-) In conclusion, for
> distributed executors we are not adding any new APIs because Callable
> implementers can already write to cache using existing API. We don't
> have to add any new elaborate callback/listener API either as users have
> not requested but should investigate Hadoop Reporter like interface to
> allow users some sense of task current execution phase.
>
> For map/reduce we will add a new method:
>
> public void execute(Cache<KOut, VOut> resultsCache);
>
> Using fluent MapReduceTask API users would be able to specify an
> intermediate cache:
>
> public MapReduceTask<KIn, VIn, KOut, VOut> usingIntermediateCache(String
> cacheName);
>
> We are not adding MapReduceTaskExecutionListener but more like JMX stats
> for the MapReduce tasks in general: like average execution time, count
> etc. Also the ability to cancel a running task through JMX/JON would be
> nice.
>

For statistics, I was thinking of adding a getStatistics() method to
MapReduceTask that would return an object with the duration of each phase
and the number of keys processed on each node, after the M/R task is done.
This could probably be extended such that it gives the user in-progress
information as well.

The in-progress information would also tie in nicely with a progress
listener, but I feel the events you proposed are too coarse. If the user
wanted to display a progress bar in his application, and the cluster only
had 2 nodes, the progress bar would hover for half of the time around 0%
and for the other half of the time around 50%. So we'd need to keep
reporting something while a phase is in progress (e.g. by splitting a
node's keys to more than one mapping task, and reporting the end of each
subtask), otherwise the listener wouldn't be of much use.

Anyway, this would be something nice to have, but I don't think it's very
important, so supplying some global statistics via JMX should be enough for
now.

Cheers
Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140225/2c69f464/attachment.html 

From galder at redhat.com  Wed Feb 26 01:56:08 2014
From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=)
Date: Wed, 26 Feb 2014 07:56:08 +0100
Subject: [infinispan-dev] Ditching ASYNC modes for
	REPL/DIST/INV/CacheStores?
In-Reply-To: <CAFm4XO1Ckr0J_891fxNmVS2CwnkNcqeov5dubFpRKd0==em7sQ@mail.gmail.com>
References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com>
	<52EB5197.4050801@redhat.com> <52EB7F6C.505@redhat.com>
	<52EB9889.9070800@redhat.com>
	<CA+YCuUNG1Pf0tD2ZtRJJTmsvKY4Gtyo0QkSCCaamCzWpHwbxzg@mail.gmail.com>
	<52EFA355.2070203@redhat.com>
	<CAFm4XO3ZwDY5Fxp5sVajDU4bju287-4P_JiDmAm-z+aQo2JKrw@mail.gmail.com>
	<CA+YCuUPge2spEAH=6jBLbbYyCp_DA0MkOu9ym5MEfL5pZvUXWA@mail.gmail.com>
	<52EFC3AF.5060201@redhat.com>
	<CA+nfvwSsJAPQdrq7sEzHHHjEJZqh_Mg1bmuaMEmzySdrBVSCfQ@mail.gmail.com>
	<BD36524B-BE39-4782-A1CC-145C5ED4CFB3@redhat.com>
	<CAFm4XO1Ckr0J_891fxNmVS2CwnkNcqeov5dubFpRKd0==em7sQ@mail.gmail.com>
Message-ID: <28874E57-C988-448A-99BB-1B65849D408F@redhat.com>


On 19 Feb 2014, at 12:03, Sanne Grinovero <sanne at infinispan.org> wrote:

> On 19 February 2014 07:12, Galder Zamarre?o <galder at redhat.com> wrote:
>> 
>> On 03 Feb 2014, at 19:01, Dan Berindei <dan.berindei at gmail.com> wrote:
>> 
>>> 
>>> 
>>> 
>>> On Mon, Feb 3, 2014 at 6:28 PM, Radim Vansa <rvansa at redhat.com> wrote:
>>>>>>> For sync we would want to invoke directly to avoid context switching.
>>>>>> I think you haven't properly understood what I was talking about: the
>>>>>> putAsync should not switch context at all in the ideal design. It should
>>>>>> traverse through the interceptors all the way down (logically, in
>>>>>> current behaviour), invoke JGroups async API and jump out. Then, as soon
>>>>>> as the response is received, the thread which delivered it should
>>>>>> traverse the interceptor stack up (again, logically), and fire the future.
>>>> A Future doesn't make much sense with an async transport.  The problem
>>>> is with an async transport you never get back a response so you never
>>>> know when the actual command is completed and thus a Future is
>>>> worthless.  The caller wouldn't know if they could rely on the use of
>>>> the Future or not.
>>> 
>>> You're right, there's one important difference between putAsync and put
>>> with async transport: in the first case you can find out when the
>>> request is completed while you cannot with the latter. Not requiring the
>>> ack can be an important optimization. I think that both versions are
>>> very valid: first mostly for bulk operations = reduction of latency,
>>> second for modifications that are acceptable to fail without handling that.
>>> I had the first case in my mind when talking about async operations, and
>>> there the futures are necessary.
>>> 
>>> A couple more differences:
>>> 1. You can't do commitAsync(), but you can configure the commit to be replicated asynchronously (1PC). Although we did talk about removing that option...
>>> 2. If you do putAsync(k, v1); putAsync(k, v2), there is no ordering between the two and you might end up with k=v1 in the cache.
>> 
>> If there?s any relationship between both puts for the caller thread, the caller must make sure that the second put is only called after the first has completed.
> 
> Actually in such a case I would strongly expect Infinispan to keep the
> two operations in order. This is not to be pushed on user's
> responsibility.

If the two operations are executed by the same thread, then yes, I agree that it should be applied one after the other:

Thread-1: Future f1 = putAsync(k, v1);
Thread-1: Future f2 = putAsync(k, v2);

I?d expect v1 to be applied and then v2. This operations would be added to some queue that you?d expect both insertions to happen one after the other, in Thread-1, so yeah, we can apply them in order.

However, if the following happens:

Thread-1: Future f1 = putAsync(k, v1);
Thread-2: Future f2 = putAsync(k, v2);

We can?t be enforcing such ordering.

Now, if there?s a relationship to the eye of the beholder between v1 and v2, and you expect v2 to be the end result, this is how you?d have to do it (JDK8-esque):

Thread-1: Future f1 = putAsync(k, v1);
Thread-2: Future f2 = f1.map.putAsync(k, v2);

or:

Thread-1: Future f1 = putAsync(k, v1);
Thread-2: Future f2 = f1.map.replaceAsync(k, v1, v2);

> 
>> 
>> If there?s separate threads calling it and it relies on this, it should call replace the second time, i.e. replaceAsync(k, v1, v2) to get the guarantees it wants.
>> 
>> What is really important is that the order in which they are executed in one node/replica is the same order in which they?re executed in all other nodes. This was something that was not maintained when async marshalling was enabled.
> 
> +1000
> 
> But also I'd stress that any sync operation should have a Future
> returned,

^ To me, purely sync operations are any operations that return anything other than a Future. IOW:

void put(k, v);

^ That?s an implicit sync operation where you have no choice.

An async operation can behave both sync and async:

Future<Void> put(k, v);

Can be sync or async, depends on whether the user waits or does something once it completes. If it does not wait, or discards the Future, it?s async. If it does somethign with the future, it?s sync.

> someone in this long thread suggested to have an option to
> drop it for example to speedup bulk imports, but I really can't see a
> scenario in which I wouldn't want to know about a failure.

+1, I think everything should return a Future.

> Let's not
> do the same mistake that made MongoDB so "popular" ;-)
> Bulk imports can still be mad efficient without strictly needing to go
> these lenghts.
> 
> Sanne
> 
> 
>> 
>>> 
>>> 
>>>> 
>>>> Also it depends what you are trying to do with async.  Currently async
>>>> transport is only for sending messages to another node, we never think
>>>> of when we are the owning node.  In this case the calling thread would
>>>> have to go down the interceptor stack and acquire any locks if it is
>>>> the owner, thus causing this "async" to block if you have any
>>>> contention on the given key.  The use of another thread would allow
>>>> the calling thread to be able to return immediately no matter what
>>>> else is occurring.  Also I don't see what is so wrong about having a
>>>> context switch to run something asynchronously, we shouldn't have a
>>>> context switch to block the user thread imo, which is very possible
>>>> with locking.
>>> 
>>> This is an important notice! Locking would complicate the design a lot,
>>> because the thread in "async" mode should do only tryLocks - if this
>>> fails, further processing should be dispatched to another thread. Not
>>> sure if this could be implemented at all, because the thread may be
>>> blocked inside JGroups as well (async API is about receiving the
>>> response asynchronously, not about sending the message asynchronously).
>>> 
>>> I don't say that the context switch is that bad. My concern is that you
>>> have a very limited amount of requests that can be processed in
>>> parallel. I consider a "request" something pretty lightweight in concept
>>> - but one thread per request makes this rather heavyweight stuff.
>>> 
>>> We did talk in Farnborough/Palma about removing the current LockManager with a queue-based structure like the one used for ordering total-order transactions. And about removing the implicit stack in the current interceptor stack with an explicit stack, to allow resuming a command mid-execution. But the feeling I got was that neither is going to make it into 7.0.
>>> 
>>> 
>>>> 
>>>>> +1 much cleaner, I love it. Actually wasn't aware the current code
>>>>> didn't do this :-(
>>>> This is what the current async transport does, but it does nothing with Futures.
>>> 
>>> Nevermind the futures, this is not the important part. It's not about
>>> async transport neither, it's about async executors.
>>> (okay, the thread was about dropping async transport, I have hijacked it)
>>> 
>>> Radim
>>> 
>>> --
>>> Radim Vansa <rvansa at redhat.com>
>>> JBoss DataGrid QA
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> 
>> --
>> Galder Zamarre?o
>> galder at redhat.com
>> twitter.com/galderz
>> 
>> Project Lead, Escalante
>> http://escalante.io
>> 
>> Engineer, Infinispan
>> http://infinispan.org
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From galder at redhat.com  Wed Feb 26 03:45:07 2014
From: galder at redhat.com (=?iso-8859-1?Q?Galder_Zamarre=F1o?=)
Date: Wed, 26 Feb 2014 09:45:07 +0100
Subject: [infinispan-dev] How to add programmatic config to an exisitng
	xml configured cache
In-Reply-To: <B14421FA3F549F4A877442542D16672C179CCB@ESESSMB109.ericsson.se>
References: <B14421FA3F549F4A877442542D16672C179CCB@ESESSMB109.ericsson.se>
Message-ID: <6D966643-6E98-4642-9DA9-38E9DF3CB49A@redhat.com>

Hi Faseela,

Can you create a unit test demonstrating this (including the config.xml file)?

There are plenty of examples in [1].

Cheers,

[1] https://github.com/infinispan/infinispan/tree/master/core/src/test/java/org/infinispan/configuration

On 20 Feb 2014, at 10:11, Faseela K <faseela.k at ericsson.com> wrote:

> Hi,
>  
>   I have some infinispan configurations available in "config.xml".
>   After loading this configuration, I want to append some more configurations programmatically, using Configuration Builder.
>   I am doing something like this :
>  
>                         Configuration template = null;
>                 ConfigurationBuilder builder = new ConfigurationBuilder();
>  
>                 DefaultCacheManager manager = new DefaultCacheManager(
>                                 "config.xml");
>                 template = manager.getCacheConfiguration("evictionCache");
>                 builder.read(template);
>                 builder.loaders().passivation(false).shared(false).preload(true)
>                                 .addFileCacheStore().fetchPersistentState(true)
>                                 .purgerThreads(3).purgeSynchronously(true)
>                                 .ignoreModifications(false).purgeOnStartup(false)
>                                 .location("tmp").async()
>                                 .enabled(true).flushLockTimeout(15000).threadPoolSize(5)
>                                 .singletonStore().enabled(true).pushStateWhenCoordinator(true)
>                                 .pushStateTimeout(20000);
>  
>                 manager.defineConfiguration("abcd", builder.build());
>  
>   The problem with this code is, it's overwriting the evictionCache configuration.
>   Can somebody help me to fix this issue?
>  
> Thanks,
> Faseela
>  
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From galder at redhat.com  Wed Feb 26 01:57:03 2014
From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=)
Date: Wed, 26 Feb 2014 07:57:03 +0100
Subject: [infinispan-dev] Ditching ASYNC modes for
	REPL/DIST/INV/CacheStores?
In-Reply-To: <5304DF70.2050603@redhat.com>
References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com>
	<52EB5197.4050801@redhat.com>
	<EFBC0CC6-7373-4411-9165-4A53B4652144@redhat.com>
	<5304DF70.2050603@redhat.com>
Message-ID: <BD4E9FBF-04A1-4FBF-ABA2-354767F96DAD@redhat.com>


On 19 Feb 2014, at 17:44, Dennis Reed <dereed at redhat.com> wrote:

> On 02/19/2014 12:57 AM, Galder Zamarre?o wrote:
>> On 31 Jan 2014, at 08:32, Dennis Reed <dereed at redhat.com> wrote:
>> 
>>> It would be a loss of functionality.
>>> 
>>> As a common example, the AS web session replication cache is configured
>>> for ASYNC by default, for performance reasons.
>>> But it can be changed to SYNC to guarantee that when the request
>>> finishes that the session was replicated.
>>> 
>>> That wouldn't be possible if you could no longer switch between
>>> ASYNC/SYNC with just a configuration change.
>> I disagree :).
>> 
>> AS could abstract that configuration detail. IOW, if all Infinispan returned was Futures, AS or any other client application, has the choice in their hands: do they wait for the future to complete or not? If they do, they?re SYNC, if not ASYNC. AS can still expose this and no functionality is lost.
> 
> Yes, the functionality is still lost.  Your suggestion is just to 
> re-implement the functionality over and over in each ISPN caller. :)

Yup, welcome to the non-blocking world.

> 
>> What happens is that SYNC/ASYNC decision stops being a configuration option (bad, bad, bad) and becomes an actual programming decision Infinispan clients must address (good, good, good).
> 
> This really depends on the client.  For the AS session replication use 
> case, a config option is good, good, good.
> But re-implementing the same functionality in every caller that may want 
> it to be a config option is bad, bad, bad.
> 
> -Dennis
> 
>>> -Dennis
>>> 
>>> On 01/31/2014 01:08 AM, Galder Zamarre?o wrote:
>>>> Hi all,
>>>> 
>>>> The following came to my mind yesterday: I think we should ditch ASYNC modes for DIST/REPL/INV and our async cache store functionality.
>>>> 
>>>> Instead, whoever wants to store something asyncronously should use asynchronous methods, i.e. call putAsync. So, this would mean that when you call put(), it's always sync. This would reduce the complexity and configuration of our code base, without affecting our functionality, and it would make things more logical IMO.
>>>> 
>>>> WDYT?
>>>> 
>>>> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From dan.berindei at gmail.com  Wed Feb 26 05:30:41 2014
From: dan.berindei at gmail.com (Dan Berindei)
Date: Wed, 26 Feb 2014 12:30:41 +0200
Subject: [infinispan-dev] Ditching ASYNC modes for
	REPL/DIST/INV/CacheStores?
In-Reply-To: <28874E57-C988-448A-99BB-1B65849D408F@redhat.com>
References: <57740CDD-8EFD-4D3E-9395-FABCF19B2448@redhat.com>
	<52EB5197.4050801@redhat.com> <52EB7F6C.505@redhat.com>
	<52EB9889.9070800@redhat.com>
	<CA+YCuUNG1Pf0tD2ZtRJJTmsvKY4Gtyo0QkSCCaamCzWpHwbxzg@mail.gmail.com>
	<52EFA355.2070203@redhat.com>
	<CAFm4XO3ZwDY5Fxp5sVajDU4bju287-4P_JiDmAm-z+aQo2JKrw@mail.gmail.com>
	<CA+YCuUPge2spEAH=6jBLbbYyCp_DA0MkOu9ym5MEfL5pZvUXWA@mail.gmail.com>
	<52EFC3AF.5060201@redhat.com>
	<CA+nfvwSsJAPQdrq7sEzHHHjEJZqh_Mg1bmuaMEmzySdrBVSCfQ@mail.gmail.com>
	<BD36524B-BE39-4782-A1CC-145C5ED4CFB3@redhat.com>
	<CAFm4XO1Ckr0J_891fxNmVS2CwnkNcqeov5dubFpRKd0==em7sQ@mail.gmail.com>
	<28874E57-C988-448A-99BB-1B65849D408F@redhat.com>
Message-ID: <CA+nfvwTYQVfS3oWVdd3zeig=KUq_aZw9X9phfPVO5fF68RxZEw@mail.gmail.com>

On Wed, Feb 26, 2014 at 8:56 AM, Galder Zamarre?o <galder at redhat.com> wrote:

>
> On 19 Feb 2014, at 12:03, Sanne Grinovero <sanne at infinispan.org> wrote:
>
> > On 19 February 2014 07:12, Galder Zamarre?o <galder at redhat.com> wrote:
> >>
> >> On 03 Feb 2014, at 19:01, Dan Berindei <dan.berindei at gmail.com> wrote:
> >>
> >>>
> >>>
> >>>
> >>> On Mon, Feb 3, 2014 at 6:28 PM, Radim Vansa <rvansa at redhat.com> wrote:
> >>>>>>> For sync we would want to invoke directly to avoid context
> switching.
> >>>>>> I think you haven't properly understood what I was talking about:
> the
> >>>>>> putAsync should not switch context at all in the ideal design. It
> should
> >>>>>> traverse through the interceptors all the way down (logically, in
> >>>>>> current behaviour), invoke JGroups async API and jump out. Then, as
> soon
> >>>>>> as the response is received, the thread which delivered it should
> >>>>>> traverse the interceptor stack up (again, logically), and fire the
> future.
> >>>> A Future doesn't make much sense with an async transport.  The problem
> >>>> is with an async transport you never get back a response so you never
> >>>> know when the actual command is completed and thus a Future is
> >>>> worthless.  The caller wouldn't know if they could rely on the use of
> >>>> the Future or not.
> >>>
> >>> You're right, there's one important difference between putAsync and put
> >>> with async transport: in the first case you can find out when the
> >>> request is completed while you cannot with the latter. Not requiring
> the
> >>> ack can be an important optimization. I think that both versions are
> >>> very valid: first mostly for bulk operations = reduction of latency,
> >>> second for modifications that are acceptable to fail without handling
> that.
> >>> I had the first case in my mind when talking about async operations,
> and
> >>> there the futures are necessary.
> >>>
> >>> A couple more differences:
> >>> 1. You can't do commitAsync(), but you can configure the commit to be
> replicated asynchronously (1PC). Although we did talk about removing that
> option...
> >>> 2. If you do putAsync(k, v1); putAsync(k, v2), there is no ordering
> between the two and you might end up with k=v1 in the cache.
> >>
> >> If there's any relationship between both puts for the caller thread,
> the caller must make sure that the second put is only called after the
> first has completed.
> >
> > Actually in such a case I would strongly expect Infinispan to keep the
> > two operations in order. This is not to be pushed on user's
> > responsibility.
>
> If the two operations are executed by the same thread, then yes, I agree
> that it should be applied one after the other:
>
> Thread-1: Future f1 = putAsync(k, v1);
> Thread-1: Future f2 = putAsync(k, v2);
>
> I'd expect v1 to be applied and then v2. This operations would be added to
> some queue that you'd expect both insertions to happen one after the other,
> in Thread-1, so yeah, we can apply them in order.
>

This does definitely not happen at the moment in Infinispan. Each putAsync
gets its own asynchronous worker thread (there are 25 async threads by
default), and the threads are not synchronized in any way.

And I'm not sure it makes sense to order them anyway. I mean the order
between two sequential putAsync operations was preserved, it would be quite
natural to expect the ordering between a putAsync and a regular put to be
preserved as well.

Thread-1: Future f1 = putAsync(k, v1)
Thread-1: put(k, v2)
Thread-1: assert f1.isDone() && get(k).equals(v2)

This would get quite complicated... an async put always creates a new,
implicit, transaction, whereas a regular put can be part of an active
transaction. So preserving the ordering between the putAsync and the put
might mean delaying not the put, but the transaction commit.

I'm not saying this couldn't be done, but I'm not sure it would make the
semantics of putAsync any clearer than they are now.


>
> However, if the following happens:
>
> Thread-1: Future f1 = putAsync(k, v1);
> Thread-2: Future f2 = putAsync(k, v2);
>
> We can't be enforcing such ordering.
>
> Now, if there's a relationship to the eye of the beholder between v1 and
> v2, and you expect v2 to be the end result, this is how you'd have to do it
> (JDK8-esque):
>
> Thread-1: Future f1 = putAsync(k, v1);
> Thread-2: Future f2 = f1.map.putAsync(k, v2);
>
> or:
>
> Thread-1: Future f1 = putAsync(k, v1);
> Thread-2: Future f2 = f1.map.replaceAsync(k, v1, v2);
>

Do you mean here that the 2nd putAsync/the replaceAsync operation would
start executing only after f1 is done? Or would you expect them both to
start executing at once, but with Infinispan ensuring that the 2nd
operation is executed on the primary owner after the 1st?

If it's the former, it should be quite easy to implement a Future with a
getCache() method that returns a delegating cache, allowing you to submit a
put operation immediately, but blocking it until the future is done. If
it's the latter, I suspect it's going to be a lot more work.


> >
> >>
> >> If there's separate threads calling it and it relies on this, it should
> call replace the second time, i.e. replaceAsync(k, v1, v2) to get the
> guarantees it wants.
> >>
> >> What is really important is that the order in which they are executed
> in one node/replica is the same order in which they're executed in all
> other nodes. This was something that was not maintained when async
> marshalling was enabled.
> >
> > +1000
> >
> > But also I'd stress that any sync operation should have a Future
> > returned,
>
> ^ To me, purely sync operations are any operations that return anything
> other than a Future. IOW:
>
> void put(k, v);
>
> ^ That's an implicit sync operation where you have no choice.
>
> An async operation can behave both sync and async:
>
> Future<Void> put(k, v);
>
> Can be sync or async, depends on whether the user waits or does something
> once it completes. If it does not wait, or discards the Future, it's async.
> If it does somethign with the future, it's sync.
>

I don't agree with this. If the user can do something else while the
operation is executing, then the operation is async. I don't know if there
is a specific name in Java-land for starting an async call and discarding
the Future, but in .Net this pattern is called "fire-and-forget".


> > someone in this long thread suggested to have an option to
> > drop it for example to speedup bulk imports, but I really can't see a
> > scenario in which I wouldn't want to know about a failure.
>
> +1, I think everything should return a Future.
>

Even void put(k, v)??


>
> > Let's not
> > do the same mistake that made MongoDB so "popular" ;-)
> > Bulk imports can still be mad efficient without strictly needing to go
> > these lenghts.
> >
> > Sanne
> >
> >
> >>
> >>>
> >>>
> >>>>
> >>>> Also it depends what you are trying to do with async.  Currently async
> >>>> transport is only for sending messages to another node, we never think
> >>>> of when we are the owning node.  In this case the calling thread would
> >>>> have to go down the interceptor stack and acquire any locks if it is
> >>>> the owner, thus causing this "async" to block if you have any
> >>>> contention on the given key.  The use of another thread would allow
> >>>> the calling thread to be able to return immediately no matter what
> >>>> else is occurring.  Also I don't see what is so wrong about having a
> >>>> context switch to run something asynchronously, we shouldn't have a
> >>>> context switch to block the user thread imo, which is very possible
> >>>> with locking.
> >>>
> >>> This is an important notice! Locking would complicate the design a lot,
> >>> because the thread in "async" mode should do only tryLocks - if this
> >>> fails, further processing should be dispatched to another thread. Not
> >>> sure if this could be implemented at all, because the thread may be
> >>> blocked inside JGroups as well (async API is about receiving the
> >>> response asynchronously, not about sending the message asynchronously).
> >>>
> >>> I don't say that the context switch is that bad. My concern is that you
> >>> have a very limited amount of requests that can be processed in
> >>> parallel. I consider a "request" something pretty lightweight in
> concept
> >>> - but one thread per request makes this rather heavyweight stuff.
> >>>
> >>> We did talk in Farnborough/Palma about removing the current
> LockManager with a queue-based structure like the one used for ordering
> total-order transactions. And about removing the implicit stack in the
> current interceptor stack with an explicit stack, to allow resuming a
> command mid-execution. But the feeling I got was that neither is going to
> make it into 7.0.
> >>>
> >>>
> >>>>
> >>>>> +1 much cleaner, I love it. Actually wasn't aware the current code
> >>>>> didn't do this :-(
> >>>> This is what the current async transport does, but it does nothing
> with Futures.
> >>>
> >>> Nevermind the futures, this is not the important part. It's not about
> >>> async transport neither, it's about async executors.
> >>> (okay, the thread was about dropping async transport, I have hijacked
> it)
> >>>
> >>> Radim
> >>>
> >>> --
> >>> Radim Vansa <rvansa at redhat.com>
> >>> JBoss DataGrid QA
> >>>
> >>> _______________________________________________
> >>> infinispan-dev mailing list
> >>> infinispan-dev at lists.jboss.org
> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>>
> >>> _______________________________________________
> >>> infinispan-dev mailing list
> >>> infinispan-dev at lists.jboss.org
> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>
> >>
> >> --
> >> Galder Zamarre?o
> >> galder at redhat.com
> >> twitter.com/galderz
> >>
> >> Project Lead, Escalante
> >> http://escalante.io
> >>
> >> Engineer, Infinispan
> >> http://infinispan.org
> >>
> >>
> >> _______________________________________________
> >> infinispan-dev mailing list
> >> infinispan-dev at lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> --
> Galder Zamarre?o
> galder at redhat.com
> twitter.com/galderz
>
> Project Lead, Escalante
> http://escalante.io
>
> Engineer, Infinispan
> http://infinispan.org
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140226/3228b679/attachment-0001.html 

From galder at redhat.com  Wed Feb 26 05:47:34 2014
From: galder at redhat.com (=?iso-8859-1?Q?Galder_Zamarre=F1o?=)
Date: Wed, 26 Feb 2014 11:47:34 +0100
Subject: [infinispan-dev] RadarGun 1.1.0.Final released
In-Reply-To: <530613E9.3020602@redhat.com>
References: <530613E9.3020602@redhat.com>
Message-ID: <045153C3-EDCA-4D17-A0FF-C8C58134CE10@redhat.com>

Great work Radim!!! Awesome job and very interesting roadmap :)

On 20 Feb 2014, at 15:40, Radim Vansa <rvansa at redhat.com> wrote:

> Hi all,
> 
> it has been a long time since last release of RadarGun. We have been 
> using it intensively and developed many new features - 1.0.0 had 7,340 
> lines of Java code, 1.1.0 has 32,978 lines. RadarGun has become 
> multi-purpose tool, used for checking both performance and functionality 
> of caches under stress.
> 
> During 1.1.0 development, most parts of code changed beyond the beyonds, 
> but we tried to keep the old configuration compatible. However, the 
> design started to be rather limiting, and therefore, we have decided to 
> make the last release for 1.1.0 and move on to RadarGun 2.0.0. In 1.1.x 
> branch we will provide bugfixes, but all new features should go to 2.0.0.
> 
> Some decoys for features expected for RadarGun 2.0.0:
> 
> * non-homogenous clusters: client/server setups, cooperation of 
> different versions of products, or easy setup of cross-site deployment 
> with different configurations
> * abstracting from cache wrapper: you will be able to use RadarGun for 
> more than just caches without any hacks
> ** current CacheWrapper interface will be designed to match JSR-107 
> javax.cache.Cache rather than java.util.Map
> * pluggable reporting: statistics will be directly multiplexed to 
> configured reporters (again, without cheating on directories), reporters 
> will provide the output formatted as CSV, HTML or even can deploy the 
> results to external repository
> * merging local and distributed benchmark -> master + single slave 
> within one JVM
> * better property parsing: evaluation of expressions, property 
> replacement executed on slaves
> 
> I hope you will like it! And enjoy 1.1.0.Final release now.
> 
> Radim
> 
> ------
> Radim Vansa <rvansa at redhat.com> JBoss DataGrid QA
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From dan.berindei at gmail.com  Wed Feb 26 06:22:43 2014
From: dan.berindei at gmail.com (Dan Berindei)
Date: Wed, 26 Feb 2014 13:22:43 +0200
Subject: [infinispan-dev] RadarGun 1.1.0.Final released
In-Reply-To: <045153C3-EDCA-4D17-A0FF-C8C58134CE10@redhat.com>
References: <530613E9.3020602@redhat.com>
	<045153C3-EDCA-4D17-A0FF-C8C58134CE10@redhat.com>
Message-ID: <CA+nfvwR3dWsobMZgyAMFLFOepDfpbXFLp+MO4g8c9u5z9byNwA@mail.gmail.com>

Great job, Radim! Looking forward to Radargun 2.0!


On Wed, Feb 26, 2014 at 12:47 PM, Galder Zamarre?o <galder at redhat.com>wrote:

> Great work Radim!!! Awesome job and very interesting roadmap :)
>
> On 20 Feb 2014, at 15:40, Radim Vansa <rvansa at redhat.com> wrote:
>
> > Hi all,
> >
> > it has been a long time since last release of RadarGun. We have been
> > using it intensively and developed many new features - 1.0.0 had 7,340
> > lines of Java code, 1.1.0 has 32,978 lines. RadarGun has become
> > multi-purpose tool, used for checking both performance and functionality
> > of caches under stress.
> >
> > During 1.1.0 development, most parts of code changed beyond the beyonds,
> > but we tried to keep the old configuration compatible. However, the
> > design started to be rather limiting, and therefore, we have decided to
> > make the last release for 1.1.0 and move on to RadarGun 2.0.0. In 1.1.x
> > branch we will provide bugfixes, but all new features should go to 2.0.0.
> >
> > Some decoys for features expected for RadarGun 2.0.0:
> >
> > * non-homogenous clusters: client/server setups, cooperation of
> > different versions of products, or easy setup of cross-site deployment
> > with different configurations
> > * abstracting from cache wrapper: you will be able to use RadarGun for
> > more than just caches without any hacks
> > ** current CacheWrapper interface will be designed to match JSR-107
> > javax.cache.Cache rather than java.util.Map
> > * pluggable reporting: statistics will be directly multiplexed to
> > configured reporters (again, without cheating on directories), reporters
> > will provide the output formatted as CSV, HTML or even can deploy the
> > results to external repository
> > * merging local and distributed benchmark -> master + single slave
> > within one JVM
> > * better property parsing: evaluation of expressions, property
> > replacement executed on slaves
> >
> > I hope you will like it! And enjoy 1.1.0.Final release now.
> >
> > Radim
> >
> > ------
> > Radim Vansa <rvansa at redhat.com> JBoss DataGrid QA
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> --
> Galder Zamarre?o
> galder at redhat.com
> twitter.com/galderz
>
> Project Lead, Escalante
> http://escalante.io
>
> Engineer, Infinispan
> http://infinispan.org
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140226/f6c18c85/attachment.html 

From mmarkus at redhat.com  Wed Feb 26 08:12:07 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Wed, 26 Feb 2014 13:12:07 +0000
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <CAFm4XO1ds3eQDV0oEENGgZbdRv-rrQ-1s5TpP0_2zYhXs+T=5Q@mail.gmail.com>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>
	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>
	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>
	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>
	<20140205163032.GB93108@hibernate.org>
	<F1EB6F85-D5B2-4459-B2BD-185124E046B3@redhat.com>
	<20140217175115.GC639@hibernate.org>
	<8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org>
	<1256491E-44CB-4B7D-8494-CB0061954571@redhat.com>
	<43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org>
	<5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com>
	<CAAuDtmddwEZbf1v8hod_FBX_tTUH5hz3uAkLH8KJbkW3v249rw@mail.gmail.com>
	<DB9D3746-0FFD-43D6-A87D-8E3B9A8EA995@redhat.com>
	<CAFm4XO1ds3eQDV0oEENGgZbdRv-rrQ-1s5TpP0_2zYhXs+T=5Q@mail.gmail.com>
Message-ID: <6036A294-231A-484F-8224-C77372987832@redhat.com>


On Feb 25, 2014, at 5:08 PM, Sanne Grinovero <sanne at infinispan.org> wrote:

> There also is the opposite problem to be considered, as Emmanuel
> suggested on 11/04/2012:
> you can't forbid the user to store the same object (same type and same
> id) in two different caches, where each Cache might be using different
> indexing options.
> 
> If the "search service" is a global concept, and you run a query which
> matches object X, we'll return it to the user but he won't be able to
> figure out from which cache it's being sourced: is that ok?

Can't the user figure that out based on the way the query is built?
I mean the problem is similar with the databases: if address is both a table and an column in the USER table, then it's the query (select) that determines where from the address is returned.

> 
> Ultimately this implies a query might return the same object X in
> multiple positions in the result list of the query; for example it
> might be the top result according to some criteria but also be the 5th
> result because of how it was indexed in a different case: maybe
> someone will find good use for this "capability" but I see it
> primarily as a source of confusion.

Curious if this cannot be source of data can/cannot be specified within the query.

> Finally, if we move the search service as a global component, there
> might be an impact in how we explain security: an ACL filter applied
> on one cache - or the index metadata produced by that cache - might
> not be applied in the same way by an entity being matched through a
> second cache.
> Not least a user's permission to access one cache (or not) will affect
> his results in a rather complex way.

I'll let Tristan comment more on this, but is this really different from an SQL database where you grant access on individual tables and run a query involving multiple of them?

> 
> I'm wondering if we need to prevent such situations.
> 
> Sanne
> 
> On 25 February 2014 16:24, Mircea Markus <mmarkus at redhat.com> wrote:
>> 
>> On Feb 25, 2014, at 3:46 PM, Adrian Nistor <anistor at gmail.com> wrote:
>> 
>>> They can do what they please. Either put multiple types in one basket or put them in separate caches (one type per cache). But allowing / recommending is one thing, mandating it is a different story.
>>> 
>>> There's no reason to forbid _any_ of these scenarios / mandate one over the other! There was previously in this thread some suggestion of mandating the one type per cache usage. -1 for it
>> 
>> Agreed. I actually don't see how we can enforce people that declare Cache<Object,Object> not put whatever they want in it. Also makes total sense for smaller caches as it is easy to set up etc.
>> The debate in this email, the way I understood it, was: are/should people using multiple caches for storing data? If yes we should consider querying functionality spreading over multiple caches.
>> 
>>> 
>>> 
>>> 
>>> On Tue, Feb 25, 2014 at 5:08 PM, Mircea Markus <mmarkus at redhat.com> wrote:
>>> 
>>> On Feb 25, 2014, at 9:28 AM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>> 
>>>>> On 24 f?vr. 2014, at 17:39, Mircea Markus <mmarkus at redhat.com> wrote:
>>>>> 
>>>>> 
>>>>>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>>>>> 
>>>>>> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome.
>>>>> 
>>>>> Curious to hear the whole story :-)
>>>>> We cannot mandate all the suers to use OGM though, one of the reasons being OGM is not platform independent (hotrod).
>>>> 
>>>> Then solve all the issues I have raised with a magic wand and come back to me when you have done it, I'm interested.
>>> 
>>> People are going to use infinispan with one cache per entity, because it makes sense:
>>> - different config (repl/dist | persistent/non-persistent) for different data types
>>> - have map/reduce tasks running only the Person entires not on Dog as well, when you want to select (Person) where age > 18
>>> I don't see a reason to forbid this, on the contrary. The way I see it the relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be a better abstraction and should be recommended as such for the Java clients, but ultimately we're a general purpose storage engine that is available to different platforms as well.
>>> 
>>> 
>>> Cheers,
>>> --
>>> Mircea Markus
>>> Infinispan lead (www.infinispan.org)
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> Cheers,
>> --
>> Mircea Markus
>> Infinispan lead (www.infinispan.org)
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From ttarrant at redhat.com  Wed Feb 26 08:05:54 2014
From: ttarrant at redhat.com (Tristan Tarrant)
Date: Wed, 26 Feb 2014 14:05:54 +0100
Subject: [infinispan-dev] JavaDocs and API documentation
Message-ID: <530DE6B2.2060405@redhat.com>

Dear all,

our JavaDocs currently encompass all of our classes, interfaces, etc 
with no clear distinction between public and private API/SPI. I would 
like to clearly mark which of our classes/interfaces are public API. 
Should we:

- add some decoration / visual cue to such elements to distinguish them 
from the internal stuff
- generate two JavaDoc bundles: one which only contains the public 
API/SPIs and one with everything

Tristan


From dan.berindei at gmail.com  Wed Feb 26 09:13:36 2014
From: dan.berindei at gmail.com (Dan Berindei)
Date: Wed, 26 Feb 2014 16:13:36 +0200
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <6036A294-231A-484F-8224-C77372987832@redhat.com>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>
	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>
	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>
	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>
	<20140205163032.GB93108@hibernate.org>
	<F1EB6F85-D5B2-4459-B2BD-185124E046B3@redhat.com>
	<20140217175115.GC639@hibernate.org>
	<8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org>
	<1256491E-44CB-4B7D-8494-CB0061954571@redhat.com>
	<43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org>
	<5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com>
	<CAAuDtmddwEZbf1v8hod_FBX_tTUH5hz3uAkLH8KJbkW3v249rw@mail.gmail.com>
	<DB9D3746-0FFD-43D6-A87D-8E3B9A8EA995@redhat.com>
	<CAFm4XO1ds3eQDV0oEENGgZbdRv-rrQ-1s5TpP0_2zYhXs+T=5Q@mail.gmail.com>
	<6036A294-231A-484F-8224-C77372987832@redhat.com>
Message-ID: <CA+nfvwSLCdDRRtCSctFfJqgHrWn-_SHKO_kEqE3D3aBGnHKLKg@mail.gmail.com>

On Wed, Feb 26, 2014 at 3:12 PM, Mircea Markus <mmarkus at redhat.com> wrote:

>
> On Feb 25, 2014, at 5:08 PM, Sanne Grinovero <sanne at infinispan.org> wrote:
>
> > There also is the opposite problem to be considered, as Emmanuel
> > suggested on 11/04/2012:
> > you can't forbid the user to store the same object (same type and same
> > id) in two different caches, where each Cache might be using different
> > indexing options.
> >
> > If the "search service" is a global concept, and you run a query which
> > matches object X, we'll return it to the user but he won't be able to
> > figure out from which cache it's being sourced: is that ok?
>
> Can't the user figure that out based on the way the query is built?
> I mean the problem is similar with the databases: if address is both a
> table and an column in the USER table, then it's the query (select) that
> determines where from the address is returned.
>

You mean the user should specify the cache name(s) when building the query?

With a database you have to go a bit out of your way to select from more
than one table at a time, normally you have just one primary table that you
select from and the others are just to help you filter and transform that
table. You also have to add some information about the source table
yourself if you need it, otherwise the DB won't tell you what table the
results are coming from:

SELECT "table1" as source, id FROM table1
UNION ALL
SELECT "table2" as source, id FROM table2

Adrian tells our current query API doesn't allow us to do projections with
synthetic columns. On the other hand, we need to extend the current API to
give us the entry key anyway, so it would be easy to extend it to give us
the name of the cache as well.


> >
> > Ultimately this implies a query might return the same object X in
> > multiple positions in the result list of the query; for example it
> > might be the top result according to some criteria but also be the 5th
> > result because of how it was indexed in a different case: maybe
> > someone will find good use for this "capability" but I see it
> > primarily as a source of confusion.
>
> Curious if this cannot be source of data can/cannot be specified within
> the query.
>

Right, the user should be able to scope a search to a single cache, or
maybe to multiple caches, even if there is only one global index.

But I think the same object can already be inserted twice in the same
cache, only with a different key, so returning duplicates from a query is
something the user already has to cope with.


> > Finally, if we move the search service as a global component, there
> > might be an impact in how we explain security: an ACL filter applied
> > on one cache - or the index metadata produced by that cache - might
> > not be applied in the same way by an entity being matched through a
> > second cache.
> > Not least a user's permission to access one cache (or not) will affect
> > his results in a rather complex way.
>
> I'll let Tristan comment more on this, but is this really different from
> an SQL database where you grant access on individual tables and run a query
> involving multiple of them?
>

The difference would be that in a DB each table will have its own
index(es), so they only have to check the permissions once and not for
every row.

OTOH, if we plan to support key-level permissions, that would require
checking the permissions on each search result anyway, so this wouldn't
cost us anything.


>
> >
> > I'm wondering if we need to prevent such situations.
> >
> > Sanne
> >
> > On 25 February 2014 16:24, Mircea Markus <mmarkus at redhat.com> wrote:
> >>
> >> On Feb 25, 2014, at 3:46 PM, Adrian Nistor <anistor at gmail.com> wrote:
> >>
> >>> They can do what they please. Either put multiple types in one basket
> or put them in separate caches (one type per cache). But allowing /
> recommending is one thing, mandating it is a different story.
> >>>
> >>> There's no reason to forbid _any_ of these scenarios / mandate one
> over the other! There was previously in this thread some suggestion of
> mandating the one type per cache usage. -1 for it
> >>
> >> Agreed. I actually don't see how we can enforce people that declare
> Cache<Object,Object> not put whatever they want in it. Also makes total
> sense for smaller caches as it is easy to set up etc.
> >> The debate in this email, the way I understood it, was: are/should
> people using multiple caches for storing data? If yes we should consider
> querying functionality spreading over multiple caches.
> >>
> >>>
> >>>
> >>>
> >>> On Tue, Feb 25, 2014 at 5:08 PM, Mircea Markus <mmarkus at redhat.com>
> wrote:
> >>>
> >>> On Feb 25, 2014, at 9:28 AM, Emmanuel Bernard <emmanuel at hibernate.org>
> wrote:
> >>>
> >>>>> On 24 f?vr. 2014, at 17:39, Mircea Markus <mmarkus at redhat.com>
> wrote:
> >>>>>
> >>>>>
> >>>>>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard <
> emmanuel at hibernate.org> wrote:
> >>>>>>
> >>>>>> By the way, Mircea, Sanne and I had quite a long discussion about
> this one and the idea of one cache per entity. It turns out that the right
> (as in easy) solution does involve a higher level programming model like
> OGM provides. You can simulate it yourself using the Infinispan APIs but it
> is just cumbersome.
> >>>>>
> >>>>> Curious to hear the whole story :-)
> >>>>> We cannot mandate all the suers to use OGM though, one of the
> reasons being OGM is not platform independent (hotrod).
> >>>>
> >>>> Then solve all the issues I have raised with a magic wand and come
> back to me when you have done it, I'm interested.
> >>>
> >>> People are going to use infinispan with one cache per entity, because
> it makes sense:
> >>> - different config (repl/dist | persistent/non-persistent) for
> different data types
> >>> - have map/reduce tasks running only the Person entires not on Dog as
> well, when you want to select (Person) where age > 18
> >>> I don't see a reason to forbid this, on the contrary. The way I see it
> the relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be
> a better abstraction and should be recommended as such for the Java
> clients, but ultimately we're a general purpose storage engine that is
> available to different platforms as well.
> >>>
> >>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140226/d2d2f701/attachment-0001.html 

From mmarkus at redhat.com  Wed Feb 26 09:02:20 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Wed, 26 Feb 2014 14:02:20 +0000
Subject: [infinispan-dev] JavaDocs and API documentation
In-Reply-To: <530DE6B2.2060405@redhat.com>
References: <530DE6B2.2060405@redhat.com>
Message-ID: <AD154F32-9192-4386-870F-3A858E4D8A74@redhat.com>

On Feb 26, 2014, at 1:05 PM, Tristan Tarrant <ttarrant at redhat.com> wrote:

> Dear all,
> 
> our JavaDocs currently encompass all of our classes, interfaces, etc 
> with no clear distinction between public and private API/SPI. I would 
> like to clearly mark which of our classes/interfaces are public API. 
> Should we:
> 
> - add some decoration / visual cue to such elements to distinguish them 
> from the internal stuff

I think Sanne mentioned and i think it was Hibernate that has impl sub-packages for all the non-public API. 
Sounds sensible to me, as people will see the impl in the class name when importing it, and that should raise question marks. shall we adopt that?

> - generate two JavaDoc bundles: one which only contains the public 
> API/SPIs and one with everything
> 
> Tristan
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From mmarkus at redhat.com  Wed Feb 26 09:20:54 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Wed, 26 Feb 2014 14:20:54 +0000
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <CA+nfvwSLCdDRRtCSctFfJqgHrWn-_SHKO_kEqE3D3aBGnHKLKg@mail.gmail.com>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>
	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>
	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>
	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>
	<20140205163032.GB93108@hibernate.org>
	<F1EB6F85-D5B2-4459-B2BD-185124E046B3@redhat.com>
	<20140217175115.GC639@hibernate.org>
	<8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org>
	<1256491E-44CB-4B7D-8494-CB0061954571@redhat.com>
	<43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org>
	<5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com>
	<CAAuDtmddwEZbf1v8hod_FBX_tTUH5hz3uAkLH8KJbkW3v249rw@mail.gmail.com>
	<DB9D3746-0FFD-43D6-A87D-8E3B9A8EA995@redhat.com>
	<CAFm4XO1ds3eQDV0oEENGgZbdRv-rrQ-1s5TpP0_2zYhXs+T=5Q@mail.gmail.com>
	<6036A294-231A-484F-8224-C77372987832@redhat.com>
	<CA+nfvwSLCdDRRtCSctFfJqgHrWn-_SHKO_kEqE3D3aBGnHKLKg@mail.gmail.com>
Message-ID: <EB0EDBC9-756F-4BA2-B0E0-F27C6827B60E@redhat.com>


On Feb 26, 2014, at 2:13 PM, Dan Berindei <dan.berindei at gmail.com> wrote:

> 
> 
> 
> On Wed, Feb 26, 2014 at 3:12 PM, Mircea Markus <mmarkus at redhat.com> wrote:
> 
> On Feb 25, 2014, at 5:08 PM, Sanne Grinovero <sanne at infinispan.org> wrote:
> 
> > There also is the opposite problem to be considered, as Emmanuel
> > suggested on 11/04/2012:
> > you can't forbid the user to store the same object (same type and same
> > id) in two different caches, where each Cache might be using different
> > indexing options.
> >
> > If the "search service" is a global concept, and you run a query which
> > matches object X, we'll return it to the user but he won't be able to
> > figure out from which cache it's being sourced: is that ok?
> 
> Can't the user figure that out based on the way the query is built?
> I mean the problem is similar with the databases: if address is both a table and an column in the USER table, then it's the query (select) that determines where from the address is returned.
> 
> You mean the user should specify the cache name(s) when building the query?

yes

> 
> With a database you have to go a bit out of your way to select from more than one table at a time, normally you have just one primary table that you select from and the others are just to help you filter and transform that table. You also have to add some information about the source table yourself if you need it, otherwise the DB won't tell you what table the results are coming from:
> 
> SELECT "table1" as source, id FROM table1
> UNION ALL
> SELECT "table2" as source, id FROM table2
> 
> Adrian tells our current query API doesn't allow us to do projections with synthetic columns. On the other hand, we need to extend the current API to give us the entry key anyway, so it would be easy to extend it to give us the name of the cache as well.
> 
> 
> >
> > Ultimately this implies a query might return the same object X in
> > multiple positions in the result list of the query; for example it
> > might be the top result according to some criteria but also be the 5th
> > result because of how it was indexed in a different case: maybe
> > someone will find good use for this "capability" but I see it
> > primarily as a source of confusion.
> 
> Curious if this cannot be source of data can/cannot be specified within the query.
> 
> Right, the user should be able to scope a search to a single cache, or maybe to multiple caches, even if there is only one global index.
> 
> But I think the same object can already be inserted twice in the same cache, only with a different key, so returning duplicates from a query is something the user already has to cope with.
> 
> 
> > Finally, if we move the search service as a global component, there
> > might be an impact in how we explain security: an ACL filter applied
> > on one cache - or the index metadata produced by that cache - might
> > not be applied in the same way by an entity being matched through a
> > second cache.
> > Not least a user's permission to access one cache (or not) will affect
> > his results in a rather complex way.
> 
> I'll let Tristan comment more on this, but is this really different from an SQL database where you grant access on individual tables and run a query involving multiple of them?
> 
> The difference would be that in a DB each table will have its own index(es), so they only have to check the permissions once and not for every row. 
> 
> OTOH, if we plan to support key-level permissions, that would require checking the permissions on each search result anyway, so this wouldn't cost us anything.
>  
> 
> >
> > I'm wondering if we need to prevent such situations.
> >
> > Sanne
> >
> > On 25 February 2014 16:24, Mircea Markus <mmarkus at redhat.com> wrote:
> >>
> >> On Feb 25, 2014, at 3:46 PM, Adrian Nistor <anistor at gmail.com> wrote:
> >>
> >>> They can do what they please. Either put multiple types in one basket or put them in separate caches (one type per cache). But allowing / recommending is one thing, mandating it is a different story.
> >>>
> >>> There's no reason to forbid _any_ of these scenarios / mandate one over the other! There was previously in this thread some suggestion of mandating the one type per cache usage. -1 for it
> >>
> >> Agreed. I actually don't see how we can enforce people that declare Cache<Object,Object> not put whatever they want in it. Also makes total sense for smaller caches as it is easy to set up etc.
> >> The debate in this email, the way I understood it, was: are/should people using multiple caches for storing data? If yes we should consider querying functionality spreading over multiple caches.
> >>
> >>>
> >>>
> >>>
> >>> On Tue, Feb 25, 2014 at 5:08 PM, Mircea Markus <mmarkus at redhat.com> wrote:
> >>>
> >>> On Feb 25, 2014, at 9:28 AM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
> >>>
> >>>>> On 24 f?vr. 2014, at 17:39, Mircea Markus <mmarkus at redhat.com> wrote:
> >>>>>
> >>>>>
> >>>>>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
> >>>>>>
> >>>>>> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome.
> >>>>>
> >>>>> Curious to hear the whole story :-)
> >>>>> We cannot mandate all the suers to use OGM though, one of the reasons being OGM is not platform independent (hotrod).
> >>>>
> >>>> Then solve all the issues I have raised with a magic wand and come back to me when you have done it, I'm interested.
> >>>
> >>> People are going to use infinispan with one cache per entity, because it makes sense:
> >>> - different config (repl/dist | persistent/non-persistent) for different data types
> >>> - have map/reduce tasks running only the Person entires not on Dog as well, when you want to select (Person) where age > 18
> >>> I don't see a reason to forbid this, on the contrary. The way I see it the relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be a better abstraction and should be recommended as such for the Java clients, but ultimately we're a general purpose storage engine that is available to different platforms as well.
> >>>
> >>>
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From ttarrant at redhat.com  Wed Feb 26 09:24:45 2014
From: ttarrant at redhat.com (Tristan Tarrant)
Date: Wed, 26 Feb 2014 15:24:45 +0100
Subject: [infinispan-dev] JavaDocs and API documentation
In-Reply-To: <AD154F32-9192-4386-870F-3A858E4D8A74@redhat.com>
References: <530DE6B2.2060405@redhat.com>
	<AD154F32-9192-4386-870F-3A858E4D8A74@redhat.com>
Message-ID: <530DF92D.1020202@redhat.com>

On 26/02/2014 15:02, Mircea Markus wrote:
> On Feb 26, 2014, at 1:05 PM, Tristan Tarrant <ttarrant at redhat.com> wrote:
>
>> Dear all,
>>
>> our JavaDocs currently encompass all of our classes, interfaces, etc
>> with no clear distinction between public and private API/SPI. I would
>> like to clearly mark which of our classes/interfaces are public API.
>> Should we:
>>
>> - add some decoration / visual cue to such elements to distinguish them
>> from the internal stuff
> I think Sanne mentioned and i think it was Hibernate that has impl sub-packages for all the non-public API.
> Sounds sensible to me, as people will see the impl in the class name when importing it, and that should raise question marks. shall we adopt that?
That would help, but we would still end up with a lot of noise in the 
javadocs, for example the list of classes on the left has no separation 
by package.

Tristan

From anistor at redhat.com  Wed Feb 26 09:33:00 2014
From: anistor at redhat.com (Adrian Nistor)
Date: Wed, 26 Feb 2014 16:33:00 +0200
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <EB0EDBC9-756F-4BA2-B0E0-F27C6827B60E@redhat.com>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>	<20140205163032.GB93108@hibernate.org>	<F1EB6F85-D5B2-4459-B2BD-185124E046B3@redhat.com>	<20140217175115.GC639@hibernate.org>	<8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org>	<1256491E-44CB-4B7D-8494-CB0061954571@redhat.com>	<43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org>	<5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com>	<CAAuDtmddwEZbf1v8hod_FBX_tTUH5hz3uAkLH8KJbkW3v249rw@mail.gmail.com>	<DB9D3746-0FFD-43D6-A87D-8E3B9A8EA995@redhat.com>	<CAFm4XO1ds3eQDV0oEENGgZbdRv-rrQ-1s5TpP0_2zYhXs+T=5Q@mail.gmail.com>	<6036A294-231A-484F-8224-C77372987832@redhat.com>	<CA+nfvwSLCdDRRtCSctFfJqgHrWn-_SHKO_kEqE3D3aBGnHKLKg@mail.gmail.com>
	<EB0EDBC9-756F-4BA2-B0E0-F27C6827B60E@redhat.com>
Message-ID: <530DFB1C.20109@redhat.com>

On 02/26/2014 04:20 PM, Mircea Markus wrote:
> On Feb 26, 2014, at 2:13 PM, Dan Berindei <dan.berindei at gmail.com> wrote:
>
>>
>>
>> On Wed, Feb 26, 2014 at 3:12 PM, Mircea Markus <mmarkus at redhat.com> wrote:
>>
>> On Feb 25, 2014, at 5:08 PM, Sanne Grinovero <sanne at infinispan.org> wrote:
>>
>>> There also is the opposite problem to be considered, as Emmanuel
>>> suggested on 11/04/2012:
>>> you can't forbid the user to store the same object (same type and same
>>> id) in two different caches, where each Cache might be using different
>>> indexing options.
>>>
>>> If the "search service" is a global concept, and you run a query which
>>> matches object X, we'll return it to the user but he won't be able to
>>> figure out from which cache it's being sourced: is that ok?
>> Can't the user figure that out based on the way the query is built?
>> I mean the problem is similar with the databases: if address is both a table and an column in the USER table, then it's the query (select) that determines where from the address is returned.
>>
>> You mean the user should specify the cache name(s) when building the query?
> yes
Let's say multiple caches are specified when building the query. How can 
I tell (with current result api) where does the matching entity come 
from? I still think we should extend the result api in order to provide: 
1. the key of the entity, 2. the name of the originating cache.  The old 
result api that just gives you an Iterator<Object> over the matches 
should continue to exist because it's more efficient for the cases when 
the user does not need #1 and #2.

>
>> With a database you have to go a bit out of your way to select from more than one table at a time, normally you have just one primary table that you select from and the others are just to help you filter and transform that table. You also have to add some information about the source table yourself if you need it, otherwise the DB won't tell you what table the results are coming from:
>>
>> SELECT "table1" as source, id FROM table1
>> UNION ALL
>> SELECT "table2" as source, id FROM table2
>>
>> Adrian tells our current query API doesn't allow us to do projections with synthetic columns. On the other hand, we need to extend the current API to give us the entry key anyway, so it would be easy to extend it to give us the name of the cache as well.
>>
>>
>>> Ultimately this implies a query might return the same object X in
>>> multiple positions in the result list of the query; for example it
>>> might be the top result according to some criteria but also be the 5th
>>> result because of how it was indexed in a different case: maybe
>>> someone will find good use for this "capability" but I see it
>>> primarily as a source of confusion.
>> Curious if this cannot be source of data can/cannot be specified within the query.
>>
>> Right, the user should be able to scope a search to a single cache, or maybe to multiple caches, even if there is only one global index.
>>
>> But I think the same object can already be inserted twice in the same cache, only with a different key, so returning duplicates from a query is something the user already has to cope with.
>>
>>
>>> Finally, if we move the search service as a global component, there
>>> might be an impact in how we explain security: an ACL filter applied
>>> on one cache - or the index metadata produced by that cache - might
>>> not be applied in the same way by an entity being matched through a
>>> second cache.
>>> Not least a user's permission to access one cache (or not) will affect
>>> his results in a rather complex way.
>> I'll let Tristan comment more on this, but is this really different from an SQL database where you grant access on individual tables and run a query involving multiple of them?
>>
>> The difference would be that in a DB each table will have its own index(es), so they only have to check the permissions once and not for every row.
>>
>> OTOH, if we plan to support key-level permissions, that would require checking the permissions on each search result anyway, so this wouldn't cost us anything.
>>   
>>
>>> I'm wondering if we need to prevent such situations.
>>>
>>> Sanne
>>>
>>> On 25 February 2014 16:24, Mircea Markus <mmarkus at redhat.com> wrote:
>>>> On Feb 25, 2014, at 3:46 PM, Adrian Nistor <anistor at gmail.com> wrote:
>>>>
>>>>> They can do what they please. Either put multiple types in one basket or put them in separate caches (one type per cache). But allowing / recommending is one thing, mandating it is a different story.
>>>>>
>>>>> There's no reason to forbid _any_ of these scenarios / mandate one over the other! There was previously in this thread some suggestion of mandating the one type per cache usage. -1 for it
>>>> Agreed. I actually don't see how we can enforce people that declare Cache<Object,Object> not put whatever they want in it. Also makes total sense for smaller caches as it is easy to set up etc.
>>>> The debate in this email, the way I understood it, was: are/should people using multiple caches for storing data? If yes we should consider querying functionality spreading over multiple caches.
>>>>
>>>>>
>>>>>
>>>>> On Tue, Feb 25, 2014 at 5:08 PM, Mircea Markus <mmarkus at redhat.com> wrote:
>>>>>
>>>>> On Feb 25, 2014, at 9:28 AM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>>>>
>>>>>>> On 24 f?vr. 2014, at 17:39, Mircea Markus <mmarkus at redhat.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>>>>>>>
>>>>>>>> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome.
>>>>>>> Curious to hear the whole story :-)
>>>>>>> We cannot mandate all the suers to use OGM though, one of the reasons being OGM is not platform independent (hotrod).
>>>>>> Then solve all the issues I have raised with a magic wand and come back to me when you have done it, I'm interested.
>>>>> People are going to use infinispan with one cache per entity, because it makes sense:
>>>>> - different config (repl/dist | persistent/non-persistent) for different data types
>>>>> - have map/reduce tasks running only the Person entires not on Dog as well, when you want to select (Person) where age > 18
>>>>> I don't see a reason to forbid this, on the contrary. The way I see it the relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be a better abstraction and should be recommended as such for the Java clients, but ultimately we're a general purpose storage engine that is available to different platforms as well.
>>>>>
>>>>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> Cheers,


From dan.berindei at gmail.com  Wed Feb 26 10:29:26 2014
From: dan.berindei at gmail.com (Dan Berindei)
Date: Wed, 26 Feb 2014 17:29:26 +0200
Subject: [infinispan-dev] JavaDocs and API documentation
In-Reply-To: <530DF92D.1020202@redhat.com>
References: <530DE6B2.2060405@redhat.com>
	<AD154F32-9192-4386-870F-3A858E4D8A74@redhat.com>
	<530DF92D.1020202@redhat.com>
Message-ID: <CA+nfvwQF5sPtfY+p-NGhtun5xJ4X-YJhfF=2Fpp7F0hDVpZF_A@mail.gmail.com>

On Wed, Feb 26, 2014 at 4:24 PM, Tristan Tarrant <ttarrant at redhat.com>wrote:

> On 26/02/2014 15:02, Mircea Markus wrote:
> > On Feb 26, 2014, at 1:05 PM, Tristan Tarrant <ttarrant at redhat.com>
> wrote:
> >
> >> Dear all,
> >>
> >> our JavaDocs currently encompass all of our classes, interfaces, etc
> >> with no clear distinction between public and private API/SPI. I would
> >> like to clearly mark which of our classes/interfaces are public API.
> >> Should we:
> >>
> >> - add some decoration / visual cue to such elements to distinguish them
> >> from the internal stuff
> > I think Sanne mentioned and i think it was Hibernate that has impl
> sub-packages for all the non-public API.
> > Sounds sensible to me, as people will see the impl in the class name
> when importing it, and that should raise question marks. shall we adopt
> that?
> That would help, but we would still end up with a lot of noise in the
> javadocs, for example the list of classes on the left has no separation
> by package.
>
>
If we move all internal classes to .impl sub-packages, it will be quite
easy to exclude the .impl packages from javadocs with a bit of
maven-javadoc-plugin configuration. I don't think we need to generate
javadocs for the internal classes at all, as the sources are easily
accessible from any IDE.

Cheers
Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140226/d831716b/attachment-0001.html 

From mmarkus at redhat.com  Wed Feb 26 11:08:17 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Wed, 26 Feb 2014 16:08:17 +0000
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <530DFB1C.20109@redhat.com>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>	<20140205163032.GB93108@hibernate.org>	<F1EB6F85-D5B2-4459-B2BD-185124E046B3@redhat.com>	<20140217175115.GC639@hibernate.org>	<8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org>	<1256491E-44CB-4B7D-8494-CB0061954571@redhat.com>	<43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org>	<5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com>	<CAAuDtmddwEZbf1v8hod_FBX_tTUH5hz3uAkLH8KJbkW3v249rw@mail.gmail.com>	<DB9D3746-0FFD-43D6-A87D-8E3B9A8EA995@redhat.com>	<CAFm4XO1ds3eQDV0oEENGgZbdRv-rrQ-1s5TpP0_2zYhXs+T=5Q@mail.gmail.com>	<6036A294-231A-484F-8224-C77372987832@redhat.com>	<CA+nfvwSLCdDRRtCSctFfJqgHrWn-_SHKO_kEqE3D3aBGnHKLKg@mail.gmail.com>
	<EB0EDBC9-756F-4BA2-B0E0-F27C6827B60E@redhat.com> <530DFB1C.20109@!
	redhat.com>
Message-ID: <26819DE2-8557-427B-BA3A-2F5BD121DDF5@redhat.com>


On Feb 26, 2014, at 2:33 PM, Adrian Nistor <anistor at redhat.com> wrote:

> On 02/26/2014 04:20 PM, Mircea Markus wrote:
>> On Feb 26, 2014, at 2:13 PM, Dan Berindei <dan.berindei at gmail.com> wrote:
>> 
>>> 
>>> 
>>> On Wed, Feb 26, 2014 at 3:12 PM, Mircea Markus <mmarkus at redhat.com> wrote:
>>> 
>>> On Feb 25, 2014, at 5:08 PM, Sanne Grinovero <sanne at infinispan.org> wrote:
>>> 
>>>> There also is the opposite problem to be considered, as Emmanuel
>>>> suggested on 11/04/2012:
>>>> you can't forbid the user to store the same object (same type and same
>>>> id) in two different caches, where each Cache might be using different
>>>> indexing options.
>>>> 
>>>> If the "search service" is a global concept, and you run a query which
>>>> matches object X, we'll return it to the user but he won't be able to
>>>> figure out from which cache it's being sourced: is that ok?
>>> Can't the user figure that out based on the way the query is built?
>>> I mean the problem is similar with the databases: if address is both a table and an column in the USER table, then it's the query (select) that determines where from the address is returned.
>>> 
>>> You mean the user should specify the cache name(s) when building the query?
>> yes
> Let's say multiple caches are specified when building the query. How can 
> I tell (with current result api) where does the matching entity come 
> from?

I'm not talking about the current API here, just looking for a way to be able to specify the source cache for an object in the result. We should be able to do that through the query, or if the result is an alternative we can consider it.

> I still think we should extend the result api in order to provide: 
> 1. the key of the entity, 2. the name of the originating cache.  The old 
> result api that just gives you an Iterator<Object> over the matches 
> should continue to exist because it's more efficient for the cases when 
> the user does not need #1 and #2.

I wouldn't mind that, but TBH i think we should add it only if users ask for it.

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From vblagoje at redhat.com  Wed Feb 26 11:08:35 2014
From: vblagoje at redhat.com (Vladimir Blagojevic)
Date: Wed, 26 Feb 2014 11:08:35 -0500
Subject: [infinispan-dev] JavaDocs and API documentation
In-Reply-To: <CA+nfvwQF5sPtfY+p-NGhtun5xJ4X-YJhfF=2Fpp7F0hDVpZF_A@mail.gmail.com>
References: <530DE6B2.2060405@redhat.com>	<AD154F32-9192-4386-870F-3A858E4D8A74@redhat.com>	<530DF92D.1020202@redhat.com>
	<CA+nfvwQF5sPtfY+p-NGhtun5xJ4X-YJhfF=2Fpp7F0hDVpZF_A@mail.gmail.com>
Message-ID: <530E1183.3060700@redhat.com>

I agree, sounds like a sensible thing to do. But this needs to be 
planned carefully and when exactly is the good time to do it, soon and 
have it ready for 7.0.0.Final?

On 2/26/2014, 10:29 AM, Dan Berindei wrote:
>
>
>
>
> If we move all internal classes to .impl sub-packages, it will be 
> quite easy to exclude the .impl packages from javadocs with a bit of 
> maven-javadoc-plugin configuration. I don't think we need to generate 
> javadocs for the internal classes at all, as the sources are easily 
> accessible from any IDE.
>
> Cheers
> Dan


From emmanuel at hibernate.org  Wed Feb 26 12:14:49 2014
From: emmanuel at hibernate.org (Emmanuel Bernard)
Date: Wed, 26 Feb 2014 18:14:49 +0100
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>
	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>
	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>
	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>
	<20140205163032.GB93108@hibernate.org>
	<F1EB6F85-D5B2-4459-B2BD-185124E046B3@redhat.com>
	<20140217175115.GC639@hibernate.org>
	<8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org>
	<1256491E-44CB-4B7D-8494-CB0061954571@redhat.com>
	<43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org>
	<5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com>
Message-ID: <10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org>


On 25 Feb 2014, at 16:08, Mircea Markus <mmarkus at redhat.com> wrote:

> 
> On Feb 25, 2014, at 9:28 AM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
> 
>>> On 24 f?vr. 2014, at 17:39, Mircea Markus <mmarkus at redhat.com> wrote:
>>> 
>>> 
>>>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>>> 
>>>> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome.
>>> 
>>> Curious to hear the whole story :-)
>>> We cannot mandate all the suers to use OGM though, one of the reasons being OGM is not platform independent (hotrod). 
>> 
>> Then solve all the issues I have raised with a magic wand and come back to me when you have done it, I'm interested.
> 
> People are going to use infinispan with one cache per entity, because it makes sense:
> - different config (repl/dist | persistent/non-persistent) for different data types
> - have map/reduce tasks running only the Person entires not on Dog as well, when you want to select (Person) where age > 18
> I don't see a reason to forbid this, on the contrary. The way I see it the relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be a better abstraction and should be recommended as such for the Java clients, but ultimately we're a general purpose storage engine that is available to different platforms as well.
> 

I do disagree on your assessment.
I did write a whole essay on why I think your view is problematic - I was getting tired of repeating myself ;P
https://github.com/infinispan/infinispan/wiki/A-continuum-of-data-structure-and-query-complexity

To anecdotally answer your specific example, yes different configs for different entities is an interesting benefit but it has to outweigh the drawbacks.
If you have to do a map reduce for tasks so simple as age > 18, I think you system better have to be prepared to run gazillions of M/R jobs.
I think that Dogs and any domestic animal is fundamentally related to humans - Person in your case. So queries involving both will be required - a cross cache M/R is not doable today AFAIK and even if it was, it?s still M/R and all its drawbacks.

To me, the Cache API and Hot Rod are well suited for what I call self contained object graph (i.e. where Dog would be an embedded object of Person and not a separate Entity). In that situation, there is a single cache.
One cache per entity does make sense for API that do support what I call connected entities. Hibernate OGM specifically.
But please read the wiki page first before commenting. I did spend a lot of time on it
https://github.com/infinispan/infinispan/wiki/A-continuum-of-data-structure-and-query-complexity

Emmanuel

From vblagoje at redhat.com  Wed Feb 26 14:31:23 2014
From: vblagoje at redhat.com (Vladimir Blagojevic)
Date: Wed, 26 Feb 2014 14:31:23 -0500
Subject: [infinispan-dev] Distributed executors and Future(s) they return
Message-ID: <530E410B.7050101@redhat.com>

Hey,

There is an interesting request from community to include an Address along with a Future returned for a subtask being executed [1].

I think it makes sense what this user wants. We might create Future sub interface that has getAddress method and we can return an object implementing that interface instead of plain Future. In some new major release we can officially change the signature of these DistributedExecutorService methods to return i.e TargetedFuture - it would not break existing clients. Maybe even make TargetedFuture extend NotifyingFuture.

Any thoughts?

Vladimir

[1] https://community.jboss.org/thread/237442


From sanne at infinispan.org  Wed Feb 26 16:25:08 2014
From: sanne at infinispan.org (Sanne Grinovero)
Date: Wed, 26 Feb 2014 21:25:08 +0000
Subject: [infinispan-dev] Distributed executors and Future(s) they return
In-Reply-To: <530E410B.7050101@redhat.com>
References: <530E410B.7050101@redhat.com>
Message-ID: <CAFm4XO2xFM7kRYOvQ=zj6Fh6cP6qNML3zqZZNPjHHPY1o5-d3A@mail.gmail.com>

I'm a bit skeptical.
It might sound a sensible request currently, but if you do so you
inherently "promise" that tasks are going to be executed on a specific
server; AFAIK we promise execution on data locality, but maintaining a
good level of flexibility you can evolve your system to smarter load
balancing of tasks, failover operations, etc..
If you expose execution details, you won't be able to develop any of
that in future.

To make an example from the database world - seems the analogy is
common these days - it's like you run a SELECT statement but want to
pick which CPU core is going to be used. That would be really odd, as
you would take away the option from the scheduler to make an effective
choice.
Still, this approach might be desirable for a database which doesn't
do any smart scheduling.

Some of these concerns might be mitigated if you return the Address of
where the task *was* executed, after it's done. I still don't think it
should be of user's interest but at least you would be able to
implement rescheduling or failover policies in future.

Sanne


On 26 February 2014 19:31, Vladimir Blagojevic <vblagoje at redhat.com> wrote:
> Hey,
>
> There is an interesting request from community to include an Address along with a Future returned for a subtask being executed [1].
>
> I think it makes sense what this user wants. We might create Future sub interface that has getAddress method and we can return an object implementing that interface instead of plain Future. In some new major release we can officially change the signature of these DistributedExecutorService methods to return i.e TargetedFuture - it would not break existing clients. Maybe even make TargetedFuture extend NotifyingFuture.
>
> Any thoughts?
>
> Vladimir
>
> [1] https://community.jboss.org/thread/237442
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


From sanne at infinispan.org  Wed Feb 26 16:45:13 2014
From: sanne at infinispan.org (Sanne Grinovero)
Date: Wed, 26 Feb 2014 21:45:13 +0000
Subject: [infinispan-dev] Row based security & Queries (Was: Design change
	in Infinispan Query)
Message-ID: <CAFm4XO2UuWJ5mt68VJVQ+KHOU49LmscqQ66uSncCGbJN+Ber6Q@mail.gmail.com>

To clarify some points rised on the thread "Design change in
Infinispan Query", which I don't wish to derail further:

The Query engine can actually apply per-entry user restriction access
in an efficient way which doesn't involve (necessarily) to check each
result; currently this needs specific user coding but:
# it's not too hard as Hibernate Search users do it regularly: we
provide various helpers and examples. Especially in the book :-)
# is not including a pre-canned strategy as -at least in case of our
Hibernate integration - the details of how people want it done are
often exotic.

Essentially a typical solution would work with Filters: a filter is a
bitset which masks potential results from queries, so it's applied
upfront actual scoring and other more complex match operations.
These bitsets are very suited for filters, and are split on segments
so that pre-computed segments related to parts of an index which
didn't change can be reused even if the index as a whole is mutating
continually.
Such a Filter could even encode the response of some external
authorization service on a per-document base (slow but effective), or
it simply represents user group tokens which are applied as tags on
the indexed documents (more efficient as long as role definitions are
stable).

That said, I'm not suggesting that this should be a priority, but I
expect that sometime in the future we could provide a pre-canned
strategy to work out of the box with our security extensions, at least
for the benefit of remote protocols. So let's keep this in mind while
making other design decisions.

-- Sanne

From dan.berindei at gmail.com  Thu Feb 27 05:54:45 2014
From: dan.berindei at gmail.com (Dan Berindei)
Date: Thu, 27 Feb 2014 12:54:45 +0200
Subject: [infinispan-dev] Row based security & Queries (Was: Design
 change in Infinispan Query)
In-Reply-To: <CAFm4XO2UuWJ5mt68VJVQ+KHOU49LmscqQ66uSncCGbJN+Ber6Q@mail.gmail.com>
References: <CAFm4XO2UuWJ5mt68VJVQ+KHOU49LmscqQ66uSncCGbJN+Ber6Q@mail.gmail.com>
Message-ID: <CA+nfvwTQPYV7KN_+HbO5D2H5ovbvDW3C7rM9sMjO-=Lf-XGV8Q@mail.gmail.com>

Hi Sanne

Reading your reply I realized I was wrong in my "evaluation", we could
require the user to specify the secure cache(s) he wants to query when
building the query and checking that he has read rights on all of them
before executing the query, just like a DB would do. And if he doesn't
specify any cache, throw an exception if there is any indexed cache that he
doesn't have read access to. So we could implement the cache-level security
we need now without any performance hit.

But thanks for the explanation, it sounds like row-level security isn't
quite as far-fetched as I was imagining it ;)

Cheers
Dan


On Wed, Feb 26, 2014 at 11:45 PM, Sanne Grinovero <sanne at infinispan.org>wrote:

> To clarify some points rised on the thread "Design change in
> Infinispan Query", which I don't wish to derail further:
>
> The Query engine can actually apply per-entry user restriction access
> in an efficient way which doesn't involve (necessarily) to check each
> result; currently this needs specific user coding but:
> # it's not too hard as Hibernate Search users do it regularly: we
> provide various helpers and examples. Especially in the book :-)
> # is not including a pre-canned strategy as -at least in case of our
> Hibernate integration - the details of how people want it done are
> often exotic.
>
> Essentially a typical solution would work with Filters: a filter is a
> bitset which masks potential results from queries, so it's applied
> upfront actual scoring and other more complex match operations.
> These bitsets are very suited for filters, and are split on segments
> so that pre-computed segments related to parts of an index which
> didn't change can be reused even if the index as a whole is mutating
> continually.
> Such a Filter could even encode the response of some external
> authorization service on a per-document base (slow but effective), or
> it simply represents user group tokens which are applied as tags on
> the indexed documents (more efficient as long as role definitions are
> stable).
>
> That said, I'm not suggesting that this should be a priority, but I
> expect that sometime in the future we could provide a pre-canned
> strategy to work out of the box with our security extensions, at least
> for the benefit of remote protocols. So let's keep this in mind while
> making other design decisions.
>
> -- Sanne
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140227/0bbc466e/attachment-0001.html 

From sanne at infinispan.org  Thu Feb 27 06:59:08 2014
From: sanne at infinispan.org (Sanne Grinovero)
Date: Thu, 27 Feb 2014 11:59:08 +0000
Subject: [infinispan-dev] Row based security & Queries (Was: Design
 change in Infinispan Query)
In-Reply-To: <CA+nfvwTQPYV7KN_+HbO5D2H5ovbvDW3C7rM9sMjO-=Lf-XGV8Q@mail.gmail.com>
References: <CAFm4XO2UuWJ5mt68VJVQ+KHOU49LmscqQ66uSncCGbJN+Ber6Q@mail.gmail.com>
	<CA+nfvwTQPYV7KN_+HbO5D2H5ovbvDW3C7rM9sMjO-=Lf-XGV8Q@mail.gmail.com>
Message-ID: <CAFm4XO3OjbHOWurgP2Q=en6_8-UBVrCfhcCcOUuJTTzkQOy9cA@mail.gmail.com>

On 27 February 2014 10:54, Dan Berindei <dan.berindei at gmail.com> wrote:
> Hi Sanne
>
> Reading your reply I realized I was wrong in my "evaluation", we could
> require the user to specify the secure cache(s) he wants to query when
> building the query and checking that he has read rights on all of them
> before executing the query, just like a DB would do. And if he doesn't
> specify any cache, throw an exception if there is any indexed cache that he
> doesn't have read access to. So we could implement the cache-level security
> we need now without any performance hit.

Right, but when querying indexes, we target an index not a cache. A
user could have access to one cache and not another, and if we go for
a shared query engine, the current implementation allows to share
indexes.
You could have a valid situation in which an entry X stored in a Cache
A, to which you have access to, but also stored in a Cache B, to which
you have no access to, is retrieved from CacheA (so no security
problem) but because of scoring affected by additional metadata which
entered the index via Cache B.
This wouldn't be a security violation sctrictly speaking but would be
highly confusing, as it's often quite complex to figure out why some
result is matching.

As you say we could block the query if the user has no access to one
of the related caches. It could still be puzzling as someone might be
prevented to retrieve data from a ache for which he has haccess, but
that's probably easier to explain and document that shared indexes
require same access permissions on each involved cache.

Cheers,
Sanne

>
> But thanks for the explanation, it sounds like row-level security isn't
> quite as far-fetched as I was imagining it ;)
>
> Cheers
> Dan
>
>
>
> On Wed, Feb 26, 2014 at 11:45 PM, Sanne Grinovero <sanne at infinispan.org>
> wrote:
>>
>> To clarify some points rised on the thread "Design change in
>> Infinispan Query", which I don't wish to derail further:
>>
>> The Query engine can actually apply per-entry user restriction access
>> in an efficient way which doesn't involve (necessarily) to check each
>> result; currently this needs specific user coding but:
>> # it's not too hard as Hibernate Search users do it regularly: we
>> provide various helpers and examples. Especially in the book :-)
>> # is not including a pre-canned strategy as -at least in case of our
>> Hibernate integration - the details of how people want it done are
>> often exotic.
>>
>> Essentially a typical solution would work with Filters: a filter is a
>> bitset which masks potential results from queries, so it's applied
>> upfront actual scoring and other more complex match operations.
>> These bitsets are very suited for filters, and are split on segments
>> so that pre-computed segments related to parts of an index which
>> didn't change can be reused even if the index as a whole is mutating
>> continually.
>> Such a Filter could even encode the response of some external
>> authorization service on a per-document base (slow but effective), or
>> it simply represents user group tokens which are applied as tags on
>> the indexed documents (more efficient as long as role definitions are
>> stable).
>>
>> That said, I'm not suggesting that this should be a priority, but I
>> expect that sometime in the future we could provide a pre-canned
>> strategy to work out of the box with our security extensions, at least
>> for the benefit of remote protocols. So let's keep this in mind while
>> making other design decisions.
>>
>> -- Sanne
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

From galder at redhat.com  Thu Feb 27 09:23:37 2014
From: galder at redhat.com (=?windows-1252?Q?Galder_Zamarre=F1o?=)
Date: Thu, 27 Feb 2014 15:23:37 +0100
Subject: [infinispan-dev] On the topic of Map/Reduce and Hadoop
Message-ID: <A09069B6-A9FB-4E43-A658-7050D04F9943@redhat.com>

Hi,

Recently we had an email thread on Map/Reduce and Hadoop?s API/mechanisms to do Map/Reduce.

I?ve just finished watching [1], which looks at Hadoop?s Java API and then looks at evolutions, improvements that functional programming have enabled. 

Some food for thought :)

Cheers,

[1] http://www.infoq.com/presentations/big-data-functional-programming
--
Galder Zamarre?o
galder at redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


From vblagoje at redhat.com  Thu Feb 27 10:28:08 2014
From: vblagoje at redhat.com (Vladimir Blagojevic)
Date: Thu, 27 Feb 2014 10:28:08 -0500
Subject: [infinispan-dev] Distributed executors and Future(s) they return
In-Reply-To: <CAFm4XO2xFM7kRYOvQ=zj6Fh6cP6qNML3zqZZNPjHHPY1o5-d3A@mail.gmail.com>
References: <530E410B.7050101@redhat.com>
	<CAFm4XO2xFM7kRYOvQ=zj6Fh6cP6qNML3zqZZNPjHHPY1o5-d3A@mail.gmail.com>
Message-ID: <530F5988.6010305@redhat.com>

Hmm very good points Sanne. Yeah I think we can have a contract that 
returns an Address were task was executed.


Cheers,
Vladimir
On 2/26/2014, 4:25 PM, Sanne Grinovero wrote:
> I'm a bit skeptical.
> It might sound a sensible request currently, but if you do so you
> inherently "promise" that tasks are going to be executed on a specific
> server; AFAIK we promise execution on data locality, but maintaining a
> good level of flexibility you can evolve your system to smarter load
> balancing of tasks, failover operations, etc..
> If you expose execution details, you won't be able to develop any of
> that in future.
>
> To make an example from the database world - seems the analogy is
> common these days - it's like you run a SELECT statement but want to
> pick which CPU core is going to be used. That would be really odd, as
> you would take away the option from the scheduler to make an effective
> choice.
> Still, this approach might be desirable for a database which doesn't
> do any smart scheduling.
>
> Some of these concerns might be mitigated if you return the Address of
> where the task *was* executed, after it's done. I still don't think it
> should be of user's interest but at least you would be able to
> implement rescheduling or failover policies in future.
>
> Sanne
>
>
> On 26 February 2014 19:31, Vladimir Blagojevic <vblagoje at redhat.com> wrote:
>> Hey,
>>
>> There is an interesting request from community to include an Address along with a Future returned for a subtask being executed [1].
>>
>> I think it makes sense what this user wants. We might create Future sub interface that has getAddress method and we can return an object implementing that interface instead of plain Future. In some new major release we can officially change the signature of these DistributedExecutorService methods to return i.e TargetedFuture - it would not break existing clients. Maybe even make TargetedFuture extend NotifyingFuture.
>>
>> Any thoughts?
>>
>> Vladimir
>>
>> [1] https://community.jboss.org/thread/237442
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


From mmarkus at redhat.com  Thu Feb 27 11:58:32 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Thu, 27 Feb 2014 16:58:32 +0000
Subject: [infinispan-dev] Distributed executors and Future(s) they return
In-Reply-To: <530F5988.6010305@redhat.com>
References: <530E410B.7050101@redhat.com>
	<CAFm4XO2xFM7kRYOvQ=zj6Fh6cP6qNML3zqZZNPjHHPY1o5-d3A@mail.gmail.com>
	<530F5988.6010305@redhat.com>
Message-ID: <3F406743-FE81-467E-90BD-2B52D52C7A0F@redhat.com>


On Feb 27, 2014, at 3:28 PM, Vladimir Blagojevic <vblagoje at redhat.com> wrote:

> Hmm very good points Sanne. Yeah I think we can have a contract that 
> returns an Address were task was executed.
> 
> 
> Cheers,
> Vladimir
> On 2/26/2014, 4:25 PM, Sanne Grinovero wrote:
>> I'm a bit skeptical.
>> It might sound a sensible request currently, but if you do so you
>> inherently "promise" that tasks are going to be executed on a specific
>> server; AFAIK we promise execution on data locality,

We allow execution to be bound on a specific address: http://goo.gl/H5qTJZ
I see your point with data locality vs. specific server.  


>> but maintaining a
>> good level of flexibility you can evolve your system to smarter load
>> balancing of tasks, failover operations, etc..
>> If you expose execution details, you won't be able to develop any of
>> that in future.
>> 
>> To make an example from the database world - seems the analogy is
>> common these days - it's like you run a SELECT statement but want to
>> pick which CPU core is going to be used. That would be really odd, as
>> you would take away the option from the scheduler to make an effective
>> choice.
>> Still, this approach might be desirable for a database which doesn't
>> do any smart scheduling.
>> 
>> Some of these concerns might be mitigated if you return the Address of
>> where the task *was* executed, after it's done. I still don't think it
>> should be of user's interest but at least you would be able to
>> implement rescheduling or failover policies in future.

We already have failure policies in place, but the user only needs to audit the failure, not to failover. If users are interested on knowing the failures, another way of doing it is the current future, in the Future.get to throw a custom exception (subclass of ExecutionException) containing as information where the execution failed. 

>> 
>> Sanne
>> 
>> 
>> On 26 February 2014 19:31, Vladimir Blagojevic <vblagoje at redhat.com> wrote:
>>> Hey,
>>> 
>>> There is an interesting request from community to include an Address along with a Future returned for a subtask being executed [1].
>>> 
>>> I think it makes sense what this user wants. We might create Future sub interface that has getAddress method and we can return an object implementing that interface instead of plain Future. In some new major release we can officially change the signature of these DistributedExecutorService methods to return i.e TargetedFuture - it would not break existing clients. Maybe even make TargetedFuture extend NotifyingFuture.
>>> 
>>> Any thoughts?
>>> 
>>> Vladimir
>>> 
>>> [1] https://community.jboss.org/thread/237442
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From sanne at infinispan.org  Thu Feb 27 13:13:20 2014
From: sanne at infinispan.org (Sanne Grinovero)
Date: Thu, 27 Feb 2014 18:13:20 +0000
Subject: [infinispan-dev] Distributed executors and Future(s) they return
In-Reply-To: <3F406743-FE81-467E-90BD-2B52D52C7A0F@redhat.com>
References: <530E410B.7050101@redhat.com>
	<CAFm4XO2xFM7kRYOvQ=zj6Fh6cP6qNML3zqZZNPjHHPY1o5-d3A@mail.gmail.com>
	<530F5988.6010305@redhat.com>
	<3F406743-FE81-467E-90BD-2B52D52C7A0F@redhat.com>
Message-ID: <CAFm4XO14=BGzccZ0nJG8TmpjZxJ-B3+qBuvZRWt4gpu1GwGnng@mail.gmail.com>

On 27 February 2014 16:58, Mircea Markus <mmarkus at redhat.com> wrote:
>
> On Feb 27, 2014, at 3:28 PM, Vladimir Blagojevic <vblagoje at redhat.com> wrote:
>
>> Hmm very good points Sanne. Yeah I think we can have a contract that
>> returns an Address were task was executed.
>>
>>
>> Cheers,
>> Vladimir
>> On 2/26/2014, 4:25 PM, Sanne Grinovero wrote:
>>> I'm a bit skeptical.
>>> It might sound a sensible request currently, but if you do so you
>>> inherently "promise" that tasks are going to be executed on a specific
>>> server; AFAIK we promise execution on data locality,
>
> We allow execution to be bound on a specific address: http://goo.gl/H5qTJZ

I know but I think that smells :)
Stuff like _Address_ should be an implementation detail. Maybe one day
you'll see why and we'll deprecate it ;-)

> I see your point with data locality vs. specific server.
>
>
>>> but maintaining a
>>> good level of flexibility you can evolve your system to smarter load
>>> balancing of tasks, failover operations, etc..
>>> If you expose execution details, you won't be able to develop any of
>>> that in future.
>>>
>>> To make an example from the database world - seems the analogy is
>>> common these days - it's like you run a SELECT statement but want to
>>> pick which CPU core is going to be used. That would be really odd, as
>>> you would take away the option from the scheduler to make an effective
>>> choice.
>>> Still, this approach might be desirable for a database which doesn't
>>> do any smart scheduling.
>>>
>>> Some of these concerns might be mitigated if you return the Address of
>>> where the task *was* executed, after it's done. I still don't think it
>>> should be of user's interest but at least you would be able to
>>> implement rescheduling or failover policies in future.
>
> We already have failure policies in place, but the user only needs to audit the failure, not to failover. If users are interested on knowing the failures, another way of doing it is the current future, in the Future.get to throw a custom exception (subclass of ExecutionException) containing as information where the execution failed.

Right, but the question is if the user really wants to know the
intermediate failures? I suspect that if someone asks for this, he's
actually wishing to implement his own failower policy & monitoring.
>From the point of view of someone running a database query, I think
the user would love to ignore issues altogether, but the real world
forces him to at least consider that the whole operation might fail.
Sending him specific notifications or exceptions of something that was
succesfull but was actually run on a different resource set than what
was originally planned is I'd say an exotic request.

I like the idea of providing additional information in a Future
subtype, but I don't think you should throw it on a get() operation.
You could simply add getters to the FutureExtended to retrieve like an
execution plan history, a trace of intermediate failures, etc.

Sanne

>
>>>
>>> Sanne
>>>
>>>
>>> On 26 February 2014 19:31, Vladimir Blagojevic <vblagoje at redhat.com> wrote:
>>>> Hey,
>>>>
>>>> There is an interesting request from community to include an Address along with a Future returned for a subtask being executed [1].
>>>>
>>>> I think it makes sense what this user wants. We might create Future sub interface that has getAddress method and we can return an object implementing that interface instead of plain Future. In some new major release we can officially change the signature of these DistributedExecutorService methods to return i.e TargetedFuture - it would not break existing clients. Maybe even make TargetedFuture extend NotifyingFuture.
>>>>
>>>> Any thoughts?
>>>>
>>>> Vladimir
>>>>
>>>> [1] https://community.jboss.org/thread/237442
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> Cheers,
> --
> Mircea Markus
> Infinispan lead (www.infinispan.org)
>
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


From dan.berindei at gmail.com  Thu Feb 27 13:40:15 2014
From: dan.berindei at gmail.com (Dan Berindei)
Date: Thu, 27 Feb 2014 20:40:15 +0200
Subject: [infinispan-dev] Distributed executors and Future(s) they return
In-Reply-To: <CAFm4XO14=BGzccZ0nJG8TmpjZxJ-B3+qBuvZRWt4gpu1GwGnng@mail.gmail.com>
References: <530E410B.7050101@redhat.com>
	<CAFm4XO2xFM7kRYOvQ=zj6Fh6cP6qNML3zqZZNPjHHPY1o5-d3A@mail.gmail.com>
	<530F5988.6010305@redhat.com>
	<3F406743-FE81-467E-90BD-2B52D52C7A0F@redhat.com>
	<CAFm4XO14=BGzccZ0nJG8TmpjZxJ-B3+qBuvZRWt4gpu1GwGnng@mail.gmail.com>
Message-ID: <CA+nfvwQvcT=MLkCd0+0-ZeDcPvzjY-rY4zeerP9Z3uY1372mog@mail.gmail.com>

On Thu, Feb 27, 2014 at 8:13 PM, Sanne Grinovero <sanne at infinispan.org>wrote:

> On 27 February 2014 16:58, Mircea Markus <mmarkus at redhat.com> wrote:
> >
> > On Feb 27, 2014, at 3:28 PM, Vladimir Blagojevic <vblagoje at redhat.com>
> wrote:
> >
> >> Hmm very good points Sanne. Yeah I think we can have a contract that
> >> returns an Address were task was executed.
> >>
> >>
> >> Cheers,
> >> Vladimir
> >> On 2/26/2014, 4:25 PM, Sanne Grinovero wrote:
> >>> I'm a bit skeptical.
> >>> It might sound a sensible request currently, but if you do so you
> >>> inherently "promise" that tasks are going to be executed on a specific
> >>> server; AFAIK we promise execution on data locality,
> >
> > We allow execution to be bound on a specific address:
> http://goo.gl/H5qTJZ
>
> I know but I think that smells :)
> Stuff like _Address_ should be an implementation detail. Maybe one day
> you'll see why and we'll deprecate it ;-)
>
> > I see your point with data locality vs. specific server.
> >
> >
> >>> but maintaining a
> >>> good level of flexibility you can evolve your system to smarter load
> >>> balancing of tasks, failover operations, etc..
> >>> If you expose execution details, you won't be able to develop any of
> >>> that in future.
> >>>
> >>> To make an example from the database world - seems the analogy is
> >>> common these days - it's like you run a SELECT statement but want to
> >>> pick which CPU core is going to be used. That would be really odd, as
> >>> you would take away the option from the scheduler to make an effective
> >>> choice.
> >>> Still, this approach might be desirable for a database which doesn't
> >>> do any smart scheduling.
> >>>
> >>> Some of these concerns might be mitigated if you return the Address of
> >>> where the task *was* executed, after it's done. I still don't think it
> >>> should be of user's interest but at least you would be able to
> >>> implement rescheduling or failover policies in future.
> >
> > We already have failure policies in place, but the user only needs to
> audit the failure, not to failover. If users are interested on knowing the
> failures, another way of doing it is the current future, in the Future.get
> to throw a custom exception (subclass of ExecutionException) containing as
> information where the execution failed.
>
> Right, but the question is if the user really wants to know the
> intermediate failures? I suspect that if someone asks for this, he's
> actually wishing to implement his own failower policy & monitoring.
> >From the point of view of someone running a database query, I think
> the user would love to ignore issues altogether, but the real world
> forces him to at least consider that the whole operation might fail.
> Sending him specific notifications or exceptions of something that was
> succesfull but was actually run on a different resource set than what
> was originally planned is I'd say an exotic request.
>

I don't think the user was after the address of the "real" executing node,
I believe he just wanted a way to map each Future to the target address
doing a submitEverywhere(task).


>
> I like the idea of providing additional information in a Future
> subtype, but I don't think you should throw it on a get() operation.
> You could simply add getters to the FutureExtended to retrieve like an
> execution plan history, a trace of intermediate failures, etc.
>
>
That sounds good, but we shouldn't limit that to just the result of one
distributed task execution. We could take the opportunity to return
something other than List<Future> from submitEverywhere(task) as well,
doing a foreach to get all the results is a bit tedious. And even if we'd
like users to treat the results from different nodes as interchangeable,
sometimes they're not, so a way of getting the result from one particular
node would also be useful.


> Sanne
>
> >
> >>>
> >>> Sanne
> >>>
> >>>
> >>> On 26 February 2014 19:31, Vladimir Blagojevic <vblagoje at redhat.com>
> wrote:
> >>>> Hey,
> >>>>
> >>>> There is an interesting request from community to include an Address
> along with a Future returned for a subtask being executed [1].
> >>>>
> >>>> I think it makes sense what this user wants. We might create Future
> sub interface that has getAddress method and we can return an object
> implementing that interface instead of plain Future. In some new major
> release we can officially change the signature of these
> DistributedExecutorService methods to return i.e TargetedFuture - it would
> not break existing clients. Maybe even make TargetedFuture extend
> NotifyingFuture.
> >>>>
> >>>> Any thoughts?
> >>>>
> >>>> Vladimir
> >>>>
> >>>> [1] https://community.jboss.org/thread/237442
> >>>>
> >>>> _______________________________________________
> >>>> infinispan-dev mailing list
> >>>> infinispan-dev at lists.jboss.org
> >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>> _______________________________________________
> >>> infinispan-dev mailing list
> >>> infinispan-dev at lists.jboss.org
> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>
> >> _______________________________________________
> >> infinispan-dev mailing list
> >> infinispan-dev at lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> > Cheers,
> > --
> > Mircea Markus
> > Infinispan lead (www.infinispan.org)
> >
> >
> >
> >
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140227/df2315e6/attachment-0001.html 

From sanne at infinispan.org  Thu Feb 27 14:03:58 2014
From: sanne at infinispan.org (Sanne Grinovero)
Date: Thu, 27 Feb 2014 19:03:58 +0000
Subject: [infinispan-dev] Distributed executors and Future(s) they return
In-Reply-To: <CA+nfvwQvcT=MLkCd0+0-ZeDcPvzjY-rY4zeerP9Z3uY1372mog@mail.gmail.com>
References: <530E410B.7050101@redhat.com>
	<CAFm4XO2xFM7kRYOvQ=zj6Fh6cP6qNML3zqZZNPjHHPY1o5-d3A@mail.gmail.com>
	<530F5988.6010305@redhat.com>
	<3F406743-FE81-467E-90BD-2B52D52C7A0F@redhat.com>
	<CAFm4XO14=BGzccZ0nJG8TmpjZxJ-B3+qBuvZRWt4gpu1GwGnng@mail.gmail.com>
	<CA+nfvwQvcT=MLkCd0+0-ZeDcPvzjY-rY4zeerP9Z3uY1372mog@mail.gmail.com>
Message-ID: <CAFm4XO1s_OMX7W-DD7edspPzgSCqhhjhBS5hFOKzyiT8-x1JUA@mail.gmail.com>

On 27 February 2014 18:40, Dan Berindei <dan.berindei at gmail.com> wrote:
>
>
>
> On Thu, Feb 27, 2014 at 8:13 PM, Sanne Grinovero <sanne at infinispan.org>
> wrote:
>>
>> On 27 February 2014 16:58, Mircea Markus <mmarkus at redhat.com> wrote:
>> >
>> > On Feb 27, 2014, at 3:28 PM, Vladimir Blagojevic <vblagoje at redhat.com>
>> > wrote:
>> >
>> >> Hmm very good points Sanne. Yeah I think we can have a contract that
>> >> returns an Address were task was executed.
>> >>
>> >>
>> >> Cheers,
>> >> Vladimir
>> >> On 2/26/2014, 4:25 PM, Sanne Grinovero wrote:
>> >>> I'm a bit skeptical.
>> >>> It might sound a sensible request currently, but if you do so you
>> >>> inherently "promise" that tasks are going to be executed on a specific
>> >>> server; AFAIK we promise execution on data locality,
>> >
>> > We allow execution to be bound on a specific address:
>> > http://goo.gl/H5qTJZ
>>
>> I know but I think that smells :)
>> Stuff like _Address_ should be an implementation detail. Maybe one day
>> you'll see why and we'll deprecate it ;-)
>>
>> > I see your point with data locality vs. specific server.
>> >
>> >
>> >>> but maintaining a
>> >>> good level of flexibility you can evolve your system to smarter load
>> >>> balancing of tasks, failover operations, etc..
>> >>> If you expose execution details, you won't be able to develop any of
>> >>> that in future.
>> >>>
>> >>> To make an example from the database world - seems the analogy is
>> >>> common these days - it's like you run a SELECT statement but want to
>> >>> pick which CPU core is going to be used. That would be really odd, as
>> >>> you would take away the option from the scheduler to make an effective
>> >>> choice.
>> >>> Still, this approach might be desirable for a database which doesn't
>> >>> do any smart scheduling.
>> >>>
>> >>> Some of these concerns might be mitigated if you return the Address of
>> >>> where the task *was* executed, after it's done. I still don't think it
>> >>> should be of user's interest but at least you would be able to
>> >>> implement rescheduling or failover policies in future.
>> >
>> > We already have failure policies in place, but the user only needs to
>> > audit the failure, not to failover. If users are interested on knowing the
>> > failures, another way of doing it is the current future, in the Future.get
>> > to throw a custom exception (subclass of ExecutionException) containing as
>> > information where the execution failed.
>>
>> Right, but the question is if the user really wants to know the
>> intermediate failures? I suspect that if someone asks for this, he's
>> actually wishing to implement his own failower policy & monitoring.
>> >From the point of view of someone running a database query, I think
>> the user would love to ignore issues altogether, but the real world
>> forces him to at least consider that the whole operation might fail.
>> Sending him specific notifications or exceptions of something that was
>> succesfull but was actually run on a different resource set than what
>> was originally planned is I'd say an exotic request.
>
>
> I don't think the user was after the address of the "real" executing node, I
> believe he just wanted a way to map each Future to the target address doing
> a submitEverywhere(task).
>
>>
>>
>> I like the idea of providing additional information in a Future
>> subtype, but I don't think you should throw it on a get() operation.
>> You could simply add getters to the FutureExtended to retrieve like an
>> execution plan history, a trace of intermediate failures, etc.
>>
>
> That sounds good, but we shouldn't limit that to just the result of one
> distributed task execution. We could take the opportunity to return
> something other than List<Future> from submitEverywhere(task) as well, doing
> a foreach to get all the results is a bit tedious. And even if we'd like
> users to treat the results from different nodes as interchangeable,
> sometimes they're not, so a way of getting the result from one particular
> node would also be useful.

+1

>
>
>>
>> Sanne
>>
>> >
>> >>>
>> >>> Sanne
>> >>>
>> >>>
>> >>> On 26 February 2014 19:31, Vladimir Blagojevic <vblagoje at redhat.com>
>> >>> wrote:
>> >>>> Hey,
>> >>>>
>> >>>> There is an interesting request from community to include an Address
>> >>>> along with a Future returned for a subtask being executed [1].
>> >>>>
>> >>>> I think it makes sense what this user wants. We might create Future
>> >>>> sub interface that has getAddress method and we can return an object
>> >>>> implementing that interface instead of plain Future. In some new major
>> >>>> release we can officially change the signature of these
>> >>>> DistributedExecutorService methods to return i.e TargetedFuture - it would
>> >>>> not break existing clients. Maybe even make TargetedFuture extend
>> >>>> NotifyingFuture.
>> >>>>
>> >>>> Any thoughts?
>> >>>>
>> >>>> Vladimir
>> >>>>
>> >>>> [1] https://community.jboss.org/thread/237442
>> >>>>
>> >>>> _______________________________________________
>> >>>> infinispan-dev mailing list
>> >>>> infinispan-dev at lists.jboss.org
>> >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >>> _______________________________________________
>> >>> infinispan-dev mailing list
>> >>> infinispan-dev at lists.jboss.org
>> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >>
>> >> _______________________________________________
>> >> infinispan-dev mailing list
>> >> infinispan-dev at lists.jboss.org
>> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> >
>> > Cheers,
>> > --
>> > Mircea Markus
>> > Infinispan lead (www.infinispan.org)
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > infinispan-dev mailing list
>> > infinispan-dev at lists.jboss.org
>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

From manik at infinispan.org  Thu Feb 27 16:45:19 2014
From: manik at infinispan.org (Manik Surtani)
Date: Thu, 27 Feb 2014 13:45:19 -0800
Subject: [infinispan-dev] Git repo very large?
Message-ID: <CAKUx19oa+WxMErXcH8t_wsCTObV1wUeoTZiGndTdy9KEMh8s-A@mail.gmail.com>

Hi guys.

Why's the git repo over 100MB in size for a fresh checkout?  Most of this
seems to be consumed by git objects:

~/Code/infinispan/.git GIT_DIR! pwd

/Users/manik/Code/infinispan/.git

~/Code/infinispan/.git GIT_DIR! du -hs .

 54M .

Perhaps we added some large files at some point and then removed them?  If
that is the case we'd need to clean up history as well.  Pls have a look at
http://rtyley.github.io/bfg-repo-cleaner/ and
http://stackoverflow.com/questions/6884331/git-repo-still-huge-after-large-files-removed-from-repository-history...
the repo shouldn't be more than 20 or 30 MB.

- M
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140227/ab31f5d2/attachment.html 

From dan.berindei at gmail.com  Fri Feb 28 06:57:21 2014
From: dan.berindei at gmail.com (Dan Berindei)
Date: Fri, 28 Feb 2014 13:57:21 +0200
Subject: [infinispan-dev] Infinispan 7.0.0.Alpha1
Message-ID: <CA+nfvwScWkZY1ZQhsynwGLefh0TRAdTw3d8pmUOyXqrgJGahtA@mail.gmail.com>

Dear Infinispan community,

We're proud to announce the first Alpha release of Infinispan 7.0.0.

This release adds several new features:

   - Support for clustered listeners. One of the limitation of Infinispan's
   distributed mode used to be that listeners could only receive events for
   cache modifications on their own node. That's no longer the case, and it
   paves the way for a long-requested feature: HotRod listeners.


   - Map/Reduce tasks can now execute the mapper/combiner/reducer on
   multiple threads<http://blog.infinispan.org/2014/02/mapreduce-parallel-execution.html>.
   Stay tuned for more Map/Reduce improvements in the near future.


   - The first essential component of cache security has been added, which
   will be the building block for remote protocol authentication and
   authorization.


   - Improved OSGi support in the HotRod Java client. The core components
   are also getting into shape for OSGi, expect more on this front in the next
   release.


As you can see, many of the new features are stepping stones for bigger
things yet to come. Feel free to join us and shape the future releases on
our forums <http://www.jboss.org/infinispan/forums>, our mailing
lists<https://lists.jboss.org/mailman/listinfo/infinispan-dev> or
our #infinispan <http://webchat.freenode.net/?channels=%23infinispan> IRC
channel.

For a complete list of features and bug fixes included in this release
please refer to the release
notes<https://issues.jboss.org/secure/ReleaseNote.jspa?projectId=12310799&version=12323785>
. Visit our downloads <http://www.jboss.org/infinispan/downloads> section
to find the latest release.

Thanks to everyone for their involvement and contribution!
Happy hacking!

Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140228/fb5993c7/attachment.html 

From vblagoje at redhat.com  Fri Feb 28 10:30:52 2014
From: vblagoje at redhat.com (Vladimir Blagojevic)
Date: Fri, 28 Feb 2014 10:30:52 -0500
Subject: [infinispan-dev] On the topic of Map/Reduce and Hadoop
In-Reply-To: <A09069B6-A9FB-4E43-A658-7050D04F9943@redhat.com>
References: <A09069B6-A9FB-4E43-A658-7050D04F9943@redhat.com>
Message-ID: <5310ABAC.5040806@redhat.com>

Scala propaganda! :-) Thanks for sharing!
On 2/27/2014, 9:23 AM, Galder Zamarre?o wrote:
> Hi,
>
> Recently we had an email thread on Map/Reduce and Hadoop?s API/mechanisms to do Map/Reduce.
>
> I?ve just finished watching [1], which looks at Hadoop?s Java API and then looks at evolutions, improvements that functional programming have enabled.
>
> Some food for thought :)
>
> Cheers,
>
> [1] http://www.infoq.com/presentations/big-data-functional-programming
> --
> Galder Zamarre?o
> galder at redhat.com
> twitter.com/galderz
>
> Project Lead, Escalante
> http://escalante.io
>
> Engineer, Infinispan
> http://infinispan.org
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


From ben.cotton at ALUMNI.RUTGERS.EDU  Fri Feb 28 13:16:24 2014
From: ben.cotton at ALUMNI.RUTGERS.EDU (cotton-ben)
Date: Fri, 28 Feb 2014 10:16:24 -0800 (PST)
Subject: [infinispan-dev] Musings on ISPN/JGRPs OSI transport choices and
	ambitions
Message-ID: <1393611384264-4028925.post@n3.nabble.com>

Hi Mircea, Manik, Bela, et. al.

I want to more publicly muse on this SUBJ line.  Here now, then maybe in
ISPN /user/ forum, then maybe JSR-347 provider wide.  I know we had a
semi-private (Bela led) exchange, but I want to be more public with this
conversation.

Long post again.  sorry.

This is just on open musing.  I realize this musing should not expect to be
accommodated by any "oh, we got to do this in ISPN/JGRPs now!" repsonse ...
there is absolutely only the most infrequent use-case that would /today/ be
served by addressing this musing ... but tomorrow that /will/ be a different
story.

Questions::

Does the concept of ISPN/JGRPs  transport between "Cluster" nodes currently
depend on OSI transport layer sockets' participation(s)?  

In other words, if all the nodes on my "Cluster" have locality=127.0.0.1  is
ISPN/JGRPs  accommodating  enough to use a native OS IPC choice as an
intra-node transport?  

Or, is it true that my transport choices are always limited to just
{TCP,UDP} --  independent of the participating nodes' locality (and that I
am thus forced to go over an OSI loopback)?

If my transport choices are only limited to {TCP,UDP} for all node locality,
then I might ask that you consider additional upcoming modern Java transport
options.

 With the ambitions of upcoming OpenJDK JEPs,  that will make mainstream an
API capabilty that today is only available via sun.misc.Unsafe, Java will
soon have "more complete" transport options that will include all of

 { TCP, UDP,  RDMA/SDP,   IPC }

Some examples of upcoming accommodating providers=

1.  RDMA/SDP: via  Infiniband VERBS (works today in JDK 7 on OSI physical
layer IB NICs, does not work over Ethernet) 
2.  IPC via OpenHFT' SHM as IPC solution (will work this year)

Again, I realize that these transport choices are useful today only  in a
very rare use case.  However, should these transports be in your offering to
ISPN/JGRPs customers, then ISPN/JGRPs becomes   -- like all of Java has
become in recent years --  increasingly more attractive to /all/ HPC Linux
supercomputing use cases (not just ours).


--
View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/Musings-on-ISPN-JGRPs-OSI-transport-choices-and-ambitions-tp4028925.html
Sent from the Infinispan Developer List mailing list archive at Nabble.com.

From mmarkus at redhat.com  Fri Feb 28 16:14:57 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Fri, 28 Feb 2014 21:14:57 +0000
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>
	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>
	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>
	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>
	<20140205163032.GB93108@hibernate.org>
	<F1EB6F85-D5B2-4459-B2BD-185124E046B3@redhat.com>
	<20140217175115.GC639@hibernate.org>
	<8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org>
	<1256491E-44CB-4B7D-8494-CB0061954571@redhat.com>
	<43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org>
	<5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com>
	<10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org>
Message-ID: <ADEEC4FA-4F53-459A-83A7-93A3E70277D7@redhat.com>


On Feb 26, 2014, at 5:14 PM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:

> 
> On 25 Feb 2014, at 16:08, Mircea Markus <mmarkus at redhat.com> wrote:
> 
>> 
>> On Feb 25, 2014, at 9:28 AM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>> 
>>>> On 24 f?vr. 2014, at 17:39, Mircea Markus <mmarkus at redhat.com> wrote:
>>>> 
>>>> 
>>>>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>>>> 
>>>>> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome.
>>>> 
>>>> Curious to hear the whole story :-)
>>>> We cannot mandate all the suers to use OGM though, one of the reasons being OGM is not platform independent (hotrod). 
>>> 
>>> Then solve all the issues I have raised with a magic wand and come back to me when you have done it, I'm interested.
>> 
>> People are going to use infinispan with one cache per entity, because it makes sense:
>> - different config (repl/dist | persistent/non-persistent) for different data types
>> - have map/reduce tasks running only the Person entires not on Dog as well, when you want to select (Person) where age > 18
>> I don't see a reason to forbid this, on the contrary. The way I see it the relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be a better abstraction and should be recommended as such for the Java clients, but ultimately we're a general purpose storage engine that is available to different platforms as well.
>> 
> 
> I do disagree on your assessment.
> I did write a whole essay on why I think your view is problematic - I was getting tired of repeating myself ;P
> https://github.com/infinispan/infinispan/wiki/A-continuum-of-data-structure-and-query-complexity

Thanks for writing this up, it is a good taxonomy of data storage schemes and querying.

> 
> To anecdotally answer your specific example, yes different configs for different entities is an interesting benefit but it has to outweigh the drawbacks.

Using a single cache for all the types is practical at all :-) Just to expand my idea, people prefer using different caches for many reasons:
- security: Account cache has a different security requirements than the News cache
- data consistency: News is a non-transactional cache, Account require pessimistic XA transactions
- expiry: expire last year's news from the system. Not the same for Accounts
- availability: I want the Accounts cache to be backed up to another site. I don't want that for the News cache
- logical data grouping: mixing Accounts with News doesn't make sense. I might want to know which account appeared in the news, though.

> If you have to do a map reduce for tasks so simple as age > 18, I think you system better have to be prepared to run gazillions of M/R jobs.

I want to run a simple M/R job in the evening to determine who turns 18 tomorrow, to congratulate them. Once a day, not gazzilions of times, and I don't need to index the age filed just for that. Also when it comes to Map/Reduce, the drawback of holding all the data in a single cache is two-folded:
- performance: you iterate over the data that is not related to your query. 
- programming model: the Map/Reduce implementation has a dependency on both Dog and Person. If I add Cats to the cache, I'll need to update the M/R code to be aware of that as well. Same if I rename/remove Dog. Not nice.

> I think that Dogs and any domestic animal is fundamentally related to humans - Person in your case. So queries involving both will be required - a cross cache M/R is not doable today AFAIK and even if it was, it?s still M/R and all its drawbacks.
> To me, the Cache API and Hot Rod are well suited for what I call self contained object graph (i.e. where Dog would be an embedded object of Person and not a separate Entity). In that situation, there is a single cache.

I see where you come from but I don't think requiring people to use a single cache for all the entities is an option. Besides a natural logical separation, different data has different storage requirements: security, access patterns, consistency, durability, availability etc. For most of the non-trivial use cases, using a single cache just wont do. 

> One cache per entity does make sense for API that do support what I call connected entities. Hibernate OGM specifically.

OGM does a great job covering this, but it is very specific: java only and OOP - our C/S mode, hotrod specifically, is language independent and not OOP. Also I would like to comment on the following statements:
"I believe a cache API and Hot Rod are well suited to address up to the self contained object graph use case with a couple of relations maintained manually by the application but that cannot be queried. For the connected entities use case, only a high level paradigm is suited like JPA."

I don't think storing object graphs should be under scrutiny here: Infinispan C/S mode (and there's where most of the client focus is BTW) has a schema (prtobuf) that does not support object graphs. I also think expecting people to use multiple caches for multiple data types is a solid assumption to start from. And here's me speculating: these data types have logical relations between them so people will ask for querying. In order to queries on multiple data types, you can either merge them together (your suggestion) or support some sort of new cross-cache indexing/querying/api. x-cache querying is more flexible and less restraining than merging data, but from what I understand from you has certain implementation challenges. There's no pressure to take a decision now around supporting queries spreading multiple caches - just something to keep an eye on when dealing with use cases/users. ATM merging data is the only solution available, let's wait and see if people ask for more.

> But please read the wiki page first before commenting. I did spend a lot of time on it
> https://github.com/infinispan/infinispan/wiki/A-continuum-of-data-structure-and-query-complexity

I do read your comments and I really appreciate your feedback. We come from slightly different worlds and look at things from different angles, but discussions like this raise many good points.

> 
> Emmanuel
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)


From mmarkus at redhat.com  Fri Feb 28 16:17:43 2014
From: mmarkus at redhat.com (Mircea Markus)
Date: Fri, 28 Feb 2014 21:17:43 +0000
Subject: [infinispan-dev] Design change in Infinispan Query
In-Reply-To: <ADEEC4FA-4F53-459A-83A7-93A3E70277D7@redhat.com>
References: <CD353C42-9B28-40BD-AA5E-26B90170D5C7@redhat.com>
	<A983BF1B-E019-49E5-BF69-476B11EA398C@redhat.com>
	<CAAuDtmchXzBh9Pr8RXKovfQ_pGZSu4SVWai3_kwWW_NX-uS_og@mail.gmail.com>
	<3A0004F8-2CD2-4809-A190-7CD27282870B@hibernate.org>
	<45FA257A-38F6-4AC4-9E0E-F785D1904FD8@redhat.com>
	<20140205163032.GB93108@hibernate.org>
	<F1EB6F85-D5B2-4459-B2BD-185124E046B3@redhat.com>
	<20140217175115.GC639@hibernate.org>
	<8C0E6B99-E7AC-4D12-BA1D-E7E948750733@hibernate.org>
	<1256491E-44CB-4B7D-8494-CB0061954571@redhat.com>
	<43ABEAC1-A2EE-4617-9F01-52A1B1A6B729@hibernate.org>
	<5110442F-6164-48B4-A7FE-73B22B71FB94@redhat.com>
	<10C5AA9B-FB14-4708-A60A-2BFC03B900DA@hibernate.org>
	<ADEEC4FA-4F53-459A-83A7-93A3E70277D7@redhat.com>
Message-ID: <CD1DE779-1CF3-44EF-92EB-2094ABAEE952@redhat.com>


Added a correction:

On Feb 28, 2014, at 9:14 PM, Mircea Markus <mmarkus at redhat.com> wrote:

> 
>>>>> On 24 f?vr. 2014, at 17:39, Mircea Markus <mmarkus at redhat.com> wrote:
>>>>> 
>>>>> 
>>>>>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>>>>> 
>>>>>> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome.
>>>>> 
>>>>> Curious to hear the whole story :-)
>>>>> We cannot mandate all the suers to use OGM though, one of the reasons being OGM is not platform independent (hotrod). 
>>>> 
>>>> Then solve all the issues I have raised with a magic wand and come back to me when you have done it, I'm interested.
>>> 
>>> People are going to use infinispan with one cache per entity, because it makes sense:
>>> - different config (repl/dist | persistent/non-persistent) for different data types
>>> - have map/reduce tasks running only the Person entires not on Dog as well, when you want to select (Person) where age > 18
>>> I don't see a reason to forbid this, on the contrary. The way I see it the relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be a better abstraction and should be recommended as such for the Java clients, but ultimately we're a general purpose storage engine that is available to different platforms as well.
>>> 
>> 
>> I do disagree on your assessment.
>> I did write a whole essay on why I think your view is problematic - I was getting tired of repeating myself ;P
>> https://github.com/infinispan/infinispan/wiki/A-continuum-of-data-structure-and-query-complexity
> 
> Thanks for writing this up, it is a good taxonomy of data storage schemes and querying.
> 
>> 
>> To anecdotally answer your specific example, yes different configs for different entities is an interesting benefit but it has to outweigh the drawbacks.
> 
> Using a single cache for all the types is practical at all :-) Just to expand my idea, people prefer using different caches for many reasons:
                                          ^NOT 

> - security: Account cache has a different security requirements than the News cache
> - data consistency: News is a non-transactional cache, Account require pessimistic XA transactions
> - expiry: expire last year's news from the system. Not the same for Accounts
> - availability: I want the Accounts cache to be backed up to another site. I don't want that for the News cache
> - logical data grouping: mixing Accounts with News doesn't make sense. I might want to know which account appeared in the news, though.
> 
>> If you have to do a map reduce for tasks so simple as age > 18, I think you system better have to be prepared to run gazillions of M/R jobs.
> 
> I want to run a simple M/R job in the evening to determine who turns 18 tomorrow, to congratulate them. Once a day, not gazzilions of times, and I don't need to index the age filed just for that. Also when it comes to Map/Reduce, the drawback of holding all the data in a single cache is two-folded:
> - performance: you iterate over the data that is not related to your query. 
> - programming model: the Map/Reduce implementation has a dependency on both Dog and Person. If I add Cats to the cache, I'll need to update the M/R code to be aware of that as well. Same if I rename/remove Dog. Not nice.
> 
>> I think that Dogs and any domestic animal is fundamentally related to humans - Person in your case. So queries involving both will be required - a cross cache M/R is not doable today AFAIK and even if it was, it?s still M/R and all its drawbacks.
>> To me, the Cache API and Hot Rod are well suited for what I call self contained object graph (i.e. where Dog would be an embedded object of Person and not a separate Entity). In that situation, there is a single cache.
> 
> I see where you come from but I don't think requiring people to use a single cache for all the entities is an option. Besides a natural logical separation, different data has different storage requirements: security, access patterns, consistency, durability, availability etc. For most of the non-trivial use cases, using a single cache just wont do. 
> 
>> One cache per entity does make sense for API that do support what I call connected entities. Hibernate OGM specifically.
> 
> OGM does a great job covering this, but it is very specific: java only and OOP - our C/S mode, hotrod specifically, is language independent and not OOP. Also I would like to comment on the following statements:
> "I believe a cache API and Hot Rod are well suited to address up to the self contained object graph use case with a couple of relations maintained manually by the application but that cannot be queried. For the connected entities use case, only a high level paradigm is suited like JPA."
> 
> I don't think storing object graphs should be under scrutiny here: Infinispan C/S mode (and there's where most of the client focus is BTW) has a schema (prtobuf) that does not support object graphs. I also think expecting people to use multiple caches for multiple data types is a solid assumption to start from. And here's me speculating: these data types have logical relations between them so people will ask for querying. In order to queries on multiple data types, you can either merge them together (your suggestion) or support some sort of new cross-cache indexing/querying/api. x-cache querying is more flexible and less restraining than merging data, but from what I understand from you has certain implementation challenges. There's no pressure to take a decision now around supporting queries spreading multiple caches - just something to keep an eye on when dealing with use cases/users. ATM merging data is the only solution available, let's wait and see if people ask for more.
> 
>> But please read the wiki page first before commenting. I did spend a lot of time on it
>> https://github.com/infinispan/infinispan/wiki/A-continuum-of-data-structure-and-query-complexity
> 
> I do read your comments and I really appreciate your feedback. We come from slightly different worlds and look at things from different angles, but discussions like this raise many good points.
> 
>> 
>> Emmanuel
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> Cheers,
> -- 
> Mircea Markus
> Infinispan lead (www.infinispan.org)
> 
> 
> 
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)