[infinispan-dev] blog on new cache store API

Tue Sep 17 11:50:45 EDT 2013

Right. I'm familiar with the map/reduce process and the proposed improvements.

This part of the blog threw me off:

"as the map/reduce tasks now run in parallel over both the nodes in the cluster and within the same node (multiple threads)"

To me, it implies that there are now multiple map threads per node. Further, I thought that the map / reduce 'working set' was limited to what was in memory. I did not realize that map / reduce would iterate over all of the data both in memory and on disk. That is good to hear, though I'm curious if it will apply to all cache stores (e.g. LevelDB) and how ISPN map / reduce handles a data set that is greater than the available memory. A lot in-memory stores face this limitation when backed by on-disk stores. If the data is retrieved one entry at a time, I don't see how multiple threads will help. However, if it is retrieved in bulk I can see how it might. Not entirely sure.

Shane

----- Original Message -----
From: "Vladimir Blagojevic" <vblagoje at redhat.com>
To: "infinispan -Dev List" <infinispan-dev at lists.jboss.org>
Cc: "Shane Johnson" <shjohnso at redhat.com>
Sent: Tuesday, September 17, 2013 10:32:39 AM
Subject: Re: [infinispan-dev] blog on new cache store API

Shane,

When MapReduce command arrives on the Infinispan node it is execute on a 
single thread that carries the incoming message. I have done preliminary 
work on multithreaded execution [1] but I have not get around to 
complete it. The main idea is that incoming thread submits a task to 
executor that in turns splits map/reduce work on multiple threads and 
executes work. Once work is completed incoming thread is given result to 
return response back.

I am not sure how Mircea implemented parallel iteration in stores but it 
is definitely a different beast. Although I agree with him that parallel 
reading from stores definitely helps. The above thread we mentioned will 
wait for reading from stores much less.

Hope it all makes more sense now!

Regards,
Vladimir

[1] https://github.com/vblagoje/infinispan/tree/t_2284

On 13-09-16 4:59 PM, Shane Johnson wrote:
> But there are now multiple map threads per node? Or, is there one map thread and multiple cache store threads? I'm not sure how a single map thread could benefit from multiple cache store threads.
>
> Shane
>
> ----- Original Message -----
> From: "Mircea Markus" <mmarkus at redhat.com>
> To: "infinispan -Dev List" <infinispan-dev at lists.jboss.org>
> Sent: Monday, September 16, 2013 3:31:01 PM
> Subject: Re: [infinispan-dev] blog on new cache store API
>
>
> On Sep 16, 2013, at 7:11 PM, Shane Johnson <shjohnso at redhat.com> wrote:
>
>> "parallel iteration: it is now possible to iterate over entries in the store with multiple threads in parallel. Map/Reduce tasks immediately benefit from this, as the map/reduce  tasks now run in parallel over both the nodes in the cluster and within the same node (multiple threads)"
>>
>> Does this apply to entries in the cache as well (ISPN-2284)?
> no, only the entries in the store are iterated in parallel.
>
> Cheers,