Right. I'm familiar with the map/reduce process and the proposed improvements.
This part of the blog threw me off:
"as the map/reduce tasks now run in parallel over both the nodes in the cluster and
within the same node (multiple threads)"
To me, it implies that there are now multiple map threads per node. Further, I thought
that the map / reduce 'working set' was limited to what was in memory. I did not
realize that map / reduce would iterate over all of the data both in memory and on disk.
That is good to hear, though I'm curious if it will apply to all cache stores (e.g.
LevelDB) and how ISPN map / reduce handles a data set that is greater than the available
memory. A lot in-memory stores face this limitation when backed by on-disk stores. If the
data is retrieved one entry at a time, I don't see how multiple threads will help.
However, if it is retrieved in bulk I can see how it might. Not entirely sure.
Shane
----- Original Message -----
From: "Vladimir Blagojevic" <vblagoje(a)redhat.com>
To: "infinispan -Dev List" <infinispan-dev(a)lists.jboss.org>
Cc: "Shane Johnson" <shjohnso(a)redhat.com>
Sent: Tuesday, September 17, 2013 10:32:39 AM
Subject: Re: [infinispan-dev] blog on new cache store API
Shane,
When MapReduce command arrives on the Infinispan node it is execute on a
single thread that carries the incoming message. I have done preliminary
work on multithreaded execution [1] but I have not get around to
complete it. The main idea is that incoming thread submits a task to
executor that in turns splits map/reduce work on multiple threads and
executes work. Once work is completed incoming thread is given result to
return response back.
I am not sure how Mircea implemented parallel iteration in stores but it
is definitely a different beast. Although I agree with him that parallel
reading from stores definitely helps. The above thread we mentioned will
wait for reading from stores much less.
Hope it all makes more sense now!
Regards,
Vladimir
[1]
https://github.com/vblagoje/infinispan/tree/t_2284
On 13-09-16 4:59 PM, Shane Johnson wrote:
But there are now multiple map threads per node? Or, is there one map
thread and multiple cache store threads? I'm not sure how a single map thread could
benefit from multiple cache store threads.
Shane
----- Original Message -----
From: "Mircea Markus" <mmarkus(a)redhat.com>
To: "infinispan -Dev List" <infinispan-dev(a)lists.jboss.org>
Sent: Monday, September 16, 2013 3:31:01 PM
Subject: Re: [infinispan-dev] blog on new cache store API
On Sep 16, 2013, at 7:11 PM, Shane Johnson <shjohnso(a)redhat.com> wrote:
> "parallel iteration: it is now possible to iterate over entries in the store
with multiple threads in parallel. Map/Reduce tasks immediately benefit from this, as the
map/reduce tasks now run in parallel over both the nodes in the cluster and within the
same node (multiple threads)"
>
> Does this apply to entries in the cache as well (ISPN-2284)?
no, only the entries in the store are iterated in parallel.
Cheers,