Right.
Regarding the cache store, while I'm not fully aware of its implementation, one of the
primary reasons to use LevelDB is that the entire key set no longer has to be in memory.
That's why I asked. Because if it was designed like that, and I have no idea if it is,
m/r would not have the entire key set beforehand.
Shane
----- Original Message -----
From: "Vladimir Blagojevic" <vblagoje(a)redhat.com>
To: "Shane Johnson" <shjohnso(a)redhat.com>
Cc: "infinispan -Dev List" <infinispan-dev(a)lists.jboss.org>
Sent: Tuesday, September 17, 2013 10:59:18 AM
Subject: Re: [infinispan-dev] blog on new cache store API
On 13-09-17 11:50 AM, Shane Johnson wrote:
Right. I'm familiar with the map/reduce process and the proposed
improvements.
This part of the blog threw me off:
"as the map/reduce tasks now run in parallel over both the nodes in the cluster and
within the same node (multiple threads)"
To me, it implies that there are now multiple map threads per node. Further, I thought
that the map / reduce 'working set' was limited to what was in memory. I did not
realize that map / reduce would iterate over all of the data both in memory and on disk.
That is good to hear, though I'm curious if it will apply to all cache stores (e.g.
LevelDB) and how ISPN map / reduce handles a data set that is greater than the available
memory. A lot in-memory stores face this limitation when backed by on-disk stores. If the
data is retrieved one entry at a time, I don't see how multiple threads will help.
However, if it is retrieved in bulk I can see how it might. Not entirely sure.
The implementation in MapReduceManagerImpl.java is cache store agnostic.
Algorithm loads all keys (pinned to that owner node) and iterates over
all values one value at at time.
Now that we are breaking this down into details I am not sure how
multiple threads in cache store would help either. Mircea?
Regards,
Vladimir