[infinispan-dev] blog on new cache store API

Tue Sep 17 11:59:18 EDT 2013

On 13-09-17 11:50 AM, Shane Johnson wrote:
> Right. I'm familiar with the map/reduce process and the proposed improvements.
>
> This part of the blog threw me off:
>
> "as the map/reduce tasks now run in parallel over both the nodes in the cluster and within the same node (multiple threads)"
>
> To me, it implies that there are now multiple map threads per node. Further, I thought that the map / reduce 'working set' was limited to what was in memory. I did not realize that map / reduce would iterate over all of the data both in memory and on disk. That is good to hear, though I'm curious if it will apply to all cache stores (e.g. LevelDB) and how ISPN map / reduce handles a data set that is greater than the available memory. A lot in-memory stores face this limitation when backed by on-disk stores. If the data is retrieved one entry at a time, I don't see how multiple threads will help. However, if it is retrieved in bulk I can see how it might. Not entirely sure.
>
The implementation in MapReduceManagerImpl.java is cache store agnostic. 
Algorithm loads all keys (pinned to that owner node) and iterates over 
all values one value at at time.

Now that we are breaking this down into details I am not sure how 
multiple threads in cache store would help either. Mircea?

Regards,
Vladimir