On 05/09/2012 04:58 PM, Sanne Grinovero wrote:

2) Mapper and Reducer should work taking advantage of multiple cores
even on the same node .. so not just divide&  conquer across multiple

nodes but also locally. Was this done already?

Mapping is done across the entire cluster. Reduction is done only on one
node. We want to change that soon
https://community.jboss.org/wiki/Infinispan60-MapReduceEnhancements

I meant I'd need a way to process all cache entries with a single Map
instance but taking advantage of all CPU cores of a system: The
Mappers should be cloned and passed to different Runnables, each one
being fed a partition of the data (and ideally the first one to finish
should steal some work from the others).

I need to find my Master's thesis work (look at [1] for an overview). I implemented a compiler for a Higher-Order parallel programming language which went a bit further than map + reduce. You basically build programs out of implicitly parallel functions, tell it the number of nodes and the compiler maps the functions on the nodes, implementing "sub-reduces" in the appropriate places. The theory is probably more interesting than the implementation itself (C++ w/MPI), but it's probably still worth a read and might inspire some new directions where we can push Infinispan's distexec module.

Tristan

[1] http://www.dcs.ed.ac.uk/home/mic/dagstuhl/roopa/dagstuhl-talk.ps.gz