On 05/09/2012 04:58 PM, Sanne Grinovero wrote:
2) Mapper and Reducer should work taking advantage of multiple cores
even on the same node .. so not just divide& conquer across multiple
nodes but also locally. Was this done already?
> Mapping is done across the entire cluster. Reduction is done only on one
> node. We want to change that soon
>
https://community.jboss.org/wiki/Infinispan60-MapReduceEnhancements
I meant I'd need a way to process all cache entries with a single Map
instance but taking advantage of all CPU cores of a system: The
Mappers should be cloned and passed to different Runnables, each one
being fed a partition of the data (and ideally the first one to finish
should steal some work from the others).
I need to find my Master's thesis work (look at [1] for an overview). I
implemented a compiler for a Higher-Order parallel programming language
which went a bit further than map + reduce. You basically build programs
out of implicitly parallel functions, tell it the number of nodes and
the compiler maps the functions on the nodes, implementing "sub-reduces"
in the appropriate places. The theory is probably more interesting than
the implementation itself (C++ w/MPI), but it's probably still worth a
read and might inspire some new directions where we can push
Infinispan's distexec module.
Tristan
[1]
http://www.dcs.ed.ac.uk/home/mic/dagstuhl/roopa/dagstuhl-talk.ps.gz