[infinispan-dev] Map/Reduce or other batch processing on CacheLoader stored entries

Vladimir Blagojevic vblagoje at redhat.com
Wed May 9 10:32:57 EDT 2012


Hey Sanne,

On 12-05-09 6:10 AM, Sanne Grinovero wrote:
> Hi all,
> ideally I would like to build the MassIndexer for Infinispan Query on
> top of Map/Reduce as it seems it should provide all what I need "out
> of the box", but technically I'd need it to support some more
> features:
>
> 1) Some resource injection in the Mapper / Reducer. I know Vladimir
> has been working on a CDI approach, is there some similar approach
> which could work for me when not having CDI in the environment?
> Having a reference to the Cache would be fine - I remember this was
> discussed a while back but forgot if some work was done on it.
I do not think so, you'd need CDI. However, see if subclassing 
MapReduceTask might be a solution for you.

>
> 2) Mapper and Reducer should work taking advantage of multiple cores
> even on the same node .. so not just divide&  conquer across multiple
> nodes but also locally. Was this done already?
Mapping is done across the entire cluster. Reduction is done only on one 
node. We want to change that soon 
https://community.jboss.org/wiki/Infinispan60-MapReduceEnhancements

> 3) I need the Map/Reduce to execute also on all entries stored in the
> CacheLoader entries. I don't believe that's the case today.. and even
> if I wanted to use just the Executor, I believe the CacheLoader API
> needs to provide some option to load all keys in a stream form
> appropriate for batch processing: using loadAll will likely get me
> into an OOM.
Hm, you are right. Can you please open a JIRA for this?
>
>
> Even if we decide to perform such jobs without using Map/Reduce but
> any other form, we need to support some form of "process on all
> entries", what should we suggest as best way to iterate and perform
> modifications to all entries in the data container?
>
>
True, we need to consider this!


More information about the infinispan-dev mailing list