[infinispan-dev] Map/Reduce or other batch processing on CacheLoader stored entries

Wed May 9 10:58:53 EDT 2012

Hey Vladimir,

On 9 May 2012 15:32, Vladimir Blagojevic <vblagoje at redhat.com> wrote:
> Hey Sanne,
>
>
> On 12-05-09 6:10 AM, Sanne Grinovero wrote:
>>
>> Hi all,
>> ideally I would like to build the MassIndexer for Infinispan Query on
>> top of Map/Reduce as it seems it should provide all what I need "out
>> of the box", but technically I'd need it to support some more
>> features:
>>
>> 1) Some resource injection in the Mapper / Reducer. I know Vladimir
>> has been working on a CDI approach, is there some similar approach
>> which could work for me when not having CDI in the environment?
>> Having a reference to the Cache would be fine - I remember this was
>> discussed a while back but forgot if some work was done on it.
>
> I do not think so, you'd need CDI. However, see if subclassing MapReduceTask
> might be a solution for you.

Right that might work.. worst case I'll write back ;)

>> 2) Mapper and Reducer should work taking advantage of multiple cores
>> even on the same node .. so not just divide&  conquer across multiple
>>
>> nodes but also locally. Was this done already?
>
> Mapping is done across the entire cluster. Reduction is done only on one
> node. We want to change that soon
> https://community.jboss.org/wiki/Infinispan60-MapReduceEnhancements

I meant I'd need a way to process all cache entries with a single Map
instance but taking advantage of all CPU cores of a system: The
Mappers should be cloned and passed to different Runnables, each one
being fed a partition of the data (and ideally the first one to finish
should steal some work from the others).

>
>
>> 3) I need the Map/Reduce to execute also on all entries stored in the
>> CacheLoader entries. I don't believe that's the case today.. and even
>> if I wanted to use just the Executor, I believe the CacheLoader API
>> needs to provide some option to load all keys in a stream form
>> appropriate for batch processing: using loadAll will likely get me
>> into an OOM.
>
> Hm, you are right. Can you please open a JIRA for this?

Cool, created ISPN-2037

>
>>
>>
>> Even if we decide to perform such jobs without using Map/Reduce but
>> any other form, we need to support some form of "process on all
>> entries", what should we suggest as best way to iterate and perform
>> modifications to all entries in the data container?
>>
>>
> True, we need to consider this!

Cheers,
Sanne