Re: [infinispan-dev] Map/Reduce or other batch processing on CacheLoader stored entries

Wednesday, 9 May 2012

Hey Vladimir,

On 9 May 2012 15:32, Vladimir Blagojevic <vblagoje(a)redhat.com&gt; wrote:
...
 Hey Sanne,

 On 12-05-09 6:10 AM, Sanne Grinovero wrote:
>
> Hi all,
> ideally I would like to build the MassIndexer for Infinispan Query on
> top of Map/Reduce as it seems it should provide all what I need "out
> of the box", but technically I'd need it to support some more
> features:
>
> 1) Some resource injection in the Mapper / Reducer. I know Vladimir
> has been working on a CDI approach, is there some similar approach
> which could work for me when not having CDI in the environment?
> Having a reference to the Cache would be fine - I remember this was
> discussed a while back but forgot if some work was done on it.

 I do not think so, you'd need CDI. However, see if subclassing MapReduceTask
 might be a solution for you. 
Right that might work.. worst case I'll write back ;)

...
> 2) Mapper and Reducer should work taking advantage of multiple
cores
> even on the same node .. so not just divide&  conquer across multiple
>
> nodes but also locally. Was this done already?

 Mapping is done across the entire cluster. Reduction is done only on one
 node. We want to change that soon
 https://community.jboss.org/wiki/Infinispan60-MapReduceEnhancements 
I meant I'd need a way to process all cache entries with a single Map
instance but taking advantage of all CPU cores of a system: The
Mappers should be cloned and passed to different Runnables, each one
being fed a partition of the data (and ideally the first one to finish
should steal some work from the others).

...

> 3) I need the Map/Reduce to execute also on all entries stored in the
> CacheLoader entries. I don't believe that's the case today.. and even
> if I wanted to use just the Executor, I believe the CacheLoader API
> needs to provide some option to load all keys in a stream form
> appropriate for batch processing: using loadAll will likely get me
> into an OOM.

 Hm, you are right. Can you please open a JIRA for this? 
Cool, created ISPN-2037

...

>
>
> Even if we decide to perform such jobs without using Map/Reduce but
> any other form, we need to support some form of "process on all
> entries", what should we suggest as best way to iterate and perform
> modifications to all entries in the data container?
>
>
 True, we need to consider this! 
Cheers,
Sanne

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] Map/Reduce or other batch processing on CacheLoader stored entries