Hey Vladimir,
On 9 May 2012 15:32, Vladimir Blagojevic <vblagoje(a)redhat.com> wrote:
Hey Sanne,
On 12-05-09 6:10 AM, Sanne Grinovero wrote:
>
> Hi all,
> ideally I would like to build the MassIndexer for Infinispan Query on
> top of Map/Reduce as it seems it should provide all what I need "out
> of the box", but technically I'd need it to support some more
> features:
>
> 1) Some resource injection in the Mapper / Reducer. I know Vladimir
> has been working on a CDI approach, is there some similar approach
> which could work for me when not having CDI in the environment?
> Having a reference to the Cache would be fine - I remember this was
> discussed a while back but forgot if some work was done on it.
I do not think so, you'd need CDI. However, see if subclassing MapReduceTask
might be a solution for you.
Right that might work.. worst case I'll write back ;)
> 2) Mapper and Reducer should work taking advantage of multiple
cores
> even on the same node .. so not just divide& Â conquer across multiple
>
> nodes but also locally. Was this done already?
Mapping is done across the entire cluster. Reduction is done only on one
node. We want to change that soon
https://community.jboss.org/wiki/Infinispan60-MapReduceEnhancements
I meant I'd need a way to process all cache entries with a single Map
instance but taking advantage of all CPU cores of a system: The
Mappers should be cloned and passed to different Runnables, each one
being fed a partition of the data (and ideally the first one to finish
should steal some work from the others).
> 3) I need the Map/Reduce to execute also on all entries stored in the
> CacheLoader entries. I don't believe that's the case today.. and even
> if I wanted to use just the Executor, I believe the CacheLoader API
> needs to provide some option to load all keys in a stream form
> appropriate for batch processing: using loadAll will likely get me
> into an OOM.
Hm, you are right. Can you please open a JIRA for this?
Cool, created ISPN-2037
>
>
> Even if we decide to perform such jobs without using Map/Reduce but
> any other form, we need to support some form of "process on all
> entries", what should we suggest as best way to iterate and perform
> modifications to all entries in the data container?
>
>
True, we need to consider this!
Cheers,
Sanne