[infinispan-dev] [inifnispan-dev] MapReduce enhancement

Brent Douglas brent.n.douglas at gmail.com
Mon Jan 2 17:48:27 EST 2012


Hi Ondra & Vladimir,

I am also very keen to see Reducer.reduce be allowed to return a different
type than it accepts if it is possible. I was previously looking at using
infinispan's map reduce to replace hazelcast to run reports in a seam 2 app
but was put off by this and another issue that I think has been resolved
now.

Sincerely,

Brent Douglas


On Tue, Jan 3, 2012 at 1:49 AM, Vladimir Blagojevic <vblagoje at redhat.com>wrote:

> Hi Ondra,
>
> On 12-01-02 9:40 AM, Ondra Nevelik wrote:
> > Hi all,
> > I was supposed to write an arbitrary app with Infinispan so I wanted to
> rewrite one of my programs that is implemented using Hadoop MapReduce to
> have a comparison of performance between the two.
> >
> > However the way MapReduce is done in Infinispan right now greatly limits
> the number of problems that can be solved with it - there is a reduce phase
> on local data on each of the compute nodes to decrease the amount of data
> transferred. There is a "global" reduce after that. This means that the
> types of input keys/values has to be the same as output types and that
> differs from the original MapReduce concept.
> This is simply not true!
> >
> > A possible solution would be to use a "combiner function" (see [1])
> instead of the local reduce phase so that the amount of data transferred
> could still be reduced(if applicable)(e.g. the WordCount example would
> still use the reduce function as the reducer) but it will be possible to
> have different input and output types. As I went briefly through the code
> of classes from mapreduce package I think there even won't be much work
> needed.
> >
> > What do you think? Is this idea worth implementing?
>
> Possibly. I have not looked into combiner function. Sanne has mentioned
> it before and he might have further comments!
>
> Regards,
> Vladimir
> >
> > [1] part 4.1 of
> http://www.mendeley.com/research/mapreducemerge-simplified-relational-data-processing-on-large-clusters/
> >
> > Ondrej Nevelik
> > EDG QE
> >
> > Red Hat Czech s.r.o.
> > Purkynova 99 612 45 Brno, Czech Republic
> > mobile: +420 724 520 140
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20120103/d4083279/attachment.html 


More information about the infinispan-dev mailing list