[inifnispan-dev] MapReduce enhancement
by Ondra Nevelik
Hi all,
I was supposed to write an arbitrary app with Infinispan so I wanted to rewrite one of my programs that is implemented using Hadoop MapReduce to have a comparison of performance between the two.
However the way MapReduce is done in Infinispan right now greatly limits the number of problems that can be solved with it - there is a reduce phase on local data on each of the compute nodes to decrease the amount of data transferred. There is a "global" reduce after that. This means that the types of input keys/values has to be the same as output types and that differs from the original MapReduce concept.
A possible solution would be to use a "combiner function" (see [1]) instead of the local reduce phase so that the amount of data transferred could still be reduced(if applicable)(e.g. the WordCount example would still use the reduce function as the reducer) but it will be possible to have different input and output types. As I went briefly through the code of classes from mapreduce package I think there even won't be much work needed.
What do you think? Is this idea worth implementing?
[1] part 4.1 of http://www.mendeley.com/research/mapreducemerge-simplified-relational-dat...
Ondrej Nevelik
EDG QE
Red Hat Czech s.r.o.
Purkynova 99 612 45 Brno, Czech Republic
mobile: +420 724 520 140
13 years
DistributedExecutionCompletionService 2.0
by Vladimir Blagojevic
Hey,
One of the users rightfully asked for extension of
DistributedExecutionCompletionService to include task submission to
cluster of nodes - http://community.jboss.org/thread/175686?tstart=0
Galder and I debated the resulting
https://github.com/infinispan/infinispan/pull/722 and have concluded
that we want to capture the added methods in
DistributedCompletionService<V> which extends JDK's
CompletionService<V>. The problem is that
DistributedCompletionService<V> is exactly the same as
DistributedExecutorService but without generics twist. So we end up with
essentially duplicate interface.
Now, ideally we could have DistributedExecutionCompletionService simply
implement DistributedExecutorService and CompletionService but compiler
does not allow us to have generics based
DistributedExecutionCompletionService implementing non-generics based
DistributedExecutorService with the same method definitions.
Any suggestions?
Vladimir
13 years