[infinispan-dev] Map Reduce 2.0

Vladimir Blagojevic vblagoje at redhat.com
Mon Feb 20 14:07:57 EST 2012


On 12-02-20 3:43 PM, Manik Surtani wrote:
> I was under the impression reduce is distributed too?  Don't we do the mapping on each node, then a first-pass reduce on each node too, before streaming results back to the caller node?
What we do in first-pass reduce is essentially combine and we should not 
do that blindly because this eager reduction/combine only works when 
reduce function is both /commutative/ and /associative/! This can lead 
to problems when it is not: 
http://philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/

So yes first-pass reduce is distributed but second-phase reduce should 
be distributed as well! Currently it is not!

> This all makes sense as well, however one problem with the consistent hash approach is that it is prone to change when there is a topology change.  How would you deal with that?  Would you maintain a history of consistent hashes?
>
>
I don't think I understand! Even if there is a topology change 
intermediate results Map<KOut, List<VOut>> will be migrated, all we need 
is KOut's, we can hash it and find out where List<VOut> are, no?

Vladimir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20120220/c21f70e1/attachment.html 


More information about the infinispan-dev mailing list