Re: [infinispan-dev] Map Reduce 2.0

Monday, 20 February 2012



On 20 Feb 2012, at 19:07, Vladimir Blagojevic wrote:

...
 On 12-02-20 3:43 PM, Manik Surtani wrote:
> 
> I was under the impression reduce is distributed too?  Don't we do the mapping on
each node, then a first-pass reduce on each node too, before streaming results back to the
caller node?
 What we do in first-pass reduce is essentially combine and we should not do that blindly
because this eager reduction/combine only works when reduce function is both commutative
and associative! This can lead to problems when it is not:
http://philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4...
 
 So yes first-pass reduce is distributed but second-phase reduce should be distributed as
well! Currently it is not! 
Ok.
...
> This all makes sense as well, however one problem with the
consistent hash approach is that it is prone to change when there is a topology change. 
How would you deal with that?  Would you maintain a history of consistent hashes?
> 
> 
 I don't think I understand! Even if there is a topology change intermediate results
Map<KOut, List<VOut>> will be migrated, all we need is KOut's, we can hash
it and find out where List<VOut> are, no? 
Well, if you assign a set of tasks to specific nodes based on a consistent hash, the
topology then changes, you'd lose the information on where you sent specific tasks.

Cheers
Manik
--
Manik Surtani
manik(a)jboss.org
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] Map Reduce 2.0