Re: [infinispan-dev] Map Reduce 2.0

Monday, 20 February 2012

On 12-02-20 3:43 PM, Manik Surtani wrote:
...
 I was under the impression reduce is distributed too?  Don't we
do the mapping on each node, then a first-pass reduce on each node too, before streaming
results back to the caller node? What we do in first-pass reduce is essentially
combine and we should not 
do that blindly because this eager reduction/combine only works when 
reduce function is both /commutative/ and /associative/! This can lead 
to problems when it is not: 
http://philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4...

So yes first-pass reduce is distributed but second-phase reduce should 
be distributed as well! Currently it is not!

...
 This all makes sense as well, however one problem with the consistent
hash approach is that it is prone to change when there is a topology change.  How would
you deal with that?  Would you maintain a history of consistent hashes?

 I don't think I understand! Even if there is a topology change 
intermediate results Map<KOut, List<VOut>> will be migrated, all we need 
is KOut's, we can hash it and find out where List<VOut> are, no?

Vladimir

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] Map Reduce 2.0