<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
On 12-02-20 3:43 PM, Manik Surtani wrote:
<blockquote
cite="mid:64ECF39B-F483-4B9C-B039-05E1E7010F0A@jboss.org"
type="cite">
<pre wrap="">
I was under the impression reduce is distributed too? Don't we do the mapping on each node, then a first-pass reduce on each node too, before streaming results back to the caller node?</pre>
</blockquote>
What we do in first-pass reduce is essentially combine and we should
not do that blindly because this eager reduction/combine only works
when reduce function is both <i>commutative</i> and <i>associative</i>!
This can lead to problems when it is not:
<a class="moz-txt-link-freetext"
href="http://philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/">http://philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/</a><br>
<br>
So yes first-pass reduce is distributed but second-phase reduce
should be distributed as well! Currently it is not!<br>
<br>
<blockquote
cite="mid:64ECF39B-F483-4B9C-B039-05E1E7010F0A@jboss.org"
type="cite">
<pre wrap="">
</pre>
<pre wrap="">
This all makes sense as well, however one problem with the consistent hash approach is that it is prone to change when there is a topology change. How would you deal with that? Would you maintain a history of consistent hashes?
</pre>
</blockquote>
I don't think I understand! Even if there is a topology change
intermediate results Map<KOut, List<VOut>> will be
migrated, all we need is KOut's, we can hash it and find out where
List<VOut> are, no?<br>
<br>
Vladimir<br>
</body>
</html>