<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    On 12-02-20 3:43 PM, Manik Surtani wrote:

    <blockquote

      cite="mid:64ECF39B-F483-4B9C-B039-05E1E7010F0A@jboss.org"

      type="cite">

      <pre wrap="">

I was under the impression reduce is distributed too?  Don't we do the mapping on each node, then a first-pass reduce on each node too, before streaming results back to the caller node?</pre>

    </blockquote>

    What we do in first-pass reduce is essentially combine and we should

    not do that blindly because this eager reduction/combine only works

    when reduce function is both <i>commutative</i> and <i>associative</i>!

    This can lead to problems when it is not:

    <a class="moz-txt-link-freetext"

href="http://philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/">http://philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/</a><br>

    <br>

    So yes first-pass reduce is distributed but second-phase reduce

    should be distributed as well! Currently it is not!<br>

    <br>

    <blockquote

      cite="mid:64ECF39B-F483-4B9C-B039-05E1E7010F0A@jboss.org"

      type="cite">

      <pre wrap="">

</pre>

      <pre wrap="">

This all makes sense as well, however one problem with the consistent hash approach is that it is prone to change when there is a topology change.  How would you deal with that?  Would you maintain a history of consistent hashes?

</pre>

    </blockquote>

    I don't think I understand! Even if there is a topology change

    intermediate results Map&lt;KOut, List&lt;VOut&gt;&gt; will be

    migrated, all we need is KOut's, we can hash it and find out where

    List&lt;VOut&gt; are, no?<br>

    <br>

    Vladimir<br>

  </body>

</html>