<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br><div><div>On 20 Feb 2012, at 19:07, Vladimir Blagojevic wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">
<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
<div bgcolor="#FFFFFF" text="#000000">
On 12-02-20 3:43 PM, Manik Surtani wrote:
<blockquote cite="mid:64ECF39B-F483-4B9C-B039-05E1E7010F0A@jboss.org" type="cite">
<pre wrap="">I was under the impression reduce is distributed too? Don't we do the mapping on each node, then a first-pass reduce on each node too, before streaming results back to the caller node?</pre>
</blockquote>
What we do in first-pass reduce is essentially combine and we should
not do that blindly because this eager reduction/combine only works
when reduce function is both <i>commutative</i> and <i>associative</i>!
This can lead to problems when it is not:
<a class="moz-txt-link-freetext" href="http://philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/">http://philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/</a><br>
<br>
So yes first-pass reduce is distributed but second-phase reduce
should be distributed as well! Currently it is not!<br></div></blockquote><div><br></div><div>Ok.</div><blockquote type="cite"><div bgcolor="#FFFFFF" text="#000000"><blockquote cite="mid:64ECF39B-F483-4B9C-B039-05E1E7010F0A@jboss.org" type="cite"><pre wrap="">This all makes sense as well, however one problem with the consistent hash approach is that it is prone to change when there is a topology change. How would you deal with that? Would you maintain a history of consistent hashes?
</pre>
</blockquote>
I don't think I understand! Even if there is a topology change
intermediate results Map<KOut, List<VOut>> will be
migrated, all we need is KOut's, we can hash it and find out where
List<VOut> are, no?<br></div></blockquote><div><br></div><div>Well, if you assign a set of tasks to specific nodes based on a consistent hash, the topology then changes, you'd lose the information on where you sent specific tasks.</div><div><br></div><div>Cheers</div><div>Manik</div></div><div apple-content-edited="true">
<span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div><div>--</div><div>Manik Surtani</div><div><a href="mailto:manik@jboss.org">manik@jboss.org</a></div><div><a href="http://twitter.com/maniksurtani">twitter.com/maniksurtani</a></div><div><br></div><div>Lead, Infinispan</div><div><a href="http://www.infinispan.org">http://www.infinispan.org</a></div><div><br></div></div></span><br class="Apple-interchange-newline">
</div>
<br></body></html>