Guys,
I was looking at this again recently and I still do not understand
how combiner could have different interface than Reducer! Hadoop
forces a user to implement combiner as a Reducer http://developer.yahoo.com/hadoop/tutorial/module4.html#functionality
and
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setCombinerClass%28java.lang.Class%29
In addition, the original paper does not mention any change of
types.
What we have admittedly done wrong is to apply Reducer on individual
Mapper
without checking if a reduce function is both commutative
and associative! This can lead to problems:
http://philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/
So yes, I am all for adding Combiner (it should do the optional
reducer per mapper we do automatically now) but I do not see why we
have to change the interface!
Regards,
Vladimir