On 18.2.2014 16:36, Vladimir Blagojevic wrote:
On 2/18/2014, 4:59 AM, Dan Berindei wrote:
>
> The limitation we have now is that in the reduce phase, the entire
> list of values for one intermediate key must be in memory at once. I
> think Hadoop only loads a block of intermediate values in memory at
> once, and can even sort the intermediate values (with a user-supplied
> comparison function) so that the reduce function can work on a sorted
> list without loading the values in memory itself.
>
>
Dan and others,
This is where Sanne's idea comes into play. Why collect entire list of
intermediate values for each intermediate key and then invoke reduce on
those values when we can invoke reduce each time new intermediate value
gets inserted?
I don't know about MR in Infinispan, but MR in CouchDB is doing a very
similar thing to what you describe. In order to actually get a final
result, they have to do an entire tree of reductions, and the reduce
function has to distinguish between a "first-level" reduce (on bare
values) and rereduce (on intermediate results from previous reductions).
They are _not_ always the same, and it's fairly confusing.
LT
https://issues.jboss.org/browse/ISPN-3999
Cheers,
Vladimir
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev