[infinispan-dev] MapReduce limitations and suggestions.

Wed Feb 19 04:20:09 EST 2014

On 18.2.2014 16:36, Vladimir Blagojevic wrote:
> On 2/18/2014, 4:59 AM, Dan Berindei wrote:
>>
>> The limitation we have now is that in the reduce phase, the entire 
>> list of values for one intermediate key must be in memory at once. I 
>> think Hadoop only loads a block of intermediate values in memory at 
>> once, and can even sort the intermediate values (with a user-supplied 
>> comparison function) so that the reduce function can work on a sorted 
>> list without loading the values in memory itself.
>>
>>
> Dan and others,
> 
> This is where Sanne's idea comes into play. Why collect entire list of 
> intermediate values for each intermediate key and then invoke reduce on 
> those values when we can invoke reduce each time new intermediate value 
> gets inserted?

I don't know about MR in Infinispan, but MR in CouchDB is doing a very
similar thing to what you describe. In order to actually get a final
result, they have to do an entire tree of reductions, and the reduce
function has to distinguish between a "first-level" reduce (on bare
values) and rereduce (on intermediate results from previous reductions).
They are _not_ always the same, and it's fairly confusing.

LT

> 
> https://issues.jboss.org/browse/ISPN-3999
> 
> Cheers,
> Vladimir
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>