[infinispan-dev] MapReduce limitations and suggestions.

Tue Feb 18 10:46:05 EST 2014

On 02/18/2014 05:36 PM, Vladimir Blagojevic wrote:
> On 2/18/2014, 4:59 AM, Dan Berindei wrote:
>>
>> The limitation we have now is that in the reduce phase, the entire 
>> list of values for one intermediate key must be in memory at once. I 
>> think Hadoop only loads a block of intermediate values in memory at 
>> once, and can even sort the intermediate values (with a user-supplied 
>> comparison function) so that the reduce function can work on a sorted 
>> list without loading the values in memory itself.
>>
>>
> Dan and others,
> 
> This is where Sanne's idea comes into play. Why collect entire list of 
> intermediate values for each intermediate key and then invoke reduce on 
> those values when we can invoke reduce each time new intermediate value 
> gets inserted?
> 
Because you cant. What you are saying is more like combining than
reducing. If there is a combiner in the MapReduceTask you can execute
the combiner on a subset (in your case 2)  values with the same key and
output one. But, this is not possible always.
> https://issues.jboss.org/browse/ISPN-3999
> 
> Cheers,
> Vladimir
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>