Re: [infinispan-dev] MapReduce limitations and suggestions.

Tuesday, 18 February 2014

On 02/18/2014 05:36 PM, Vladimir Blagojevic wrote:
...
 On 2/18/2014, 4:59 AM, Dan Berindei wrote:
>
> The limitation we have now is that in the reduce phase, the entire 
> list of values for one intermediate key must be in memory at once. I 
> think Hadoop only loads a block of intermediate values in memory at 
> once, and can even sort the intermediate values (with a user-supplied 
> comparison function) so that the reduce function can work on a sorted 
> list without loading the values in memory itself.
>
>
 Dan and others,

 This is where Sanne's idea comes into play. Why collect entire list of 
 intermediate values for each intermediate key and then invoke reduce on 
 those values when we can invoke reduce each time new intermediate value 
 gets inserted?
  Because you cant. What you are saying is more like combining than
reducing. If there is a combiner in the MapReduceTask you can execute
the combiner on a subset (in your case 2)  values with the same key and
output one. But, this is not possible always.
...
 https://issues.jboss.org/browse/ISPN-3999

 Cheers,
 Vladimir
 _______________________________________________
 infinispan-dev mailing list
 infinispan-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] MapReduce limitations and suggestions.