[infinispan-dev] MapReduce limitations and suggestions.
Ladislav Thon
lthon at redhat.com
Wed Feb 19 04:20:09 EST 2014
On 18.2.2014 16:36, Vladimir Blagojevic wrote:
> On 2/18/2014, 4:59 AM, Dan Berindei wrote:
>>
>> The limitation we have now is that in the reduce phase, the entire
>> list of values for one intermediate key must be in memory at once. I
>> think Hadoop only loads a block of intermediate values in memory at
>> once, and can even sort the intermediate values (with a user-supplied
>> comparison function) so that the reduce function can work on a sorted
>> list without loading the values in memory itself.
>>
>>
> Dan and others,
>
> This is where Sanne's idea comes into play. Why collect entire list of
> intermediate values for each intermediate key and then invoke reduce on
> those values when we can invoke reduce each time new intermediate value
> gets inserted?
I don't know about MR in Infinispan, but MR in CouchDB is doing a very
similar thing to what you describe. In order to actually get a final
result, they have to do an entire tree of reductions, and the reduce
function has to distinguish between a "first-level" reduce (on bare
values) and rereduce (on intermediate results from previous reductions).
They are _not_ always the same, and it's fairly confusing.
LT
>
> https://issues.jboss.org/browse/ISPN-3999
>
> Cheers,
> Vladimir
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
More information about the infinispan-dev
mailing list