I think that the intermediate cache is not required at all. The M/R
algorithm itself can (and should!) run with memory occupied by the
result of reduction. The current implementation with Map first and
Reduce after that will always have these problems, using a cache for
temporary caching the result is only a workaround.
The only situation when temporary cache could be useful is when the
result grows linearly (or close to that or even more) with the amount of
reduced entries. This would be the case for groupBy producing Map<Color,
List<Entry>> from all entries in cache. Then the task does not scale and
should be redesigned anyway, but flushing the results into cache backed
by cache store could help.
Radim
On 02/14/2014 04:54 PM, Vladimir Blagojevic wrote:
Tristan,
Actually they are not addressed in this pull request but the feature
where custom output cache is used instead of results being returned is
next in the implementation pipeline.
Evangelos, indeed, depending on a reducer function all intermediate
KOut/VOut pairs might be moved to a single node. How would custom cache
help in this case?
Regards,
Vladimir
On 2/14/2014, 10:16 AM, Tristan Tarrant wrote:
> Hi Evangelos,
>
> you might be interested in looking into a current pull request which
> addresses some (all?) of these issues
>
>
https://github.com/infinispan/infinispan/pull/2300
>
> Tristan
>
> On 14/02/2014 16:10, Evangelos Vazaios wrote:
>> Hello everyone,
>>
>> I started using the MapReduce implementation of Infinispan and I came
>> across some possible limitations. Thus, I want to make some suggestions
>> about the MapReduce (MR) implementation of Infinispan.
>> Depending on the algorithm, there might be some memory problems,
>> especially for intermediate results.
>> An example of such a case is group by. Suppose that we have a cluster
>> of 2 nodes with 2 GB available. Let a distributed cache, where simple
>> car objects (id,brand,colour) are stored and the total size of data is
>> 3.5GB. If all objects have the same colour , then all 3.5 GB would go to
>> only one reducer, as a result an OutOfMemoryException will be thrown.
>>
>> To overcome these limitations, I propose to add as parameter the name of
>> the intermediate cache to be used. This will enable the creation of a
>> custom configured cache that deals with the memory limitations.
>>
>> Another feature that I would like to have is to set the name of the
>> output cache. The reasoning behind this is similar to the one mentioned
>> above.
>>
>> I wait for your thoughts on these two suggestions.
>>
>> Regards,
>> Evangelos
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev(a)lists.jboss.org
>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev