[infinispan-dev] MapReduce limitations and suggestions.

Fri Feb 14 10:16:05 EST 2014

Hi Evangelos,

you might be interested in looking into a current pull request which 
addresses some (all?) of these issues

https://github.com/infinispan/infinispan/pull/2300

Tristan

On 14/02/2014 16:10, Evangelos Vazaios wrote:
> Hello everyone,
>
> I started using the MapReduce implementation of Infinispan and I came
> across some possible limitations. Thus,  I want to make some suggestions
> about the MapReduce (MR) implementation of Infinispan.
> Depending on the algorithm,  there might be some memory problems,
> especially for intermediate results.
> An example of such a case is  group by. Suppose that we have a cluster
> of 2 nodes with 2 GB  available. Let a distributed cache, where simple
> car objects (id,brand,colour) are stored and the total size of data is
> 3.5GB. If all objects have the same colour , then all 3.5 GB would go to
> only one reducer, as a result an OutOfMemoryException will be thrown.
>
> To overcome these limitations, I propose to add as parameter the name of
> the intermediate cache to be used. This will enable the creation of a
> custom configured cache that deals with the memory limitations.
>
> Another feature that I would like to have is to set the name of the
> output cache. The reasoning behind this is similar to the one mentioned
> above.
>
> I wait for your thoughts on these two suggestions.
>
> Regards,
> Evangelos
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>