]
RH Bugzilla Integration commented on ISPN-4440:
-----------------------------------------------
Alan Field <afield(a)redhat.com> changed the Status of [bug
Remove setMaxCollectorSize API from MapReduceTask
-------------------------------------------------
Key: ISPN-4440
URL:
https://issues.jboss.org/browse/ISPN-4440
Project: Infinispan
Issue Type: Task
Security Level: Public(Everyone can see)
Components: Distributed Execution and Map/Reduce
Affects Versions: 7.0.0.Alpha4
Reporter: Vladimir Blagojevic
Assignee: Vladimir Blagojevic
Priority: Minor
Fix For: 7.0.0.Alpha5
During the refinement of parallel execution of M/R algorithm we introduced an abstraction
maxCollectorSize on the level of MapReduceTask. The ideas was that during execution of
map/combine phase, number of intermediate keys/values collected in a Collector could
potentially become very large. By limiting size of collector, intermediate key/values are
transferred to intermediate cache in batches before reduce phase is executed and
OutOfMemoryError issues are avoided as well.
However, during the extensive performance phase Alan Field, Dan Berindei and I have
concluded that maxCollectorSize set to 10000 entries gives the best trade off between
performance and memory use. Therefore there is no need to expose this value to
MapReduceTask users.
Having said that there might be some uses cases where holding 10000 intermediate large
memory footprint objects might lead to OOM, and in such cases users should allocate more
heap to MapReduceTasks. We might consider introducing again this API should such a need
arise.