[infinispan-issues] [JBoss JIRA] (ISPN-4440) Remove setMaxCollectorSize API from MapReduceTask

Vladimir Blagojevic (JIRA) issues at jboss.org
Wed Jun 25 10:44:24 EDT 2014


Vladimir Blagojevic created ISPN-4440:
-----------------------------------------

             Summary: Remove setMaxCollectorSize API from MapReduceTask
                 Key: ISPN-4440
                 URL: https://issues.jboss.org/browse/ISPN-4440
             Project: Infinispan
          Issue Type: Task
      Security Level: Public (Everyone can see)
          Components: Distributed Execution and Map/Reduce
    Affects Versions: 7.0.0.Alpha4
            Reporter: Vladimir Blagojevic
            Assignee: Vladimir Blagojevic
            Priority: Minor


During the refinement of parallel execution of M/R algorithm we introduced an abstraction maxCollectorSize on the level of MapReduceTask. The ideas was that during execution of map/combine phase, number of intermediate keys/values collected in a Collector could potentially become very large. By limiting size of collector, intermediate key/values are transferred to intermediate cache in batches before reduce phase is executed and OutOfMemoryError issues are avoided as well.

However, during the extensive performance phase Alan Field, Dan Berindei and I have concluded that maxCollectorSize set to 10000 entries gives the best trade off between performance and memory use. Therefore there is no need to expose this value to MapReduceTask users. 

Having said that there might be some uses cases where holding 10000 intermediate large memory footprint objects might lead to OOM, and in such cases users should allocate more heap to MapReduceTasks. We might consider introducing again this API should such a need arise.  



--
This message was sent by Atlassian JIRA
(v6.2.6#6264)


More information about the infinispan-issues mailing list