[infinispan-issues] [JBoss JIRA] (ISPN-4022) M/R: Run the combiner concurrently with the mapper

Vladimir Blagojevic (JIRA) issues at jboss.org
Mon Mar 3 12:50:37 EST 2014


    [ https://issues.jboss.org/browse/ISPN-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12949517#comment-12949517 ] 

Vladimir Blagojevic commented on ISPN-4022:
-------------------------------------------

[~dan.berindei] and [~mircea.markus] I thought about this a bit further and I concluded there are additional benefits to this approach. I would call this enhancement "staggered combine". Just as Dan suggests we should invoke combine on certain thresholds (say 1K entries in a combiner) during map phase and move intermediate KOut/VOut values around cluster as these thresholds are reached. The benefit is that not only we will relieve memory pressure, we will also never run out of RAM storing map output in a Collector. In addition staggered migration of KOut/VOut to intermediate cache should alleviate some of the insertion stress we have observed in performance tests. 

If we are able to combine this feature with ISPN-3999 Sanne suggested this should be awesome! WDYT?

                
> M/R: Run the combiner concurrently with the mapper
> --------------------------------------------------
>
>                 Key: ISPN-4022
>                 URL: https://issues.jboss.org/browse/ISPN-4022
>             Project: Infinispan
>          Issue Type: Feature Request
>          Components: Core, Distributed Execution and Map/Reduce
>    Affects Versions: 6.0.1.Final
>            Reporter: Dan Berindei
>            Assignee: Vladimir Blagojevic
>             Fix For: 7.0.0.Final
>
>
> Because we only run the combiner after we finished the mapping phase, we need to keep all the results of the mapping phase in memory at once. We should split the output of the mapper into chunks and allow the combiner to process chunks while the mapper is still running, relieving some of the memory pressure. Maybe even block the mapper if there are too many chunks in-flight.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the infinispan-issues mailing list