]
RH Bugzilla Integration commented on ISPN-4240:
-----------------------------------------------
Tristan Tarrant <ttarrant(a)redhat.com> changed the Status of [bug
Migrate one intermediate key/value per maxCollectorSize threshold
-----------------------------------------------------------------
Key: ISPN-4240
URL:
https://issues.jboss.org/browse/ISPN-4240
Project: Infinispan
Issue Type: Enhancement
Components: Distributed Execution and Map/Reduce
Reporter: Vladimir Blagojevic
Assignee: Vladimir Blagojevic
Fix For: 7.0.0.Alpha4
When we hit the maxCollectorSize threshold, we remove all the values from the collector
and we start transferring them into the intermediary cache. But the other mapper threads
keep working, and event with large collector sizes, it's very likely that the
collector will be full again before we finished sending the previous batch. So we could
end up with a lot of threads trying to insert maxCollectorSize values in the intermediary
cache in parallel, and a lot more intermediary values in memory on each mapper node than
the user would expect.
In order to alleviate this observed phenomena Dan proposed an idea to keep the
maxCollectorSize threshold, but to only move a single key at a time from the collector to
the intermediary cache when the threshold is hit (the least recently used one). That way,
the other keys would have time to collect more values (or just run the combiner a few more
times).
This way we still transfer some intermediate keys/values during map/combine phase rather
than all at once at the end of that phase thus alleviating possibility of OOM. At the end
of map/combine phase all remaining keys/values are of course transferred. Application
users can fine tune their requirements and a specific use case using maxCollectorSize
threshold.