[
https://issues.jboss.org/browse/ISPN-4372?page=com.atlassian.jira.plugin....
]
Dan Berindei commented on ISPN-4372:
------------------------------------
I don't think it's fair to say that M/R performance depends on the input
cache's value size. There is also another factor involved:
{{WordCountMapperEmitPerValue}} coalesces all the occurrences of the same word in the
value, so the number of intermediary values {{emit()}}ed by the mapper decreases a lot as
the cache value size increases.
We should confirm that the same behaviour occurs with the basic {{WordCountMapper}},
otherwise it would be more fair to say that M/R performance is depended on the number of
the number of intermediary entries emitted by the Mapper, which is to be expected.
However, it is surprising that M/R throughput doesn't continue increasing as the cache
value size increases (and the number of intermediary values decreases) past 32KB. This is
definitely worth investigating.
Map/Reduce performance is dependent on cache value size
-------------------------------------------------------
Key: ISPN-4372
URL:
https://issues.jboss.org/browse/ISPN-4372
Project: Infinispan
Issue Type: Feature Request
Components: Distributed Execution and Map/Reduce
Affects Versions: 7.0.0.Alpha4
Reporter: Alan Field
Assignee: Dan Berindei
Labels: performance
Performance testing the Map/Reduce changes has shown that the performance improvements
vary based on the size of the values in the cache. [1] Using values from 8kB to 128kB
shows a large performance increase over Infinispan 6, but smaller and larger values are
the same or slower than Infinispan 6.
http://blog.infinispan.org/2014/06/mapreduce-performance-improvements.html
--
This message was sent by Atlassian JIRA
(v6.2.3#6260)