Guys,
We need some input on how to design API regarding use of intermediate
caches [1]. As you might know one of the requirements for improving our
M/R is allowing applications to use custom defined intermediate
key/value cache used to store keys/values of map/combine phase before
being reduced in reduced phase.
Currently we have a constructor where one can specify whether to use
shared or per-task intermediate cache. And now we wanted to add an
additional method:
usingIntermediateCache(String cacheName, String cacheConfigurationName);
that will enable use of custom intermediate cache.
Now, Dan, and rightly so, thought this was a bit confusing. Are we
referring to intermediate shared or per-task intermediate cache when
using the above mentioned method.
His proposal is touse a per-task intermediate cache with our default
specified intermediate cache configuration. Remove the constructor
parameter in MapReduceTask regarding shared or non shared cache and add
configuration methods for both caches:
usingIntermediateCache(String configName) - use a per-task
intermediate cache with the given configuration
usingSharedIntermediateCache(String cache) - use a shared cache
with our default configuration
usingSharedIntermediateCache(String cache, String configName) - use
a shared cache with the given configuration
Note that we need a name for shared cache because we want to enable
application to easily remove/inspect that cache after all m/r tasks
sharing that intermediate cache have been executed.
What are your thoughts here?
Vladimir
[1]
https://issues.jboss.org/browse/ISPN-4021