Re: [infinispan-dev] Further dist.exec and M/R API improvements

Tuesday, 25 February 2014

On 2/25/2014, 7:33 AM, Dan Berindei wrote:
...

 Do we really need special support for distributed tasks to write 
 results to another cache? We already allow a task to do

 cache.getCacheManager().getCache("outputCache").put(k, v) Yeah, very good
point Dan. Thanks for being sanity check. Mircea?

...

     >
     >> I was thinking we add a concept of
     >> DistributedTaskExecutionListener which can be specified in
     >> DistributedTaskBuilder:
     >>
     >> DistributedTaskBuilder<T>
     >> executionListener(DistributedTaskExecutionListener<K, T> listener);
     >>
     >>
     >> We needed DistributedTaskExecutionListener anyway. All
     distributed tasks
     >> might use some feedback about task progress, completion/failure
     and on.
     >> My proposal is roughly:
     >>
     >>
     >> public interface DistributedTaskExecutionListener<K, T> {
     >>
     >>     void subtaskSent(Address node, Set<K> inputKeys);
     >>     void subtaskFailed(Address node, Set<K> inputKeys,
     Exception e);
     >>     void subtaskSucceded(Address node, Set<K> inputKeys, T result);
     >>     void allSubtasksCompleted();
     >>
     >> }
     >>
     >> So much for that.
     > I think this it would make sense to add this logic for
     monitoring, + additional info such as average execution time etc.
     I'm not sure if this is a generally useful API though, unless
     there were people asking for it already?
     Ok, noted. If you remember any references about this let me know and
     I'll incorporate what people actually asked for rather than guess.

 Ok, let's wait until we get some actual requests from users then. TBH 
 I don't think distributed tasks with subtasks are something that users 
 care about. E.g. with Map/Reduce the reduce tasks are not subtasks of 
 the map/combine tasks, so this API wouldn't help.

 Hadoop has a Reporter interface that allows you to report "ticks" and 
 increment counters, maybe we should add something like that instead? 
The subtask I am referring to here is just to denote part of the 
distributed task initiated using dist.executors. This interface (maybe 
extended a bit with ideas from Reporter) could be used for both 
monitoring and more application specific logic about task re-execution 
and so on.

...

 I think we should allow each distributed task to deal with output in 
 its own way, the existing API should be enough. 
Yes, I can see your point. Mircea?
...

     >> public interface MapReduceTaskExecutionListener {
     >>
     >>     void mapTaskInitialized(Address executionAddress);
     >>     void mapTaskSucceeded(Address executionAddress);
     >>     void mapTaskFailed(Address executionTarget, Exception cause);
     >>     void mapPhaseCompleted();
     >>
     >>     void reduceTaskInitialized(Address executionAddress);
     >>     void reduceTaskSucceeded(Address executionAddress);
     >>     void reduceTaskFailed(Address address, Exception cause);
     >>     void reducePhaseCompleted();
     >>
     >> }
     > IMO - in the first stage at leas - I would rather use a simpler
     (Notifying)Future, on which the user can wait till the computation
     happens: it's simpler and more aligned with the rest of our async API.
     >
     What do you mean? We already have futures in MapReduceTask API.
     This API
     is more fine grained and allows monitoring/reporting of task progress.
     Please clarify.

 I'm not sure about the usefulness of an API like this either... if the 
 intention is to allow the user to collect statistics about duration of 
 various phases, then I think exposing the durations via MapReduceTasks 
 would be better. How would you design that API Dan? Something other than 
listener/callback interface?

...

     >> while MapReduceTask would have an additional method:
     >>
     >> public void execute(Cache<KOut, VOut> resultsCache);
     > you could overload it with cache name only method.
     Yeah, good idea. Same for usingIntermediateCache? I actually asked you
     this here https://issues.jboss.org/browse/ISPN-4021

 +1 to allow a cache name only. For the intermediate cache I don't 
 think it makes sense to allow a Cache version at all. Ok good. Deal.

Thanks,
Vladimir

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] Further dist.exec and M/R API improvements