Re: [infinispan-dev] Distributed execution framework - API proposal(s)

Tuesday, 11 January 2011

On 11 Jan 2011, at 14:18, Vladimir Blagojevic wrote:

...
 On 11-01-11 10:28 AM, Manik Surtani wrote:
> Hi there.
> 
> Lets take the 2 API sets (M/R and distributed exec) separately:
> 
> M/R
> * Why does MapReduceTask extend Set?  It probably shouldn't even 
> extend DistributedRemoteTask, since some methods such as 
> execute(Callable) doesn't make sense for MapReduceTask.  Perhaps a 
> common super-interface?  That alone should simplify the generics...

 Leftover that I did not catch. I meant Map<K1,K2> instead of 
 Set<Map.Entry<K2, V2>>; it does not extend Set it just constraints one 
 of the type parameters :-)

 Completely agree on common parent that has the common setters for 
 various task aspects.

> * Why does Mapper.map() return a Map? [1] Shouldn't it return a single 
> transformation for a given K and V?

 Yes, I thought so too but how would we then match intermediate keys for 
 reduce phase. Our plumbing has to find all the same 
 transformed/intermediate keys returned from a map phase and then invoke 
 the reduce phase. 
What's an intermediate key?  :-)  For each K, V pair, we call map(K, V) which
transforms V to a T.  So all we expect from map() should be T.

...
> * Reducer.reduce()'s return type should not necessarily be
the same 
> type as the mapped transformation.  Also, does the reducer really need 
> the key? Surely just the transformed version of each K/V pair.

 Yeah, when we solve the above this should fall into place. 
Yep.  So we then just call reduce(T) for each value of T that we get from the mapping
phase (along with the previous, reduced result of R from the previous invocation).  Just
maybe, we may also need to mention the key for which the mapping took place.  In which
case, reduce(T, R) would look like reduce(K, T, R).  But TBH I don't really see the
use of passing in K.

...

> * Similarly, Collator.add() should just need the address and the 
> reduced result from each node (each node would only produce 1 result!)
> 
> DistExec
> * Why do you have execute() and executeAsync() with no params?  What 
> do these methods do?

 In case user provided Factory for DistributedCallable. We have to handle 
 cases where simple no parameter constructor for callables is not 
 sufficient - hence factory.  
Why are we constructing DistributedCallables?  Surely the user passes in an instance?

...
 Now that I think about it maybe adding K... 
 input as a parameter to call function of DistributedCallable makes more 
 sense, or no? WDYT? 
Not sure this is needed either.  

	List<ResultType> res = new DistributedRemoteTask(cache).on(key1,
key2).execute(myCallableInstance);

...
> * Not sure what DistributedTaskContext is all about.  What does
read() 
> and write() do?  Is this meant to be a "layer" in front of the cache 
> in question?  Why not just pass in a ref to the cache?

 True.

> * What does Factory do?

 Creates DistributedCallables. See above. 
Also, isn't the name DistributedRemoteTask redundant?  :-)

Cheers
Manik

--
Manik Surtani
manik(a)jboss.org
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] Distributed execution framework - API proposal(s)