[infinispan-dev] Distributed execution framework - API proposal(s)

Vladimir Blagojevic vblagoje at redhat.com
Tue Jan 11 09:18:13 EST 2011


On 11-01-11 10:28 AM, Manik Surtani wrote:
> Hi there.
>
> Lets take the 2 API sets (M/R and distributed exec) separately:
>
> M/R
> * Why does MapReduceTask extend Set?  It probably shouldn't even 
> extend DistributedRemoteTask, since some methods such as 
> execute(Callable) doesn't make sense for MapReduceTask.  Perhaps a 
> common super-interface?  That alone should simplify the generics...

Leftover that I did not catch. I meant Map<K1,K2> instead of 
Set<Map.Entry<K2, V2>>; it does not extend Set it just constraints one 
of the type parameters :-)

Completely agree on common parent that has the common setters for 
various task aspects.

> * Why does Mapper.map() return a Map? [1] Shouldn't it return a single 
> transformation for a given K and V?

Yes, I thought so too but how would we then match intermediate keys for 
reduce phase. Our plumbing has to find all the same 
transformed/intermediate keys returned from a map phase and then invoke 
the reduce phase.

> * Reducer.reduce()'s return type should not necessarily be the same 
> type as the mapped transformation.  Also, does the reducer really need 
> the key? Surely just the transformed version of each K/V pair.

Yeah, when we solve the above this should fall into place.

> * Similarly, Collator.add() should just need the address and the 
> reduced result from each node (each node would only produce 1 result!)
>
> DistExec
> * Why do you have execute() and executeAsync() with no params?  What 
> do these methods do?

In case user provided Factory for DistributedCallable. We have to handle 
cases where simple no parameter constructor for callables is not 
sufficient - hence factory. Now that I think about it maybe adding K... 
input as a parameter to call function of DistributedCallable makes more 
sense, or no? WDYT?

> * Not sure what DistributedTaskContext is all about.  What does read() 
> and write() do?  Is this meant to be a "layer" in front of the cache 
> in question?  Why not just pass in a ref to the cache?

True.

> * What does Factory do?

Creates DistributedCallables. See above.

>
> Almost there!  :-)
>
> Cheers
> Manik
>
> [1] How many times can I say "map" in one sentence?  :)
>
>
Thanks Manik. Almost there :-)


More information about the infinispan-dev mailing list