And for the distributed executor API, how's this as an alternative (suggested by Eduardo earlier)?

DistributedExecutorService extends ExecutorService [1], adds overloaded methods of submit(Callable) with submit(Callable, K...).  In keeping with the ExecutorService contract, submit() will ensure the Callable is executed just once, on any one node.  We could add a submitEverywhere(Callable) and submitEverywhere(Callable, K...) to mean execution on _all_ nodes.

Also ships with a DistributedCallable which extends Callable, adds setCache() and setEmbeddedCacheManager() setters.  If the callable is a DistributedCallable, then these setters are called before Callable.call(); otherwise just do Callable.call().

WDYT?

[1] http://download.oracle.com/javase/6/docs/api/java/util/concurrent/ExecutorService.html

On 11 Jan 2011, at 14:18, Vladimir Blagojevic wrote:

On 11-01-11 10:28 AM, Manik Surtani wrote:
Hi there.

Lets take the 2 API sets (M/R and distributed exec) separately:

M/R
* Why does MapReduceTask extend Set?  It probably shouldn't even
extend DistributedRemoteTask, since some methods such as
execute(Callable) doesn't make sense for MapReduceTask.  Perhaps a
common super-interface?  That alone should simplify the generics...

Leftover that I did not catch. I meant Map<K1,K2> instead of
Set<Map.Entry<K2, V2>>; it does not extend Set it just constraints one
of the type parameters :-)

Completely agree on common parent that has the common setters for
various task aspects.

* Why does Mapper.map() return a Map? [1] Shouldn't it return a single
transformation for a given K and V?

Yes, I thought so too but how would we then match intermediate keys for
reduce phase. Our plumbing has to find all the same
transformed/intermediate keys returned from a map phase and then invoke
the reduce phase.

* Reducer.reduce()'s return type should not necessarily be the same
type as the mapped transformation.  Also, does the reducer really need
the key? Surely just the transformed version of each K/V pair.

Yeah, when we solve the above this should fall into place.

* Similarly, Collator.add() should just need the address and the
reduced result from each node (each node would only produce 1 result!)

DistExec
* Why do you have execute() and executeAsync() with no params?  What
do these methods do?

In case user provided Factory for DistributedCallable. We have to handle
cases where simple no parameter constructor for callables is not
sufficient - hence factory. Now that I think about it maybe adding K...
input as a parameter to call function of DistributedCallable makes more
sense, or no? WDYT?

* Not sure what DistributedTaskContext is all about.  What does read()
and write() do?  Is this meant to be a "layer" in front of the cache
in question?  Why not just pass in a ref to the cache?

True.

* What does Factory do?

Creates DistributedCallables. See above.


Almost there!  :-)

Cheers
Manik

[1] How many times can I say "map" in one sentence?  :)


Thanks Manik. Almost there :-)
_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
manik@jboss.org
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org