On 11-01-11 2:42 PM, Manik Surtani wrote:
How about this pseudocode:
On each node:
mapped = list()
for entry in cache.entries:
t = mapper.map(entry.key, entry.value)
mapped.add(t)
r = null
for t in mapped:
r = reducer.reduce(t, r)
return r to calling node
On calling node:
reduced_results = invoke map reduce task on all nodes, retrieve
map{address: result}
for r in reduced_results.entries:
remote_address = r.key
remote_reduced_result = r.value
collator.add(remote_address, remote_reduced_result)
return collator.collate()
Interesting Manik! Where did you come up with this model? The examples I
tried to write using this model (WordCount etc) would work yet your
model is definitely different than the original proposed map/reduce!
Hmm. I think both approaches are best, to allow for the most
flexibility. So perhaps overload execute() with the following:
execute(DistributedCallable c); // Javadoc says DistributedCallable
must have a no arg ctor
execute(DistributedCallableFactory cf); // Javadoc says
DistributedCallable must have a no arg ctor
execute(Class<? extends DistributedCallable> cc);
execute(Class<? extends DistributedCallableFactory> cfc);
Also, wrt. names and methods on the API on the Distributed Exec part
of things, wdyt about aligning it with JDK executors (see separate
email from me)?
Sure no problem!
And - not trying to confuse you, promise! - here's some scope
for
additional work later on as well: network classloaders to deal with
non-homogenous deployments with remote tasks like these. ;)
Cheers
Manik