On 11 Jan 2011, at 16:16, Vladimir Blagojevic wrote:
On 11-01-11 12:27 PM, Manik Surtani wrote:
> You can do that but, even as the paper suggests, you usually end up with just 1
result. Essentially it is a case of how simple we want to make the API. I think for a
large number of the use cases we've come across, mapping a K and V to a single T works
fine (this is a simple value mapping you see in many functional languages). The
alternative is to map an entire tuple (e.g., k1, v1 becomes k2, v2) but as Java
doesn't support multiple return values, so then you end up with Mapper.map() returning
a tuple (Map.Entry<K2, V2> in Java) which I find clunky.
>
> Perhaps, if there is adequate demand, we could extend Mapper with TupleMapper, which
would support this.
>
Ok noted, but I still need a clarification about your proposal:
Reducer<T, R> {
// incrementally reduces a transformed entry. Called once for each T produced by the
mapper.
// previously reduced value passed in each time.
R reduce(T, R);
}
Collator<R> {
// Adds reduced results from remote nodes. Called once for each R returned by a
RemoteReducer.
add(Address origin, R remote);
// collates all results added so far.
R collate();
}
So far I understand that we invoke reduce (T) for each returned T map(K,V) invocation.
Now, where is this previously reduced R parameter coming from? Are you invoking reduce
phase serially across a list of all returned results of T map(K,V) passing R from last
invocation of reduce in a new invocation of reduce?
How about this pseudocode:
On each node:
mapped = list()
for entry in cache.entries:
t = mapper.map(entry.key, entry.value)
mapped.add(t)
r = null
for t in mapped:
r = reducer.reduce(t, r)
return r to calling node
On calling node:
reduced_results = invoke map reduce task on all nodes, retrieve map{address: result}
for r in reduced_results.entries:
remote_address = r.key
remote_reduced_result = r.value
collator.add(remote_address, remote_reduced_result)
return collator.collate()
* Similarly, Collator.add() should just need the address and the
>>>>> reduced result from each node (each node would only produce 1
result!)
>>>>>
>>>>> DistExec
>>>>> * Why do you have execute() and executeAsync() with no params? What
>>>>> do these methods do?
>>>> In case user provided Factory for DistributedCallable. We have to handle
>>>> cases where simple no parameter constructor for callables is not
>>>> sufficient - hence factory.
>>> Why are we constructing DistributedCallables? Surely the user passes in an
instance?
>>>
>> Ok, so we need to serialize/migrate across JVMs this instance of
>> DistributedCallable and invoke it on a target Infinispan node, right? If
>> so, then we need to add K...input as a parameter to call function since
>> each migrated instance (on execution node) needs to have different input
>> for invocation.
> Yeah good point. My bad. :) We'd need to document this appropriately in the
DistributedCallable javadocs.
>
:-) What is more elegant and accommodating for interface implementers -
passing parameter K to call of DistributedCallable or using factories
for DistributedCallable?
Hmm. I think both approaches are best, to allow for the most flexibility. So perhaps
overload execute() with the following:
execute(DistributedCallable c); // Javadoc says DistributedCallable must have a no arg
ctor
execute(DistributedCallableFactory cf); // Javadoc says DistributedCallable must have a
no arg ctor
execute(Class<? extends DistributedCallable> cc);
execute(Class<? extends DistributedCallableFactory> cfc);
Also, wrt. names and methods on the API on the Distributed Exec part of things, wdyt about
aligning it with JDK executors (see separate email from me)?
And - not trying to confuse you, promise! - here's some scope for additional work
later on as well: network classloaders to deal with non-homogenous deployments with remote
tasks like these. ;)
Cheers
Manik
Cheers,
Vladimir
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Manik Surtani
manik(a)jboss.org
twitter.com/maniksurtani
Lead, Infinispan
http://www.infinispan.org