[infinispan-dev] Distributed execution framework - API proposal(s)
Vladimir Blagojevic
vblagoje at redhat.com
Tue Jan 11 10:02:05 EST 2011
On 11-01-11 11:37 AM, Manik Surtani wrote:
> What's an intermediate key? :-) For each K, V pair, we call map(K, V) which transforms V to a T. So all we expect from map() should be T.
>
>>> * Reducer.reduce()'s return type should not necessarily be the same
>>> type as the mapped transformation. Also, does the reducer really need
>>> the key? Surely just the transformed version of each K/V pair.
>> Yeah, when we solve the above this should fall into place.
> Yep. So we then just call reduce(T) for each value of T that we get from the mapping phase (along with the previous, reduced result of R from the previous invocation). Just maybe, we may also need to mention the key for which the mapping took place. In which case, reduce(T, R) would look like reduce(K, T, R). But TBH I don't really see the use of passing in K.
>
They way I understood map/reduce is that for every map(k1,v2) we get
transformed result k2,v2. Then we match all k2's that are identical and
invoke reduce(k2, collection(v2)) once for each identical k2. If we
follow your suggestion then number of invocations in map phase is the
same as in reduce phase which does not make sense - that is not map/reduce.
Would you please have a look at chapter 2 of
http://labs.google.com/papers/mapreduce.html
Maybe I completely lost it? :-)
>>> * Similarly, Collator.add() should just need the address and the
>>> reduced result from each node (each node would only produce 1 result!)
>>>
>>> DistExec
>>> * Why do you have execute() and executeAsync() with no params? What
>>> do these methods do?
>> In case user provided Factory for DistributedCallable. We have to handle
>> cases where simple no parameter constructor for callables is not
>> sufficient - hence factory.
> Why are we constructing DistributedCallables? Surely the user passes in an instance?
>
Ok, so we need to serialize/migrate across JVMs this instance of
DistributedCallable and invoke it on a target Infinispan node, right? If
so, then we need to add K...input as a parameter to call function since
each migrated instance (on execution node) needs to have different input
for invocation.
> Not sure this is needed either.
>
> List<ResultType> res = new DistributedRemoteTask(cache).on(key1, key2).execute(myCallableInstance);
>
>
What do you mean?
> Also, isn't the name DistributedRemoteTask redundant? :-)
>
Just DistributedTask should suffice.
Let me know,
Vladimir
More information about the infinispan-dev
mailing list