[infinispan-dev] Distributed execution framework - API proposal(s)
Manik Surtani
msurtani at redhat.com
Tue Jan 4 20:00:21 EST 2011
The Address is purely additional info if the collator cares to filter out stuff from certain nodes.
And the whole concept of the collator is just another reduce - couldn't think of a better name. Often it makes sense to reduce once remotely and once again locally.
Hope this makes sense. :)
Sent from my iPhone
On 4 Jan 2011, at 23:20, Sanne Grinovero <sanne.grinovero at gmail.com> wrote:
> As a data-oriented API, why should I deal with details such as Address? Could we avoid that?
> Also I didn't see a Collator in other examples; I've never used M/R seriusly so I might have forgotten the more complex examples, but my impression was that problems should be entirely expressed in Mapper and Reducer only.
>
> Sanne
>
>> On 4 Jan 2011 18:47, "Manik Surtani" <manik at jboss.org> wrote:
>>
>> Also, I think we need to be clear about these 2 (map and reduce) functions. Map doesn't mean "pick node to run task on" in map/reduce speak. Map means select /transform data for inclusion into a result set. Perhaps it also makes sense to use smaller/simpler interfaces. I know this breaks away from the F/J API, but I'm beginning to wonder if there is a proper alignment of purpose here in the first place - going back on my original plans here. How's this for an alternate API:
>>
>> Mapper<K, V, T> {
>> // just "maps" entries on a remote node. Map = filter and transform. Invoked once for each entry on a remote cache.
>> // null responses are ignored/filtered out.
>> T map(K, V);
>> }
>>
>> Reducer<T, R> {
>> // incrementally reduces a transformed entry. Called once for each T produced by the mapper.
>> // previously reduced value passed in each time.
>> R reduce(T, R);
>> }
>>
>> Collator<R> {
>> // Adds reduced results from remote nodes. Called once for each R returned by a RemoteReducer.
>> add(Address origin, R remote);
>>
>> // collates all results added so far.
>> R collate();
>> }
>>
>>
>> And the API could do something like
>>
>> MapReduceContext c = new MapReduceContext(cache);
>>
>> // 1) distributes 'mapper' cluster wide. Calls mapper.map() for each K/V pair. Stores result T for each invocation if T != null.
>> // 2) For each T, reducer.reduce() is called. Each time, the previous value of R is passed back in to reduce().
>> // 3) Final value of R is sent back as a RPC result. For each result, address and R is passed collator.add()
>> // 4) Once all remote RPCs have responded, collator.collate() is called, pass result back to caller.
>> R r = c.invoke(mapper, reducer, collator);
>>
>> Variants may include:
>>
>> Filtering nodes:
>> // restricts the set of nodes where RPCs are sent, based on the subset of the cluster that contain one or more of K.
>> // question: does this mean only K/V pairs that are in K... are passed in to the mapper?
>> R r = c.invoke(mapper, reducer, collator, K...);
>>
>> Using futures:
>> NotifyingFuture<R> f = c.invokeFuture(mapper, reducer, collator)
>>
>> Example: implementing a word count. but only for keys that start with "text" :
>>
>> Mapper<String, String, Integer> mapper = new Mapper<String, String, Integer> () {
>> Integer map(String k, String v) {
>> return k.startsWith("text") ? v.length() : null;
>> }
>> }
>>
>> Reducer<Integer, Integer> reducer = Reducer<Integer, Integer>() {
>> Integer reduce(Integer transformed, Integer prevReduced) {return transformed + prevReduced;}
>> }
>>
>> Collator<Integer> collator = Collator<Integer>() {
>> int collated = 0;
>> void add(Address origin, Integer result) {collated += result;}
>> Integer collate() {return collated;}
>> }
>>
>>
>> WDYT? :-)
>>
>> Cheers
>> Manik
>>
>>
>>
>> On 3 Jan 2011, at 11:37, Vladimir Blagojevic wrote:
>>
>> > On 11-01-03 6:16 AM, Galder Zamarreño wrote:
>> >> Maybe I'm reading this wrong but are you saying th...
>>
>> --
>> Manik Surtani
>> manik at jboss.org
>> twitter.com/maniksurtani
>>
>> Lead, Infinispan
>> http://www.infinispan.or...
>>
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/...
>>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20110104/a3275635/attachment.html
More information about the infinispan-dev
mailing list