<p>As a data-oriented API, why should I deal with details such as Address? Could we avoid that?<br>
Also I didn't see a Collator in other examples; I've never used M/R seriusly so I might have forgotten the more complex examples, but my impression was that problems should be entirely expressed in Mapper and Reducer only.</p>
<p>Sanne</p>
<p><blockquote type="cite">On 4 Jan 2011 18:47, "Manik Surtani" <<a href="mailto:manik@jboss.org">manik@jboss.org</a>> wrote:<br><br>Also, I think we need to be clear about these 2 (map and reduce) functions. Map doesn't mean "pick node to run task on" in map/reduce speak. Map means select /transform data for inclusion into a result set. Perhaps it also makes sense to use smaller/simpler interfaces. I know this breaks away from the F/J API, but I'm beginning to wonder if there is a proper alignment of purpose here in the first place - going back on my original plans here. How's this for an alternate API:<br>
<br>
Mapper<K, V, T> {<br>
// just "maps" entries on a remote node. Map = filter and transform. Invoked once for each entry on a remote cache.<br>
// null responses are ignored/filtered out.<br>
T map(K, V);<br>
}<br>
<br>
Reducer<T, R> {<br>
// incrementally reduces a transformed entry. Called once for each T produced by the mapper.<br>
// previously reduced value passed in each time.<br>
R reduce(T, R);<br>
}<br>
<br>
Collator<R> {<br>
// Adds reduced results from remote nodes. Called once for each R returned by a RemoteReducer.<br>
add(Address origin, R remote);<br>
<br>
// collates all results added so far.<br>
R collate();<br>
}<br>
<br>
<br>
And the API could do something like<br>
<br>
MapReduceContext c = new MapReduceContext(cache);<br>
<br>
// 1) distributes 'mapper' cluster wide. Calls mapper.map() for each K/V pair. Stores result T for each invocation if T != null.<br>
// 2) For each T, reducer.reduce() is called. Each time, the previous value of R is passed back in to reduce().<br>
// 3) Final value of R is sent back as a RPC result. For each result, address and R is passed collator.add()<br>
// 4) Once all remote RPCs have responded, collator.collate() is called, pass result back to caller.<br>
R r = c.invoke(mapper, reducer, collator);<br>
<br>
Variants may include:<br>
<br>
Filtering nodes:<br>
// restricts the set of nodes where RPCs are sent, based on the subset of the cluster that contain one or more of K.<br>
// question: does this mean only K/V pairs that are in K... are passed in to the mapper?<br>
R r = c.invoke(mapper, reducer, collator, K...);<br>
<br>
Using futures:<br>
NotifyingFuture<R> f = c.invokeFuture(mapper, reducer, collator)<br>
<br>
Example: implementing a word count. but only for keys that start with "text" :<br>
<br>
Mapper<String, String, Integer> mapper = new Mapper<String, String, Integer> () {<br>
Integer map(String k, String v) {<br>
return k.startsWith("text") ? v.length() : null;<br>
}<br>
}<br>
<br>
Reducer<Integer, Integer> reducer = Reducer<Integer, Integer>() {<br>
Integer reduce(Integer transformed, Integer prevReduced) {return transformed + prevReduced;}<br>
}<br>
<br>
Collator<Integer> collator = Collator<Integer>() {<br>
int collated = 0;<br>
void add(Address origin, Integer result) {collated += result;}<br>
Integer collate() {return collated;}<br>
}<br>
<br>
<br>
WDYT? :-)<br>
<br>
Cheers<br>
<font color="#888888">Manik<br>
</font><p><font color="#500050"><br><br><br>On 3 Jan 2011, at 11:37, Vladimir Blagojevic wrote:<br></font></p><p><font color="#500050">> On 11-01-03 6:16 AM, Galder Zamarreño wrote:<br>>> Maybe I'm reading this wrong but are you saying th...</font></p>
<p><font color="#500050">--<br>Manik Surtani<br><a href="mailto:manik@jboss.org">manik@jboss.org</a><br><a href="http://twitter.com/maniksurtani">twitter.com/maniksurtani</a><br><br>Lead, Infinispan<br><a href="http://www.infinispan.or.">http://www.infinispan.or.</a>..</font></p>
<p><font color="#500050">infinispan-dev mailing list<br><a href="mailto:infinispan-dev@lists.jboss.org">infinispan-dev@lists.jboss.org</a><br><a href="https://lists.jboss.org/mailman/listinfo/.">https://lists.jboss.org/mailman/listinfo/.</a>..</font></p>
</blockquote></p>