On 16 Dec 2010, at 11:34, Vladimir Blagojevic wrote:
On 10-12-16 7:25 AM, Manik Surtani wrote:
>
> I think it makes more sense on CacheManager because we might have a task operating on
data of a few caches, not just one. So to be safe and not sorry I'd go with
CacheManager. newDistributedTask can have parameters to specify which cache(s) to use.
>
> So you'd do CacheManager.newDistributedTask(task, Map<String, K>
cacheNamesAndKeys)?
>
No, I'd shift these parameters to DistributedTask#execute unless we absolutely need
them at for task creation. Let's keep factory method for DistributedTask as simple as
possible. I'd rather have a single factory method and develop and grow DistributedTask
API independently of CacheManager than have multiple overloaded factory methods for
DistributedTask.
Hmm. Maybe it is better to not involve an API on the CacheManager at all. Following
JSR166y [1], we could do:
DistributedForkJoinPool p = DisributedForkJoinPool.newPool(cache); // I still think it
should be on a per-cache basis
DistributedTask<MyResultType, K, V> dt = new DistributedTask<MyResultType, K,
V>() {
public void map(Map.Entry<K, V> entry, Map<K, V> context) {
// select the entries you are interested in. Transform if needed and store in
context
}
public MyResultType reduce(Map<Address, Map<K, V>> contexts) {
// aggregate from context and return value.
};
};
MyResultType result = p.invoke(dt, key1, key2, key3); // keys are optional.
What I see happening is:
* dt is broadcast to all nodes that hold either of {key1, key2, key3}. If keys are not
provided, broadcast to all.
* dt.map() is called on each node, for each key specified (if it exists on the local
node).
* Contexts are sent back to the calling node and are passed to dt.reduce()
* Result of dt.reduce() passed to the caller of p.invoke()
What do you think?
[1]
http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166ydocs/index.html
--
Manik Surtani
manik(a)jboss.org
twitter.com/maniksurtani
Lead, Infinispan
http://www.infinispan.org