On 17 Dec 2010, at 11:41, Vladimir Blagojevic wrote:

Even better this way. I would like to hear more about your reasoning behind using DT on per-cache basis. Yes, it would be simpler and easier API for the users but we do not cover uses cases when distributed task execution needs access to more than one cache during its execution....

I was just wondering whether such a use case exists or whether we're just inventing stuff.  :-)  It would lead to a much more cumbersome API since you'd need to provide a map of cache names to keys, etc.


On 10-12-16 9:07 AM, Manik Surtani wrote:

Hmm.  Maybe it is better to not involve an API on the CacheManager at all.  Following JSR166y [1], we could do:

DistributedForkJoinPool p = DisributedForkJoinPool.newPool(cache); // I still think it should be on a per-cache basis

DistributedTask<MyResultType, K, V> dt = new DistributedTask<MyResultType, K, V>() {
    
    public void map(Map.Entry<K, V> entry, Map<K, V> context) {
        // select the entries you are interested in.  Transform if needed and store in context
    }

    public MyResultType reduce(Map<Address, Map<K, V>> contexts) {
        // aggregate from context and return value.
    };

};

MyResultType result = p.invoke(dt, key1, key2, key3); // keys are optional.

What I see happening is:

* dt is broadcast to all nodes that hold either of {key1, key2, key3}.  If keys are not provided, broadcast to all.
* dt.map() is called on each node, for each key specified (if it exists on the local node).
* Contexts are sent back to the calling node and are passed to dt.reduce()
* Result of dt.reduce() passed to the caller of p.invoke()

What do you think? 


--




--
Manik Surtani
manik@jboss.org
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org