[infinispan-dev] Distributed tasks - specifying task input

Fri Dec 17 09:58:32 EST 2010

About>
 void map(String cacheName, Map.Entry<K, V> e, DistributedTaskContext
<K, V, R> ctx);

To satisfy a Lucene query it should be able to at least read multiple
caches at the same time.

It would still do map/reduce on a single cache, just I need to be able
to invoke getCache() during both
the map and reduce operations - you might think that would defeat the
purpose, but the situation is that one cache is
distributed and the other replicated, so I won't do remote get
operations as we obviously use the keys of the
distributed one.

So the API seems fine - I hope as I don't have a prototype - if there
is some means to have a reference to the
EmdeddableCacheManager.

Wouldn't it be more usefull to have a reference to the Cache object
instead of it's name.

Finally, I might consider it very useful to be able to declare usage
on keys which don't exist yet, hope that's not a problem?
So in case of Lucene I might know that I'm going to write and rewrite
several times a file called "A",
so I send the operation to the node which is going to host A, though
this key A might exist, or be created
by the task during the process.

Sanne

2010/12/17 Manik Surtani <manik at jboss.org>:
>
> On 17 Dec 2010, at 11:56, Vladimir Blagojevic wrote:
>
>> On 10-12-17 8:42 AM, Manik Surtani wrote:
>>>
>>> I was just wondering whether such a use case exists or whether we're
>>> just inventing stuff.  :-)  It would lead to a much more cumbersome
>>> API since you'd need to provide a map of cache names to keys, etc.
>>
>> Ok agreed. Unless there is a way to solve this nicely lets go with cache
>> based DT. I'll think more about this over the weekend.
>
> Hmm, just saw Sanne's message.  Ok, so if we were to support multiple caches... how could this look?
>
> interface DistributedTask<K, V, R> {
>
>   Collection<String> cacheNames();
>
>   Collection<K> keys(String cache);
>
>   void map(String cacheName, Map.Entry<K, V> e, DistributedTaskContext <K, V, R> ctx);
>
>   R reduce(Map<Address, DistributedTaskContext<K, V, R>> contexts);
>
> }
>
> interface DistributedTaskContext<K, V, R> {
>
>   String getCacheName();
>
>   // ... map-like methods on K, V
> }
>
> DistributedForkJoinPool p = cacheManager.newDistributedForkJoinPool();
>
> result = p.invoke(new DistributedTask() { ... });
>
> and let the DistributedTask impl provide the names of caches and keys as needed?  Some semantics - a null collection of cache names = all caches, a null collection of keys = all keys?  Or is that a bit shit?  :-)
>
> Cheers
> Manik
>
>>
>> Also, why would you call DT#map for each key specified? I am thinking it
>> should be called once when task is migrated to execution node, context
>> should be filled with entries on that cache node and passed to map. So
>> call map once per node with all keys already in context.
>
> Just saves the implementation having to iterate over the collection.  We do the iterating and callback each time.  :-)  Minor point.
>
>> I'd also try to use our own interfaces here rather than pure Map so we
>> can evolve them if we need to - DistributedTaskContext or whatever.....
>
> +1.
>
> --
> Manik Surtani
> manik at jboss.org
> twitter.com/maniksurtani
>
> Lead, Infinispan
> http://www.infinispan.org
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>