Re: [infinispan-dev] Comments on latest iteration of Distributed Task Execution design

Friday, 30 July 2010

On 30 Jul 2010, at 16:36, Vladimir Blagojevic wrote:

...
> * Proposed interfaces - Not sure if I understand the purpose of
DistributedCallable#mapped().  You already assign a cache to the callable in
DistributedCallable#initialized(), right?

 The intent of initialized call was to bootstrap task into environment at the master node,
the node invoking the distribution of callables. The purpose of initialize is two fold.
Task implementor can look up input that was given to it in constructor of
DistributedCallable and interpret this input (e.g which keys to use, which particular
cache etc etc) and in turn inform execution environment which cache it would use for data
input so proper ConsistentHash can be passed in preferredExecutionNodes by the
environment. I wanted to isolate task implementor from digging deep into Infinispan
internals to figure out how to fulfill preferredExecutionNodes contract.
preferredExecutionNodes is invoked while task is still at master node before migration to
execution node. 
Not sure I understand.  If you are passing in a reference to the cache the Callable is
executed on (in mapped()) and you are passing in a ConsistentHash in
preferredExecutionNodes(), what is the purpose of initialized()?  Why would a Callable
need a reference to the Cache instance at the "master node", as you put it?

...
 The purpose of mapped is to signal task that it has been migrated to
execution node and that execution is about to begin. Here implementors would actually load
the data which is local to JVM and needed for task computation.  
Yup, this makes sense.

...
> * DistributedCallable#preferredExecutionNodes() - do we really
want to support this at this stage?  Or is it better to not support arbitrary node
selection by end-user code?  

 Task can still do arbitrary node selection in DistributedTask#map(). This goes into
fulfilling objective of 1 from above.  
Well, like I said this needs to work in a simpler, remote Callable execution case as well
as a Fork/Join task.  So in this case we should allow the Callable implementor to either
a) provide a list of preferred nodes, or 2) provide a set of affected keys and let us
decide which nodes to route the task to, based on the keys provided.

...
> Simpler would be to add a DistributedCallable#getRelatedKeys()
which returns a Set of keys which the callable would be expected to touch, so we can
decide where to route the task.  Or maybe you want to offer both forms, so that a
DistributedCallable could *either* provide a set of nodes to execute on or a set of keys
which it expects to touch.

 I still like the original approach better :) Maybe it makes more sense with this
elaboration now.... 
Well, that exposes a ConsistentHash instance to the Callable impl.  In most cases, the
Callable wouldn't need to know or care about a ConsistentHash, all it would be
concerned with is the keys it needs.  Not the location.  We should be able to figure out
location ourselves.

Cheers
Manik
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] Comments on latest iteration of Distributed Task Execution design