[infinispan-dev] Comments on latest iteration of Distributed Task Execution design

Fri Jul 30 11:50:17 EDT 2010

On 30 Jul 2010, at 16:36, Vladimir Blagojevic wrote:

>> * Proposed interfaces - Not sure if I understand the purpose of DistributedCallable#mapped().  You already assign a cache to the callable in DistributedCallable#initialized(), right?
> 
> The intent of initialized call was to bootstrap task into environment at the master node, the node invoking the distribution of callables. The purpose of initialize is two fold. Task implementor can look up input that was given to it in constructor of DistributedCallable and interpret this input (e.g which keys to use, which particular cache etc etc) and in turn inform execution environment which cache it would use for data input so proper ConsistentHash can be passed in preferredExecutionNodes by the environment. I wanted to isolate task implementor from digging deep into Infinispan internals to figure out how to fulfill preferredExecutionNodes contract. preferredExecutionNodes is invoked while task is still at master node before migration to execution node.

Not sure I understand.  If you are passing in a reference to the cache the Callable is executed on (in mapped()) and you are passing in a ConsistentHash in preferredExecutionNodes(), what is the purpose of initialized()?  Why would a Callable need a reference to the Cache instance at the "master node", as you put it?

> The purpose of mapped is to signal task that it has been migrated to execution node and that execution is about to begin. Here implementors would actually load the data which is local to JVM and needed for task computation. 

Yup, this makes sense.

>> * DistributedCallable#preferredExecutionNodes() - do we really want to support this at this stage?  Or is it better to not support arbitrary node selection by end-user code?  
> 
> Task can still do arbitrary node selection in DistributedTask#map(). This goes into fulfilling objective of 1 from above. 

Well, like I said this needs to work in a simpler, remote Callable execution case as well as a Fork/Join task.  So in this case we should allow the Callable implementor to either a) provide a list of preferred nodes, or 2) provide a set of affected keys and let us decide which nodes to route the task to, based on the keys provided.

>> Simpler would be to add a DistributedCallable#getRelatedKeys() which returns a Set of keys which the callable would be expected to touch, so we can decide where to route the task.  Or maybe you want to offer both forms, so that a DistributedCallable could *either* provide a set of nodes to execute on or a set of keys which it expects to touch.
> 
> I still like the original approach better :) Maybe it makes more sense with this elaboration now....

Well, that exposes a ConsistentHash instance to the Callable impl.  In most cases, the Callable wouldn't need to know or care about a ConsistentHash, all it would be concerned with is the keys it needs.  Not the location.  We should be able to figure out location ourselves.

Cheers
Manik
--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org