On Dec 27, 2010, at 5:25 PM, Vladimir Blagojevic wrote:
Hey,
I spent the last week working on concrete API proposals for distributed
execution framework. I believe that we are close to finalize the
proposal and your input and feedback is important now! Here are the main
ideas where I think we made progress since we last talked.
Access to multiple caches during task execution
While we have agreed to allow access to multiple caches during task
execution including this logic into task API complicates it greatly. The
compromise I found is to focus all API on to a one specific cache but
allow access to other caches through DistributedTaskContext API. The
focus on one specific cache and its input keys will allows us to
properly CH map task units across Infinispan cluster and will cover most
of the use cases. DistributedTaskContext can also easily be mapped to a
single cache. See DistributedTask and DistributedTaskContext for more
details.
Maybe I'm reading this wrong but are you saying that multiple caches cause problem
with mapping of task units to nodes in cluster?
Or are you just doing it not to clutter the API?
I think DistributedTaskContext extending CacheContainer is rather confusing, particularly
when DistributedTaskContext has K,V parameters that generally are associated with Cache
rather than CacheContainer.
Also, why is a context iterable? Iterates the contents of a CacheContainer? extends
generally means that "is something". AFAIK, you'd be able to iterate a Map
or Cache, but not a CacheContainer.
DistributedTask and DistributedCallable
I found it useful to separate task characteristics in general and actual
work/computation details. Therefore the main task characteristics are
specified through DistributedTask API and details of actual task
computation are specified through DistributedCallable API.
DistributedTask specifies coarse task details, the failover policy, the
task splitting policy, cancellation policy and so on while in
DistributedCallable API implementers focus on actual details of a
computation/work unit.
Personally, I think DistributedTask has too many generics (K, V, T, R) and it's hard
to read. IMO, only T and R should only exist. I would also try to stick to Callable
conventions that takes a V.
I don't like to see things like this, reminds me of EJB 2.1 where you were forced to
implement a method to simply get hold of a ctx. There're much nicer ways to do things
like this, if completely necessary (see EJB3) :
@Override
public void mapped(DistributedTaskContext<String, String> ctx) {
this.ctx = ctx;
}
Looking at the example provided, it seems to me that all DistributedTaskContext is used
for is to navigate the Cache contents from a user defined callable, in which case I would
limit its scope.
Finally, what is the aim of the write() methods in DTC? I don't see them in use in the
given example. If they're not meant to be used by the user and are only for internal
fwk use, I would not leak them.
I have updated the original document [1] to reflect API update. You can
see the actual proposal in git here [2] and I have also included the
variation of this approach [3] that separates map and reduce task phases
with separate interfaces and removes DistributedCallable interaface. I
have also kept Trustin's ideas in another proposal [4] since I would
like to include them as well if possible.
Regards,
Vladimir
[1]
http://community.jboss.org/wiki/InfinispanDistributedExecutionFramework
[2]
https://github.com/vblagoje/infinispan/tree/t_ispn-39_master_prop1
[3]
https://github.com/vblagoje/infinispan/tree/t_ispn-39_master_prop2
[4]
https://github.com/vblagoje/infinispan/tree/t_ispn-39_master_prop3
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache