Hi Manik,
all this, aggregating and chaining Futures looks like the way to go, but
what about streaming ?
It would be elegant to think of a completely 'streamed' interface, using
Pipes for the asynchronous aspect (a Piped(In/Out)putStream has to be
running in another thread -> the Future's thread ?).
Combining Futures and Pipes sound like an even better solution, lower memory
usage and higher throughput !
Being able to stream data in and out of the cache is particularly
interesting for a Qi4j entitystore since we also use marshaling (JSon for
the moment).
And JClouds has a streamed interface, we could connect all the streams
together (Infinispan extended Map interface -> Marshaling -> JClouds) and
fill multiple pipes at the same time.... sounds like a dream, a funny one ?
Cheers
phil
On Wed, Jan 13, 2010 at 3:22 PM, Manik Surtani <manik(a)jboss.org> wrote:
So I've been spending some time thinking about how we deal with
async tasks
in Infinispan, both from an API perspective as well as an implementation
detail, and wanted to throw a few ideas out there.
First, lets understand the 4 most expensive things that happen in
Infinispan, either simply expensive or things that could block under high
contention (in descending order): RPC calls, marshalling, CacheStore and
locking.
We deal with asynchronism in a somewhat haphazard way at the moment, each
of these functions receiving a somewhat different treatment:
1) RPC: Using JGroups' ResponseMode of waiting for none.
2) Marshalling: using an async repl executor to take this offline
3) Use an AsyncStore wrapper which places tasks in an executor
4) Nothing
and to add to it,
5) READs are never asynchronous. E.g., no such thing as an async GET -
even if it entails RPC or a CacheStore lookup (which may be a remote call
like S3!)
The impact of this approach is that end users never really get to benefit
from the general asynchronism in place, and still needs to configure stuff
in several different places. And internally, it makes dealing with internal
APIs hard.
Externally, this is quite well encapsulated in the Cache interface, by
offering async methods such as putAsync(), etc. so there would be little to
change here. They return Futures, parameterized to the actual method return
type, e.g., putAsync returns Future<V> in a parameterized Cache<K, V>.
(More precisely, they return a NotifyingFuture, a sub-interface of Future
that allows attaching listeners, but that's a detail.)
So I think we should start with that. The user receives a Future. This
Future is an aggregate Future, which should aggregate and block based on
several sub-Futures, one for each of the tasks (1 ~ 4) outlined above. Now
what is the impact of this? Designing such a Future is easy enough, but how
would this change internal components?
1) RPC. Async RPC is, IMO, broken at the moment. It is unsafe in that it
offers no guarantees that the calls are received by the recipient, and you
have no way of knowing. So RPC should always be synchronous, but wrapped in
a Future so that it is taken offline and the Future can be checked for
success.
2) Marshalling happens offline at the moment and a Future is returned as it
stands, but this could probably be combined with RPC to a single Future
since this step is logically before RPC, and RPC relies on marshalling to
complete.
3) The CacheStore interface should only return parameterized Futures.
Implementations that do not offer any async support may simply return a
wrapped value. Calls to this interface that require immediate results would
simply poll the Future. But in all other cases - ranging from the
AsyncStore wrapper to more sophisticated async backends such as JClouds, we
get to harness the async nature offered by the impl.
4) Locking could be async, but understanding context is important here. We
currently use 2 different types of Locks: JDK reentrant locks for
non-transactional calls, or OwnableReentrantLock for transactional calls
where the lock owner is not the thread but instead the GlobalTransaction.
For this to work, any invocation context should have reference to the
original invoking thread, OwnableReentrantLocks should always be used, and
the owner set accordingly, to ensure an Executor thread doesn't gain
ownership of a lock instead.
The end result of all this, is that async API calls will be truly
non-blocking, and will be able t harness as much parallelism as possible in
the system. Another effect is that we'd have a cleaner internal API which
is specifically designed with async operations in mind, and converting any
of the async calls to sync calls is trivial (Future.get()) anywhere along
the invocation chain. Also, I think once we have this in place, any async
transport modes (REPL_ASYNC, DIST_ASYNC) should be deprecated in favour of
an async API, since you get the ability to check that a call completes even
though it happens offline.
And a disclaimer: I have no plans to implement anything like this soon, I
was just throwing this out for comment and distillation. :)
Cheers
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev