[infinispan-dev] Rethinking asynchronism in Infinispan

Philippe Van Dyck pvdyck at gmail.com
Wed Jan 13 10:29:37 EST 2010


Hi Manik,


all this, aggregating and chaining Futures looks like the way to go, but
what about streaming ?

It would be elegant to think of a completely 'streamed' interface, using
Pipes for the asynchronous aspect (a Piped(In/Out)putStream has to be
running in another thread -> the Future's thread ?).

Combining Futures and Pipes sound like an even better solution, lower memory
usage and higher throughput !

Being able to stream data in and out of the cache is particularly
interesting for a Qi4j entitystore since we also use marshaling (JSon for
the moment).

And JClouds has a streamed interface, we could connect all the streams
together (Infinispan extended Map interface -> Marshaling -> JClouds) and
fill multiple pipes at the same time.... sounds like a dream, a funny one ?

Cheers

phil


On Wed, Jan 13, 2010 at 3:22 PM, Manik Surtani <manik at jboss.org> wrote:

> So I've been spending some time thinking about how we deal with async tasks
> in Infinispan, both from an API perspective as well as an implementation
> detail, and wanted to throw a few ideas out there.
>
> First, lets understand the 4 most expensive things that happen in
> Infinispan, either simply expensive or things that could block under high
> contention (in descending order): RPC calls, marshalling, CacheStore and
> locking.
>
> We deal with asynchronism in a somewhat haphazard way at the moment, each
> of these functions receiving a somewhat different treatment:
>
> 1)  RPC: Using JGroups' ResponseMode of waiting for none.
> 2) Marshalling: using an async repl executor to take this offline
> 3) Use an AsyncStore wrapper which places tasks in an executor
> 4) Nothing
>
> and to add to it,
>
> 5) READs are never asynchronous.  E.g., no such thing as an async GET -
> even if it entails RPC or a CacheStore lookup (which may be a remote call
> like S3!)
>
> The impact of this approach is that end users never really get to benefit
> from the general asynchronism in place, and still needs to configure stuff
> in several different places.  And internally, it makes dealing with internal
> APIs hard.
>
> Externally, this is quite well encapsulated in the Cache interface, by
> offering async methods such as putAsync(), etc. so there would be little to
> change here.  They return Futures, parameterized to the actual method return
> type, e.g., putAsync returns Future<V> in a parameterized Cache<K, V>.
>  (More precisely, they return a NotifyingFuture, a sub-interface of Future
> that allows attaching listeners, but that's a detail.)
>
> So I think we should start with that.  The user receives a Future.  This
> Future is an aggregate Future, which should aggregate and block based on
> several sub-Futures, one for each of the tasks (1 ~ 4) outlined above.  Now
> what is the impact of this?  Designing such a Future is easy enough, but how
> would this change internal components?
>
> 1)  RPC.  Async RPC is, IMO, broken at the moment.  It is unsafe in that it
> offers no guarantees that the calls are received by the recipient, and you
> have no way of knowing.  So RPC should always be synchronous, but wrapped in
> a Future so that it is taken offline and the Future can be checked for
> success.
>
> 2) Marshalling happens offline at the moment and a Future is returned as it
> stands, but this could probably be combined with RPC to a single Future
> since this step is logically before RPC, and RPC relies on marshalling to
> complete.
>
> 3) The CacheStore interface should only return parameterized Futures.
>  Implementations that do not offer any async support may simply return a
> wrapped value.  Calls to this interface that require immediate results would
> simply poll the Future.  But in all other cases - ranging from the
> AsyncStore wrapper to more sophisticated async backends such as JClouds, we
> get to harness the async nature offered by the impl.
>
> 4) Locking could be async, but understanding context is important here.  We
> currently use 2 different types of Locks: JDK reentrant locks for
> non-transactional calls, or OwnableReentrantLock for transactional calls
> where the lock owner is not the thread but instead the GlobalTransaction.
>  For this to work, any invocation context should have reference to the
> original invoking thread, OwnableReentrantLocks should always be used, and
> the owner set accordingly, to ensure an Executor thread doesn't gain
> ownership of a lock instead.
>
> The end result of all this, is that async API calls will be truly
> non-blocking, and will be able t harness as much parallelism as possible in
> the system.  Another effect is that we'd have a cleaner internal API which
> is specifically designed with async operations in mind, and converting any
> of the async calls to sync calls is trivial (Future.get()) anywhere along
> the invocation chain.  Also, I think once we have this in place, any async
> transport modes (REPL_ASYNC, DIST_ASYNC) should be deprecated in favour of
> an async API, since you get the ability to check that a call completes even
> though it happens offline.
>
> And a disclaimer: I have no plans to implement anything like this soon, I
> was just throwing this out for comment and distillation.  :)
>
> Cheers
> --
> Manik Surtani
> manik at jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
>
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20100113/3865a78a/attachment-0002.html 


More information about the infinispan-dev mailing list