[infinispan-dev] Rethinking asynchronism in Infinispan
Bela Ban
bban at redhat.com
Wed Jan 13 12:13:21 EST 2010
Manik Surtani wrote:
> So I've been spending some time thinking about how we deal with async
> tasks in Infinispan, both from an API perspective as well as an
> implementation detail, and wanted to throw a few ideas out there.
>
> First, lets understand the 4 most expensive things that happen in
> Infinispan, either simply expensive or things that could block under
> high contention (in descending order): RPC calls, marshalling,
> CacheStore and locking
In my experience, RPC calls (at least async ones) should be much less
costly than CacheStore and *un*-marshalling (marshalling *is* usually
fast). Re CacheStore: writing / reading to a disk is slower than even a
network round trip.
> We deal with asynchronism in a somewhat haphazard way at the moment,
> each of these functions receiving a somewhat different treatment:
>
> 1) RPC: Using JGroups' ResponseMode of waiting for none.
> 2) Marshalling: using an async repl executor to take this offline
The problem here is that you're pushing the problem of marshalling
further down the line. *Eventually* data has to be marshalled, and
somebody *has* to block ! IIRC, you used a bounded queue to place
marshalling tasks onto, so for load peaks this was fine, but for
constant high load, someone will always block on the (full) queue.
> 3) Use an AsyncStore wrapper which places tasks in an executor
Similar issue to above: at some point the thread pool might be full.
Then you need to start discarding tasks, or block.
Both (2) and (3) handle temporary spikes well though...
> 4) Nothing
>
> and to add to it,
>
> 5) READs are never asynchronous. E.g., no such thing as an async GET -
> even if it entails RPC or a CacheStore lookup (which may be a remote
> call like S3!)
>
> The impact of this approach is that end users never really get to
> benefit from the general asynchronism in place
What is this general asynchronism ? From my view of Infinispan, I don't
see a bias towards asynchronous execution, but I see an API which
supports both async and sync execution.
> and still needs to configure stuff in several different places. And
> internally, it makes dealing with internal APIs hard.
>
> Externally, this is quite well encapsulated in the Cache interface, by
> offering async methods such as putAsync(), etc. so there would be
> little to change here. They return Futures, parameterized to the
> actual method return type, e.g., putAsync returns Future<V> in a
> parameterized Cache<K, V>. (More precisely, they return a
> NotifyingFuture, a sub-interface of Future that allows attaching
> listeners, but that's a detail.)
>
> So I think we should start with that. The user receives a Future. This
> Future is an aggregate Future, which should aggregate and block based
> on several sub-Futures, one for each of the tasks (1 ~ 4) outlined
> above. Now what is the impact of this? Designing such a Future is easy
> enough, but how would this change internal components?
Are you suggesting the external API remains the same, but internally
futures are used ? Or do you suggest to make use of futures mandatory ?
Can you show a pesudo code sample ?
> 1) RPC. Async RPC is, IMO, broken at the moment. It is unsafe in that
> it offers no guarantees that the calls are received by the recipient
No, JGroups guarantees message delivery ! Besides that, async APIs *are*
by definition fire-and-forget (JMS topics), so IMO this is not broken !
Or do you have something akin to persistent JMS messages in mind ?
> and you have no way of knowing.
Yes, but that's the name of the game with *async* RPCs ! If you want to
know, use sync RPCs...
> So RPC should always be synchronous, but wrapped in a Future so that
> it is taken offline and the Future can be checked for success
Who would check on the future, e.g. pseudo code like this:
Future<String> future=cache.putWithFuture("name", "value");
String prev_val=future.get();
doesn't help, as there is no work done before the future is checked.
Sync RPCs are a magnitude slower than async ones, so unless you call
1000 sync RPCs, get 1000 futures and then check on the futures, I don't
see the benefits of this. And code like this will be harder to write.
> 2) Marshalling happens offline at the moment and a Future is returned
> as it stands, but this could probably be combined with RPC to a single
> Future since this step is logically before RPC, and RPC relies on
> marshalling to complete
Interesting, can you elaborate more ?
--
Bela Ban
Lead JGroups / Clustering Team
JBoss
More information about the infinispan-dev
mailing list