Re: [infinispan-dev] Rethinking asynchronism in Infinispan

Wednesday, 13 January 2010

Manik Surtani wrote:
...
 So I've been spending some time thinking about how we deal with
async 
 tasks in Infinispan, both from an API perspective as well as an 
 implementation detail, and wanted to throw a few ideas out there.

 First, lets understand the 4 most expensive things that happen in 
 Infinispan, either simply expensive or things that could block under 
 high contention (in descending order): RPC calls, marshalling, 
 CacheStore and locking 

In my experience, RPC calls (at least async ones) should be much less 
costly than CacheStore and *un*-marshalling (marshalling *is* usually 
fast). Re CacheStore: writing / reading to a disk is slower than even a 
network round trip.

...
 We deal with asynchronism in a somewhat haphazard way at the moment,

 each of these functions receiving a somewhat different treatment:

 1) RPC: Using JGroups' ResponseMode of waiting for none.
 2) Marshalling: using an async repl executor to take this offline 
The problem here is that you're pushing the problem of marshalling 
further down the line. *Eventually* data has to be marshalled, and 
somebody *has* to block ! IIRC, you used a bounded queue to place 
marshalling tasks onto, so for load peaks this was fine, but for 
constant high load, someone will always block on the (full) queue.

...
 3) Use an AsyncStore wrapper which places tasks in an executor

Similar issue to above: at some point the thread pool might be full. 
Then you need to start discarding tasks, or block.

Both (2) and (3) handle temporary spikes well though...

...
 4) Nothing

 and to add to it,

 5) READs are never asynchronous. E.g., no such thing as an async GET - 
 even if it entails RPC or a CacheStore lookup (which may be a remote 
 call like S3!)

 The impact of this approach is that end users never really get to 
 benefit from the general asynchronism in place 
What is this general asynchronism ? From my view of Infinispan, I don't 
see a bias towards asynchronous execution, but I see an API which 
supports both async and sync execution.

...
 and still needs to configure stuff in several different places. And 
 internally, it makes dealing with internal APIs hard.

 Externally, this is quite well encapsulated in the Cache interface, by 
 offering async methods such as putAsync(), etc. so there would be 
 little to change here. They return Futures, parameterized to the 
 actual method return type, e.g., putAsync returns Future<V> in a 
 parameterized Cache<K, V>. (More precisely, they return a 
 NotifyingFuture, a sub-interface of Future that allows attaching 
 listeners, but that's a detail.)

 So I think we should start with that. The user receives a Future. This 
 Future is an aggregate Future, which should aggregate and block based 
 on several sub-Futures, one for each of the tasks (1 ~ 4) outlined 
 above. Now what is the impact of this? Designing such a Future is easy 
 enough, but how would this change internal components? 
Are you suggesting the external API remains the same, but internally 
futures are used ? Or do you suggest to make use of futures mandatory ?

Can you show a pesudo code sample ?

...
 1) RPC. Async RPC is, IMO, broken at the moment. It is unsafe in that

 it offers no guarantees that the calls are received by the recipient 
No, JGroups guarantees message delivery ! Besides that, async APIs *are* 
by definition fire-and-forget (JMS topics), so IMO this is not broken !

Or do you have something akin to persistent JMS messages in mind ?

...
 and you have no way of knowing. 
Yes, but that's the name of the game with *async* RPCs ! If you want to 
know, use sync RPCs...

...
 So RPC should always be synchronous, but wrapped in a Future so that

 it is taken offline and the Future can be checked for success 
Who would check on the future, e.g. pseudo code like this:

Future<String> future=cache.putWithFuture("name", "value");
String prev_val=future.get();

doesn't help, as there is no work done before the future is checked.

Sync RPCs are a magnitude slower than async ones, so unless you call 
1000 sync RPCs, get 1000 futures and then check on the futures, I don't 
see the benefits of this. And code like this will be harder to write.

...
 2) Marshalling happens offline at the moment and a Future is returned

 as it stands, but this could probably be combined with RPC to a single 
 Future since this step is logically before RPC, and RPC relies on 
 marshalling to complete 
Interesting, can you elaborate more ?

-- 
Bela Ban
Lead JGroups / Clustering Team
JBoss

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] Rethinking asynchronism in Infinispan