[infinispan-dev] Rethinking asynchronism in Infinispan

Fri Jan 15 04:09:56 EST 2010

On 01/13/2010 06:56 PM, Manik Surtani wrote:
>
> On 13 Jan 2010, at 17:13, Bela Ban wrote:
>
>>
>>
>> Manik Surtani wrote:
>>> So I've been spending some time thinking about how we deal with async
>>> tasks in Infinispan, both from an API perspective as well as an
>>> implementation detail, and wanted to throw a few ideas out there.
>>>
>>> First, lets understand the 4 most expensive things that happen in
>>> Infinispan, either simply expensive or things that could block under
>>> high contention (in descending order): RPC calls, marshalling,
>>> CacheStore and locking
>>
>>
>> In my experience, RPC calls (at least async ones) should be much less
>> costly than CacheStore and *un*-marshalling (marshalling *is* usually
>> fast). Re CacheStore: writing / reading to a disk is slower than even a
>> network round trip.
>
> Well, it depends on the case.  Network saturation, load and which handles concurrency better.  But either way, expensive things that can and should be parallelized.
>
>> We deal with asynchronism in a somewhat haphazard way at the moment,
>>> each of these functions receiving a somewhat different treatment:
>>>
>>> 1) RPC: Using JGroups' ResponseMode of waiting for none.
>>> 2) Marshalling: using an async repl executor to take this offline
>>
>> The problem here is that you're pushing the problem of marshalling
>> further down the line. *Eventually* data has to be marshalled, and
>> somebody *has* to block ! IIRC, you used a bounded queue to place
>> marshalling tasks onto, so for load peaks this was fine, but for
>> constant high load, someone will always block on the (full) queue.
>
> That's where it happens right now.  Just before RPC.  Now perhaps there is some sense in understanding that more than one subsystem may need a marshalled representation of an entry (e.g., the RpcManager to push across the wire, as well as a CacheStore to persist to disk or network again), so this could happen prior to either of these calls.  And also useful to note - as you mention later - that UNmarshalling is often far slower than marshalling, so any sync network calls should not wait for entries to be unmarshalled on the remote end.  We provide for this to some degree with the lazyDeserialization config element which makes use of MarshalledValues.  It could just be better integrated with the rest of what we are doing re: async.

Note that there was also some talk in the past about using 
MarshalledValues all the time in a email thread called "storing in 
memory data in binary format" in the infinispan-dev list. This enabled 
memory based eviction policies.

-- 
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache