[infinispan-dev] storeAsBinary keeps both the object and the byte[] - why?

Mircea Markus mmarkus at redhat.com
Tue Aug 6 06:18:29 EDT 2013


Thanks Galder.

On 5 Aug 2013, at 16:41, Galder Zamarreño <galder at redhat.com> wrote:

> Sorry for the delay getting back on this topic. Let me start with a little side node:
> 
> I've been trying to find a previous discussion where I wondered about the merits/complexity/need of storeAsBinary. I'm pretty sure I made a point in the past about whether it was really useful but can't find the discussion.

For large clusters with DIST and data reads without node affinity it's more efficient to store the data in marshalled/binary format in order to reduce the serialisation overhead:
http://lists.jboss.org/pipermail/infinispan-dev/2009-October/004299.html

> I do remember Manik replying back. Anyway, I'm not sure storeAsBinary really does reduce memory consumption and I'm not sure we have any measurements it shows it's quicker in certain scenarios. Even Martin, who investigated cache memory overhead, did not really use storeAsBinary to figure this out, and finally, we no longer need it for lazy deserialization since we have modular classloader in place.
> The only real use case I've found for it so far has been when developing the JSR-107 facade, and that's to provide store-by-value-like [1] capabilities (as opposed to our default behaivour which is store-by-ref), and even then storeAsBinary had to be tweaked. 
> 
> With this in mind, let me add my reply below…
> 
> [1] https://github.com/infinispan/infinispan/blob/master/core/src/test/java/org/infinispan/marshall/DefensiveCopyTest.java
> 
> On Jul 18, 2013, at 2:44 PM, Mircea Markus <mmarkus at redhat.com> wrote:
> 
>> Hi,
>> 
>> We have the following behaviour when storeAsBinary is enabled:
>> - when an entry is added it is initially stored in binary format (byte[])
>> - when it is read from an *owning node*, it is unmarshalled and the object reference is cached in memory together with the byte representation
>> - the object reference is only cleaned up when cache.compact() is invoked explicitly
>> 
>> Assuming a key is read uniformly on all the nodes, after a while the system ends up with all the entries stored twice: the byte[] and the object in unserialized form. Of course this can be mitigated by asking the users to invoke Cache.compact - but that's quite confusing and not very user friendly as the user needs to be concerned with memory management. 
>> 
>> Can anybody think of some reasons why the value is kept twice? I mean besides optimising for local gets, which I think is not a good enough reason given the potentially huge memory consumption and the complexity added.
> 
>> From what I remember, this is to make local gets faster and avoid having to deserialize the entry all the time. However, this optimisation is useless for the only real use case for storeAsBinary that I mentioned above: store-by-value. 
> 
> That's cos whenever you send back an a value to the client, you don't send it as-is, but you send a copy back to avoid the user being able to modify the contents of the cache without calling a cache operation. This is easy to do just by deserializing the object stored in the cache whenever someone requests it. Clearly, if you're doing a lot of local gets it won't be very fast, but it's the price you currently have to pay to get safety. And when you want to store a value, you just serialize it and store it in the cache, making the original reference to the object useless to modify the cache contents.
> 
> So, no, don't really see the reason to keep it twice.

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)







More information about the infinispan-dev mailing list