Hi,
Sorry for the delay getting back on this topic. Let me start with a little side node:
I've been trying to find a previous discussion where I wondered about the
merits/complexity/need of storeAsBinary. I'm pretty sure I made a point in the past
about whether it was really useful but can't find the discussion. I do remember Manik
replying back. Anyway, I'm not sure storeAsBinary really does reduce memory
consumption and I'm not sure we have any measurements it shows it's quicker in
certain scenarios. Even Martin, who investigated cache memory overhead, did not really use
storeAsBinary to figure this out, and finally, we no longer need it for lazy
deserialization since we have modular classloader in place. The only real use case
I've found for it so far has been when developing the JSR-107 facade, and that's
to provide store-by-value-like [1] capabilities (as opposed to our default behaivour which
is store-by-ref), and even then storeAsBinary had to be tweaked.
With this in mind, let me add my reply below…
[1]
https://github.com/infinispan/infinispan/blob/master/core/src/test/java/o...
On Jul 18, 2013, at 2:44 PM, Mircea Markus <mmarkus(a)redhat.com> wrote:
Hi,
We have the following behaviour when storeAsBinary is enabled:
- when an entry is added it is initially stored in binary format (byte[])
- when it is read from an *owning node*, it is unmarshalled and the object reference is
cached in memory together with the byte representation
- the object reference is only cleaned up when cache.compact() is invoked explicitly
Assuming a key is read uniformly on all the nodes, after a while the system ends up with
all the entries stored twice: the byte[] and the object in unserialized form. Of course
this can be mitigated by asking the users to invoke Cache.compact - but that's quite
confusing and not very user friendly as the user needs to be concerned with memory
management.
Can anybody think of some reasons why the value is kept twice? I mean besides optimising
for local gets, which I think is not a good enough reason given the potentially huge
memory consumption and the complexity added.
From what I remember, this is to make local gets faster and avoid
having to deserialize the entry all the time. However, this optimisation is useless for
the only real use case for storeAsBinary that I mentioned above: store-by-value.
That's cos whenever you send back an a value to the client, you don't send it
as-is, but you send a copy back to avoid the user being able to modify the contents of the
cache without calling a cache operation. This is easy to do just by deserializing the
object stored in the cache whenever someone requests it. Clearly, if you're doing a
lot of local gets it won't be very fast, but it's the price you currently have to
pay to get safety. And when you want to store a value, you just serialize it and store it
in the cache, making the original reference to the object useless to modify the cache
contents.
So, no, don't really see the reason to keep it twice.
Cheers,
Cheers,
--
Mircea Markus
Infinispan lead (
www.infinispan.org)
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Galder Zamarreño
galder(a)redhat.com
twitter.com/galderz
Project Lead, Escalante
http://escalante.io
Engineer, Infinispan
http://infinispan.org