[infinispan-dev] Adaptive marshaller buffer sizes - ISPN-1102

Mon May 23 15:12:35 EDT 2011

2011/5/23 Dan Berindei <dan.berindei at gmail.com>:
> On Mon, May 23, 2011 at 7:44 PM, Sanne Grinovero
> <sanne.grinovero at gmail.com> wrote:
>> To keep stuff simple, I'd add an alternative feature instead:
>> have the custom externalizers to optionally recommend an allocation buffer size.
>>
>> In my experience people use a set of well known types for the key, and
>> maybe for the value as well, for which they actually know the output
>> byte size, so there's no point in Infinispan to try guessing the size
>> and then adapting on it; an exception being the often used Strings,
>> even in composite keys, but again as user of the API I have a pretty
>> good idea of the size I'm going to need, for each object I store.
>>
>
> Excellent idea, if the custom externalizer can give us the exact size
> of the serialized object we wouldn't need to do any guesswork.
> I'm a bit worried about over-zealous externalizers that will spend
> just as much computing the size of a complex object graph as they
> spend on actually serializing the whole thing, but as long as our
> internal externalizers are good examples I think we're ok.

I'd expect many to return a static constant, not to compute anything
actually. Didn't prototype it, I'm just tossing some ideas so I might
miss something.

> Big plus: we could use the size of the serialized object to estimate
> the memory usage of each cache entry, so maybe with this we could
> finally constrain the cache to use a fixed amount of memory :)

that's very interesting!

>> Also in MarshalledValue I see that an ExposedByteArrayOutputStream is
>> created, and after serialization if the buffer is found to be bigger
>> than the buffer we're referencing a copy is made to create an exact
>> matching byte[].
>> What about revamping the interface there, to expose the
>> ExposedByteArrayOutputStream instead of byte[], up to the JGroups
>> level?
>>
>> In case the value is not stored in binary form, the expected life of
>> the stream is very short anyway, after being pushed directly to
>> network buffers we don't need it anymore... couldn't we pass the
>> non-truncated stream directly to JGroups without this final size
>> adjustement ?
>>
>> Of course when values are stored in binary form it might make sense to
>> save some memory, but again if that was an option I'd make use of it;
>> in case of Lucene I can guess the size with a very good estimate (some
>> bytes off), compared to buffer sizes of potentially many megabytes
>> which I'd prefer to avoid copying - especially not interested in it to
>> safe 2 bytes even if I where to store values in binary.
>>
>
> Yeah, but ExposedByteArrayOutputStream grows by 100% percent, so if
> you're off by 1 in your size estimate you'll waste 50% of the memory
> by keeping that buffer around.

Not really, the current implementation has some smart logic in that
area which reduces the potential waste to 20%;
there's a "max doubling threshold" in bytes, we could set that to the
expected value or work on the idea.

Anyway if I happen to be often "off by one", I'd better add +1 to what
my externalizer computes as estimate ;)

>
> Even if your estimate is perfect you're still wasting at least 32
> bytes on a 64-bit machine: 16 bytes for the buffer object header + 8
> for the array reference + 4 (rounded up to 8) for the count, though I
> guess you could get that down to 4 bytes by keeping the byte[] and
> count as members of MarshalledValue.

this is true, but only if you have to keep the
ExposedByteArrayOutputStream for longer after the RPC, otherwise the
overhead
is zero as it's being thrown away very quickly - exactly as is done now.

Maybe we should write a couple of alternative implementations for
MarshalledValue, depending on which options are enabled as it's
getting complex.

> Besides, for Lucene couldn't you store the actual data separately as a
> byte[] so that Infinispan doesn't wrap it in a MarshalledValue?

exactly as we do ;) so in the Lucene case these optimizations apply to
the keys only, which are mostly tuples like "long, int, String",
considering that the strings contain very short filenames I'd love to:
a) default to something smaller than the current 128B
b) no need to make another copy if I've overestimated  by a couple of
bytes, the buffer is very short lived anyway.

Cheers,
Sanne

>
>> Then if we just keep the ExposedByteArrayOutputStream around in the
>> MarshalledValue, we could save some copying by replacing the
>> "output.write(raw)" in writeObject(ObjectOutput output,
>> MarshalledValue mv) by using a
>> output.write( byte[] , offset, length );
>>
>> Cheers,
>> Sanne
>>
>>
>> 2011/5/23 Bela Ban <bban at redhat.com>:
>>>
>>>
>>> On 5/23/11 6:15 PM, Dan Berindei wrote:
>>>
>>>> I totally agree, combining adaptive size with buffer reuse would be
>>>> really cool. I imagine when passing the buffer to JGroups we'd still
>>>> make an arraycopy, but we'd get rid of a lot of arraycopy calls to
>>>> resize the buffer when the average object size is>  500 bytes. At the
>>>> same time, if a small percentage of the objects are much bigger than
>>>> the rest, we wouldn't reuse those huge buffers so we wouldn't waste
>>>> too much memory.
>>>
>>>
>>>  From my experience, reusing and syncing on a buffer will be slower than
>>> making a simple arraycopy. I used to reuse buffers in JGroups, but got
>>> better perf when I simply copied the buffer.
>>> Plus the reservoir sampling's complexity is another source of bugs...
>>>
>>> --
>>> Bela Ban
>>> Lead JGroups / Clustering Team
>>> JBoss
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>