[infinispan-dev] Adaptive marshaller buffer sizes - ISPN-1102

Mon May 23 14:42:58 EDT 2011

On Mon, May 23, 2011 at 7:44 PM, Sanne Grinovero
<sanne.grinovero at gmail.com> wrote:
> To keep stuff simple, I'd add an alternative feature instead:
> have the custom externalizers to optionally recommend an allocation buffer size.
>
> In my experience people use a set of well known types for the key, and
> maybe for the value as well, for which they actually know the output
> byte size, so there's no point in Infinispan to try guessing the size
> and then adapting on it; an exception being the often used Strings,
> even in composite keys, but again as user of the API I have a pretty
> good idea of the size I'm going to need, for each object I store.
>

Excellent idea, if the custom externalizer can give us the exact size
of the serialized object we wouldn't need to do any guesswork.
I'm a bit worried about over-zealous externalizers that will spend
just as much computing the size of a complex object graph as they
spend on actually serializing the whole thing, but as long as our
internal externalizers are good examples I think we're ok.

Big plus: we could use the size of the serialized object to estimate
the memory usage of each cache entry, so maybe with this we could
finally constrain the cache to use a fixed amount of memory :)

> Also in MarshalledValue I see that an ExposedByteArrayOutputStream is
> created, and after serialization if the buffer is found to be bigger
> than the buffer we're referencing a copy is made to create an exact
> matching byte[].
> What about revamping the interface there, to expose the
> ExposedByteArrayOutputStream instead of byte[], up to the JGroups
> level?
>
> In case the value is not stored in binary form, the expected life of
> the stream is very short anyway, after being pushed directly to
> network buffers we don't need it anymore... couldn't we pass the
> non-truncated stream directly to JGroups without this final size
> adjustement ?
>
> Of course when values are stored in binary form it might make sense to
> save some memory, but again if that was an option I'd make use of it;
> in case of Lucene I can guess the size with a very good estimate (some
> bytes off), compared to buffer sizes of potentially many megabytes
> which I'd prefer to avoid copying - especially not interested in it to
> safe 2 bytes even if I where to store values in binary.
>

Yeah, but ExposedByteArrayOutputStream grows by 100% percent, so if
you're off by 1 in your size estimate you'll waste 50% of the memory
by keeping that buffer around.

Even if your estimate is perfect you're still wasting at least 32
bytes on a 64-bit machine: 16 bytes for the buffer object header + 8
for the array reference + 4 (rounded up to 8) for the count, though I
guess you could get that down to 4 bytes by keeping the byte[] and
count as members of MarshalledValue.

Besides, for Lucene couldn't you store the actual data separately as a
byte[] so that Infinispan doesn't wrap it in a MarshalledValue?

> Then if we just keep the ExposedByteArrayOutputStream around in the
> MarshalledValue, we could save some copying by replacing the
> "output.write(raw)" in writeObject(ObjectOutput output,
> MarshalledValue mv) by using a
> output.write( byte[] , offset, length );
>
> Cheers,
> Sanne
>
>
> 2011/5/23 Bela Ban <bban at redhat.com>:
>>
>>
>> On 5/23/11 6:15 PM, Dan Berindei wrote:
>>
>>> I totally agree, combining adaptive size with buffer reuse would be
>>> really cool. I imagine when passing the buffer to JGroups we'd still
>>> make an arraycopy, but we'd get rid of a lot of arraycopy calls to
>>> resize the buffer when the average object size is>  500 bytes. At the
>>> same time, if a small percentage of the objects are much bigger than
>>> the rest, we wouldn't reuse those huge buffers so we wouldn't waste
>>> too much memory.
>>
>>
>>  From my experience, reusing and syncing on a buffer will be slower than
>> making a simple arraycopy. I used to reuse buffers in JGroups, but got
>> better perf when I simply copied the buffer.
>> Plus the reservoir sampling's complexity is another source of bugs...
>>
>> --
>> Bela Ban
>> Lead JGroups / Clustering Team
>> JBoss
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev