[infinispan-dev] Adaptive marshaller buffer sizes - ISPN-1102

Mon May 23 12:44:32 EDT 2011

To keep stuff simple, I'd add an alternative feature instead:
have the custom externalizers to optionally recommend an allocation buffer size.

In my experience people use a set of well known types for the key, and
maybe for the value as well, for which they actually know the output
byte size, so there's no point in Infinispan to try guessing the size
and then adapting on it; an exception being the often used Strings,
even in composite keys, but again as user of the API I have a pretty
good idea of the size I'm going to need, for each object I store.

Also in MarshalledValue I see that an ExposedByteArrayOutputStream is
created, and after serialization if the buffer is found to be bigger
than the buffer we're referencing a copy is made to create an exact
matching byte[].
What about revamping the interface there, to expose the
ExposedByteArrayOutputStream instead of byte[], up to the JGroups
level?

In case the value is not stored in binary form, the expected life of
the stream is very short anyway, after being pushed directly to
network buffers we don't need it anymore... couldn't we pass the
non-truncated stream directly to JGroups without this final size
adjustement ?

Of course when values are stored in binary form it might make sense to
save some memory, but again if that was an option I'd make use of it;
in case of Lucene I can guess the size with a very good estimate (some
bytes off), compared to buffer sizes of potentially many megabytes
which I'd prefer to avoid copying - especially not interested in it to
safe 2 bytes even if I where to store values in binary.

Then if we just keep the ExposedByteArrayOutputStream around in the
MarshalledValue, we could save some copying by replacing the
"output.write(raw)" in writeObject(ObjectOutput output,
MarshalledValue mv) by using a
output.write( byte[] , offset, length );

Cheers,
Sanne

2011/5/23 Bela Ban <bban at redhat.com>:
>
>
> On 5/23/11 6:15 PM, Dan Berindei wrote:
>
>> I totally agree, combining adaptive size with buffer reuse would be
>> really cool. I imagine when passing the buffer to JGroups we'd still
>> make an arraycopy, but we'd get rid of a lot of arraycopy calls to
>> resize the buffer when the average object size is>  500 bytes. At the
>> same time, if a small percentage of the objects are much bigger than
>> the rest, we wouldn't reuse those huge buffers so we wouldn't waste
>> too much memory.
>
>
>  From my experience, reusing and syncing on a buffer will be slower than
> making a simple arraycopy. I used to reuse buffers in JGroups, but got
> better perf when I simply copied the buffer.
> Plus the reservoir sampling's complexity is another source of bugs...
>
> --
> Bela Ban
> Lead JGroups / Clustering Team
> JBoss
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>