On Mon, May 23, 2011 at 7:44 PM, Sanne Grinovero
<sanne.grinovero(a)gmail.com> wrote:
To keep stuff simple, I'd add an alternative feature instead:
have the custom externalizers to optionally recommend an allocation buffer size.
In my experience people use a set of well known types for the key, and
maybe for the value as well, for which they actually know the output
byte size, so there's no point in Infinispan to try guessing the size
and then adapting on it; an exception being the often used Strings,
even in composite keys, but again as user of the API I have a pretty
good idea of the size I'm going to need, for each object I store.
Excellent idea, if the custom externalizer can give us the exact size
of the serialized object we wouldn't need to do any guesswork.
I'm a bit worried about over-zealous externalizers that will spend
just as much computing the size of a complex object graph as they
spend on actually serializing the whole thing, but as long as our
internal externalizers are good examples I think we're ok.
Big plus: we could use the size of the serialized object to estimate
the memory usage of each cache entry, so maybe with this we could
finally constrain the cache to use a fixed amount of memory :)
Also in MarshalledValue I see that an ExposedByteArrayOutputStream
is
created, and after serialization if the buffer is found to be bigger
than the buffer we're referencing a copy is made to create an exact
matching byte[].
What about revamping the interface there, to expose the
ExposedByteArrayOutputStream instead of byte[], up to the JGroups
level?
In case the value is not stored in binary form, the expected life of
the stream is very short anyway, after being pushed directly to
network buffers we don't need it anymore... couldn't we pass the
non-truncated stream directly to JGroups without this final size
adjustement ?
Of course when values are stored in binary form it might make sense to
save some memory, but again if that was an option I'd make use of it;
in case of Lucene I can guess the size with a very good estimate (some
bytes off), compared to buffer sizes of potentially many megabytes
which I'd prefer to avoid copying - especially not interested in it to
safe 2 bytes even if I where to store values in binary.
Yeah, but ExposedByteArrayOutputStream grows by 100% percent, so if
you're off by 1 in your size estimate you'll waste 50% of the memory
by keeping that buffer around.
Even if your estimate is perfect you're still wasting at least 32
bytes on a 64-bit machine: 16 bytes for the buffer object header + 8
for the array reference + 4 (rounded up to 8) for the count, though I
guess you could get that down to 4 bytes by keeping the byte[] and
count as members of MarshalledValue.
Besides, for Lucene couldn't you store the actual data separately as a
byte[] so that Infinispan doesn't wrap it in a MarshalledValue?
Then if we just keep the ExposedByteArrayOutputStream around in the
MarshalledValue, we could save some copying by replacing the
"output.write(raw)" in writeObject(ObjectOutput output,
MarshalledValue mv) by using a
output.write( byte[] , offset, length );
Cheers,
Sanne
2011/5/23 Bela Ban <bban(a)redhat.com>:
>
>
> On 5/23/11 6:15 PM, Dan Berindei wrote:
>
>> I totally agree, combining adaptive size with buffer reuse would be
>> really cool. I imagine when passing the buffer to JGroups we'd still
>> make an arraycopy, but we'd get rid of a lot of arraycopy calls to
>> resize the buffer when the average object size is> 500 bytes. At the
>> same time, if a small percentage of the objects are much bigger than
>> the rest, we wouldn't reuse those huge buffers so we wouldn't waste
>> too much memory.
>
>
> From my experience, reusing and syncing on a buffer will be slower than
> making a simple arraycopy. I used to reuse buffers in JGroups, but got
> better perf when I simply copied the buffer.
> Plus the reservoir sampling's complexity is another source of bugs...
>
> --
> Bela Ban
> Lead JGroups / Clustering Team
> JBoss
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev