[infinispan-dev] Adaptive marshaller buffer sizes - ISPN-1102

Mon May 23 18:01:37 EDT 2011

On Tue, May 24, 2011 at 12:13 AM, Sanne Grinovero
<sanne.grinovero at gmail.com> wrote:
> 2011/5/23 Bela Ban <bban at redhat.com>:
>>
>>
>> On 5/23/11 8:42 PM, Dan Berindei wrote:
>>> On Mon, May 23, 2011 at 7:44 PM, Sanne Grinovero
>>> <sanne.grinovero at gmail.com>  wrote:
>>>> To keep stuff simple, I'd add an alternative feature instead:
>>>> have the custom externalizers to optionally recommend an allocation buffer size.
>>>>
>>>> In my experience people use a set of well known types for the key, and
>>>> maybe for the value as well, for which they actually know the output
>>>> byte size, so there's no point in Infinispan to try guessing the size
>>>> and then adapting on it; an exception being the often used Strings,
>>>> even in composite keys, but again as user of the API I have a pretty
>>>> good idea of the size I'm going to need, for each object I store.
>>>>
>>>
>>> Excellent idea, if the custom externalizer can give us the exact size
>>> of the serialized object we wouldn't need to do any guesswork.
>>> I'm a bit worried about over-zealous externalizers that will spend
>>> just as much computing the size of a complex object graph as they
>>> spend on actually serializing the whole thing, but as long as our
>>> internal externalizers are good examples I think we're ok.
>>>
>>> Big plus: we could use the size of the serialized object to estimate
>>> the memory usage of each cache entry, so maybe with this we could
>>> finally constrain the cache to use a fixed amount of memory :)
>>
>>
>> I don't think this is a good idea because most people won't be able to
>> guess the right buffer sizes. Giving inncorrect buffer sizes might even
>> lead to performance degradation, until the buffers have expanded...
>>
>> For example, would you guys be able to guess the buffer sizes of
>> Infinispan used in JBoss AS ? We're placing not just session data, but
>> all sorts of crap into the cache, so I for one wouldn't be able to even
>> give you a best estimate...
>
> that's right, I have no clue on that. As you guessed, I was referring
> to the buffers being created to marshall and send and object,
> and proposing an optional method: in fact for my keys I can provide a
> good estimate and the current defaults are quite far from it.
>

The subject of the thread is "Adaptive marshaller buffer sizes", so
I'm pretty sure Galder was thinking about that as well :-)
He probably wasn't thinking about MarshalledValue, but of
AbstractMarshaller, where we use a default estimate of 512 bytes. I
guess the fact that we have two defaults (128 and 512) for essentially
the same thing proves that we need something better...

For some objects, like the Lucene keys and values, it's really easy to
compute the size (relying on the Infinispan internal externalizers to
provide the serialized size for primitives and JDK classes). For
others, navigating the object graph to get a reliable size is too
complicated and we may want to use an estimate instead.

> But going back to your question, it would be neat if people could
> "profile" their use case by enabling some logger to output needed data
> from
> stress runs, and based on that provide a simple hint like initial
> buffer size & thresholds to apply.
> Also being able to plug in some "smart" implementation as proposed by
> Galder in the first post could be useful for some, even though we
> might want to avoid that in the default configuration.
>
> When used as Hibernate second level cache, sizes for both keys and
> values are quite well defined for every cache region; could be an
> interesting case to automate such optimizations.
>

In the general case you could have the application using 1MB values in
the first 1 minute and then 1K values for the rest of the
application's lifecycle, so a single estimate size won't be good for
the entire lifecycle of the application. That's why I liked the
reservoir sampling idea so much when I read about it, your value size
can change dramatically and the buffer size will still adapt to the
new size - but the later sizes carry a smaller weight, so the
estimated size will change less and less. It's just like profiling,
only you don't have to compromise on a single estimate.

The other thing about using percentiles is that you know exactly what
you're getting: if you set the estimate buffer size to the 90th
percentile then you know that 90% of the requests will not require a
buffer resize and 10% will. Sure, it's not bulletproof, but it's
better than any hardcoded value could ever hope to be.

Perhaps we can combine the two and use adaptive size estimation only
if the custom externalizer can't provide an accurate size (or
providing an accurate size would be too costly). Or maybe an
externalizer for complex objects could use our buffer size predictor
so that it doesn't have to compute the size every time. I don't know,
but I know I'd really like to play with this stuff ;-)

Cheers
Dan

> Cheers,
> Sanne
>
>>
>> --
>> Bela Ban
>> Lead JGroups / Clustering Team
>> JBoss
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev