Hi all,
Re:
https://issues.jboss.org/browse/ISPN-1102
First of all thanks to Dan for his suggestion on reservoir sampling+percentiles, very good
suggestion:). So, I'm looking into this and Trustin's
http://docs.jboss.org/netty/3.2/api/org/jboss/netty/channel/AdaptiveRecei...
but in this email I wanted to discuss the reservoir sampling mechanism
(
http://en.wikipedia.org/wiki/Reservoir_sampling).
So, the way I look at it, to implement this you'd keep track of N buffer sizes used so
far, and out of those chose K samples based on reservoir sampling, and then of those K
samples left, take the 90th percentile.
Calculating the percentile is easy with those K samples stored in an ordered collection.
Now, my problem with this is that reservoir sampling is an O(n) operation and you would
not want to be doing that per each request for a buffer that comes in.
One option I can think of that instead of ever letting a user thread calculate this, the
user thread could just feed the buffer size collection (a concurrent collection) and we
could have a thread in the background that periodically or based on some threshold
calculates the reservoir sample + percentile and this is what's used as next buffer
size. My biggest problem here is the concurrent collection in the middle. You could have a
priority queue ordered by buffer sizes but it's unbounded. The concurrent collection
does not require to be ordered though, the reservoir sampling could do that, but you want
it the collection bounded. But if bounded and the limit is hit, you would not want it to
block but instead override values remove the last element and insert again. You only care
about the last N relevant buffer sizes...
Another option that would avoid the use of a concurrent collection would be if this was
calculated per thread and stored in a thread local. The calculation could be done every X
requests still in the client thread, or could be sent to a separate thread wrapping it
around a callable and keeping the future as thread local, you could query it next time the
thread wants to marshall something.
I feel a bit more inclined towards the latter option although it limits the calculation to
be per-thread for several reasons:
- We already have org.jboss.marshalling.Marshaller and org.jboss.marshalling.Unmarshaller
instances as thread local which have proven to perform well.
- So we could tap into this set up to maintain not only the marshaller, but the size of
the buffer too.
- It could offer the possibility of being extended further to avoid creating buffers all
the time and instead reuse them as long as the size is constant. After a size
recalculation we'd ditch it and create a new one.
Thoughts?
--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache