[Hawkular-dev] Little research help with collection times..

Mon Mar 6 06:41:42 EST 2017

Hi,

I could use some help if someone has access to this sort of data. I'm 
trying to find our collection interval fluctuations when using our 
collectors. The actual collection interval is not interesting, but the 
amount it's delayed per run and how that changes.

Currently we use the default's from the Gorilla paper (with the typo 
that the paper had), which gives us ranges: [-63,64] (7 bits), 
[-255,256] (9 bits), [-2047,2048] (12 bits) and rest. First one needs 2 
bits as metadata, second needs 3 and third needs already 4 bits.

Those ranges were done by the Facebook guys from their data, but that 
was using the second precision. We use milliseconds like for example 
Prometheus does, and they use ranges for 14,17,20 meaningful bits 
(IIRC). I don't think their values are however useful to us as from what 
I've gathered from my small set of data, our best ranges are somewhere 
around:

[-256,256] millisecond difference in delta of delta, D = (tn - t(n-1) - 
(t(n-1) - t(n-2))

[-512,512] milliseconds

[-2048,2048] milliseconds

First two are probably wiser to merge to a single range and ignore the 
first one. While we store one extra bit of info, we save one on the 
metadata. Last one could technically be something like "missed one 
datapoint", but then that would be collection interval and I'm not sure 
if we have any target value for that in the long range.

But, I'm guessing too much. From the hawkular-openshift-agent I can see 
that values are fetched in sequential mode, so delay in fetching one 
metric will cause the next one to be delayed and so on. I can see from 
an empty instance how much this time is, but when running in a congested 
environment, what can we expect?

Saving a bit or two might not sound like much, but lets say with 30 000 
pods in Openshift, 32 metrics per pod in 15 seconds interval we're 
talking about 5 529 600 000 datapoints per day and it adds up (659MB per 
day saved space with a single bit). So getting the ranges correct nets 
us a lot of savings.

   - Micke