Hi,
I could use some help if someone has access to this sort of data. I'm
trying to find our collection interval fluctuations when using our
collectors. The actual collection interval is not interesting, but the
amount it's delayed per run and how that changes.
Currently we use the default's from the Gorilla paper (with the typo
that the paper had), which gives us ranges: [-63,64] (7 bits),
[-255,256] (9 bits), [-2047,2048] (12 bits) and rest. First one needs 2
bits as metadata, second needs 3 and third needs already 4 bits.
Those ranges were done by the Facebook guys from their data, but that
was using the second precision. We use milliseconds like for example
Prometheus does, and they use ranges for 14,17,20 meaningful bits
(IIRC). I don't think their values are however useful to us as from what
I've gathered from my small set of data, our best ranges are somewhere
around:
[-256,256] millisecond difference in delta of delta, D = (tn - t(n-1) -
(t(n-1) - t(n-2))
[-512,512] milliseconds
[-2048,2048] milliseconds
First two are probably wiser to merge to a single range and ignore the
first one. While we store one extra bit of info, we save one on the
metadata. Last one could technically be something like "missed one
datapoint", but then that would be collection interval and I'm not sure
if we have any target value for that in the long range.
But, I'm guessing too much. From the hawkular-openshift-agent I can see
that values are fetched in sequential mode, so delay in fetching one
metric will cause the next one to be delayed and so on. I can see from
an empty instance how much this time is, but when running in a congested
environment, what can we expect?
Saving a bit or two might not sound like much, but lets say with 30 000
pods in Openshift, 32 metrics per pod in 15 seconds interval we're
talking about 5 529 600 000 datapoints per day and it adds up (659MB per
day saved space with a single bit). So getting the ranges correct nets
us a lot of savings.
- Micke
Show replies by date