On 9 Mar 2017, at 9:21, Joel Takvorian wrote:
If datapoints expire (that is, inventory "blobs"), their
underlying
metrics
will still be present in the database, even with no data associated.
We may want to delete them as well.
While I would expect them to also expire by TTL.
It is though a discussion we need to have, as this may not be
what is wanted:
Suppose you have an application with label app=shop.
Now that application has a component that comes from a 3rd party vendor
and which has a memory hole. The vendor is not willing to provide a
quick fix, but only at their usual 3 month cadence. To keep your shop
online, you will either 'manually' restart it every night. Or have a
liveness check for Kubernetes, which checks for free memory and when
this is exhausted will report the container as non-live. Kubernetes will
kill the pod and start a new one.
In any case here, you get a new pod for your app=shop and the old server
in inventory is no longer needed, but e.g. for capacity planning you
want to keep all metrics for app=shop.
What may hold true though is that e.g. after a week the granularity of
the data may decrease and we may only keep aggregates. In this case the
auto-expiry of the data could/would need to trigger an aggregation into
another time-series that holds the aggregates. And perhaps at that point
(as I wrote before) drop the container id(*) part from the metric id and
actually "aggregate into the label" E.g.
container1.cpu.usage{app=shop1} -> label.app=shop1.cpu.usage
container2.cpu.usage{app=shop1} -> label.app=shop1.cpu.usage
(the 'label.' prefix is used here to distinguish from "normal metrics"
for illustration purposes
If container 1 and 2 above happen to run in parallel then the "usual"
aggregation rules for a group would apply
label.app=shop1.cpu.usage.max = max (container1.cpu.usage{app=shop1},
container2.cpu.usage{app=shop1})
and so on. Basically aggregation of all parallel running containers and
over a time interval.