[Hawkular-dev] Garbage collection of outdated Servers/Metrics - especially with (orchestrated) containers

Heiko W.Rupp hrupp at redhat.com
Thu Mar 9 03:52:53 EST 2017


On 9 Mar 2017, at 9:21, Joel Takvorian wrote:

> If datapoints expire (that is, inventory "blobs"), their underlying 
> metrics
> will still be present in the database, even with no data associated. 
> We may want to delete them as well.

While I would expect them to also expire by TTL.

It is though a discussion we need to have, as  this may not be
what is wanted:
Suppose you have an application with label app=shop.
Now that application has a component that comes from a 3rd party vendor 
and which has a memory hole. The vendor is not willing to provide a 
quick fix, but only at their usual 3 month cadence. To keep your shop 
online, you will either 'manually' restart it every night. Or have a 
liveness check for Kubernetes, which checks for free memory and when 
this is exhausted will report the container as non-live. Kubernetes will 
kill the pod and start a new one.
In any case here, you get a new pod for your app=shop and the old server 
in inventory is no longer needed, but e.g. for capacity planning you 
want to keep all metrics for app=shop.

What may hold true though is that e.g. after a week the granularity of 
the data may decrease and we may only keep aggregates. In this case the 
auto-expiry of the data could/would need to trigger an aggregation into 
another time-series that holds the aggregates. And perhaps at that point 
(as I wrote before) drop the container id(*) part from the metric id and 
actually "aggregate into the label"  E.g.

container1.cpu.usage{app=shop1}  -> label.app=shop1.cpu.usage
container2.cpu.usage{app=shop1}  -> label.app=shop1.cpu.usage

(the 'label.' prefix is used here to distinguish from "normal metrics" 
for illustration purposes

If container 1 and 2 above happen to run in parallel then the "usual" 
aggregation rules for a group would apply

label.app=shop1.cpu.usage.max = max (container1.cpu.usage{app=shop1}, 
container2.cpu.usage{app=shop1})

and so on. Basically aggregation of all parallel running containers and 
over a time interval.


More information about the hawkular-dev mailing list