[Hawkular-dev] Garbage collection of outdated Servers/Metrics - especially with (orchestrated) containers

Wed Mar 8 11:54:46 EST 2017

Hi,

I know I brought that up in the past ... (and I am sure Juca will easily 
find where :-)

In the good old datacenter, one deployed an application server with 
application, added it to monitoring and it hummed there happily. The 
server sometimes got rebooted for OS updates or other things, but the 
app-server always stayed the same and the monitoring system knew all its 
pets by their name and the admins happily did this: 
http://www.starwars-union.de/bilder/news/20110401_sunset.jpg

Nowadays in orchestration system the situation more looks like this
http://www.animationsfilme.ch/wp-content/uploads/2013/07/DespicableMe_01.jpg

where containers come and go and an application in container once it has 
died is not re-started but a new container with its own ID is started.

Of course we can identify applications with labels so that I don't need 
to know the container id. So I can gather and display metrics for those 
and all is fine.

But: all those containers will create new
- metric ids
- inventory entries
- ???

The question is now: how long do we want/need to keep them?
Hawkular-metrics has a TTL for the datapoints, but I think metric 
definitions are not evicted.
Similar a container being killed can't easily tell inventory that its 
entry can go away.

For inventory-new we could use the expiration feature of 
Hawkular-metrics for datapoints, where e.g. the agent would regularly 
sync data and thus refresh the last-seen time to keep an entry "alive".

Also for the pure metrics - how much of historic data do we want/need. 
And if we would e.g. aggregate those for long(er) term storage I think 
we could perhaps actually aggregate over all individual time series over 
many parallel pods and aggregate them into one for the entire 
application.

   Heiko