[Hawkular-dev] Garbage collection of outdated Servers/Metrics - especially with (orchestrated) containers

Wed Mar 15 09:11:22 EDT 2017

On 03/14/2017 11:29 PM, Larry O'Leary wrote:
> If I understand how tags are getting used, I think tags will provide 
> the ability to keep track of history regardless of whether
> But to take the burden of the user to tag each new container that 
> represents helloworld, I would hope that we can figure out how to do 
> this automatically and then allow the user to update/fix tags if they 
> are wrong?
>
I think most of your concerns are not our domain anymore, but instead 
how Openshift for example manages it and whether we can store that 
"inventory information". Here's one example of what we've extracted by 
gathering information:

{
   "tags": {
     "resource_id_description": "Identifier(s) specific to a metric",
     "labels": "deployment:router-1,deploymentconfig:router,router:router",
     "nodename": "localhost",
     "resource_id": "/",
     "type": "pod_container",
     "hostname": "localhost",
     "container_base_image": "openshift/origin-haproxy-router:latest",
     "namespace_id": "ef59e1bb-ea0d-11e6-9dc8-a0d3c1f893c0",
     "descriptor_name": "filesystem/usage",
     "pod_name": "router-1-bwvdt",
     "container_name": "router",
     "units": "bytes",
     "host_id": "localhost",
     "group_id": "router/filesystem/usage",
     "pod_namespace": "default",
     "pod_id": "fe42efce-ea0d-11e6-9dc8-a0d3c1f893c0",
     "namespace_name": "default"
   },
   "tenantId": "default",
   "dataRetention": 7,
   "minTimestamp": 1488967200000,
   "type": "gauge",
   "id": "router/fe42efce-ea0d-11e6-9dc8-a0d3c1f893c0/filesystem/usage//",
   "maxTimestamp": 1489581220000
}

If your "helloworld" is always running in the project 
"ef59e1bb-ea0d-11e6-9dc8-a0d3c1f893c0" then there's one search criteria 
that will not change even if you redeploy it or change the EAP to a 
newer version etc.

Next up we can look at descriptor_name (or group_id), that is the metric 
you might want to follow. This one does not change either, so you can 
still keep a track of the newest running instance in this way. labels 
are in Openshift which gives you the information you might seek, such as 
labeling it "application_name = Our ultimate frontend application". Now 
you can track with that name also, regardless of whether the pod changed 
underneath.

So these issues you've noted are mostly due to how you use Openshift. If 
you mark it there so that it can be followed, then our tags allow you to 
follow it. But now it gives you the ability to follow it, even if you 
have multiple pods running concurrently or if you update versions, the 
pod crashes, changes node etc.

But, you can also track all those options also. "Show me the 
filesystem/usage between node1 and node2" or "show me how the previous 
version compared to new version". Or list all the instances running 
physically on node "localhost".

We don't do inventoring for the container environments, but we try to 
gather as much information as possible to allow the end-user to create 
their own groupings without restricting the possibilities. And in 
dynamic environment.

I'm not sure if I answered to your problem definition - as I don't think 
it's in our hands anymore. This comment might not apply to ManageIQ 
environment ;)

Now, once this pod goes down this metricId will not receive new metrics 
(new pod_id will come up and new metricId created). We don't however 
delete this history of old pod_ids and their metric definitions even if 
the datapoints associated with them have expired. Your queries - if done 
with tags - do not change and they do not have to care how many times 
the underlying pod was restarted. Which I think was one of your worries. 
Consider the metricId as internal information of Hawkular-Metrics.

   - Micke