[Hawkular-dev] HOSA and conversion from prometheus to hawkular metrics

Wed Feb 1 09:38:52 EST 2017

The past several days I've been working on an enhancement to HOSA that came in from the community (in fact, I would consider it a bug). I'm about ready to merge the PR [1] for this and do a HOSA 1.1.0.Final release. I wanted to post this to announce it and see if there is any feedback, too.

Today, HOSA collects metrics from any Prometheus endpoint which you declare - example:

   metrics
   - name: go_memstats_sys_bytes
   - name: process_max_fds
   - name: process_open_fds

But if a Prometheus metric has labels, Prometheus itself considers each metric with a unique combination of labels as an individual time series metric. This is different than how Hawkular Metric works - each Hawkular Metric metric ID (even if its metric definition or its datapoints have tags) is a single time series metric. We need to account for this difference. For example, if our agent is configured with:

   metrics:
   - name: jvm_memory_pool_bytes_committed

And the Prometheus endpoint emits that metric with a label called "pool" like this:

   jvm_memory_pool_bytes_committed{pool="Code Cache",} 2.7787264E7
   jvm_memory_pool_bytes_committed{pool="PS Eden Space",} 2.3068672E7

then to Prometheus this is actually 2 time series metrics (the number of bytes committed per pool type), not 1. Even though the metric name is the same (what Prometheus calls a "metric family name"), there are two unique combinations of labels - one with "Code Cache" and one with "PS Eden Space" - so they are 2 distinct time series metric data.

Today, the agent only creates a single Hawkular-Metric in this case, with each datapoint tagged with those Prometheus labels on the appropriate data point. But we don't want to aggregate them like that since we lose the granularity that the Prometheus endpoint gives us (that is, the number of bytes committed in each pool type). I will say I think we might be able to get that granularity back through datapoint tag queries in Hawkular-Metrics but I don't know how well (if at all) that is supported and how efficient such queries would be even if supported, and how efficient storage of these metrics would be if we tag every data point with these labels (not sure if that is the general purpose of tags in H-Metrics). But, regardless, the fact that these really are different time series metrics should (IMO) be represented as different time series metrics (via metric definitions/metric IDs) in Hawkular-Metrics.

To support labeled Prometheus endpoint data like this, the agent needs to split this one named metric into N Hawkular-Metrics metrics (where N is the number of unique label combinations for that named metric). So even though the agent is configured with the one metric "jvm_memory_pool_bytes_committed" we need to actually create two Hawkular-Metric metric definitions (with two different and unique metric IDs obviously).

The PR [1] that is ready to go does this. By default it will create multiple metric definitions/metric IDs in the form "metric-family-name{labelName1=labelValue1,labelName2=labelValue2,...}" unless you want a different form in which case you can define an "id" and put in "${labelName}" in the ID you declare (such as "${oneLabelName}_my_own_metric_name_${theOtherLabelName}" or whatever). But I suspect the default format will be what most people want and thus nothing needs to be done. In the above example, two metric definitions with the following IDs are created:

1. jvm_memory_pool_bytes_committed{pool=Code Cache}
2. jvm_memory_pool_bytes_committed{pool=PS Eden Space}

--John Mazz

[1] https://github.com/hawkular/hawkular-openshift-agent/pull/117