The past several days I've been working on an enhancement to HOSA that came in from
the community (in fact, I would consider it a bug). I'm about ready to merge the PR
[1] for this and do a HOSA 1.1.0.Final release. I wanted to post this to announce it and
see if there is any feedback, too.
Today, HOSA collects metrics from any Prometheus endpoint which you declare - example:
metrics
- name: go_memstats_sys_bytes
- name: process_max_fds
- name: process_open_fds
But if a Prometheus metric has labels, Prometheus itself considers each metric with a
unique combination of labels as an individual time series metric. This is different than
how Hawkular Metric works - each Hawkular Metric metric ID (even if its metric definition
or its datapoints have tags) is a single time series metric. We need to account for this
difference. For example, if our agent is configured with:
metrics:
- name: jvm_memory_pool_bytes_committed
And the Prometheus endpoint emits that metric with a label called "pool" like
this:
jvm_memory_pool_bytes_committed{pool="Code Cache",} 2.7787264E7
jvm_memory_pool_bytes_committed{pool="PS Eden Space",} 2.3068672E7
then to Prometheus this is actually 2 time series metrics (the number of bytes committed
per pool type), not 1. Even though the metric name is the same (what Prometheus calls a
"metric family name"), there are two unique combinations of labels - one with
"Code Cache" and one with "PS Eden Space" - so they are 2 distinct
time series metric data.
Today, the agent only creates a single Hawkular-Metric in this case, with each datapoint
tagged with those Prometheus labels on the appropriate data point. But we don't want
to aggregate them like that since we lose the granularity that the Prometheus endpoint
gives us (that is, the number of bytes committed in each pool type). I will say I think we
might be able to get that granularity back through datapoint tag queries in
Hawkular-Metrics but I don't know how well (if at all) that is supported and how
efficient such queries would be even if supported, and how efficient storage of these
metrics would be if we tag every data point with these labels (not sure if that is the
general purpose of tags in H-Metrics). But, regardless, the fact that these really are
different time series metrics should (IMO) be represented as different time series metrics
(via metric definitions/metric IDs) in Hawkular-Metrics.
To support labeled Prometheus endpoint data like this, the agent needs to split this one
named metric into N Hawkular-Metrics metrics (where N is the number of unique label
combinations for that named metric). So even though the agent is configured with the one
metric "jvm_memory_pool_bytes_committed" we need to actually create two
Hawkular-Metric metric definitions (with two different and unique metric IDs obviously).
The PR [1] that is ready to go does this. By default it will create multiple metric
definitions/metric IDs in the form
"metric-family-name{labelName1=labelValue1,labelName2=labelValue2,...}" unless
you want a different form in which case you can define an "id" and put in
"${labelName}" in the ID you declare (such as
"${oneLabelName}_my_own_metric_name_${theOtherLabelName}" or whatever). But I
suspect the default format will be what most people want and thus nothing needs to be
done. In the above example, two metric definitions with the following IDs are created:
1. jvm_memory_pool_bytes_committed{pool=Code Cache}
2. jvm_memory_pool_bytes_committed{pool=PS Eden Space}
--John Mazz
[1]
https://github.com/hawkular/hawkular-openshift-agent/pull/117