[Hawkular-dev] HOSA and conversion from prometheus to hawkular metrics

Joel Takvorian jtakvori at redhat.com
Fri Feb 17 02:44:16 EST 2017


For the curly braces in Grafana, I'm going to investigate.

For your second point, I'm trying to put me in the shoes of an ops: if I
want to create a dashboard that shows a labelled metric (in term of
prometheus label), I'd like to see all its avatars in the same chart to be
able to compare them, see in what they converge or in what they diverge.
And maybe compare them in all pods of a given container name. That would be
queries with tags:

Query tags:
- container_name: something
- family_name (or "metric_base_name", or whatever name we give to that
tag): what_i_ate

I can't be 100% sure that it's going to be used, as people do what they
want in Grafana. But it seems interesting to me. The question is: what's
the cost of adding a tag? I believe metric tags are relatively cheap in
term of storage. So, having both "metric_name" (what_i_ate{food=Banana})
and "family_name" (what_i_ate) would solve all our issues, no?



On Thu, Feb 16, 2017 at 6:23 PM, John Mazzitelli <mazz at redhat.com> wrote:

> I need to resurrect this thread now that some others have had experience
> with what we have - specifically, what Thomas reported in this issue:
> https://github.com/hawkular/hawkular-openshift-agent/issues/126
>
> It has to do with Prometheus metrics and how HOSA names and tags them in
> H-Metrics.
>
> Just some quick background first:
>
> Prometheus metrics have two parts - a "family name" (like
> "http_response_count") and labels (like "method"). This means you can have
> N metrics in Prometheus with the same metric family name but each with
> different label values (like "http_response_count{method=GET}" and
> "http_response_count{method=POST}". Each unique combination of family
> name plus label values represent a different set of time series data (so
> http_response_count{method=GET} is one set of time series data and
> http_response_count{method=POST} is another set of time series data).
>
> H-Metrics doesn't really have this concept of metric family. H-Metrics has
> metric definitions each with unique names (or "metric IDs") and a set of
> tags (h-metrics uses the name "tags" rather than "labels"). In H-Metrics,
> you cannot have N metrics with the same name (ID). You must have unique IDs
> to represent different sets of time series data.
>
> OK, with that quick intro, two things:
>
> =====
>
> 1) Metrics coming from Prometheus by default will be stored in H-Metrics
> with metric IDs like:
>
>    metric_family_name{label_name1=value1,label_name2=value2}
>
> Basically, HOSA stores the H-Metric ID so it looks identical to the metric
> data coming from Prometheus endpoints (name with labels comma-separated and
> enclosed within curly braces).
>
> But Grafana might have issues with the curly braces. However, the original
> opinion when this was first implemented in HOSA was that just using
> underscores in H-Metrics IDs, for example:
>
>    metric_family_name_label_name1_value1_label_name2_value2
>
> will make querying from H-Metrics more difficult (it all looks like one
> big name and it is hard to distinguish the labels in the name).
>
> QUESTION #1a: Does Grafana really have an issue with displaying metrics
> whose names have curly braces - {} - and commas in them?
> QUESTION #1b: If so, what should the default metric ID look like when we
> have Prometheus labels like this, if not by using curly braces and commas?
>
> =====
>
> 2) These Prometheus metrics don't look right in the current OpenShift UI.
> If we have two Prometheus metrics stored in H-Metrics with the IDs:
>
> what_i_ate{food=Banana}
> what_i_ate{food=Apple}
>
> what you see in the OpenShift UI console is two metric graphs each with
> the same metric name "what_i_ate" - you don't know which ones they are.
>
> Why? Application metrics like these are now shown in the OpenShift UI and
> it works fine even for Prometheus metrics UNLESS the Prometheus metrics had
> labels (like the example above with Prometheus labels food=Apple or
> food=Banana). This is because when we tag these metrics in H-Metrics, one
> tag we add to the metric definition is "metric_name" and for Prometheus the
> value of this tag is the METRIC FAMILY name. This is what Joel was asking
> for (see the last messages in this thread). But the OS UI console uses this
> metric_name tag for the label of the graph (the full, real ID of the metric
> is ugly to make sure its unique within the cluster - e.g.
> "pod/3e4553ew-34553d-345433-123a/custom/what_i_ate{food=Banana}" - so we
> don't really want to show that to a user).
>
> QUESTION #2a: Should I switch back and make metric_name be the last part
> of the actual metric ID (not Prometheus family name) like
> "what_i_ate{food=Banana}" so the OS UI console works? Or do we fix the OS
> UI console to parse the full metric ID and only show the last part (after
> the "/custom/" part) thus leaving "metric_name" tag in H-Metrics be the
> Prometheus metric family name and make querying easier (a-la Joel's
> suggestion).
>
> QUESTION #2b: Is having metric family name a useful thing to have as a
> H-Metric tag in the first place? If so, I will have to get HOSA to create a
> new tag "base_metric_name" if "metric_name" is to be fixed to get the OS UI
> to work. But does having the Prometheus metric family name even a useful
> thing? Joel seemed to think so; I would like to make sure it is a useful
> thing before I go and implement this change.
>
> ----- Forwarded Message -----
> From: "John Mazzitelli" <mazz at redhat.com>
> To: "Discussions around Hawkular development" <
> hawkular-dev at lists.jboss.org>
> Sent: Wednesday, February 1, 2017 11:47:18 AM
> Subject: Re: [Hawkular-dev] HOSA and conversion from prometheus to
> hawkular metrics
>
> https://github.com/hawkular/hawkular-openshift-agent/blob/
> master/deploy/openshift/hawkular-openshift-agent-configmap.yaml#L20
>
> :D
>
> That's already there - the ${METRIC:name} resolves to the name of the
> metric (not the new ID) and our default config puts that tag on every
> metric.
>
>
> ----- Original Message -----
> >
> > +1, if that is not being done I think it would good. Actually, it's
> probably
> > a good "best practice" as it make it easier to slice and dice the data.
> >
> > On 2/1/2017 10:35 AM, Joel Takvorian wrote:
> >
> >
> >
> > +1
> >
> > Conversion based on labels seems more sane.
> >
> > I wonder if a new tag that recalls the prometheus metric name would be
> > useful; ex. "baseName=jvm_memory_pool_bytes_committed", to retrieve all
> > metrics of that family. Just an idea.
> >
> > On Wed, Feb 1, 2017 at 4:25 PM, John Mazzitelli < mazz at redhat.com >
> wrote:
> >
> >
> > > Are you also tagging the Prometheus metrics with the labels?
> >
> > Yes, that is what was originally being done, and that is still in there.
> >
> > ----- Original Message -----
> > >
> > > Mazz, this makes sense to me. Our decision to use unique ids (well
> +type)
> > > is
> > > going to lead to this sort of thing. The ids are going to basically be
> > > large
> > > concatenations of the tags that identify the data. Then, additionally
> we're
> > > going to have to tag the metrics with the same name/value pairs that
> are
> > > present in the id. Are you also tagging the Prometheus metrics with the
> > > labels?
> > >
> > > On 2/1/2017 9:38 AM, John Mazzitelli wrote:
> > >
> > >
> > >
> > > The past several days I've been working on an enhancement to HOSA that
> came
> > > in from the community (in fact, I would consider it a bug). I'm about
> ready
> > > to merge the PR [1] for this and do a HOSA 1.1.0.Final release. I
> wanted to
> > > post this to announce it and see if there is any feedback, too.
> > >
> > > Today, HOSA collects metrics from any Prometheus endpoint which you
> declare
> > > -
> > > example:
> > >
> > > metrics
> > > - name: go_memstats_sys_bytes
> > > - name: process_max_fds
> > > - name: process_open_fds
> > >
> > > But if a Prometheus metric has labels, Prometheus itself considers each
> > > metric with a unique combination of labels as an individual time series
> > > metric. This is different than how Hawkular Metric works - each
> Hawkular
> > > Metric metric ID (even if its metric definition or its datapoints have
> > > tags)
> > > is a single time series metric. We need to account for this
> difference. For
> > > example, if our agent is configured with:
> > >
> > > metrics:
> > > - name: jvm_memory_pool_bytes_committed
> > >
> > > And the Prometheus endpoint emits that metric with a label called
> "pool"
> > > like
> > > this:
> > >
> > > jvm_memory_pool_bytes_committed{pool="Code Cache",} 2.7787264E7
> > > jvm_memory_pool_bytes_committed{pool="PS Eden Space",} 2.3068672E7
> > >
> > > then to Prometheus this is actually 2 time series metrics (the number
> of
> > > bytes committed per pool type), not 1. Even though the metric name is
> the
> > > same (what Prometheus calls a "metric family name"), there are two
> unique
> > > combinations of labels - one with "Code Cache" and one with "PS Eden
> Space"
> > > - so they are 2 distinct time series metric data.
> > >
> > > Today, the agent only creates a single Hawkular-Metric in this case,
> with
> > > each datapoint tagged with those Prometheus labels on the appropriate
> data
> > > point. But we don't want to aggregate them like that since we lose the
> > > granularity that the Prometheus endpoint gives us (that is, the number
> of
> > > bytes committed in each pool type). I will say I think we might be
> able to
> > > get that granularity back through datapoint tag queries in
> Hawkular-Metrics
> > > but I don't know how well (if at all) that is supported and how
> efficient
> > > such queries would be even if supported, and how efficient storage of
> these
> > > metrics would be if we tag every data point with these labels (not
> sure if
> > > that is the general purpose of tags in H-Metrics). But, regardless, the
> > > fact
> > > that these really are different time series metrics should (IMO) be
> > > represented as different time series metrics (via metric
> definitions/metric
> > > IDs) in Hawkular-Metrics.
> > >
> > > To support labeled Prometheus endpoint data like this, the agent needs
> to
> > > split this one named metric into N Hawkular-Metrics metrics (where N
> is the
> > > number of unique label combinations for that named metric). So even
> though
> > > the agent is configured with the one metric
> > > "jvm_memory_pool_bytes_committed" we need to actually create two
> > > Hawkular-Metric metric definitions (with two different and unique
> metric
> > > IDs
> > > obviously).
> > >
> > > The PR [1] that is ready to go does this. By default it will create
> > > multiple
> > > metric definitions/metric IDs in the form
> > > "metric-family-name{labelName1=labelValue1,
> labelName2=labelValue2,...}"
> > > unless you want a different form in which case you can define an "id"
> and
> > > put in "${labelName}" in the ID you declare (such as
> > > "${oneLabelName}_my_own_metric_name_${theOtherLabelName}" or
> whatever). But
> > > I suspect the default format will be what most people want and thus
> nothing
> > > needs to be done. In the above example, two metric definitions with the
> > > following IDs are created:
> > >
> > > 1. jvm_memory_pool_bytes_committed{pool=Code Cache}
> > > 2. jvm_memory_pool_bytes_committed{pool=PS Eden Space}
> > >
> > > --John Mazz
> > >
> > > [1] https://github.com/hawkular/hawkular-openshift-agent/pull/117
> _______________________________________________
> hawkular-dev mailing list
> hawkular-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hawkular-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170217/155ac4d1/attachment-0001.html 


More information about the hawkular-dev mailing list