From jkremser at redhat.com Wed Feb 1 06:29:32 2017 From: jkremser at redhat.com (Jiri Kremser) Date: Wed, 1 Feb 2017 12:29:32 +0100 Subject: [Hawkular-dev] Docker image size does matter Message-ID: Hello, I was looking into google/cadvisor docker image that is only 47 megs large and wondering how we can improve. To some extend it is so small because of the Go lang, but not only. Here are the results: base image with JRE 8 and Alpine linux: 76.8 MB wildfly 10.1.0.Final image 215 MB hawkular-services 320 MB Just for the record, here is status quo: base CentOS image w/ JDK 8: 149 MB wf image: 580 MB hawkular-services image 672 MB All the mini-images are based on Alpine (that itself is based on BusyBox), so the price for it is less convenience when debugging the images. I also removed 9.2M /opt/jboss/wildfly/docs and wanted to remove 9.0M /opt/jboss/wildfly/modules/system/layers/base/org/hibernate 5.1M /opt/jboss/wildfly/modules/system/layers/base/org/apache/lucene 5.6M /opt/jboss/wildfly/modules/system/layers/base/org/apache/cxf but from some reason the h-services fails to start because it didn't found some class from that hibernate module, so I rather put it back. What also helped was squashing all the image layers into 1. This makes the download faster and possibly the image smaller. When applying docker-squash [1] to the current h-services image it saves ~50megs I am aware that this probably wont fly with some RH policy that we should base our SW on Fedora/RHEL base OS images, but I am gonna use them for development and because I often run out of space because of Docker. Oh and I haven't published it on dockerhub yet, but the repo is here [2] jk [1]: https://github.com/goldmann/docker-squash [2]: https://github.com/Jiri-Kremser/hawkular-services-mini-dockerfiles -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170201/73e33cb5/attachment.html From hrupp at redhat.com Wed Feb 1 07:19:34 2017 From: hrupp at redhat.com (Heiko W.Rupp) Date: Wed, 01 Feb 2017 13:19:34 +0100 Subject: [Hawkular-dev] Docker image size does matter In-Reply-To: References: Message-ID: <619CC058-3A1B-4BE0-ABB3-57283418D28A@redhat.com> On 1 Feb 2017, at 12:29, Jiri Kremser wrote: > base image with JRE 8 and Alpine linux: 76.8 MB Yes, alpine is only 3-4 MB that is great. > I also removed > 9.2M /opt/jboss/wildfly/docs Makes sense. > What also helped was squashing all the image layers into 1. This makes > the > download faster and possibly the image smaller. When applying > docker-squash > [1] to the current h-services image it saves ~50megs This is a bit of a false friend as docker pull only transfers layers it does not yet have. E.g $ docker pull pilhuhn/hawkular-services:0.30.0.Final 0.30.0.Final: Pulling from pilhuhn/hawkular-services 08d48e6f1cff: Already exists 664e6a3041e6: Already exists 2f8461e7022b: Already exists 9500f4548bd3: Already exists 69e2e5217a47: Already exists cf95509fd4ad: Downloading [======> ] 10.75 MB/89.61 MB So what you say is true for the first download, but afterwards all the base layers of wf + jdk + ... are present. With stripping into 1 layer there is no chance of caching. Situation of course changes when the base layer is updated > I am aware that this probably wont fly with some RH policy that we > should > base our SW on Fedora/RHEL base OS images, but I am gonna use them for > development and because I often run out of space because of Docker. I like those alpine images and use them for private stuff, but for Hawkular upstream I think we should use something that is close for downstream so minimise the moving parts. From jkremser at redhat.com Wed Feb 1 07:49:05 2017 From: jkremser at redhat.com (Jiri Kremser) Date: Wed, 1 Feb 2017 13:49:05 +0100 Subject: [Hawkular-dev] Docker image size does matter In-Reply-To: <619CC058-3A1B-4BE0-ABB3-57283418D28A@redhat.com> References: <619CC058-3A1B-4BE0-ABB3-57283418D28A@redhat.com> Message-ID: So what you say is true for the first download, but afterwards all the base layers of wf + jdk + ... are present. With stripping into 1 layer there is no chance of caching. Situation of course changes when the base layer is updated. Right, good point, I was optimizing more for the very first download, when people want to try it on their boxes or put the image id in the openshift and spin the h-services as fast as possible. While layers are useful in the long tern or when you have multiple containers based on the same image or the same image with multiple versions. But if squashing everything into one layer can actually make the image smaller, then there must be something strange happening like the 1st layer creating X and the 2nd layer removing X. In our case it's probably the wildfly image creating the standalone wildfly server while the h-services actually replacing it with our wf server + h-services. btw. there was good talk on the devconf about it https://www.youtube.com/watch?v=ZVX8aXJ-hV4 jk On Wed, Feb 1, 2017 at 1:19 PM, Heiko W.Rupp wrote: > On 1 Feb 2017, at 12:29, Jiri Kremser wrote: > > > base image with JRE 8 and Alpine linux: 76.8 MB > > Yes, alpine is only 3-4 MB that is great. > > > I also removed > > 9.2M /opt/jboss/wildfly/docs > > Makes sense. > > > > What also helped was squashing all the image layers into 1. This makes > > the > > download faster and possibly the image smaller. When applying > > docker-squash > > [1] to the current h-services image it saves ~50megs > > This is a bit of a false friend as docker pull only transfers layers it > does not yet have. > > E.g > > $ docker pull pilhuhn/hawkular-services:0.30.0.Final > 0.30.0.Final: Pulling from pilhuhn/hawkular-services > 08d48e6f1cff: Already exists > 664e6a3041e6: Already exists > 2f8461e7022b: Already exists > 9500f4548bd3: Already exists > 69e2e5217a47: Already exists > cf95509fd4ad: Downloading [======> > ] 10.75 MB/89.61 MB > > So what you say is true for the first download, but afterwards all > the base layers of wf + jdk + ... are present. With stripping > into 1 layer there is no chance of caching. > Situation of course changes when the base layer is updated > > > > I am aware that this probably wont fly with some RH policy that we > > should > > base our SW on Fedora/RHEL base OS images, but I am gonna use them for > > development and because I often run out of space because of Docker. > > I like those alpine images and use them for private stuff, > but for Hawkular upstream I think we should use something > that is close for downstream so minimise the moving parts. > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170201/d2bb2940/attachment.html From mazz at redhat.com Wed Feb 1 09:38:52 2017 From: mazz at redhat.com (John Mazzitelli) Date: Wed, 1 Feb 2017 09:38:52 -0500 (EST) Subject: [Hawkular-dev] HOSA and conversion from prometheus to hawkular metrics In-Reply-To: <1513161861.1521790.1485958110722.JavaMail.zimbra@redhat.com> Message-ID: <1971347286.1581541.1485959932755.JavaMail.zimbra@redhat.com> The past several days I've been working on an enhancement to HOSA that came in from the community (in fact, I would consider it a bug). I'm about ready to merge the PR [1] for this and do a HOSA 1.1.0.Final release. I wanted to post this to announce it and see if there is any feedback, too. Today, HOSA collects metrics from any Prometheus endpoint which you declare - example: metrics - name: go_memstats_sys_bytes - name: process_max_fds - name: process_open_fds But if a Prometheus metric has labels, Prometheus itself considers each metric with a unique combination of labels as an individual time series metric. This is different than how Hawkular Metric works - each Hawkular Metric metric ID (even if its metric definition or its datapoints have tags) is a single time series metric. We need to account for this difference. For example, if our agent is configured with: metrics: - name: jvm_memory_pool_bytes_committed And the Prometheus endpoint emits that metric with a label called "pool" like this: jvm_memory_pool_bytes_committed{pool="Code Cache",} 2.7787264E7 jvm_memory_pool_bytes_committed{pool="PS Eden Space",} 2.3068672E7 then to Prometheus this is actually 2 time series metrics (the number of bytes committed per pool type), not 1. Even though the metric name is the same (what Prometheus calls a "metric family name"), there are two unique combinations of labels - one with "Code Cache" and one with "PS Eden Space" - so they are 2 distinct time series metric data. Today, the agent only creates a single Hawkular-Metric in this case, with each datapoint tagged with those Prometheus labels on the appropriate data point. But we don't want to aggregate them like that since we lose the granularity that the Prometheus endpoint gives us (that is, the number of bytes committed in each pool type). I will say I think we might be able to get that granularity back through datapoint tag queries in Hawkular-Metrics but I don't know how well (if at all) that is supported and how efficient such queries would be even if supported, and how efficient storage of these metrics would be if we tag every data point with these labels (not sure if that is the general purpose of tags in H-Metrics). But, regardless, the fact that these really are different time series metrics should (IMO) be represented as different time series metrics (via metric definitions/metric IDs) in Hawkular-Metrics. To support labeled Prometheus endpoint data like this, the agent needs to split this one named metric into N Hawkular-Metrics metrics (where N is the number of unique label combinations for that named metric). So even though the agent is configured with the one metric "jvm_memory_pool_bytes_committed" we need to actually create two Hawkular-Metric metric definitions (with two different and unique metric IDs obviously). The PR [1] that is ready to go does this. By default it will create multiple metric definitions/metric IDs in the form "metric-family-name{labelName1=labelValue1,labelName2=labelValue2,...}" unless you want a different form in which case you can define an "id" and put in "${labelName}" in the ID you declare (such as "${oneLabelName}_my_own_metric_name_${theOtherLabelName}" or whatever). But I suspect the default format will be what most people want and thus nothing needs to be done. In the above example, two metric definitions with the following IDs are created: 1. jvm_memory_pool_bytes_committed{pool=Code Cache} 2. jvm_memory_pool_bytes_committed{pool=PS Eden Space} --John Mazz [1] https://github.com/hawkular/hawkular-openshift-agent/pull/117 From jshaughn at redhat.com Wed Feb 1 10:15:48 2017 From: jshaughn at redhat.com (Jay Shaughnessy) Date: Wed, 1 Feb 2017 10:15:48 -0500 Subject: [Hawkular-dev] HOSA and conversion from prometheus to hawkular metrics In-Reply-To: <1971347286.1581541.1485959932755.JavaMail.zimbra@redhat.com> References: <1971347286.1581541.1485959932755.JavaMail.zimbra@redhat.com> Message-ID: <7b794f4d-749c-1ea9-3ebc-64d17b3c0392@redhat.com> Mazz, this makes sense to me. Our decision to use unique ids (well +type) is going to lead to this sort of thing. The ids are going to basically be large concatenations of the tags that identify the data. Then, additionally we're going to have to tag the metrics with the same name/value pairs that are present in the id. Are you also tagging the Prometheus metrics with the labels? On 2/1/2017 9:38 AM, John Mazzitelli wrote: > The past several days I've been working on an enhancement to HOSA that came in from the community (in fact, I would consider it a bug). I'm about ready to merge the PR [1] for this and do a HOSA 1.1.0.Final release. I wanted to post this to announce it and see if there is any feedback, too. > > Today, HOSA collects metrics from any Prometheus endpoint which you declare - example: > > metrics > - name: go_memstats_sys_bytes > - name: process_max_fds > - name: process_open_fds > > But if a Prometheus metric has labels, Prometheus itself considers each metric with a unique combination of labels as an individual time series metric. This is different than how Hawkular Metric works - each Hawkular Metric metric ID (even if its metric definition or its datapoints have tags) is a single time series metric. We need to account for this difference. For example, if our agent is configured with: > > metrics: > - name: jvm_memory_pool_bytes_committed > > And the Prometheus endpoint emits that metric with a label called "pool" like this: > > jvm_memory_pool_bytes_committed{pool="Code Cache",} 2.7787264E7 > jvm_memory_pool_bytes_committed{pool="PS Eden Space",} 2.3068672E7 > > then to Prometheus this is actually 2 time series metrics (the number of bytes committed per pool type), not 1. Even though the metric name is the same (what Prometheus calls a "metric family name"), there are two unique combinations of labels - one with "Code Cache" and one with "PS Eden Space" - so they are 2 distinct time series metric data. > > Today, the agent only creates a single Hawkular-Metric in this case, with each datapoint tagged with those Prometheus labels on the appropriate data point. But we don't want to aggregate them like that since we lose the granularity that the Prometheus endpoint gives us (that is, the number of bytes committed in each pool type). I will say I think we might be able to get that granularity back through datapoint tag queries in Hawkular-Metrics but I don't know how well (if at all) that is supported and how efficient such queries would be even if supported, and how efficient storage of these metrics would be if we tag every data point with these labels (not sure if that is the general purpose of tags in H-Metrics). But, regardless, the fact that these really are different time series metrics should (IMO) be represented as different time series metrics (via metric definitions/metric IDs) in Hawkular-Metrics. > > To support labeled Prometheus endpoint data like this, the agent needs to split this one named metric into N Hawkular-Metrics metrics (where N is the number of unique label combinations for that named metric). So even though the agent is configured with the one metric "jvm_memory_pool_bytes_committed" we need to actually create two Hawkular-Metric metric definitions (with two different and unique metric IDs obviously). > > The PR [1] that is ready to go does this. By default it will create multiple metric definitions/metric IDs in the form "metric-family-name{labelName1=labelValue1,labelName2=labelValue2,...}" unless you want a different form in which case you can define an "id" and put in "${labelName}" in the ID you declare (such as "${oneLabelName}_my_own_metric_name_${theOtherLabelName}" or whatever). But I suspect the default format will be what most people want and thus nothing needs to be done. In the above example, two metric definitions with the following IDs are created: > > 1. jvm_memory_pool_bytes_committed{pool=Code Cache} > 2. jvm_memory_pool_bytes_committed{pool=PS Eden Space} > > --John Mazz > > [1] https://github.com/hawkular/hawkular-openshift-agent/pull/117 > > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170201/4001f470/attachment.html From mazz at redhat.com Wed Feb 1 10:25:16 2017 From: mazz at redhat.com (John Mazzitelli) Date: Wed, 1 Feb 2017 10:25:16 -0500 (EST) Subject: [Hawkular-dev] HOSA and conversion from prometheus to hawkular metrics In-Reply-To: <7b794f4d-749c-1ea9-3ebc-64d17b3c0392@redhat.com> References: <1971347286.1581541.1485959932755.JavaMail.zimbra@redhat.com> <7b794f4d-749c-1ea9-3ebc-64d17b3c0392@redhat.com> Message-ID: <1466597081.1642029.1485962716407.JavaMail.zimbra@redhat.com> > Are you also tagging the Prometheus metrics with the labels? Yes, that is what was originally being done, and that is still in there. ----- Original Message ----- > > Mazz, this makes sense to me. Our decision to use unique ids (well +type) is > going to lead to this sort of thing. The ids are going to basically be large > concatenations of the tags that identify the data. Then, additionally we're > going to have to tag the metrics with the same name/value pairs that are > present in the id. Are you also tagging the Prometheus metrics with the > labels? > > On 2/1/2017 9:38 AM, John Mazzitelli wrote: > > > > The past several days I've been working on an enhancement to HOSA that came > in from the community (in fact, I would consider it a bug). I'm about ready > to merge the PR [1] for this and do a HOSA 1.1.0.Final release. I wanted to > post this to announce it and see if there is any feedback, too. > > Today, HOSA collects metrics from any Prometheus endpoint which you declare - > example: > > metrics > - name: go_memstats_sys_bytes > - name: process_max_fds > - name: process_open_fds > > But if a Prometheus metric has labels, Prometheus itself considers each > metric with a unique combination of labels as an individual time series > metric. This is different than how Hawkular Metric works - each Hawkular > Metric metric ID (even if its metric definition or its datapoints have tags) > is a single time series metric. We need to account for this difference. For > example, if our agent is configured with: > > metrics: > - name: jvm_memory_pool_bytes_committed > > And the Prometheus endpoint emits that metric with a label called "pool" like > this: > > jvm_memory_pool_bytes_committed{pool="Code Cache",} 2.7787264E7 > jvm_memory_pool_bytes_committed{pool="PS Eden Space",} 2.3068672E7 > > then to Prometheus this is actually 2 time series metrics (the number of > bytes committed per pool type), not 1. Even though the metric name is the > same (what Prometheus calls a "metric family name"), there are two unique > combinations of labels - one with "Code Cache" and one with "PS Eden Space" > - so they are 2 distinct time series metric data. > > Today, the agent only creates a single Hawkular-Metric in this case, with > each datapoint tagged with those Prometheus labels on the appropriate data > point. But we don't want to aggregate them like that since we lose the > granularity that the Prometheus endpoint gives us (that is, the number of > bytes committed in each pool type). I will say I think we might be able to > get that granularity back through datapoint tag queries in Hawkular-Metrics > but I don't know how well (if at all) that is supported and how efficient > such queries would be even if supported, and how efficient storage of these > metrics would be if we tag every data point with these labels (not sure if > that is the general purpose of tags in H-Metrics). But, regardless, the fact > that these really are different time series metrics should (IMO) be > represented as different time series metrics (via metric definitions/metric > IDs) in Hawkular-Metrics. > > To support labeled Prometheus endpoint data like this, the agent needs to > split this one named metric into N Hawkular-Metrics metrics (where N is the > number of unique label combinations for that named metric). So even though > the agent is configured with the one metric > "jvm_memory_pool_bytes_committed" we need to actually create two > Hawkular-Metric metric definitions (with two different and unique metric IDs > obviously). > > The PR [1] that is ready to go does this. By default it will create multiple > metric definitions/metric IDs in the form > "metric-family-name{labelName1=labelValue1,labelName2=labelValue2,...}" > unless you want a different form in which case you can define an "id" and > put in "${labelName}" in the ID you declare (such as > "${oneLabelName}_my_own_metric_name_${theOtherLabelName}" or whatever). But > I suspect the default format will be what most people want and thus nothing > needs to be done. In the above example, two metric definitions with the > following IDs are created: > > 1. jvm_memory_pool_bytes_committed{pool=Code Cache} > 2. jvm_memory_pool_bytes_committed{pool=PS Eden Space} > > --John Mazz > > [1] https://github.com/hawkular/hawkular-openshift-agent/pull/117 > _______________________________________________ > hawkular-dev mailing list hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > > > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > From jtakvori at redhat.com Wed Feb 1 10:35:17 2017 From: jtakvori at redhat.com (Joel Takvorian) Date: Wed, 1 Feb 2017 16:35:17 +0100 Subject: [Hawkular-dev] HOSA and conversion from prometheus to hawkular metrics In-Reply-To: <1466597081.1642029.1485962716407.JavaMail.zimbra@redhat.com> References: <1971347286.1581541.1485959932755.JavaMail.zimbra@redhat.com> <7b794f4d-749c-1ea9-3ebc-64d17b3c0392@redhat.com> <1466597081.1642029.1485962716407.JavaMail.zimbra@redhat.com> Message-ID: +1 Conversion based on labels seems more sane. I wonder if a new tag that recalls the prometheus metric name would be useful; ex. "baseName=jvm_memory_pool_bytes_committed", to retrieve all metrics of that family. Just an idea. On Wed, Feb 1, 2017 at 4:25 PM, John Mazzitelli wrote: > > Are you also tagging the Prometheus metrics with the labels? > > Yes, that is what was originally being done, and that is still in there. > > ----- Original Message ----- > > > > Mazz, this makes sense to me. Our decision to use unique ids (well > +type) is > > going to lead to this sort of thing. The ids are going to basically be > large > > concatenations of the tags that identify the data. Then, additionally > we're > > going to have to tag the metrics with the same name/value pairs that are > > present in the id. Are you also tagging the Prometheus metrics with the > > labels? > > > > On 2/1/2017 9:38 AM, John Mazzitelli wrote: > > > > > > > > The past several days I've been working on an enhancement to HOSA that > came > > in from the community (in fact, I would consider it a bug). I'm about > ready > > to merge the PR [1] for this and do a HOSA 1.1.0.Final release. I wanted > to > > post this to announce it and see if there is any feedback, too. > > > > Today, HOSA collects metrics from any Prometheus endpoint which you > declare - > > example: > > > > metrics > > - name: go_memstats_sys_bytes > > - name: process_max_fds > > - name: process_open_fds > > > > But if a Prometheus metric has labels, Prometheus itself considers each > > metric with a unique combination of labels as an individual time series > > metric. This is different than how Hawkular Metric works - each Hawkular > > Metric metric ID (even if its metric definition or its datapoints have > tags) > > is a single time series metric. We need to account for this difference. > For > > example, if our agent is configured with: > > > > metrics: > > - name: jvm_memory_pool_bytes_committed > > > > And the Prometheus endpoint emits that metric with a label called "pool" > like > > this: > > > > jvm_memory_pool_bytes_committed{pool="Code Cache",} 2.7787264E7 > > jvm_memory_pool_bytes_committed{pool="PS Eden Space",} 2.3068672E7 > > > > then to Prometheus this is actually 2 time series metrics (the number of > > bytes committed per pool type), not 1. Even though the metric name is the > > same (what Prometheus calls a "metric family name"), there are two unique > > combinations of labels - one with "Code Cache" and one with "PS Eden > Space" > > - so they are 2 distinct time series metric data. > > > > Today, the agent only creates a single Hawkular-Metric in this case, with > > each datapoint tagged with those Prometheus labels on the appropriate > data > > point. But we don't want to aggregate them like that since we lose the > > granularity that the Prometheus endpoint gives us (that is, the number of > > bytes committed in each pool type). I will say I think we might be able > to > > get that granularity back through datapoint tag queries in > Hawkular-Metrics > > but I don't know how well (if at all) that is supported and how efficient > > such queries would be even if supported, and how efficient storage of > these > > metrics would be if we tag every data point with these labels (not sure > if > > that is the general purpose of tags in H-Metrics). But, regardless, the > fact > > that these really are different time series metrics should (IMO) be > > represented as different time series metrics (via metric > definitions/metric > > IDs) in Hawkular-Metrics. > > > > To support labeled Prometheus endpoint data like this, the agent needs to > > split this one named metric into N Hawkular-Metrics metrics (where N is > the > > number of unique label combinations for that named metric). So even > though > > the agent is configured with the one metric > > "jvm_memory_pool_bytes_committed" we need to actually create two > > Hawkular-Metric metric definitions (with two different and unique metric > IDs > > obviously). > > > > The PR [1] that is ready to go does this. By default it will create > multiple > > metric definitions/metric IDs in the form > > "metric-family-name{labelName1=labelValue1,labelName2=labelValue2,...}" > > unless you want a different form in which case you can define an "id" and > > put in "${labelName}" in the ID you declare (such as > > "${oneLabelName}_my_own_metric_name_${theOtherLabelName}" or whatever). > But > > I suspect the default format will be what most people want and thus > nothing > > needs to be done. In the above example, two metric definitions with the > > following IDs are created: > > > > 1. jvm_memory_pool_bytes_committed{pool=Code Cache} > > 2. jvm_memory_pool_bytes_committed{pool=PS Eden Space} > > > > --John Mazz > > > > [1] https://github.com/hawkular/hawkular-openshift-agent/pull/117 > > _______________________________________________ > > hawkular-dev mailing list hawkular-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/hawkular-dev > > > > > > _______________________________________________ > > hawkular-dev mailing list > > hawkular-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/hawkular-dev > > > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170201/0ae2141a/attachment-0001.html From jshaughn at redhat.com Wed Feb 1 11:40:31 2017 From: jshaughn at redhat.com (Jay Shaughnessy) Date: Wed, 1 Feb 2017 11:40:31 -0500 Subject: [Hawkular-dev] HOSA and conversion from prometheus to hawkular metrics In-Reply-To: References: <1971347286.1581541.1485959932755.JavaMail.zimbra@redhat.com> <7b794f4d-749c-1ea9-3ebc-64d17b3c0392@redhat.com> <1466597081.1642029.1485962716407.JavaMail.zimbra@redhat.com> Message-ID: <141247e6-4e22-e0ea-6634-c60deaa4a4c7@redhat.com> +1, if that is not being done I think it would good. Actually, it's probably a good "best practice" as it make it easier to slice and dice the data. On 2/1/2017 10:35 AM, Joel Takvorian wrote: > +1 > > Conversion based on labels seems more sane. > > I wonder if a new tag that recalls the prometheus metric name would be > useful; ex. "baseName=jvm_memory_pool_bytes_committed", to retrieve > all metrics of that family. Just an idea. > > On Wed, Feb 1, 2017 at 4:25 PM, John Mazzitelli > wrote: > > > Are you also tagging the Prometheus metrics with the labels? > > Yes, that is what was originally being done, and that is still in > there. > > ----- Original Message ----- > > > > Mazz, this makes sense to me. Our decision to use unique ids > (well +type) is > > going to lead to this sort of thing. The ids are going to > basically be large > > concatenations of the tags that identify the data. Then, > additionally we're > > going to have to tag the metrics with the same name/value pairs > that are > > present in the id. Are you also tagging the Prometheus metrics > with the > > labels? > > > > On 2/1/2017 9:38 AM, John Mazzitelli wrote: > > > > > > > > The past several days I've been working on an enhancement to > HOSA that came > > in from the community (in fact, I would consider it a bug). I'm > about ready > > to merge the PR [1] for this and do a HOSA 1.1.0.Final release. > I wanted to > > post this to announce it and see if there is any feedback, too. > > > > Today, HOSA collects metrics from any Prometheus endpoint which > you declare - > > example: > > > > metrics > > - name: go_memstats_sys_bytes > > - name: process_max_fds > > - name: process_open_fds > > > > But if a Prometheus metric has labels, Prometheus itself > considers each > > metric with a unique combination of labels as an individual time > series > > metric. This is different than how Hawkular Metric works - each > Hawkular > > Metric metric ID (even if its metric definition or its > datapoints have tags) > > is a single time series metric. We need to account for this > difference. For > > example, if our agent is configured with: > > > > metrics: > > - name: jvm_memory_pool_bytes_committed > > > > And the Prometheus endpoint emits that metric with a label > called "pool" like > > this: > > > > jvm_memory_pool_bytes_committed{pool="Code Cache",} 2.7787264E7 > > jvm_memory_pool_bytes_committed{pool="PS Eden Space",} > 2.3068672E7 > > > > then to Prometheus this is actually 2 time series metrics (the > number of > > bytes committed per pool type), not 1. Even though the metric > name is the > > same (what Prometheus calls a "metric family name"), there are > two unique > > combinations of labels - one with "Code Cache" and one with "PS > Eden Space" > > - so they are 2 distinct time series metric data. > > > > Today, the agent only creates a single Hawkular-Metric in this > case, with > > each datapoint tagged with those Prometheus labels on the > appropriate data > > point. But we don't want to aggregate them like that since we > lose the > > granularity that the Prometheus endpoint gives us (that is, the > number of > > bytes committed in each pool type). I will say I think we might > be able to > > get that granularity back through datapoint tag queries in > Hawkular-Metrics > > but I don't know how well (if at all) that is supported and how > efficient > > such queries would be even if supported, and how efficient > storage of these > > metrics would be if we tag every data point with these labels > (not sure if > > that is the general purpose of tags in H-Metrics). But, > regardless, the fact > > that these really are different time series metrics should (IMO) be > > represented as different time series metrics (via metric > definitions/metric > > IDs) in Hawkular-Metrics. > > > > To support labeled Prometheus endpoint data like this, the agent > needs to > > split this one named metric into N Hawkular-Metrics metrics > (where N is the > > number of unique label combinations for that named metric). So > even though > > the agent is configured with the one metric > > "jvm_memory_pool_bytes_committed" we need to actually create two > > Hawkular-Metric metric definitions (with two different and > unique metric IDs > > obviously). > > > > The PR [1] that is ready to go does this. By default it will > create multiple > > metric definitions/metric IDs in the form > > > "metric-family-name{labelName1=labelValue1,labelName2=labelValue2,...}" > > unless you want a different form in which case you can define an > "id" and > > put in "${labelName}" in the ID you declare (such as > > "${oneLabelName}_my_own_metric_name_${theOtherLabelName}" or > whatever). But > > I suspect the default format will be what most people want and > thus nothing > > needs to be done. In the above example, two metric definitions > with the > > following IDs are created: > > > > 1. jvm_memory_pool_bytes_committed{pool=Code Cache} > > 2. jvm_memory_pool_bytes_committed{pool=PS Eden Space} > > > > --John Mazz > > > > [1] > https://github.com/hawkular/hawkular-openshift-agent/pull/117 > > > _______________________________________________ > > hawkular-dev mailing list hawkular-dev at lists.jboss.org > > > https://lists.jboss.org/mailman/listinfo/hawkular-dev > > > > > > > _______________________________________________ > > hawkular-dev mailing list > > hawkular-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/hawkular-dev > > > > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > > > > > > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170201/8482d884/attachment.html From mazz at redhat.com Wed Feb 1 11:47:18 2017 From: mazz at redhat.com (John Mazzitelli) Date: Wed, 1 Feb 2017 11:47:18 -0500 (EST) Subject: [Hawkular-dev] HOSA and conversion from prometheus to hawkular metrics In-Reply-To: <141247e6-4e22-e0ea-6634-c60deaa4a4c7@redhat.com> References: <1971347286.1581541.1485959932755.JavaMail.zimbra@redhat.com> <7b794f4d-749c-1ea9-3ebc-64d17b3c0392@redhat.com> <1466597081.1642029.1485962716407.JavaMail.zimbra@redhat.com> <141247e6-4e22-e0ea-6634-c60deaa4a4c7@redhat.com> Message-ID: <1012981008.1719259.1485967638574.JavaMail.zimbra@redhat.com> https://github.com/hawkular/hawkular-openshift-agent/blob/master/deploy/openshift/hawkular-openshift-agent-configmap.yaml#L20 :D That's already there - the ${METRIC:name} resolves to the name of the metric (not the new ID) and our default config puts that tag on every metric. ----- Original Message ----- > > +1, if that is not being done I think it would good. Actually, it's probably > a good "best practice" as it make it easier to slice and dice the data. > > On 2/1/2017 10:35 AM, Joel Takvorian wrote: > > > > +1 > > Conversion based on labels seems more sane. > > I wonder if a new tag that recalls the prometheus metric name would be > useful; ex. "baseName=jvm_memory_pool_bytes_committed", to retrieve all > metrics of that family. Just an idea. > > On Wed, Feb 1, 2017 at 4:25 PM, John Mazzitelli < mazz at redhat.com > wrote: > > > > Are you also tagging the Prometheus metrics with the labels? > > Yes, that is what was originally being done, and that is still in there. > > ----- Original Message ----- > > > > Mazz, this makes sense to me. Our decision to use unique ids (well +type) > > is > > going to lead to this sort of thing. The ids are going to basically be > > large > > concatenations of the tags that identify the data. Then, additionally we're > > going to have to tag the metrics with the same name/value pairs that are > > present in the id. Are you also tagging the Prometheus metrics with the > > labels? > > > > On 2/1/2017 9:38 AM, John Mazzitelli wrote: > > > > > > > > The past several days I've been working on an enhancement to HOSA that came > > in from the community (in fact, I would consider it a bug). I'm about ready > > to merge the PR [1] for this and do a HOSA 1.1.0.Final release. I wanted to > > post this to announce it and see if there is any feedback, too. > > > > Today, HOSA collects metrics from any Prometheus endpoint which you declare > > - > > example: > > > > metrics > > - name: go_memstats_sys_bytes > > - name: process_max_fds > > - name: process_open_fds > > > > But if a Prometheus metric has labels, Prometheus itself considers each > > metric with a unique combination of labels as an individual time series > > metric. This is different than how Hawkular Metric works - each Hawkular > > Metric metric ID (even if its metric definition or its datapoints have > > tags) > > is a single time series metric. We need to account for this difference. For > > example, if our agent is configured with: > > > > metrics: > > - name: jvm_memory_pool_bytes_committed > > > > And the Prometheus endpoint emits that metric with a label called "pool" > > like > > this: > > > > jvm_memory_pool_bytes_committed{pool="Code Cache",} 2.7787264E7 > > jvm_memory_pool_bytes_committed{pool="PS Eden Space",} 2.3068672E7 > > > > then to Prometheus this is actually 2 time series metrics (the number of > > bytes committed per pool type), not 1. Even though the metric name is the > > same (what Prometheus calls a "metric family name"), there are two unique > > combinations of labels - one with "Code Cache" and one with "PS Eden Space" > > - so they are 2 distinct time series metric data. > > > > Today, the agent only creates a single Hawkular-Metric in this case, with > > each datapoint tagged with those Prometheus labels on the appropriate data > > point. But we don't want to aggregate them like that since we lose the > > granularity that the Prometheus endpoint gives us (that is, the number of > > bytes committed in each pool type). I will say I think we might be able to > > get that granularity back through datapoint tag queries in Hawkular-Metrics > > but I don't know how well (if at all) that is supported and how efficient > > such queries would be even if supported, and how efficient storage of these > > metrics would be if we tag every data point with these labels (not sure if > > that is the general purpose of tags in H-Metrics). But, regardless, the > > fact > > that these really are different time series metrics should (IMO) be > > represented as different time series metrics (via metric definitions/metric > > IDs) in Hawkular-Metrics. > > > > To support labeled Prometheus endpoint data like this, the agent needs to > > split this one named metric into N Hawkular-Metrics metrics (where N is the > > number of unique label combinations for that named metric). So even though > > the agent is configured with the one metric > > "jvm_memory_pool_bytes_committed" we need to actually create two > > Hawkular-Metric metric definitions (with two different and unique metric > > IDs > > obviously). > > > > The PR [1] that is ready to go does this. By default it will create > > multiple > > metric definitions/metric IDs in the form > > "metric-family-name{labelName1=labelValue1,labelName2=labelValue2,...}" > > unless you want a different form in which case you can define an "id" and > > put in "${labelName}" in the ID you declare (such as > > "${oneLabelName}_my_own_metric_name_${theOtherLabelName}" or whatever). But > > I suspect the default format will be what most people want and thus nothing > > needs to be done. In the above example, two metric definitions with the > > following IDs are created: > > > > 1. jvm_memory_pool_bytes_committed{pool=Code Cache} > > 2. jvm_memory_pool_bytes_committed{pool=PS Eden Space} > > > > --John Mazz > > > > [1] https://github.com/hawkular/hawkular-openshift-agent/pull/117 > > _______________________________________________ > > hawkular-dev mailing list hawkular-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/hawkular-dev > > > > > > _______________________________________________ > > hawkular-dev mailing list > > hawkular-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/hawkular-dev > > > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > > > > _______________________________________________ > hawkular-dev mailing list hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > > > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > From mwringe at redhat.com Thu Feb 2 08:38:45 2017 From: mwringe at redhat.com (Matt Wringe) Date: Thu, 2 Feb 2017 08:38:45 -0500 (EST) Subject: [Hawkular-dev] Docker image size does matter In-Reply-To: <619CC058-3A1B-4BE0-ABB3-57283418D28A@redhat.com> References: <619CC058-3A1B-4BE0-ABB3-57283418D28A@redhat.com> Message-ID: <107538713.26228021.1486042725468.JavaMail.zimbra@redhat.com> ----- Original Message ----- > From: "Heiko W.Rupp" > To: "Discussions around Hawkular development" > Sent: Wednesday, 1 February, 2017 7:19:34 AM > Subject: Re: [Hawkular-dev] Docker image size does matter > > On 1 Feb 2017, at 12:29, Jiri Kremser wrote: > > > base image with JRE 8 and Alpine linux: 76.8 MB > > Yes, alpine is only 3-4 MB that is great. Please also take into consideration that minimal docker images do not contain common commands and can be a pain if you need to go into the container and run commands. Trying to debug issues when you don't have access to basic commands can be a bit frustrating. Especially if you are not root and cannot just install binaries in the container. > > I also removed > > 9.2M /opt/jboss/wildfly/docs > > Makes sense. > > > > What also helped was squashing all the image layers into 1. This makes > > the > > download faster and possibly the image smaller. When applying > > docker-squash > > [1] to the current h-services image it saves ~50megs > > This is a bit of a false friend as docker pull only transfers layers it > does not yet have. > > E.g > > $ docker pull pilhuhn/hawkular-services:0.30.0.Final > 0.30.0.Final: Pulling from pilhuhn/hawkular-services > 08d48e6f1cff: Already exists > 664e6a3041e6: Already exists > 2f8461e7022b: Already exists > 9500f4548bd3: Already exists > 69e2e5217a47: Already exists > cf95509fd4ad: Downloading [======> > ] 10.75 MB/89.61 MB > > So what you say is true for the first download, but afterwards all > the base layers of wf + jdk + ... are present. With stripping > into 1 layer there is no chance of caching. > Situation of course changes when the base layer is updated > > > > I am aware that this probably wont fly with some RH policy that we > > should > > base our SW on Fedora/RHEL base OS images, but I am gonna use them for > > development and because I often run out of space because of Docker. > > I like those alpine images and use them for private stuff, > but for Hawkular upstream I think we should use something > that is close for downstream so minimise the moving parts. > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > From garethahealy at gmail.com Thu Feb 2 08:47:38 2017 From: garethahealy at gmail.com (Gareth Healy) Date: Thu, 2 Feb 2017 13:47:38 +0000 Subject: [Hawkular-dev] Docker image size does matter In-Reply-To: <107538713.26228021.1486042725468.JavaMail.zimbra@redhat.com> References: <619CC058-3A1B-4BE0-ABB3-57283418D28A@redhat.com> <107538713.26228021.1486042725468.JavaMail.zimbra@redhat.com> Message-ID: >From experience in the field, i've not had any customers complain / comment on the size of images as of yet - been on OpenShift engagements for over a year now. On Thu, Feb 2, 2017 at 1:38 PM, Matt Wringe wrote: > ----- Original Message ----- > > From: "Heiko W.Rupp" > > To: "Discussions around Hawkular development" < > hawkular-dev at lists.jboss.org> > > Sent: Wednesday, 1 February, 2017 7:19:34 AM > > Subject: Re: [Hawkular-dev] Docker image size does matter > > > > On 1 Feb 2017, at 12:29, Jiri Kremser wrote: > > > > > base image with JRE 8 and Alpine linux: 76.8 MB > > > > Yes, alpine is only 3-4 MB that is great. > > Please also take into consideration that minimal docker images do not > contain common commands and can be a pain if you need to go into the > container and run commands. > > Trying to debug issues when you don't have access to basic commands can be > a bit frustrating. Especially if you are not root and cannot just install > binaries in the container. > > > > I also removed > > > 9.2M /opt/jboss/wildfly/docs > > > > Makes sense. > > > > > > > What also helped was squashing all the image layers into 1. This makes > > > the > > > download faster and possibly the image smaller. When applying > > > docker-squash > > > [1] to the current h-services image it saves ~50megs > > > > This is a bit of a false friend as docker pull only transfers layers it > > does not yet have. > > > > E.g > > > > $ docker pull pilhuhn/hawkular-services:0.30.0.Final > > 0.30.0.Final: Pulling from pilhuhn/hawkular-services > > 08d48e6f1cff: Already exists > > 664e6a3041e6: Already exists > > 2f8461e7022b: Already exists > > 9500f4548bd3: Already exists > > 69e2e5217a47: Already exists > > cf95509fd4ad: Downloading [======> > > ] 10.75 MB/89.61 MB > > > > So what you say is true for the first download, but afterwards all > > the base layers of wf + jdk + ... are present. With stripping > > into 1 layer there is no chance of caching. > > Situation of course changes when the base layer is updated > > > > > > > I am aware that this probably wont fly with some RH policy that we > > > should > > > base our SW on Fedora/RHEL base OS images, but I am gonna use them for > > > development and because I often run out of space because of Docker. > > > > I like those alpine images and use them for private stuff, > > but for Hawkular upstream I think we should use something > > that is close for downstream so minimise the moving parts. > > _______________________________________________ > > hawkular-dev mailing list > > hawkular-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/hawkular-dev > > > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170202/bed14667/attachment.html From neil.okamoto+hawkular at gmail.com Thu Feb 2 20:31:07 2017 From: neil.okamoto+hawkular at gmail.com (Neil Okamoto) Date: Thu, 2 Feb 2017 17:31:07 -0800 Subject: [Hawkular-dev] Hawkular APM and instrumenting clojure Message-ID: As an experiment I'm instrumenting a service written in clojure using opentracing-java. Through the clojure/java interop I've mostly succeeded in getting trace information reported through to the Hawkular APM server. I say "mostly succeeded" because sooner or later in every one of my hacking sessions I get to the point where the spans I am creating in the app are no longer reported in the web ui. For convenience I'm using the Hawkular dev docker image . In my test app I'm doing nothing more than initializing an APMTracer with the appropriate environment variables set, and then calling buildSpan("foo"), withTag("sampling.priority", 1), start(), sleep for a while, and then finish(). Where all of the previous was done in clojure, but I'm talking in pseudocode here just to make the intent clear. So like I said, sometimes these traces are reported, other times they seem to be silently dropped. I can't detect any consistent pattern how or why this happens... (1) Is using a "sampling.priority" of 1 merely advisory? It would explain everything if those traces are meant to be dropped. (2) Is there any convenient way I can see, with increased logging or something, which traces are actually being sent from the client, and which are actually received by the server? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170202/57cf47d5/attachment.html From gbrown at redhat.com Fri Feb 3 04:54:11 2017 From: gbrown at redhat.com (Gary Brown) Date: Fri, 3 Feb 2017 04:54:11 -0500 (EST) Subject: [Hawkular-dev] Hawkular APM and instrumenting clojure In-Reply-To: References: Message-ID: <1079364124.23287018.1486115651499.JavaMail.zimbra@redhat.com> Hi Neil ----- Original Message ----- > > As an experiment I'm instrumenting a service written in clojure using > opentracing-java. Through the clojure/java interop I've mostly succeeded in > getting trace information reported through to the Hawkular APM server. > > I say "mostly succeeded" because sooner or later in every one of my hacking > sessions I get to the point where the spans I am creating in the app are no > longer reported in the web ui. > > For convenience I'm using the Hawkular dev docker image . In my test app I'm > doing nothing more than initializing an APMTracer with the appropriate > environment variables set, and then calling buildSpan("foo"), > withTag("sampling.priority", 1), start(), sleep for a while, and then > finish(). Where all of the previous was done in clojure, but I'm talking in > pseudocode here just to make the intent clear. > > So like I said, sometimes these traces are reported, other times they seem to > be silently dropped. I can't detect any consistent pattern how or why this > happens... > > (1) Is using a "sampling.priority" of 1 merely advisory? It would explain > everything if those traces are meant to be dropped. If using the default constructor for APMTracer, then the default behaviour should be to trace all - and setting the sampling.priority to 1 should not override that. Could you try not setting this tag to see if there is any difference? > > (2) Is there any convenient way I can see, with increased logging or > something, which traces are actually being sent from the client, and which > are actually received by the server? You could initially check the traces stored in Elasticsearch using something like: curl http://localhost:9200/apm-hawkular/trace/_search | python -m json.tool Do you have a pure Java example that reproduces the same issue? Might be worth creating a jira in https://issues.jboss.org/projects/HWKAPM to track the issue. Regards Gary > > > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > From neil.okamoto+hawkular at gmail.com Fri Feb 3 09:39:05 2017 From: neil.okamoto+hawkular at gmail.com (Neil Okamoto) Date: Fri, 3 Feb 2017 06:39:05 -0800 Subject: [Hawkular-dev] Hawkular APM and instrumenting clojure In-Reply-To: <1079364124.23287018.1486115651499.JavaMail.zimbra@redhat.com> References: <1079364124.23287018.1486115651499.JavaMail.zimbra@redhat.com> Message-ID: Thanks Gary. On Fri, Feb 3, 2017 at 1:54 AM, Gary Brown wrote: > > > (1) Is using a "sampling.priority" of 1 merely advisory? It would explain > > everything if those traces are meant to be dropped. > > If using the default constructor for APMTracer, then the default behaviour > should be to trace all - and setting the sampling.priority to 1 should not > override that. Could you try not setting this tag to see if there is any > difference? > I see. Well, I am using the default constructor, and I have tried with and without sampling.priority=1 and it's the same situation either way. > (2) Is there any convenient way I can see, with increased logging or > > something, which traces are actually being sent from the client, and > which > > are actually received by the server? > > You could initially check the traces stored in Elasticsearch using > something like: curl http://localhost:9200/apm-hawkular/trace/_search | > python -m json.tool > Right now I have a repl launched with HAWKULAR_APM_LOG_LEVEL set to FINEST. I'm creating spans in the repl as described earlier. Each time I create a trace I see a log entry from the client like this: FINEST: [TracePublisherRESTClient] [Thread[pool-2-thread-1,5,main]] Status code is: 204 and that 204 would suggest the trace info was successfully sent. But inside the docker container I can curl Elasticsearch and those new traces are not to be found. Incidentally, I started the repl last night, did a few successful tests, and then closed the lid of my laptop for the night with the Hawkular container still running and the repl still running. I've also had this issue occur immediately on launch of the repl, so I don't think it's specifically about long running repls and/or sleeping, but for completeness I thought I would clarify how I am running this. > Do you have a pure Java example that reproduces the same issue? Might be worth creating a jira in https://issues.jboss.org/projects/HWKAPM to track the issue. No, not yet... -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170203/ff6056c4/attachment.html From gbrown at redhat.com Fri Feb 3 10:15:11 2017 From: gbrown at redhat.com (Gary Brown) Date: Fri, 3 Feb 2017 10:15:11 -0500 (EST) Subject: [Hawkular-dev] Hawkular APM and instrumenting clojure In-Reply-To: References: <1079364124.23287018.1486115651499.JavaMail.zimbra@redhat.com> Message-ID: <611252231.23367861.1486134911732.JavaMail.zimbra@redhat.com> Hi Neil Sounds strange. Would it be possible to try running the server outside docker to see if there may be issues there. If you create the jira with reproducer then we will investigate aswell. Thanks for the additional info. Regards Gary ----- Original Message ----- > Thanks Gary. > > On Fri, Feb 3, 2017 at 1:54 AM, Gary Brown < gbrown at redhat.com > wrote: > > > > > (1) Is using a "sampling.priority" of 1 merely advisory? It would explain > > everything if those traces are meant to be dropped. > > If using the default constructor for APMTracer, then the default behaviour > should be to trace all - and setting the sampling.priority to 1 should not > override that. Could you try not setting this tag to see if there is any > difference? > > I see. Well, I am using the default constructor, and I have tried with and > without sampling.priority=1 and it's the same situation either way. > > > > > (2) Is there any convenient way I can see, with increased logging or > > something, which traces are actually being sent from the client, and which > > are actually received by the server? > > You could initially check the traces stored in Elasticsearch using something > like: curl http://localhost:9200/apm-hawkular/trace/_search | python -m > json.tool > > Right now I have a repl launched with HAWKULAR_APM_LOG_LEVEL set to FINEST. > I'm creating spans in the repl as described earlier. Each time I create a > trace I see a log entry from the client like this: > > FINEST: [TracePublisherRESTClient] [Thread[pool-2-thread-1,5,main]] Status > code is: 204 > > and that 204 would suggest the trace info was successfully sent. But inside > the docker container I can curl Elasticsearch and those new traces are not > to be found. > > Incidentally, I started the repl last night, did a few successful tests, and > then closed the lid of my laptop for the night with the Hawkular container > still running and the repl still running. I've also had this issue occur > immediately on launch of the repl, so I don't think it's specifically about > long running repls and/or sleeping, but for completeness I thought I would > clarify how I am running this. > > > Do you have a pure Java example that reproduces the same issue? Might be > > worth creating a jira in https://issues.jboss.org/projects/HWKAPM to track > > the issue. > > No, not yet... > > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > From neil.okamoto at gmail.com Fri Feb 3 10:53:49 2017 From: neil.okamoto at gmail.com (Neil Okamoto) Date: Fri, 3 Feb 2017 07:53:49 -0800 Subject: [Hawkular-dev] Hawkular APM and instrumenting clojure In-Reply-To: <611252231.23367861.1486134911732.JavaMail.zimbra@redhat.com> References: <1079364124.23287018.1486115651499.JavaMail.zimbra@redhat.com> <611252231.23367861.1486134911732.JavaMail.zimbra@redhat.com> Message-ID: <3415515B-A08F-4C33-8D6A-3A8966531EC6@gmail.com> Thanks Gary. I'll try running the server outside docker, but before I do that I'm going to run the container on a machine with more memory. > On Feb 3, 2017, at 7:15 AM, Gary Brown wrote: > > Hi Neil > > Sounds strange. Would it be possible to try running the server outside docker to see if there may be issues there. > > If you create the jira with reproducer then we will investigate aswell. > > Thanks for the additional info. > > Regards > Gary > > ----- Original Message ----- >> Thanks Gary. >> >> On Fri, Feb 3, 2017 at 1:54 AM, Gary Brown < gbrown at redhat.com > wrote: >> >> >> >>> (1) Is using a "sampling.priority" of 1 merely advisory? It would explain >>> everything if those traces are meant to be dropped. >> >> If using the default constructor for APMTracer, then the default behaviour >> should be to trace all - and setting the sampling.priority to 1 should not >> override that. Could you try not setting this tag to see if there is any >> difference? >> >> I see. Well, I am using the default constructor, and I have tried with and >> without sampling.priority=1 and it's the same situation either way. >> >> >> >>> (2) Is there any convenient way I can see, with increased logging or >>> something, which traces are actually being sent from the client, and which >>> are actually received by the server? >> >> You could initially check the traces stored in Elasticsearch using something >> like: curl http://localhost:9200/apm-hawkular/trace/_search | python -m >> json.tool >> >> Right now I have a repl launched with HAWKULAR_APM_LOG_LEVEL set to FINEST. >> I'm creating spans in the repl as described earlier. Each time I create a >> trace I see a log entry from the client like this: >> >> FINEST: [TracePublisherRESTClient] [Thread[pool-2-thread-1,5,main]] Status >> code is: 204 >> >> and that 204 would suggest the trace info was successfully sent. But inside >> the docker container I can curl Elasticsearch and those new traces are not >> to be found. >> >> Incidentally, I started the repl last night, did a few successful tests, and >> then closed the lid of my laptop for the night with the Hawkular container >> still running and the repl still running. I've also had this issue occur >> immediately on launch of the repl, so I don't think it's specifically about >> long running repls and/or sleeping, but for completeness I thought I would >> clarify how I am running this. >> >>> Do you have a pure Java example that reproduces the same issue? Might be >>> worth creating a jira in https://issues.jboss.org/projects/HWKAPM to track >>> the issue. >> >> No, not yet... >> >> _______________________________________________ >> hawkular-dev mailing list >> hawkular-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/hawkular-dev >> > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev From neil.okamoto+hawkular at gmail.com Fri Feb 3 21:53:43 2017 From: neil.okamoto+hawkular at gmail.com (Neil Okamoto) Date: Fri, 3 Feb 2017 18:53:43 -0800 Subject: [Hawkular-dev] Hawkular APM and instrumenting clojure Message-ID: Since this morning I've had a server running inside docker on a separate machine with more installed memory. I haven't seen any problems since then. In retrospect I wish I had thought of this sooner. For now I'm moving on from this problem to do a more complete instrumentation of the clojure app. I'll keep an eye open for further problems and I'll report back if there's anything noteworthy. thanks Gary, Neil On Fri, Feb 3, 2017 at 7:53 AM, Neil Okamoto wrote: > Thanks Gary. I'll try running the server outside docker, but before I do > that I'm going to run the container on a machine with more memory. > > > On Feb 3, 2017, at 7:15 AM, Gary Brown wrote: > > > > Hi Neil > > > > Sounds strange. Would it be possible to try running the server outside > docker to see if there may be issues there. > > > > If you create the jira with reproducer then we will investigate aswell. > > > > Thanks for the additional info. > > > > Regards > > Gary > > > > ----- Original Message ----- > >> Thanks Gary. > >> > >> On Fri, Feb 3, 2017 at 1:54 AM, Gary Brown < gbrown at redhat.com > wrote: > >> > >> > >> > >>> (1) Is using a "sampling.priority" of 1 merely advisory? It would > explain > >>> everything if those traces are meant to be dropped. > >> > >> If using the default constructor for APMTracer, then the default > behaviour > >> should be to trace all - and setting the sampling.priority to 1 should > not > >> override that. Could you try not setting this tag to see if there is any > >> difference? > >> > >> I see. Well, I am using the default constructor, and I have tried with > and > >> without sampling.priority=1 and it's the same situation either way. > >> > >> > >> > >>> (2) Is there any convenient way I can see, with increased logging or > >>> something, which traces are actually being sent from the client, and > which > >>> are actually received by the server? > >> > >> You could initially check the traces stored in Elasticsearch using > something > >> like: curl http://localhost:9200/apm-hawkular/trace/_search | python -m > >> json.tool > >> > >> Right now I have a repl launched with HAWKULAR_APM_LOG_LEVEL set to > FINEST. > >> I'm creating spans in the repl as described earlier. Each time I create > a > >> trace I see a log entry from the client like this: > >> > >> FINEST: [TracePublisherRESTClient] [Thread[pool-2-thread-1,5,main]] > Status > >> code is: 204 > >> > >> and that 204 would suggest the trace info was successfully sent. But > inside > >> the docker container I can curl Elasticsearch and those new traces are > not > >> to be found. > >> > >> Incidentally, I started the repl last night, did a few successful > tests, and > >> then closed the lid of my laptop for the night with the Hawkular > container > >> still running and the repl still running. I've also had this issue occur > >> immediately on launch of the repl, so I don't think it's specifically > about > >> long running repls and/or sleeping, but for completeness I thought I > would > >> clarify how I am running this. > >> > >>> Do you have a pure Java example that reproduces the same issue? Might > be > >>> worth creating a jira in https://issues.jboss.org/projects/HWKAPM to > track > >>> the issue. > >> > >> No, not yet... > >> > >> _______________________________________________ > >> hawkular-dev mailing list > >> hawkular-dev at lists.jboss.org > >> https://lists.jboss.org/mailman/listinfo/hawkular-dev > >> > > _______________________________________________ > > hawkular-dev mailing list > > hawkular-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/hawkular-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170203/02895690/attachment.html From gbrown at redhat.com Mon Feb 6 06:10:13 2017 From: gbrown at redhat.com (Gary Brown) Date: Mon, 6 Feb 2017 06:10:13 -0500 (EST) Subject: [Hawkular-dev] Hawkular APM 0.14.0.Final Released In-Reply-To: <822438281.23642962.1486379157567.JavaMail.zimbra@redhat.com> Message-ID: <346675232.23643607.1486379413069.JavaMail.zimbra@redhat.com> Hi We are pleased to announce that version 0.14.0.Final of Hawkular APM has been released. The details for the release can be found here: https://github.com/hawkular/hawkular-apm/releases/tag/0.14.0.Final The main new feature is a UI for comparing the performance of service versions, particularly useful in a continuous deployment environment (in combination with strategies such as canary, blue/green, a/b) to assess the quality of newly deployed versions. This new feature is shown in a recent blog: http://www.hawkular.org/blog/2017/02/04/hawkular-apm-service-deployments.html Regards Gary From lponce at redhat.com Tue Feb 7 04:16:24 2017 From: lponce at redhat.com (Lucas Ponce) Date: Tue, 7 Feb 2017 04:16:24 -0500 (EST) Subject: [Hawkular-dev] Upgrade to Wildfly 1.1.0.Final In-Reply-To: <536334893.655415.1486458833301.JavaMail.zimbra@redhat.com> Message-ID: <1581897772.656530.1486458984972.JavaMail.zimbra@redhat.com> Hello, Is there any objection / potential problem if we upgrade from 1.0.0.Final to 1.1.0.Final ? During investigation of a clustering issue, there are some fixes related that seems to be packaged on 1.1.0.Final. I am still working on this but I would want to know if upgrading the Wildfly version in parent could have consensus. Thanks, Lucas From hrupp at redhat.com Tue Feb 7 04:18:47 2017 From: hrupp at redhat.com (Heiko W.Rupp) Date: Tue, 07 Feb 2017 10:18:47 +0100 Subject: [Hawkular-dev] Hawkular-services 0.31 released Message-ID: <724E986F-59F3-4388-87D4-910FE633477B@redhat.com> Hello, Hawkular-services 0.31 was just released. Major change to 0.30 [1] is the update of HAM to 0.23.4. I have pushed updated docker images for pilhuhn/hawkular-services. [1] http://www.hawkular.org/blog/2017/01/31/hawkular-services-0.30-released.html From hrupp at redhat.com Tue Feb 7 04:21:36 2017 From: hrupp at redhat.com (Heiko W.Rupp) Date: Tue, 07 Feb 2017 10:21:36 +0100 Subject: [Hawkular-dev] Upgrade to Wildfly 1.1.0.Final In-Reply-To: <1581897772.656530.1486458984972.JavaMail.zimbra@redhat.com> References: <1581897772.656530.1486458984972.JavaMail.zimbra@redhat.com> Message-ID: <691118B1-648A-47B9-B104-F9C2BFBB326A@redhat.com> On 7 Feb 2017, at 10:16, Lucas Ponce wrote: > Is there any objection / potential problem if we upgrade from > 1.0.0.Final to 1.1.0.Final ? Do we know if 10.1 is just a bug-fix release or would it introduce new features, that we are then using? > During investigation of a clustering issue, there are some fixes > related that seems to be packaged on 1.1.0.Final. If it is bugfixes only, I am all for updating. From lponce at redhat.com Tue Feb 7 04:28:05 2017 From: lponce at redhat.com (Lucas Ponce) Date: Tue, 7 Feb 2017 04:28:05 -0500 (EST) Subject: [Hawkular-dev] Upgrade to Wildfly 1.1.0.Final In-Reply-To: <691118B1-648A-47B9-B104-F9C2BFBB326A@redhat.com> References: <1581897772.656530.1486458984972.JavaMail.zimbra@redhat.com> <691118B1-648A-47B9-B104-F9C2BFBB326A@redhat.com> Message-ID: <732648764.661603.1486459685119.JavaMail.zimbra@redhat.com> ----- Mensaje original ----- > De: "Heiko W.Rupp" > Para: "Discussions around Hawkular development" > Enviados: Martes, 7 de Febrero 2017 10:21:36 > Asunto: Re: [Hawkular-dev] Upgrade to Wildfly 1.1.0.Final > > On 7 Feb 2017, at 10:16, Lucas Ponce wrote: > > > Is there any objection / potential problem if we upgrade from > > 1.0.0.Final to 1.1.0.Final ? > > Do we know if 10.1 is just a bug-fix release or would it > introduce new features, that we are then using? > There are some new features http://wildfly.org/news/2016/08/19/WildFly10-1-Released/ > > During investigation of a clustering issue, there are some fixes > > related that seems to be packaged on 1.1.0.Final. > > If it is bugfixes only, I am all for updating. I am starting to test it locally, at least for Alerting, to see if it resolves issues I am having. But if there is some way to test other components it would be good to give it a try. From jshaughn at redhat.com Tue Feb 7 08:29:27 2017 From: jshaughn at redhat.com (Jay Shaughnessy) Date: Tue, 7 Feb 2017 08:29:27 -0500 Subject: [Hawkular-dev] Upgrade to Wildfly 1.1.0.Final In-Reply-To: <732648764.661603.1486459685119.JavaMail.zimbra@redhat.com> References: <1581897772.656530.1486458984972.JavaMail.zimbra@redhat.com> <691118B1-648A-47B9-B104-F9C2BFBB326A@redhat.com> <732648764.661603.1486459685119.JavaMail.zimbra@redhat.com> Message-ID: <84e18c06-66c0-27d1-e584-0b409cc59bdb@redhat.com> I'll repost what I mentioned in December: "I noticed that on Openshift we are running Hawkular Metrics on WildFly 10.1.0. It was upped from 10.0.0 several months ago due to a blocking issue that had been fixed in EAP but not WF 10.0. I ran into a new issue when trying to deploy Metrics master on OS Origin. It failed to deploy on WF 10.1.0. I was able to solve the issue without a major change but it called out the fact that we are building Hawkular against WF 10.0.1 bom and running itests against 10.0.0 server. Because OS is a primary target platform I'm wondering if we should bump the parent pom deps to the 10.1.0 bom and server (as well as upping a few related deps as well, like ISPN). As part of my investigation I did this locally for parent pom, commons, alerting and metrics and did not see any issues." I think I have a branch somewhere that fixed whatever that deployment issue is... On 2/7/2017 4:28 AM, Lucas Ponce wrote: > > ----- Mensaje original ----- >> De: "Heiko W.Rupp" >> Para: "Discussions around Hawkular development" >> Enviados: Martes, 7 de Febrero 2017 10:21:36 >> Asunto: Re: [Hawkular-dev] Upgrade to Wildfly 1.1.0.Final >> >> On 7 Feb 2017, at 10:16, Lucas Ponce wrote: >> >>> Is there any objection / potential problem if we upgrade from >>> 1.0.0.Final to 1.1.0.Final ? >> Do we know if 10.1 is just a bug-fix release or would it >> introduce new features, that we are then using? >> > There are some new features > > http://wildfly.org/news/2016/08/19/WildFly10-1-Released/ > > >>> During investigation of a clustering issue, there are some fixes >>> related that seems to be packaged on 1.1.0.Final. >> If it is bugfixes only, I am all for updating. > I am starting to test it locally, at least for Alerting, to see if it resolves issues I am having. > > But if there is some way to test other components it would be good to give it a try. > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170207/d6bbbfbf/attachment.html From mwringe at redhat.com Tue Feb 7 15:51:23 2017 From: mwringe at redhat.com (Matt Wringe) Date: Tue, 7 Feb 2017 15:51:23 -0500 (EST) Subject: [Hawkular-dev] Upgrade to Wildfly 1.1.0.Final In-Reply-To: <691118B1-648A-47B9-B104-F9C2BFBB326A@redhat.com> References: <1581897772.656530.1486458984972.JavaMail.zimbra@redhat.com> <691118B1-648A-47B9-B104-F9C2BFBB326A@redhat.com> Message-ID: <830064420.28185617.1486500683672.JavaMail.zimbra@redhat.com> ----- Original Message ----- > From: "Heiko W.Rupp" > To: "Discussions around Hawkular development" > Sent: Tuesday, 7 February, 2017 4:21:36 AM > Subject: Re: [Hawkular-dev] Upgrade to Wildfly 1.1.0.Final > > On 7 Feb 2017, at 10:16, Lucas Ponce wrote: > > > Is there any objection / potential problem if we upgrade from > > 1.0.0.Final to 1.1.0.Final ? > > Do we know if 10.1 is just a bug-fix release or would it > introduce new features, that we are then using? With Wildfly there is no backporting to fix issues. To make sure that we are running with the latest bug fixes and security updates, whenever there is a new release of Wildfly we should be moving to that as soon as possible. > > During investigation of a clustering issue, there are some fixes > > related that seems to be packaged on 1.1.0.Final. > > If it is bugfixes only, I am all for updating. > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > From mazz at redhat.com Tue Feb 7 15:55:44 2017 From: mazz at redhat.com (John Mazzitelli) Date: Tue, 7 Feb 2017 15:55:44 -0500 (EST) Subject: [Hawkular-dev] Upgrade to Wildfly 1.1.0.Final In-Reply-To: <830064420.28185617.1486500683672.JavaMail.zimbra@redhat.com> References: <1581897772.656530.1486458984972.JavaMail.zimbra@redhat.com> <691118B1-648A-47B9-B104-F9C2BFBB326A@redhat.com> <830064420.28185617.1486500683672.JavaMail.zimbra@redhat.com> Message-ID: <756502719.1309406.1486500944157.JavaMail.zimbra@redhat.com> > To make sure that we are running with the latest bug fixes and security > updates, whenever there is a new release of Wildfly we should be moving to > that as soon as possible. The problem with that is, it may break us if run in older EAP versions (like EAP6). But this may not be an issue with server-side components like metrics and alerts. It is an issue with the wildfly agent, though. From mazz at redhat.com Tue Feb 7 21:08:47 2017 From: mazz at redhat.com (John Mazzitelli) Date: Tue, 7 Feb 2017 21:08:47 -0500 (EST) Subject: [Hawkular-dev] hosa - /metrics can be behind auth; 2 new metrics In-Reply-To: <210693744.1364031.1486518429334.JavaMail.zimbra@redhat.com> Message-ID: <256341300.1369293.1486519727212.JavaMail.zimbra@redhat.com> [this is more for Matt W, but will post here] Two new things in HOSA - these have been released under the 1.2.0.Final version and is available on docker hub - see https://hub.docker.com/r/hawkular/hawkular-openshift-agent/tags/ 1) Hawkular WildFly Agent has its own metrics endpoint (so it can monitor itself). The endpoint is /metrics. This is nothing new. But the /metrics can now be configured behind basic auth. If you configure this in the agent config, you must authenticate to see the metrics: emitter: metrics_credentials: username: foo password: bar You can pass these in via env. vars and thus you can use OpenShift secrets for it. 2) There are now two new metrics (both gauges) the agent itself emits: hawkular_openshift_agent_monitored_pods (The number of pods currently being monitored) hawkular_openshift_agent_monitored_endpoints (The number of endpoints currently being monitored) That is all. From mwringe at redhat.com Wed Feb 8 17:33:36 2017 From: mwringe at redhat.com (Matt Wringe) Date: Wed, 8 Feb 2017 17:33:36 -0500 (EST) Subject: [Hawkular-dev] Upgrade to Wildfly 1.1.0.Final In-Reply-To: <756502719.1309406.1486500944157.JavaMail.zimbra@redhat.com> References: <1581897772.656530.1486458984972.JavaMail.zimbra@redhat.com> <691118B1-648A-47B9-B104-F9C2BFBB326A@redhat.com> <830064420.28185617.1486500683672.JavaMail.zimbra@redhat.com> <756502719.1309406.1486500944157.JavaMail.zimbra@redhat.com> Message-ID: <1640060724.28587966.1486593216685.JavaMail.zimbra@redhat.com> ----- Original Message ----- > From: "John Mazzitelli" > To: "Discussions around Hawkular development" > Sent: Tuesday, 7 February, 2017 3:55:44 PM > Subject: Re: [Hawkular-dev] Upgrade to Wildfly 1.1.0.Final > > > To make sure that we are running with the latest bug fixes and security > > updates, whenever there is a new release of Wildfly we should be moving to > > that as soon as possible. > > The problem with that is, it may break us if run in older EAP versions (like > EAP6). If you are relying on an older version of Wildfly to behave the same way as EAP you are going to be disappointed. They may share a common ancestor, but the longer that EAP is out there the more out of sync it gets with Wildfly. EAP will have backported fixes and other updates that the older Wildfly version its based on will never get. They are similar but different enough that its probably a wise idea to be supporting them separately. We have to do this with our metric images for OpenShift. For the community version we should always be updating, even just to make sure we have the latest security updates. > But this may not be an issue with server-side components like metrics and > alerts. > > It is an issue with the wildfly agent, though. > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > From jsanda at redhat.com Wed Feb 8 22:59:13 2017 From: jsanda at redhat.com (John Sanda) Date: Wed, 8 Feb 2017 22:59:13 -0500 Subject: [Hawkular-dev] [metrics] configurable data retention Message-ID: <8A708E4B-3A32-4D89-89E2-5FF8A49FAE7B@redhat.com> Pretty much from the start of the project we have provided configurable data retention. There is a system-wide default retention that can be set at start up. You can also set the data retention per tenant as well as per individual metric. Do we need to provide this fine-grained level of configurability, or is it sufficient to only have a system-wide data retention which is configurable? It is worth noting that in OpenShift *only* the system-wide data retention is set. Recently we have been dealing with a number of production issues including: * Cassandra crashing with an OutOfMemoryError * Stats queries failing in Hawkular Metrics due to high read latencies in Cassandra * Expired data not getting purged in a timely fashion These issues all involve compaction. In older versions of Hawkular Metrics we were using the default, size tiered compaction strategy (STCS). Time window compaction strategy (TWCS) is better suited and for time series data such as our. We are already seeing good results with some early testing. Using the correct and properly configured compaction strategy can have a significant impact on several things including: * I/O usage * cpu usage * read performance * disk usage TWCS was developed for some very specific use cases which are common use cases with Cassandra. TWCS is recommended for time series that meet the following criteria: * append-only writes * no deletes * global (i.e., table-wide) TTL * few out of order writes (at least it is the exception and not the norm) It is the third bullet which has prompted this email. If we allow/support different TTLs per tenant and/or per metric we will lose a lot of the benefits of TWCS and likely continue to struggle with some of the issues we have been facing as of late. If you ask me exactly how well or poorly will compaction perform using mixed TTLs, I can only speculate. I simply do not have the bandwidth to test things that C* docs and C* devs say *not* to do. I am of the opinion, at least for OpenShift users, that disk usage is much more important than fine-grained data retentions. A big question I have is, what about outside of OpenShift? This may be a question for some people not on this list, so I want to make sure it does reach the right people. I think we could potentially tie together configurable data retention with rollups. Let?s say we add support for 15min, 1hr, 6hr, and 24hr rollups where each rollup is stored in its own table and each larger rollup having a larger retention. Different levels of data retentions could be used to determine what rollups a tenant has. If a tenant wants a data retention of a month for example, then that could translate into generating 15min and 1hr rollups for that tenant. - John From snegrea at redhat.com Thu Feb 9 15:02:27 2017 From: snegrea at redhat.com (Stefan Negrea) Date: Thu, 9 Feb 2017 14:02:27 -0600 Subject: [Hawkular-dev] Hawkular Metrics 0.24.0 - Release Message-ID: Hello, I am happy to announce release 0.24.0 of Hawkular Metrics. This release is anchored by a new tag query language and general stability improvements. Here is a list of major changes: - *Tag Query Language* - A query language was added to support complex constructs for tag based queries for metrics - The old tag query syntax is deprecated but can still be used; the new syntax takes precedence - The new syntax supports: - logical operators: AND,OR - equality operators: =, != - value in array operators: IN, NOT IN - existential conditions: - tag without any operator is equivalent to = '*' - tag preceded by the NOT operator matches only instances without the tag defined - all the values in between single quotes are treated as regex expressions - simple text values do not need single quotes - spaces before and after equality operators are not necessary - For more details please see: Pull Request 725 , HWKMETRICS-523 - Sample queries: a1 = 'bcd' OR a2 != 'efg' a1='bcd' OR a2!='efg' a1 = efg AND ( a2 = 'hijk' OR a2 = 'xyz' ) a1 = 'efg' AND ( a2 IN ['hijk', 'xyz'] ) a1 = 'efg' AND a2 NOT IN ['hijk'] a1 = 'd' OR ( a1 != 'ab' AND ( c1 = '*' ) ) a1 OR a2 NOT a1 AND a2 a1 = 'a' AND NOT b2 a1 = a AND NOT b2 - *Performance* - Updated compaction strategies for data tables from size tiered compaction (STCS) to time window compaction (TWCS) (HWKMETRICS-556 ) - Jobs now execute on RxJava's I/O scheduler thread pool ( HWKMETRICS-579 ) - *Administration* - The admin tenant is now configurable via ADMIN_TENANT environment variable (HWKMETRICS-572 ) - Internal metric collection is disabled by default (HWKMETRICS-578 ) - Resolved a null pointer exception in DropWizardReporter due to admin tenant changes (HWKMETRICS-577 ) - *Job Scheduler* - Resolved an issue where the compression job would stop running after a few days (HWKMETRICS-564 ) - Updated the job scheduler to renew job locks during job execution ( HWKMETRICS-570 ) - Updated the job scheduler to reacquire job lock after server restarts (HWKMETRICS-583 ) - *Hawkular Alerting - Major Updates* - Resolved several issues where schema upgrades were not applied after the initial schema install (HWKALERTS-220 , HWKALERTS-222 ) *Hawkular Alerting - Included* - Version 1.5.1 - Project details and repository: Github - Documentation: REST API , Examples , Developer Guide *Hawkular Metrics Clients* - Python: https://github.com/hawkular/hawkular-client-python - Go: https://github.com/hawkular/hawkular-client-go - Ruby: https://github.com/hawkular/hawkular-client-ruby - Java: https://github.com/hawkular/hawkular-client-java *Release Links* Github Release: https://github.com/hawkular/hawkular-metrics/releases/tag/0.24.0 JBoss Nexus Maven artifacts: http://origin-repository.jboss.org/nexus/content/repositorie s/public/org/hawkular/metrics/ Jira release tracker: https://issues.jboss.org/projects/HWKMETRICS/versions/12332966 A big "Thank you" goes to John Sanda, Matt Wringe, Michael Burman, Joel Takvorian, Jay Shaughnessy, Lucas Ponce, and Heiko Rupp for their project contributions. Thank you, Stefan Negrea -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170209/473243b5/attachment-0001.html From theute at redhat.com Fri Feb 10 07:28:45 2017 From: theute at redhat.com (Thomas Heute) Date: Fri, 10 Feb 2017 13:28:45 +0100 Subject: [Hawkular-dev] Collecting PV usage ? Message-ID: Mazz, in your metric collection adventure for HOSA have you met a way to see the usage of PVs attached to a pod ? User should know (be able to visualize) how much of the PVs are used and then be alerted if it reach a certain %. Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170210/0ae92356/attachment.html From mazz at redhat.com Fri Feb 10 07:36:18 2017 From: mazz at redhat.com (John Mazzitelli) Date: Fri, 10 Feb 2017 07:36:18 -0500 (EST) Subject: [Hawkular-dev] Collecting PV usage ? In-Reply-To: References: Message-ID: <807728075.2531836.1486730178463.JavaMail.zimbra@redhat.com> No. It is on the list though: https://github.com/hawkular/hawkular-openshift-agent/issues/110 I honestly don't know where to get these persistent volume stats (or any of the low level stats) - but I believe its just prometheus data so we should be able to get it if we just know the correct URLs. But so far we've been concentrating on app metrics coming from user's pods so we haven't worked on that yet. ----- Original Message ----- > Mazz, > > in your metric collection adventure for HOSA have you met a way to see the > usage of PVs attached to a pod ? > User should know (be able to visualize) how much of the PVs are used and > then be alerted if it reach a certain %. > > Thomas > From theute at redhat.com Fri Feb 10 07:55:46 2017 From: theute at redhat.com (Thomas Heute) Date: Fri, 10 Feb 2017 13:55:46 +0100 Subject: [Hawkular-dev] Collecting PV usage ? In-Reply-To: <807728075.2531836.1486730178463.JavaMail.zimbra@redhat.com> References: <807728075.2531836.1486730178463.JavaMail.zimbra@redhat.com> Message-ID: On Fri, Feb 10, 2017 at 1:36 PM, John Mazzitelli wrote: > No. It is on the list though: > > https://github.com/hawkular/hawkular-openshift-agent/issues/110 I don't think they are collected by Heapster today > I honestly don't know where to get these persistent volume stats (or any > of the low level stats) - but I believe its just prometheus data so we > should be able to get it if we just know the correct URLs. But so far we've > been concentrating on app metrics coming from user's pods so we haven't > worked on that yet. > Right. I haven't found evidence of a prometheus endpoint yet. It doesn't seem straightforward, but an important data to expose. I'm afraid that it will depend on the storage solution being used (to see the actual usage) Thomas > > ----- Original Message ----- > > Mazz, > > > > in your metric collection adventure for HOSA have you met a way to see > the > > usage of PVs attached to a pod ? > > User should know (be able to visualize) how much of the PVs are used and > > then be alerted if it reach a certain %. > > > > Thomas > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170210/0677b292/attachment.html From jdoyle at redhat.com Fri Feb 10 09:49:44 2017 From: jdoyle at redhat.com (John Doyle) Date: Fri, 10 Feb 2017 09:49:44 -0500 Subject: [Hawkular-dev] Hawkular Metrics 0.24.0 - Release In-Reply-To: References: Message-ID: Congratulations! On Thu, Feb 9, 2017 at 3:02 PM, Stefan Negrea wrote: > Hello, > > I am happy to announce release 0.24.0 of Hawkular Metrics. This release is > anchored by a new tag query language and general stability improvements. > > Here is a list of major changes: > > - *Tag Query Language* > - A query language was added to support complex constructs for tag > based queries for metrics > - The old tag query syntax is deprecated but can still be used; the > new syntax takes precedence > - The new syntax supports: > - logical operators: AND,OR > - equality operators: =, != > - value in array operators: IN, NOT IN > - existential conditions: > - tag without any operator is equivalent to = '*' > - tag preceded by the NOT operator matches only instances > without the tag defined > - all the values in between single quotes are treated as regex > expressions > - simple text values do not need single quotes > - spaces before and after equality operators are not necessary > - For more details please see: Pull Request 725 > , > HWKMETRICS-523 > - Sample queries: > > a1 = 'bcd' OR a2 != 'efg' > a1='bcd' OR a2!='efg' > a1 = efg AND ( a2 = 'hijk' OR a2 = 'xyz' ) > a1 = 'efg' AND ( a2 IN ['hijk', 'xyz'] ) > a1 = 'efg' AND a2 NOT IN ['hijk'] > a1 = 'd' OR ( a1 != 'ab' AND ( c1 = '*' ) ) > a1 OR a2 > NOT a1 AND a2 > a1 = 'a' AND NOT b2 > a1 = a AND NOT b2 > > > - *Performance* > - Updated compaction strategies for data tables from size tiered > compaction (STCS) to time window compaction (TWCS) (HWKMETRICS-556 > ) > - Jobs now execute on RxJava's I/O scheduler thread pool ( > HWKMETRICS-579 ) > - *Administration* > - The admin tenant is now configurable via ADMIN_TENANT environment > variable (HWKMETRICS-572 > ) > - Internal metric collection is disabled by default (HWKMETRICS-578 > ) > - Resolved a null pointer exception in DropWizardReporter due to > admin tenant changes (HWKMETRICS-577 > ) > - *Job Scheduler* > - Resolved an issue where the compression job would stop running > after a few days (HWKMETRICS-564 > ) > - Updated the job scheduler to renew job locks during job execution > (HWKMETRICS-570 ) > - Updated the job scheduler to reacquire job lock after server > restarts (HWKMETRICS-583 > ) > - *Hawkular Alerting - Major Updates* > - Resolved several issues where schema upgrades were not applied > after the initial schema install (HWKALERTS-220 > , HWKALERTS-222 > ) > > > *Hawkular Alerting - Included* > > - Version 1.5.1 > > - Project details and repository: Github > > - Documentation: REST API > , Examples > , Developer > Guide > > > *Hawkular Metrics Clients* > > - Python: https://github.com/hawkular/hawkular-client-python > - Go: https://github.com/hawkular/hawkular-client-go > - Ruby: https://github.com/hawkular/hawkular-client-ruby > - Java: https://github.com/hawkular/hawkular-client-java > > > *Release Links* > > Github Release: https://github.com/hawkular/hawkular-metrics/ > releases/tag/0.24.0 > > JBoss Nexus Maven artifacts: > http://origin-repository.jboss.org/nexus/content/repositorie > s/public/org/hawkular/metrics/ > > Jira release tracker: > https://issues.jboss.org/projects/HWKMETRICS/versions/12332966 > > A big "Thank you" goes to John Sanda, Matt Wringe, Michael Burman, Joel > Takvorian, Jay Shaughnessy, Lucas Ponce, and Heiko Rupp for their project > contributions. > > Thank you, > Stefan Negrea > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170210/dba92f4a/attachment.html From mwringe at redhat.com Fri Feb 10 11:45:32 2017 From: mwringe at redhat.com (Matt Wringe) Date: Fri, 10 Feb 2017 11:45:32 -0500 (EST) Subject: [Hawkular-dev] Collecting PV usage ? In-Reply-To: References: <807728075.2531836.1486730178463.JavaMail.zimbra@redhat.com> Message-ID: <571400962.29493096.1486745131995.JavaMail.zimbra@redhat.com> ----- Original Message ----- > From: "Thomas Heute" > To: "John Mazzitelli" > Cc: "Discussions around Hawkular development" > Sent: Friday, 10 February, 2017 7:55:46 AM > Subject: Re: [Hawkular-dev] Collecting PV usage ? > > > > On Fri, Feb 10, 2017 at 1:36 PM, John Mazzitelli < mazz at redhat.com > wrote: > > > No. It is on the list though: > > https://github.com/hawkular/hawkular-openshift-agent/issues/110 > > I don't think they are collected by Heapster today Heapster collects a bunch of filesystem data, I believe its pulling in all volumes being used which should include PVs. It includes the usage and limit for each of the volumes being used. > > I honestly don't know where to get these persistent volume stats (or any of > the low level stats) - but I believe its just prometheus data so we should > be able to get it if we just know the correct URLs. But so far we've been > concentrating on app metrics coming from user's pods so we haven't worked on > that yet. > > Right. I haven't found evidence of a prometheus endpoint yet. The main prometheus endpoint for each node should be found here curl -k -H "Authorization: Bearer $ADMIN_TOKEN" -X GET https://${NODE_IP}:10250/metrics The summary endpoint (not Prometheus but json based) that Heapster uses and contains volume information, can be found here: reset;curl -k -H "Authorization: Bearer $ADMIN_TOKEN" -X POST -d '{"num_stats":1}' https://${NODE_IP}:10250/stats/summary I think you can get the same information from the prometheus endpoint here that you can from the summary one, but it might be a bit tricky to parse things out. But it could also be possible that the summary endpoint is including new information. > > It doesn't seem straightforward, but an important data to expose. I'm afraid > that it will depend on the storage solution being used (to see the actual > usage) > > > Thomas > > > > > ----- Original Message ----- > > Mazz, > > > > in your metric collection adventure for HOSA have you met a way to see the > > usage of PVs attached to a pod ? > > User should know (be able to visualize) how much of the PVs are used and > > then be alerted if it reach a certain %. > > > > Thomas > > > > > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > From mwringe at redhat.com Fri Feb 10 11:46:27 2017 From: mwringe at redhat.com (Matt Wringe) Date: Fri, 10 Feb 2017 11:46:27 -0500 (EST) Subject: [Hawkular-dev] Collecting PV usage ? In-Reply-To: <571400962.29493096.1486745131995.JavaMail.zimbra@redhat.com> References: <807728075.2531836.1486730178463.JavaMail.zimbra@redhat.com> <571400962.29493096.1486745131995.JavaMail.zimbra@redhat.com> Message-ID: <1423363865.29493314.1486745187288.JavaMail.zimbra@redhat.com> ----- Original Message ----- > From: "Matt Wringe" > To: "Discussions around Hawkular development" > Cc: "John Mazzitelli" > Sent: Friday, 10 February, 2017 11:45:32 AM > Subject: Re: [Hawkular-dev] Collecting PV usage ? > > ----- Original Message ----- > > From: "Thomas Heute" > > To: "John Mazzitelli" > > Cc: "Discussions around Hawkular development" > > > > Sent: Friday, 10 February, 2017 7:55:46 AM > > Subject: Re: [Hawkular-dev] Collecting PV usage ? > > > > > > > > On Fri, Feb 10, 2017 at 1:36 PM, John Mazzitelli < mazz at redhat.com > wrote: > > > > > > No. It is on the list though: > > > > https://github.com/hawkular/hawkular-openshift-agent/issues/110 > > > > I don't think they are collected by Heapster today > > Heapster collects a bunch of filesystem data, I believe its pulling in all > volumes being used which should include PVs. > > It includes the usage and limit for each of the volumes being used. > > > > > I honestly don't know where to get these persistent volume stats (or any of > > the low level stats) - but I believe its just prometheus data so we should > > be able to get it if we just know the correct URLs. But so far we've been > > concentrating on app metrics coming from user's pods so we haven't worked > > on > > that yet. > > > > Right. I haven't found evidence of a prometheus endpoint yet. > > The main prometheus endpoint for each node should be found here > > curl -k -H "Authorization: Bearer $ADMIN_TOKEN" -X GET > https://${NODE_IP}:10250/metrics If we want to collect things like cpu/memory/etc for the pods directly instead of using heapster, this should be the endpoint to use as well. > The summary endpoint (not Prometheus but json based) that Heapster uses and > contains volume information, can be found here: > > reset;curl -k -H "Authorization: Bearer $ADMIN_TOKEN" -X POST -d > '{"num_stats":1}' https://${NODE_IP}:10250/stats/summary > > I think you can get the same information from the prometheus endpoint here > that you can from the summary one, but it might be a bit tricky to parse > things out. But it could also be possible that the summary endpoint is > including new information. > > > > > It doesn't seem straightforward, but an important data to expose. I'm > > afraid > > that it will depend on the storage solution being used (to see the actual > > usage) > > > > > > Thomas > > > > > > > > > > ----- Original Message ----- > > > Mazz, > > > > > > in your metric collection adventure for HOSA have you met a way to see > > > the > > > usage of PVs attached to a pod ? > > > User should know (be able to visualize) how much of the PVs are used and > > > then be alerted if it reach a certain %. > > > > > > Thomas > > > > > > > > > _______________________________________________ > > hawkular-dev mailing list > > hawkular-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/hawkular-dev > > > From miburman at redhat.com Sat Feb 11 05:04:51 2017 From: miburman at redhat.com (Michael Burman) Date: Sat, 11 Feb 2017 12:04:51 +0200 Subject: [Hawkular-dev] RxJava2 preliminary testing Message-ID: <2c187395-e920-e2f0-f9ed-c147ad4d0783@redhat.com> Hi, I did yesterday evening and today some testing on how using RxJava2 would benefit us (I'm expecting more from RxJava 2.1 actually, since it has some enhanced parallelism features which we might benefit from). Short notes from RxJava2 migration, it's more painful than I assumed. The code changes can be small in terms of lines of code changed, but almost every method has had their signature or behavior changed. So at least I've had to read the documentation all the time when doing things and trying to unlearn what I've done in the RxJava1. And all this comes with a backwards compatibility pressure for Java 6 (so you can't benefit from many Java 8 advantages). Reactive-Commons / Reactor have started from Java 8 to provide cleaner implementation. Grr. I wrote a simple write path modification in the PR #762 (metrics) that writes Gauges using RxJava2 ported micro-batching feature. There's still some RxJavaInterOp use in it, so that might slow down the performance a little bit. However, it is possible to merge these two codes. There are also some other optimizations I think could be worth it. I'd advice against it though, reading gets quite complex. I would almost suggest that we would do the MetricsServiceImpl/DataAccessImpl merging by rewriting small parts at a time in the new class with RxJava2 and make that call the old code with RxJavaInterOp. That way we could move slowly to the newer codebase. I fixed the JMH-benchmarks (as they're not compiled in our CI and were actually broken by some other PRs) and ran some tests. These are the tests that measure only the metrics-core-service performance and do not touch the REST-interface (or Wildfly) at all, thus giving better comparison in how our internal changes behave. What I'm seeing is around 20-30% difference in performance when writing gauges this way. So this should offset some of the issues we saw when we improved error handling (which caused performance degradation). I did ran into the HWKMETRICS-542 (BusyPoolException) so the tests were run with 1024 connections. I'll continue next week some more testing, but at the same time I proved that the micro-batching features do improve performance in the internal processing, especially when there's small amount of writers to a single node. But testing those features could probably benefit from more benchmark tests without WIldfly (which takes so much processing power that most performance improvements can't be measured correctly anymore). - Micke From mazz at redhat.com Sat Feb 11 09:01:45 2017 From: mazz at redhat.com (John Mazzitelli) Date: Sat, 11 Feb 2017 09:01:45 -0500 (EST) Subject: [Hawkular-dev] HOSA now limits amount of metrics per pod; new agent metrics added In-Reply-To: <267921126.3065551.1486820768829.JavaMail.zimbra@redhat.com> Message-ID: <255878990.3068210.1486821705414.JavaMail.zimbra@redhat.com> FYI: New enhancement to Hawkular OpenShift Agent (HOSA). To avoid having a misconfigured or malicious pod from flooding HOSA and H-Metrics with large amounts of metric data, HOSA has now been enhanced to support the setting of "max_metrics_per_pod" (this is a setting in the agent global configuration). Its default is 50. Any pod that asks the agent to collect more than that (sum total across all of its endpoints) will be throttled down and only the maximum number of metrics will be stored for that pod. Note: when I say "metrics" here I do not mean datapoints - this limits the number of unique metric IDs allowed to be stored per pod) If you enable the status endpoint, you'll see this in the yaml report when a max limit is reached for the endpoint in question: openshift-infra|the-pod-name-73fgt|prometheus|http://172.19.0.5:8080/metrics: METRIC LIMIT EXCEEDED. Last collection at [Sat, 11 Feb 2017 13:46:44 +0000] gathered [54] metrics, [4] were discarded, in [1.697787ms] A warning will also be logged in the log file: "Reached max limit of metrics for [openshift-infra|the-pod-name-73fgt|prometheus|http://172.19.0.5:8080/metrics] - discarding [4] collected metrics" (As part of this code change, the status endpoint was enhanced to now show the number of metrics collected from each endpoint under each pod. This is not the total number of datapoints; it is showing unique metric IDs - this number will always be <= the max metrics per pod) Finally, the agent now collects and emits 4 metrics of its own (in addition to all the other "go" related ones like memory used, etc). They are: 1 Counter: hawkular_openshift_agent_metric_data_points_collected_total The total number of individual metric data points collected from all endpoints. 3 Gauges: hawkular_openshift_agent_monitored_pods The number of pods currently being monitored. hawkular_openshift_agent_monitored_endpoints The number of endpoints currently being monitored. hawkular_openshift_agent_monitored_metrics The total number of metrics currently being monitored across all endpoints. All of this is in master and will be in the next HOSA release, which I hope to do this weekend. From mazz at redhat.com Sat Feb 11 09:15:12 2017 From: mazz at redhat.com (John Mazzitelli) Date: Sat, 11 Feb 2017 09:15:12 -0500 (EST) Subject: [Hawkular-dev] HOSA now limits amount of metrics per pod; new agent metrics added In-Reply-To: <255878990.3068210.1486821705414.JavaMail.zimbra@redhat.com> References: <255878990.3068210.1486821705414.JavaMail.zimbra@redhat.com> Message-ID: <203665192.3069775.1486822511989.JavaMail.zimbra@redhat.com> > All of this is in master and will be in the next HOSA release, which I hope to do this weekend. FYI: But it is available now if you want to try it out - HOSA github project uses travis to publish the lastest master build on docker hub - so just pull the "latest" tagged version to get the new stuff. https://hub.docker.com/r/hawkular/hawkular-openshift-agent/tags/ From miburman at redhat.com Sun Feb 12 10:40:33 2017 From: miburman at redhat.com (Michael Burman) Date: Sun, 12 Feb 2017 17:40:33 +0200 Subject: [Hawkular-dev] Performance problems? (was RxJava2 preliminary testing) In-Reply-To: <2c187395-e920-e2f0-f9ed-c147ad4d0783@redhat.com> References: <2c187395-e920-e2f0-f9ed-c147ad4d0783@redhat.com> Message-ID: <2f2cbdeb-9920-0d2f-4e62-c77d677e20ae@redhat.com> Hi, Testing revealed something else that worried me, I was quite sure that I've had far higher numbers before than what we have now, so I went back to test some older versions to see if this was true.. The testing setup is as follows: 4 cores, 8GB, ccm Cassandra (3.0.10, 3.10 made no significant difference here). So all the CPU freed from HWKMETRICS is used by Cassandra (they compete on resources). Uses core-metrics-service only, no REST interface - directly writing with RxJava using addDatapoints. 1 or 10 datapoint per metric, writing 100 000 metrics in one call to addDatapoints (insertBenchmark, not the -Single ending one). Sources in the jmh-benchmark module (fix_jmh branch, PR # 0.19.3: 1 datapoint -> 31004 metrics / second (31004 datapoints per second) 10 datapoints -> 19027 metrics / second (190270 datapoints per second) Current master: 1 datapoint -> 8535 metrics / second (8535 datapoints per second) 10 datapoints -> 7065 metrics / second (70650 datapoints per second) So performance has dropped significantly between current master and 0.19.3 (0.19.0 was the first release without double writing). With HWKMETRICS-599 (micro-batching on top of the current master): RxJava1: 1 datapoint -> 55036 metrics / second (55036 datapoints / second) 10 datapoints -> 17870 metrics / second (178700 datapoints / second) RxJava2: 1 datapoint -> 76216 metrics / second (76216 datapoints / second) 10 datapoints -> 20088 metrics / second (200880 datapoints / second) HWKMETRICS-599 without retryWhen made no difference (an assumption that this was the problem creator). The 1 datapoint per metric is the most interesting use-case, so that's something we should concentrate on. But before someone asks: 100 metrics, 1000 datapoints per metric and batch size of 1000 (instead of default 50 in the micro-batching): 298030 datapoints / second. That's all folks! - Micke On 02/11/2017 12:04 PM, Michael Burman wrote: > Hi, > > I did yesterday evening and today some testing on how using RxJava2 > would benefit us (I'm expecting more from RxJava 2.1 actually, since it > has some enhanced parallelism features which we might benefit from). > > Short notes from RxJava2 migration, it's more painful than I assumed. > The code changes can be small in terms of lines of code changed, but > almost every method has had their signature or behavior changed. So at > least I've had to read the documentation all the time when doing things > and trying to unlearn what I've done in the RxJava1. > > And all this comes with a backwards compatibility pressure for Java 6 > (so you can't benefit from many Java 8 advantages). Reactive-Commons / > Reactor have started from Java 8 to provide cleaner implementation. Grr. > > I wrote a simple write path modification in the PR #762 (metrics) that > writes Gauges using RxJava2 ported micro-batching feature. There's still > some RxJavaInterOp use in it, so that might slow down the performance a > little bit. However, it is possible to merge these two codes. There are > also some other optimizations I think could be worth it. > > I'd advice against it though, reading gets quite complex. I would almost > suggest that we would do the MetricsServiceImpl/DataAccessImpl merging > by rewriting small parts at a time in the new class with RxJava2 and > make that call the old code with RxJavaInterOp. That way we could move > slowly to the newer codebase. > > I fixed the JMH-benchmarks (as they're not compiled in our CI and were > actually broken by some other PRs) and ran some tests. These are the > tests that measure only the metrics-core-service performance and do not > touch the REST-interface (or Wildfly) at all, thus giving better > comparison in how our internal changes behave. > > What I'm seeing is around 20-30% difference in performance when writing > gauges this way. So this should offset some of the issues we saw when we > improved error handling (which caused performance degradation). I did > ran into the HWKMETRICS-542 (BusyPoolException) so the tests were run > with 1024 connections. > > I'll continue next week some more testing, but at the same time I proved > that the micro-batching features do improve performance in the internal > processing, especially when there's small amount of writers to a single > node. But testing those features could probably benefit from more > benchmark tests without WIldfly (which takes so much processing power > that most performance improvements can't be measured correctly anymore). > > - Micke > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev From mazz at redhat.com Sun Feb 12 13:38:43 2017 From: mazz at redhat.com (John Mazzitelli) Date: Sun, 12 Feb 2017 13:38:43 -0500 (EST) Subject: [Hawkular-dev] HOSA now limits amount of metrics per pod; new agent metrics added In-Reply-To: <255878990.3068210.1486821705414.JavaMail.zimbra@redhat.com> References: <255878990.3068210.1486821705414.JavaMail.zimbra@redhat.com> Message-ID: <1671204035.3312536.1486924723171.JavaMail.zimbra@redhat.com> > will be in the next HOSA release, which I hope to do this weekend. 1.2.1.Final has been released: https://hub.docker.com/r/hawkular/hawkular-openshift-agent/tags/ ----- Original Message ----- > FYI: New enhancement to Hawkular OpenShift Agent (HOSA). > > To avoid having a misconfigured or malicious pod from flooding HOSA and > H-Metrics with large amounts of metric data, HOSA has now been enhanced to > support the setting of "max_metrics_per_pod" (this is a setting in the agent > global configuration). Its default is 50. Any pod that asks the agent to > collect more than that (sum total across all of its endpoints) will be > throttled down and only the maximum number of metrics will be stored for > that pod. Note: when I say "metrics" here I do not mean datapoints - this > limits the number of unique metric IDs allowed to be stored per pod) > > If you enable the status endpoint, you'll see this in the yaml report when a > max limit is reached for the endpoint in question: > > openshift-infra|the-pod-name-73fgt|prometheus|http://172.19.0.5:8080/metrics: > METRIC > LIMIT EXCEEDED. Last collection at [Sat, 11 Feb 2017 13:46:44 +0000] > gathered > [54] metrics, [4] were discarded, in [1.697787ms] > > A warning will also be logged in the log file: > > "Reached max limit of metrics for > [openshift-infra|the-pod-name-73fgt|prometheus|http://172.19.0.5:8080/metrics] > - discarding [4] collected metrics" > > (As part of this code change, the status endpoint was enhanced to now show > the number of metrics collected from each endpoint under each pod. This is > not the total number of datapoints; it is showing unique metric IDs - this > number will always be <= the max metrics per pod) > > Finally, the agent now collects and emits 4 metrics of its own (in addition > to all the other "go" related ones like memory used, etc). They are: > > 1 Counter: > > hawkular_openshift_agent_metric_data_points_collected_total > The total number of individual metric data points collected from all > endpoints. > > 3 Gauges: > > hawkular_openshift_agent_monitored_pods > The number of pods currently being monitored. > > hawkular_openshift_agent_monitored_endpoints > The number of endpoints currently being monitored. > > hawkular_openshift_agent_monitored_metrics > The total number of metrics currently being monitored across all endpoints. > > All of this is in master and will be in the next HOSA release, which I hope > to do this weekend. From theute at redhat.com Tue Feb 14 08:36:12 2017 From: theute at redhat.com (Thomas Heute) Date: Tue, 14 Feb 2017 14:36:12 +0100 Subject: [Hawkular-dev] Collecting PV usage ? In-Reply-To: <571400962.29493096.1486745131995.JavaMail.zimbra@redhat.com> References: <807728075.2531836.1486730178463.JavaMail.zimbra@redhat.com> <571400962.29493096.1486745131995.JavaMail.zimbra@redhat.com> Message-ID: On Fri, Feb 10, 2017 at 5:45 PM, Matt Wringe wrote: > ----- Original Message ----- > > From: "Thomas Heute" > > To: "John Mazzitelli" > > Cc: "Discussions around Hawkular development" < > hawkular-dev at lists.jboss.org> > > Sent: Friday, 10 February, 2017 7:55:46 AM > > Subject: Re: [Hawkular-dev] Collecting PV usage ? > > > > > > > > On Fri, Feb 10, 2017 at 1:36 PM, John Mazzitelli < mazz at redhat.com > > wrote: > > > > > > No. It is on the list though: > > > > https://github.com/hawkular/hawkular-openshift-agent/issues/110 > > > > I don't think they are collected by Heapster today > > Heapster collects a bunch of filesystem data, I believe its pulling in all > volumes being used which should include PVs. > > It includes the usage and limit for each of the volumes being used. It seems to always return 0 unfortunately: [image: Inline image 1] -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170214/f2efad53/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 119257 bytes Desc: not available Url : http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170214/f2efad53/attachment-0001.png From theute at redhat.com Tue Feb 14 09:12:09 2017 From: theute at redhat.com (Thomas Heute) Date: Tue, 14 Feb 2017 15:12:09 +0100 Subject: [Hawkular-dev] Collecting PV usage ? In-Reply-To: References: <807728075.2531836.1486730178463.JavaMail.zimbra@redhat.com> <571400962.29493096.1486745131995.JavaMail.zimbra@redhat.com> Message-ID: filesystem/limit and filesystem/available are actually correct, it returns the same host full capacity and remaining space for all containers. I don't seem to get PV data in particular, they are simply host directories though. On Tue, Feb 14, 2017 at 2:36 PM, Thomas Heute wrote: > > > On Fri, Feb 10, 2017 at 5:45 PM, Matt Wringe wrote: > >> ----- Original Message ----- >> > From: "Thomas Heute" >> > To: "John Mazzitelli" >> > Cc: "Discussions around Hawkular development" < >> hawkular-dev at lists.jboss.org> >> > Sent: Friday, 10 February, 2017 7:55:46 AM >> > Subject: Re: [Hawkular-dev] Collecting PV usage ? >> > >> > >> > >> > On Fri, Feb 10, 2017 at 1:36 PM, John Mazzitelli < mazz at redhat.com > >> wrote: >> > >> > >> > No. It is on the list though: >> > >> > https://github.com/hawkular/hawkular-openshift-agent/issues/110 >> > >> > I don't think they are collected by Heapster today >> >> Heapster collects a bunch of filesystem data, I believe its pulling in >> all volumes being used which should include PVs. >> >> It includes the usage and limit for each of the volumes being used. > > > > It seems to always return 0 unfortunately: > [image: Inline image 1] > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170214/3b753807/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 119257 bytes Desc: not available Url : http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170214/3b753807/attachment-0001.png From mazz at redhat.com Thu Feb 16 12:23:12 2017 From: mazz at redhat.com (John Mazzitelli) Date: Thu, 16 Feb 2017 12:23:12 -0500 (EST) Subject: [Hawkular-dev] HOSA and conversion from prometheus to hawkular metrics In-Reply-To: <1012981008.1719259.1485967638574.JavaMail.zimbra@redhat.com> References: <1971347286.1581541.1485959932755.JavaMail.zimbra@redhat.com> <7b794f4d-749c-1ea9-3ebc-64d17b3c0392@redhat.com> <1466597081.1642029.1485962716407.JavaMail.zimbra@redhat.com> <141247e6-4e22-e0ea-6634-c60deaa4a4c7@redhat.com> <1012981008.1719259.1485967638574.JavaMail.zimbra@redhat.com> Message-ID: <1230203468.6182728.1487265792257.JavaMail.zimbra@redhat.com> I need to resurrect this thread now that some others have had experience with what we have - specifically, what Thomas reported in this issue: https://github.com/hawkular/hawkular-openshift-agent/issues/126 It has to do with Prometheus metrics and how HOSA names and tags them in H-Metrics. Just some quick background first: Prometheus metrics have two parts - a "family name" (like "http_response_count") and labels (like "method"). This means you can have N metrics in Prometheus with the same metric family name but each with different label values (like "http_response_count{method=GET}" and "http_response_count{method=POST}". Each unique combination of family name plus label values represent a different set of time series data (so http_response_count{method=GET} is one set of time series data and http_response_count{method=POST} is another set of time series data). H-Metrics doesn't really have this concept of metric family. H-Metrics has metric definitions each with unique names (or "metric IDs") and a set of tags (h-metrics uses the name "tags" rather than "labels"). In H-Metrics, you cannot have N metrics with the same name (ID). You must have unique IDs to represent different sets of time series data. OK, with that quick intro, two things: ===== 1) Metrics coming from Prometheus by default will be stored in H-Metrics with metric IDs like: metric_family_name{label_name1=value1,label_name2=value2} Basically, HOSA stores the H-Metric ID so it looks identical to the metric data coming from Prometheus endpoints (name with labels comma-separated and enclosed within curly braces). But Grafana might have issues with the curly braces. However, the original opinion when this was first implemented in HOSA was that just using underscores in H-Metrics IDs, for example: metric_family_name_label_name1_value1_label_name2_value2 will make querying from H-Metrics more difficult (it all looks like one big name and it is hard to distinguish the labels in the name). QUESTION #1a: Does Grafana really have an issue with displaying metrics whose names have curly braces - {} - and commas in them? QUESTION #1b: If so, what should the default metric ID look like when we have Prometheus labels like this, if not by using curly braces and commas? ===== 2) These Prometheus metrics don't look right in the current OpenShift UI. If we have two Prometheus metrics stored in H-Metrics with the IDs: what_i_ate{food=Banana} what_i_ate{food=Apple} what you see in the OpenShift UI console is two metric graphs each with the same metric name "what_i_ate" - you don't know which ones they are. Why? Application metrics like these are now shown in the OpenShift UI and it works fine even for Prometheus metrics UNLESS the Prometheus metrics had labels (like the example above with Prometheus labels food=Apple or food=Banana). This is because when we tag these metrics in H-Metrics, one tag we add to the metric definition is "metric_name" and for Prometheus the value of this tag is the METRIC FAMILY name. This is what Joel was asking for (see the last messages in this thread). But the OS UI console uses this metric_name tag for the label of the graph (the full, real ID of the metric is ugly to make sure its unique within the cluster - e.g. "pod/3e4553ew-34553d-345433-123a/custom/what_i_ate{food=Banana}" - so we don't really want to show that to a user). QUESTION #2a: Should I switch back and make metric_name be the last part of the actual metric ID (not Prometheus family name) like "what_i_ate{food=Banana}" so the OS UI console works? Or do we fix the OS UI console to parse the full metric ID and only show the last part (after the "/custom/" part) thus leaving "metric_name" tag in H-Metrics be the Prometheus metric family name and make querying easier (a-la Joel's suggestion). QUESTION #2b: Is having metric family name a useful thing to have as a H-Metric tag in the first place? If so, I will have to get HOSA to create a new tag "base_metric_name" if "metric_name" is to be fixed to get the OS UI to work. But does having the Prometheus metric family name even a useful thing? Joel seemed to think so; I would like to make sure it is a useful thing before I go and implement this change. ----- Forwarded Message ----- From: "John Mazzitelli" To: "Discussions around Hawkular development" Sent: Wednesday, February 1, 2017 11:47:18 AM Subject: Re: [Hawkular-dev] HOSA and conversion from prometheus to hawkular metrics https://github.com/hawkular/hawkular-openshift-agent/blob/master/deploy/openshift/hawkular-openshift-agent-configmap.yaml#L20 :D That's already there - the ${METRIC:name} resolves to the name of the metric (not the new ID) and our default config puts that tag on every metric. ----- Original Message ----- > > +1, if that is not being done I think it would good. Actually, it's probably > a good "best practice" as it make it easier to slice and dice the data. > > On 2/1/2017 10:35 AM, Joel Takvorian wrote: > > > > +1 > > Conversion based on labels seems more sane. > > I wonder if a new tag that recalls the prometheus metric name would be > useful; ex. "baseName=jvm_memory_pool_bytes_committed", to retrieve all > metrics of that family. Just an idea. > > On Wed, Feb 1, 2017 at 4:25 PM, John Mazzitelli < mazz at redhat.com > wrote: > > > > Are you also tagging the Prometheus metrics with the labels? > > Yes, that is what was originally being done, and that is still in there. > > ----- Original Message ----- > > > > Mazz, this makes sense to me. Our decision to use unique ids (well +type) > > is > > going to lead to this sort of thing. The ids are going to basically be > > large > > concatenations of the tags that identify the data. Then, additionally we're > > going to have to tag the metrics with the same name/value pairs that are > > present in the id. Are you also tagging the Prometheus metrics with the > > labels? > > > > On 2/1/2017 9:38 AM, John Mazzitelli wrote: > > > > > > > > The past several days I've been working on an enhancement to HOSA that came > > in from the community (in fact, I would consider it a bug). I'm about ready > > to merge the PR [1] for this and do a HOSA 1.1.0.Final release. I wanted to > > post this to announce it and see if there is any feedback, too. > > > > Today, HOSA collects metrics from any Prometheus endpoint which you declare > > - > > example: > > > > metrics > > - name: go_memstats_sys_bytes > > - name: process_max_fds > > - name: process_open_fds > > > > But if a Prometheus metric has labels, Prometheus itself considers each > > metric with a unique combination of labels as an individual time series > > metric. This is different than how Hawkular Metric works - each Hawkular > > Metric metric ID (even if its metric definition or its datapoints have > > tags) > > is a single time series metric. We need to account for this difference. For > > example, if our agent is configured with: > > > > metrics: > > - name: jvm_memory_pool_bytes_committed > > > > And the Prometheus endpoint emits that metric with a label called "pool" > > like > > this: > > > > jvm_memory_pool_bytes_committed{pool="Code Cache",} 2.7787264E7 > > jvm_memory_pool_bytes_committed{pool="PS Eden Space",} 2.3068672E7 > > > > then to Prometheus this is actually 2 time series metrics (the number of > > bytes committed per pool type), not 1. Even though the metric name is the > > same (what Prometheus calls a "metric family name"), there are two unique > > combinations of labels - one with "Code Cache" and one with "PS Eden Space" > > - so they are 2 distinct time series metric data. > > > > Today, the agent only creates a single Hawkular-Metric in this case, with > > each datapoint tagged with those Prometheus labels on the appropriate data > > point. But we don't want to aggregate them like that since we lose the > > granularity that the Prometheus endpoint gives us (that is, the number of > > bytes committed in each pool type). I will say I think we might be able to > > get that granularity back through datapoint tag queries in Hawkular-Metrics > > but I don't know how well (if at all) that is supported and how efficient > > such queries would be even if supported, and how efficient storage of these > > metrics would be if we tag every data point with these labels (not sure if > > that is the general purpose of tags in H-Metrics). But, regardless, the > > fact > > that these really are different time series metrics should (IMO) be > > represented as different time series metrics (via metric definitions/metric > > IDs) in Hawkular-Metrics. > > > > To support labeled Prometheus endpoint data like this, the agent needs to > > split this one named metric into N Hawkular-Metrics metrics (where N is the > > number of unique label combinations for that named metric). So even though > > the agent is configured with the one metric > > "jvm_memory_pool_bytes_committed" we need to actually create two > > Hawkular-Metric metric definitions (with two different and unique metric > > IDs > > obviously). > > > > The PR [1] that is ready to go does this. By default it will create > > multiple > > metric definitions/metric IDs in the form > > "metric-family-name{labelName1=labelValue1,labelName2=labelValue2,...}" > > unless you want a different form in which case you can define an "id" and > > put in "${labelName}" in the ID you declare (such as > > "${oneLabelName}_my_own_metric_name_${theOtherLabelName}" or whatever). But > > I suspect the default format will be what most people want and thus nothing > > needs to be done. In the above example, two metric definitions with the > > following IDs are created: > > > > 1. jvm_memory_pool_bytes_committed{pool=Code Cache} > > 2. jvm_memory_pool_bytes_committed{pool=PS Eden Space} > > > > --John Mazz > > > > [1] https://github.com/hawkular/hawkular-openshift-agent/pull/117 From jtakvori at redhat.com Fri Feb 17 02:44:16 2017 From: jtakvori at redhat.com (Joel Takvorian) Date: Fri, 17 Feb 2017 08:44:16 +0100 Subject: [Hawkular-dev] HOSA and conversion from prometheus to hawkular metrics In-Reply-To: <1230203468.6182728.1487265792257.JavaMail.zimbra@redhat.com> References: <1971347286.1581541.1485959932755.JavaMail.zimbra@redhat.com> <7b794f4d-749c-1ea9-3ebc-64d17b3c0392@redhat.com> <1466597081.1642029.1485962716407.JavaMail.zimbra@redhat.com> <141247e6-4e22-e0ea-6634-c60deaa4a4c7@redhat.com> <1012981008.1719259.1485967638574.JavaMail.zimbra@redhat.com> <1230203468.6182728.1487265792257.JavaMail.zimbra@redhat.com> Message-ID: For the curly braces in Grafana, I'm going to investigate. For your second point, I'm trying to put me in the shoes of an ops: if I want to create a dashboard that shows a labelled metric (in term of prometheus label), I'd like to see all its avatars in the same chart to be able to compare them, see in what they converge or in what they diverge. And maybe compare them in all pods of a given container name. That would be queries with tags: Query tags: - container_name: something - family_name (or "metric_base_name", or whatever name we give to that tag): what_i_ate I can't be 100% sure that it's going to be used, as people do what they want in Grafana. But it seems interesting to me. The question is: what's the cost of adding a tag? I believe metric tags are relatively cheap in term of storage. So, having both "metric_name" (what_i_ate{food=Banana}) and "family_name" (what_i_ate) would solve all our issues, no? On Thu, Feb 16, 2017 at 6:23 PM, John Mazzitelli wrote: > I need to resurrect this thread now that some others have had experience > with what we have - specifically, what Thomas reported in this issue: > https://github.com/hawkular/hawkular-openshift-agent/issues/126 > > It has to do with Prometheus metrics and how HOSA names and tags them in > H-Metrics. > > Just some quick background first: > > Prometheus metrics have two parts - a "family name" (like > "http_response_count") and labels (like "method"). This means you can have > N metrics in Prometheus with the same metric family name but each with > different label values (like "http_response_count{method=GET}" and > "http_response_count{method=POST}". Each unique combination of family > name plus label values represent a different set of time series data (so > http_response_count{method=GET} is one set of time series data and > http_response_count{method=POST} is another set of time series data). > > H-Metrics doesn't really have this concept of metric family. H-Metrics has > metric definitions each with unique names (or "metric IDs") and a set of > tags (h-metrics uses the name "tags" rather than "labels"). In H-Metrics, > you cannot have N metrics with the same name (ID). You must have unique IDs > to represent different sets of time series data. > > OK, with that quick intro, two things: > > ===== > > 1) Metrics coming from Prometheus by default will be stored in H-Metrics > with metric IDs like: > > metric_family_name{label_name1=value1,label_name2=value2} > > Basically, HOSA stores the H-Metric ID so it looks identical to the metric > data coming from Prometheus endpoints (name with labels comma-separated and > enclosed within curly braces). > > But Grafana might have issues with the curly braces. However, the original > opinion when this was first implemented in HOSA was that just using > underscores in H-Metrics IDs, for example: > > metric_family_name_label_name1_value1_label_name2_value2 > > will make querying from H-Metrics more difficult (it all looks like one > big name and it is hard to distinguish the labels in the name). > > QUESTION #1a: Does Grafana really have an issue with displaying metrics > whose names have curly braces - {} - and commas in them? > QUESTION #1b: If so, what should the default metric ID look like when we > have Prometheus labels like this, if not by using curly braces and commas? > > ===== > > 2) These Prometheus metrics don't look right in the current OpenShift UI. > If we have two Prometheus metrics stored in H-Metrics with the IDs: > > what_i_ate{food=Banana} > what_i_ate{food=Apple} > > what you see in the OpenShift UI console is two metric graphs each with > the same metric name "what_i_ate" - you don't know which ones they are. > > Why? Application metrics like these are now shown in the OpenShift UI and > it works fine even for Prometheus metrics UNLESS the Prometheus metrics had > labels (like the example above with Prometheus labels food=Apple or > food=Banana). This is because when we tag these metrics in H-Metrics, one > tag we add to the metric definition is "metric_name" and for Prometheus the > value of this tag is the METRIC FAMILY name. This is what Joel was asking > for (see the last messages in this thread). But the OS UI console uses this > metric_name tag for the label of the graph (the full, real ID of the metric > is ugly to make sure its unique within the cluster - e.g. > "pod/3e4553ew-34553d-345433-123a/custom/what_i_ate{food=Banana}" - so we > don't really want to show that to a user). > > QUESTION #2a: Should I switch back and make metric_name be the last part > of the actual metric ID (not Prometheus family name) like > "what_i_ate{food=Banana}" so the OS UI console works? Or do we fix the OS > UI console to parse the full metric ID and only show the last part (after > the "/custom/" part) thus leaving "metric_name" tag in H-Metrics be the > Prometheus metric family name and make querying easier (a-la Joel's > suggestion). > > QUESTION #2b: Is having metric family name a useful thing to have as a > H-Metric tag in the first place? If so, I will have to get HOSA to create a > new tag "base_metric_name" if "metric_name" is to be fixed to get the OS UI > to work. But does having the Prometheus metric family name even a useful > thing? Joel seemed to think so; I would like to make sure it is a useful > thing before I go and implement this change. > > ----- Forwarded Message ----- > From: "John Mazzitelli" > To: "Discussions around Hawkular development" < > hawkular-dev at lists.jboss.org> > Sent: Wednesday, February 1, 2017 11:47:18 AM > Subject: Re: [Hawkular-dev] HOSA and conversion from prometheus to > hawkular metrics > > https://github.com/hawkular/hawkular-openshift-agent/blob/ > master/deploy/openshift/hawkular-openshift-agent-configmap.yaml#L20 > > :D > > That's already there - the ${METRIC:name} resolves to the name of the > metric (not the new ID) and our default config puts that tag on every > metric. > > > ----- Original Message ----- > > > > +1, if that is not being done I think it would good. Actually, it's > probably > > a good "best practice" as it make it easier to slice and dice the data. > > > > On 2/1/2017 10:35 AM, Joel Takvorian wrote: > > > > > > > > +1 > > > > Conversion based on labels seems more sane. > > > > I wonder if a new tag that recalls the prometheus metric name would be > > useful; ex. "baseName=jvm_memory_pool_bytes_committed", to retrieve all > > metrics of that family. Just an idea. > > > > On Wed, Feb 1, 2017 at 4:25 PM, John Mazzitelli < mazz at redhat.com > > wrote: > > > > > > > Are you also tagging the Prometheus metrics with the labels? > > > > Yes, that is what was originally being done, and that is still in there. > > > > ----- Original Message ----- > > > > > > Mazz, this makes sense to me. Our decision to use unique ids (well > +type) > > > is > > > going to lead to this sort of thing. The ids are going to basically be > > > large > > > concatenations of the tags that identify the data. Then, additionally > we're > > > going to have to tag the metrics with the same name/value pairs that > are > > > present in the id. Are you also tagging the Prometheus metrics with the > > > labels? > > > > > > On 2/1/2017 9:38 AM, John Mazzitelli wrote: > > > > > > > > > > > > The past several days I've been working on an enhancement to HOSA that > came > > > in from the community (in fact, I would consider it a bug). I'm about > ready > > > to merge the PR [1] for this and do a HOSA 1.1.0.Final release. I > wanted to > > > post this to announce it and see if there is any feedback, too. > > > > > > Today, HOSA collects metrics from any Prometheus endpoint which you > declare > > > - > > > example: > > > > > > metrics > > > - name: go_memstats_sys_bytes > > > - name: process_max_fds > > > - name: process_open_fds > > > > > > But if a Prometheus metric has labels, Prometheus itself considers each > > > metric with a unique combination of labels as an individual time series > > > metric. This is different than how Hawkular Metric works - each > Hawkular > > > Metric metric ID (even if its metric definition or its datapoints have > > > tags) > > > is a single time series metric. We need to account for this > difference. For > > > example, if our agent is configured with: > > > > > > metrics: > > > - name: jvm_memory_pool_bytes_committed > > > > > > And the Prometheus endpoint emits that metric with a label called > "pool" > > > like > > > this: > > > > > > jvm_memory_pool_bytes_committed{pool="Code Cache",} 2.7787264E7 > > > jvm_memory_pool_bytes_committed{pool="PS Eden Space",} 2.3068672E7 > > > > > > then to Prometheus this is actually 2 time series metrics (the number > of > > > bytes committed per pool type), not 1. Even though the metric name is > the > > > same (what Prometheus calls a "metric family name"), there are two > unique > > > combinations of labels - one with "Code Cache" and one with "PS Eden > Space" > > > - so they are 2 distinct time series metric data. > > > > > > Today, the agent only creates a single Hawkular-Metric in this case, > with > > > each datapoint tagged with those Prometheus labels on the appropriate > data > > > point. But we don't want to aggregate them like that since we lose the > > > granularity that the Prometheus endpoint gives us (that is, the number > of > > > bytes committed in each pool type). I will say I think we might be > able to > > > get that granularity back through datapoint tag queries in > Hawkular-Metrics > > > but I don't know how well (if at all) that is supported and how > efficient > > > such queries would be even if supported, and how efficient storage of > these > > > metrics would be if we tag every data point with these labels (not > sure if > > > that is the general purpose of tags in H-Metrics). But, regardless, the > > > fact > > > that these really are different time series metrics should (IMO) be > > > represented as different time series metrics (via metric > definitions/metric > > > IDs) in Hawkular-Metrics. > > > > > > To support labeled Prometheus endpoint data like this, the agent needs > to > > > split this one named metric into N Hawkular-Metrics metrics (where N > is the > > > number of unique label combinations for that named metric). So even > though > > > the agent is configured with the one metric > > > "jvm_memory_pool_bytes_committed" we need to actually create two > > > Hawkular-Metric metric definitions (with two different and unique > metric > > > IDs > > > obviously). > > > > > > The PR [1] that is ready to go does this. By default it will create > > > multiple > > > metric definitions/metric IDs in the form > > > "metric-family-name{labelName1=labelValue1, > labelName2=labelValue2,...}" > > > unless you want a different form in which case you can define an "id" > and > > > put in "${labelName}" in the ID you declare (such as > > > "${oneLabelName}_my_own_metric_name_${theOtherLabelName}" or > whatever). But > > > I suspect the default format will be what most people want and thus > nothing > > > needs to be done. In the above example, two metric definitions with the > > > following IDs are created: > > > > > > 1. jvm_memory_pool_bytes_committed{pool=Code Cache} > > > 2. jvm_memory_pool_bytes_committed{pool=PS Eden Space} > > > > > > --John Mazz > > > > > > [1] https://github.com/hawkular/hawkular-openshift-agent/pull/117 > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170217/155ac4d1/attachment-0001.html From jtakvori at redhat.com Fri Feb 17 04:26:05 2017 From: jtakvori at redhat.com (Joel Takvorian) Date: Fri, 17 Feb 2017 10:26:05 +0100 Subject: [Hawkular-dev] HOSA and conversion from prometheus to hawkular metrics In-Reply-To: References: <1971347286.1581541.1485959932755.JavaMail.zimbra@redhat.com> <7b794f4d-749c-1ea9-3ebc-64d17b3c0392@redhat.com> <1466597081.1642029.1485962716407.JavaMail.zimbra@redhat.com> <141247e6-4e22-e0ea-6634-c60deaa4a4c7@redhat.com> <1012981008.1719259.1485967638574.JavaMail.zimbra@redhat.com> <1230203468.6182728.1487265792257.JavaMail.zimbra@redhat.com> Message-ID: About the first point, just answered here https://github.com/hawkular/hawkular-openshift-agent/issues/126 In short, there shouldn't be any problem with curly braces in Grafana. On Fri, Feb 17, 2017 at 8:44 AM, Joel Takvorian wrote: > For the curly braces in Grafana, I'm going to investigate. > > For your second point, I'm trying to put me in the shoes of an ops: if I > want to create a dashboard that shows a labelled metric (in term of > prometheus label), I'd like to see all its avatars in the same chart to be > able to compare them, see in what they converge or in what they diverge. > And maybe compare them in all pods of a given container name. That would be > queries with tags: > > Query tags: > - container_name: something > - family_name (or "metric_base_name", or whatever name we give to that > tag): what_i_ate > > I can't be 100% sure that it's going to be used, as people do what they > want in Grafana. But it seems interesting to me. The question is: what's > the cost of adding a tag? I believe metric tags are relatively cheap in > term of storage. So, having both "metric_name" (what_i_ate{food=Banana}) > and "family_name" (what_i_ate) would solve all our issues, no? > > > > On Thu, Feb 16, 2017 at 6:23 PM, John Mazzitelli wrote: > >> I need to resurrect this thread now that some others have had experience >> with what we have - specifically, what Thomas reported in this issue: >> https://github.com/hawkular/hawkular-openshift-agent/issues/126 >> >> It has to do with Prometheus metrics and how HOSA names and tags them in >> H-Metrics. >> >> Just some quick background first: >> >> Prometheus metrics have two parts - a "family name" (like >> "http_response_count") and labels (like "method"). This means you can have >> N metrics in Prometheus with the same metric family name but each with >> different label values (like "http_response_count{method=GET}" and >> "http_response_count{method=POST}". Each unique combination of family >> name plus label values represent a different set of time series data (so >> http_response_count{method=GET} is one set of time series data and >> http_response_count{method=POST} is another set of time series data). >> >> H-Metrics doesn't really have this concept of metric family. H-Metrics >> has metric definitions each with unique names (or "metric IDs") and a set >> of tags (h-metrics uses the name "tags" rather than "labels"). In >> H-Metrics, you cannot have N metrics with the same name (ID). You must have >> unique IDs to represent different sets of time series data. >> >> OK, with that quick intro, two things: >> >> ===== >> >> 1) Metrics coming from Prometheus by default will be stored in H-Metrics >> with metric IDs like: >> >> metric_family_name{label_name1=value1,label_name2=value2} >> >> Basically, HOSA stores the H-Metric ID so it looks identical to the >> metric data coming from Prometheus endpoints (name with labels >> comma-separated and enclosed within curly braces). >> >> But Grafana might have issues with the curly braces. However, the >> original opinion when this was first implemented in HOSA was that just >> using underscores in H-Metrics IDs, for example: >> >> metric_family_name_label_name1_value1_label_name2_value2 >> >> will make querying from H-Metrics more difficult (it all looks like one >> big name and it is hard to distinguish the labels in the name). >> >> QUESTION #1a: Does Grafana really have an issue with displaying metrics >> whose names have curly braces - {} - and commas in them? >> QUESTION #1b: If so, what should the default metric ID look like when we >> have Prometheus labels like this, if not by using curly braces and commas? >> >> ===== >> >> 2) These Prometheus metrics don't look right in the current OpenShift UI. >> If we have two Prometheus metrics stored in H-Metrics with the IDs: >> >> what_i_ate{food=Banana} >> what_i_ate{food=Apple} >> >> what you see in the OpenShift UI console is two metric graphs each with >> the same metric name "what_i_ate" - you don't know which ones they are. >> >> Why? Application metrics like these are now shown in the OpenShift UI and >> it works fine even for Prometheus metrics UNLESS the Prometheus metrics had >> labels (like the example above with Prometheus labels food=Apple or >> food=Banana). This is because when we tag these metrics in H-Metrics, one >> tag we add to the metric definition is "metric_name" and for Prometheus the >> value of this tag is the METRIC FAMILY name. This is what Joel was asking >> for (see the last messages in this thread). But the OS UI console uses this >> metric_name tag for the label of the graph (the full, real ID of the metric >> is ugly to make sure its unique within the cluster - e.g. >> "pod/3e4553ew-34553d-345433-123a/custom/what_i_ate{food=Banana}" - so we >> don't really want to show that to a user). >> >> QUESTION #2a: Should I switch back and make metric_name be the last part >> of the actual metric ID (not Prometheus family name) like >> "what_i_ate{food=Banana}" so the OS UI console works? Or do we fix the OS >> UI console to parse the full metric ID and only show the last part (after >> the "/custom/" part) thus leaving "metric_name" tag in H-Metrics be the >> Prometheus metric family name and make querying easier (a-la Joel's >> suggestion). >> >> QUESTION #2b: Is having metric family name a useful thing to have as a >> H-Metric tag in the first place? If so, I will have to get HOSA to create a >> new tag "base_metric_name" if "metric_name" is to be fixed to get the OS UI >> to work. But does having the Prometheus metric family name even a useful >> thing? Joel seemed to think so; I would like to make sure it is a useful >> thing before I go and implement this change. >> >> ----- Forwarded Message ----- >> From: "John Mazzitelli" >> To: "Discussions around Hawkular development" < >> hawkular-dev at lists.jboss.org> >> Sent: Wednesday, February 1, 2017 11:47:18 AM >> Subject: Re: [Hawkular-dev] HOSA and conversion from prometheus to >> hawkular metrics >> >> https://github.com/hawkular/hawkular-openshift-agent/blob/ma >> ster/deploy/openshift/hawkular-openshift-agent-configmap.yaml#L20 >> >> :D >> >> That's already there - the ${METRIC:name} resolves to the name of the >> metric (not the new ID) and our default config puts that tag on every >> metric. >> >> >> ----- Original Message ----- >> > >> > +1, if that is not being done I think it would good. Actually, it's >> probably >> > a good "best practice" as it make it easier to slice and dice the data. >> > >> > On 2/1/2017 10:35 AM, Joel Takvorian wrote: >> > >> > >> > >> > +1 >> > >> > Conversion based on labels seems more sane. >> > >> > I wonder if a new tag that recalls the prometheus metric name would be >> > useful; ex. "baseName=jvm_memory_pool_bytes_committed", to retrieve all >> > metrics of that family. Just an idea. >> > >> > On Wed, Feb 1, 2017 at 4:25 PM, John Mazzitelli < mazz at redhat.com > >> wrote: >> > >> > >> > > Are you also tagging the Prometheus metrics with the labels? >> > >> > Yes, that is what was originally being done, and that is still in there. >> > >> > ----- Original Message ----- >> > > >> > > Mazz, this makes sense to me. Our decision to use unique ids (well >> +type) >> > > is >> > > going to lead to this sort of thing. The ids are going to basically be >> > > large >> > > concatenations of the tags that identify the data. Then, additionally >> we're >> > > going to have to tag the metrics with the same name/value pairs that >> are >> > > present in the id. Are you also tagging the Prometheus metrics with >> the >> > > labels? >> > > >> > > On 2/1/2017 9:38 AM, John Mazzitelli wrote: >> > > >> > > >> > > >> > > The past several days I've been working on an enhancement to HOSA >> that came >> > > in from the community (in fact, I would consider it a bug). I'm about >> ready >> > > to merge the PR [1] for this and do a HOSA 1.1.0.Final release. I >> wanted to >> > > post this to announce it and see if there is any feedback, too. >> > > >> > > Today, HOSA collects metrics from any Prometheus endpoint which you >> declare >> > > - >> > > example: >> > > >> > > metrics >> > > - name: go_memstats_sys_bytes >> > > - name: process_max_fds >> > > - name: process_open_fds >> > > >> > > But if a Prometheus metric has labels, Prometheus itself considers >> each >> > > metric with a unique combination of labels as an individual time >> series >> > > metric. This is different than how Hawkular Metric works - each >> Hawkular >> > > Metric metric ID (even if its metric definition or its datapoints have >> > > tags) >> > > is a single time series metric. We need to account for this >> difference. For >> > > example, if our agent is configured with: >> > > >> > > metrics: >> > > - name: jvm_memory_pool_bytes_committed >> > > >> > > And the Prometheus endpoint emits that metric with a label called >> "pool" >> > > like >> > > this: >> > > >> > > jvm_memory_pool_bytes_committed{pool="Code Cache",} 2.7787264E7 >> > > jvm_memory_pool_bytes_committed{pool="PS Eden Space",} 2.3068672E7 >> > > >> > > then to Prometheus this is actually 2 time series metrics (the number >> of >> > > bytes committed per pool type), not 1. Even though the metric name is >> the >> > > same (what Prometheus calls a "metric family name"), there are two >> unique >> > > combinations of labels - one with "Code Cache" and one with "PS Eden >> Space" >> > > - so they are 2 distinct time series metric data. >> > > >> > > Today, the agent only creates a single Hawkular-Metric in this case, >> with >> > > each datapoint tagged with those Prometheus labels on the appropriate >> data >> > > point. But we don't want to aggregate them like that since we lose the >> > > granularity that the Prometheus endpoint gives us (that is, the >> number of >> > > bytes committed in each pool type). I will say I think we might be >> able to >> > > get that granularity back through datapoint tag queries in >> Hawkular-Metrics >> > > but I don't know how well (if at all) that is supported and how >> efficient >> > > such queries would be even if supported, and how efficient storage of >> these >> > > metrics would be if we tag every data point with these labels (not >> sure if >> > > that is the general purpose of tags in H-Metrics). But, regardless, >> the >> > > fact >> > > that these really are different time series metrics should (IMO) be >> > > represented as different time series metrics (via metric >> definitions/metric >> > > IDs) in Hawkular-Metrics. >> > > >> > > To support labeled Prometheus endpoint data like this, the agent >> needs to >> > > split this one named metric into N Hawkular-Metrics metrics (where N >> is the >> > > number of unique label combinations for that named metric). So even >> though >> > > the agent is configured with the one metric >> > > "jvm_memory_pool_bytes_committed" we need to actually create two >> > > Hawkular-Metric metric definitions (with two different and unique >> metric >> > > IDs >> > > obviously). >> > > >> > > The PR [1] that is ready to go does this. By default it will create >> > > multiple >> > > metric definitions/metric IDs in the form >> > > "metric-family-name{labelName1=labelValue1,labelName2= >> labelValue2,...}" >> > > unless you want a different form in which case you can define an "id" >> and >> > > put in "${labelName}" in the ID you declare (such as >> > > "${oneLabelName}_my_own_metric_name_${theOtherLabelName}" or >> whatever). But >> > > I suspect the default format will be what most people want and thus >> nothing >> > > needs to be done. In the above example, two metric definitions with >> the >> > > following IDs are created: >> > > >> > > 1. jvm_memory_pool_bytes_committed{pool=Code Cache} >> > > 2. jvm_memory_pool_bytes_committed{pool=PS Eden Space} >> > > >> > > --John Mazz >> > > >> > > [1] https://github.com/hawkular/hawkular-openshift-agent/pull/117 >> _______________________________________________ >> hawkular-dev mailing list >> hawkular-dev at lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/hawkular-dev >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170217/007acdfb/attachment-0001.html From jshaughn at redhat.com Fri Feb 17 16:13:35 2017 From: jshaughn at redhat.com (Jay Shaughnessy) Date: Fri, 17 Feb 2017 16:13:35 -0500 Subject: [Hawkular-dev] HOSA and conversion from prometheus to hawkular metrics In-Reply-To: References: <1971347286.1581541.1485959932755.JavaMail.zimbra@redhat.com> <7b794f4d-749c-1ea9-3ebc-64d17b3c0392@redhat.com> <1466597081.1642029.1485962716407.JavaMail.zimbra@redhat.com> <141247e6-4e22-e0ea-6634-c60deaa4a4c7@redhat.com> <1012981008.1719259.1485967638574.JavaMail.zimbra@redhat.com> <1230203468.6182728.1487265792257.JavaMail.zimbra@redhat.com> Message-ID: <1799f55e-bf72-6677-ecf2-ae274632af1e@redhat.com> +1. It seems to me that underlying metric ids are something we just want to hide as an implementation detail. Querying for a "family name" and narrowing by other tags gives you a useful set of TS. On 2/17/2017 2:44 AM, Joel Takvorian wrote: > For the curly braces in Grafana, I'm going to investigate. > > For your second point, I'm trying to put me in the shoes of an ops: if > I want to create a dashboard that shows a labelled metric (in term of > prometheus label), I'd like to see all its avatars in the same chart > to be able to compare them, see in what they converge or in what they > diverge. And maybe compare them in all pods of a given container name. > That would be queries with tags: > > Query tags: > - container_name: something > - family_name (or "metric_base_name", or whatever name we give to that > tag): what_i_ate > > I can't be 100% sure that it's going to be used, as people do what > they want in Grafana. But it seems interesting to me. The question is: > what's the cost of adding a tag? I believe metric tags are relatively > cheap in term of storage. So, having both "metric_name" > (what_i_ate{food=Banana}) and "family_name" (what_i_ate) would solve > all our issues, no? > > > > On Thu, Feb 16, 2017 at 6:23 PM, John Mazzitelli > wrote: > > I need to resurrect this thread now that some others have had > experience with what we have - specifically, what Thomas reported > in this issue: > https://github.com/hawkular/hawkular-openshift-agent/issues/126 > > > It has to do with Prometheus metrics and how HOSA names and tags > them in H-Metrics. > > Just some quick background first: > > Prometheus metrics have two parts - a "family name" (like > "http_response_count") and labels (like "method"). This means you > can have N metrics in Prometheus with the same metric family name > but each with different label values (like > "http_response_count{method=GET}" and > "http_response_count{method=POST}". Each unique combination of > family name plus label values represent a different set of time > series data (so http_response_count{method=GET} is one set of time > series data and http_response_count{method=POST} is another set of > time series data). > > H-Metrics doesn't really have this concept of metric family. > H-Metrics has metric definitions each with unique names (or > "metric IDs") and a set of tags (h-metrics uses the name "tags" > rather than "labels"). In H-Metrics, you cannot have N metrics > with the same name (ID). You must have unique IDs to represent > different sets of time series data. > > OK, with that quick intro, two things: > > ===== > > 1) Metrics coming from Prometheus by default will be stored in > H-Metrics with metric IDs like: > > metric_family_name{label_name1=value1,label_name2=value2} > > Basically, HOSA stores the H-Metric ID so it looks identical to > the metric data coming from Prometheus endpoints (name with labels > comma-separated and enclosed within curly braces). > > But Grafana might have issues with the curly braces. However, the > original opinion when this was first implemented in HOSA was that > just using underscores in H-Metrics IDs, for example: > > metric_family_name_label_name1_value1_label_name2_value2 > > will make querying from H-Metrics more difficult (it all looks > like one big name and it is hard to distinguish the labels in the > name). > > QUESTION #1a: Does Grafana really have an issue with displaying > metrics whose names have curly braces - {} - and commas in them? > QUESTION #1b: If so, what should the default metric ID look like > when we have Prometheus labels like this, if not by using curly > braces and commas? > > ===== > > 2) These Prometheus metrics don't look right in the current > OpenShift UI. If we have two Prometheus metrics stored in > H-Metrics with the IDs: > > what_i_ate{food=Banana} > what_i_ate{food=Apple} > > what you see in the OpenShift UI console is two metric graphs each > with the same metric name "what_i_ate" - you don't know which ones > they are. > > Why? Application metrics like these are now shown in the OpenShift > UI and it works fine even for Prometheus metrics UNLESS the > Prometheus metrics had labels (like the example above with > Prometheus labels food=Apple or food=Banana). This is because when > we tag these metrics in H-Metrics, one tag we add to the metric > definition is "metric_name" and for Prometheus the value of this > tag is the METRIC FAMILY name. This is what Joel was asking for > (see the last messages in this thread). But the OS UI console uses > this metric_name tag for the label of the graph (the full, real ID > of the metric is ugly to make sure its unique within the cluster - > e.g. > "pod/3e4553ew-34553d-345433-123a/custom/what_i_ate{food=Banana}" - > so we don't really want to show that to a user). > > QUESTION #2a: Should I switch back and make metric_name be the > last part of the actual metric ID (not Prometheus family name) > like "what_i_ate{food=Banana}" so the OS UI console works? Or do > we fix the OS UI console to parse the full metric ID and only show > the last part (after the "/custom/" part) thus leaving > "metric_name" tag in H-Metrics be the Prometheus metric family > name and make querying easier (a-la Joel's suggestion). > > QUESTION #2b: Is having metric family name a useful thing to have > as a H-Metric tag in the first place? If so, I will have to get > HOSA to create a new tag "base_metric_name" if "metric_name" is to > be fixed to get the OS UI to work. But does having the Prometheus > metric family name even a useful thing? Joel seemed to think so; I > would like to make sure it is a useful thing before I go and > implement this change. > > ----- Forwarded Message ----- > From: "John Mazzitelli" > > To: "Discussions around Hawkular development" > > > Sent: Wednesday, February 1, 2017 11:47:18 AM > Subject: Re: [Hawkular-dev] HOSA and conversion from prometheus to > hawkular metrics > > https://github.com/hawkular/hawkular-openshift-agent/blob/master/deploy/openshift/hawkular-openshift-agent-configmap.yaml#L20 > > > :D > > That's already there - the ${METRIC:name} resolves to the name of > the metric (not the new ID) and our default config puts that tag > on every metric. > > > ----- Original Message ----- > > > > +1, if that is not being done I think it would good. Actually, > it's probably > > a good "best practice" as it make it easier to slice and dice > the data. > > > > On 2/1/2017 10:35 AM, Joel Takvorian wrote: > > > > > > > > +1 > > > > Conversion based on labels seems more sane. > > > > I wonder if a new tag that recalls the prometheus metric name > would be > > useful; ex. "baseName=jvm_memory_pool_bytes_committed", to > retrieve all > > metrics of that family. Just an idea. > > > > On Wed, Feb 1, 2017 at 4:25 PM, John Mazzitelli < > mazz at redhat.com > wrote: > > > > > > > Are you also tagging the Prometheus metrics with the labels? > > > > Yes, that is what was originally being done, and that is still > in there. > > > > ----- Original Message ----- > > > > > > Mazz, this makes sense to me. Our decision to use unique ids > (well +type) > > > is > > > going to lead to this sort of thing. The ids are going to > basically be > > > large > > > concatenations of the tags that identify the data. Then, > additionally we're > > > going to have to tag the metrics with the same name/value > pairs that are > > > present in the id. Are you also tagging the Prometheus metrics > with the > > > labels? > > > > > > On 2/1/2017 9:38 AM, John Mazzitelli wrote: > > > > > > > > > > > > The past several days I've been working on an enhancement to > HOSA that came > > > in from the community (in fact, I would consider it a bug). > I'm about ready > > > to merge the PR [1] for this and do a HOSA 1.1.0.Final > release. I wanted to > > > post this to announce it and see if there is any feedback, too. > > > > > > Today, HOSA collects metrics from any Prometheus endpoint > which you declare > > > - > > > example: > > > > > > metrics > > > - name: go_memstats_sys_bytes > > > - name: process_max_fds > > > - name: process_open_fds > > > > > > But if a Prometheus metric has labels, Prometheus itself > considers each > > > metric with a unique combination of labels as an individual > time series > > > metric. This is different than how Hawkular Metric works - > each Hawkular > > > Metric metric ID (even if its metric definition or its > datapoints have > > > tags) > > > is a single time series metric. We need to account for this > difference. For > > > example, if our agent is configured with: > > > > > > metrics: > > > - name: jvm_memory_pool_bytes_committed > > > > > > And the Prometheus endpoint emits that metric with a label > called "pool" > > > like > > > this: > > > > > > jvm_memory_pool_bytes_committed{pool="Code Cache",} 2.7787264E7 > > > jvm_memory_pool_bytes_committed{pool="PS Eden Space",} 2.3068672E7 > > > > > > then to Prometheus this is actually 2 time series metrics (the > number of > > > bytes committed per pool type), not 1. Even though the metric > name is the > > > same (what Prometheus calls a "metric family name"), there are > two unique > > > combinations of labels - one with "Code Cache" and one with > "PS Eden Space" > > > - so they are 2 distinct time series metric data. > > > > > > Today, the agent only creates a single Hawkular-Metric in this > case, with > > > each datapoint tagged with those Prometheus labels on the > appropriate data > > > point. But we don't want to aggregate them like that since we > lose the > > > granularity that the Prometheus endpoint gives us (that is, > the number of > > > bytes committed in each pool type). I will say I think we > might be able to > > > get that granularity back through datapoint tag queries in > Hawkular-Metrics > > > but I don't know how well (if at all) that is supported and > how efficient > > > such queries would be even if supported, and how efficient > storage of these > > > metrics would be if we tag every data point with these labels > (not sure if > > > that is the general purpose of tags in H-Metrics). But, > regardless, the > > > fact > > > that these really are different time series metrics should > (IMO) be > > > represented as different time series metrics (via metric > definitions/metric > > > IDs) in Hawkular-Metrics. > > > > > > To support labeled Prometheus endpoint data like this, the > agent needs to > > > split this one named metric into N Hawkular-Metrics metrics > (where N is the > > > number of unique label combinations for that named metric). So > even though > > > the agent is configured with the one metric > > > "jvm_memory_pool_bytes_committed" we need to actually create two > > > Hawkular-Metric metric definitions (with two different and > unique metric > > > IDs > > > obviously). > > > > > > The PR [1] that is ready to go does this. By default it will > create > > > multiple > > > metric definitions/metric IDs in the form > > > > "metric-family-name{labelName1=labelValue1,labelName2=labelValue2,...}" > > > unless you want a different form in which case you can define > an "id" and > > > put in "${labelName}" in the ID you declare (such as > > > "${oneLabelName}_my_own_metric_name_${theOtherLabelName}" or > whatever). But > > > I suspect the default format will be what most people want and > thus nothing > > > needs to be done. In the above example, two metric definitions > with the > > > following IDs are created: > > > > > > 1. jvm_memory_pool_bytes_committed{pool=Code Cache} > > > 2. jvm_memory_pool_bytes_committed{pool=PS Eden Space} > > > > > > --John Mazz > > > > > > [1] > https://github.com/hawkular/hawkular-openshift-agent/pull/117 > > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > > > > > > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170217/cc3bc46c/attachment-0001.html From mazz at redhat.com Sat Feb 18 23:45:11 2017 From: mazz at redhat.com (John Mazzitelli) Date: Sat, 18 Feb 2017 23:45:11 -0500 (EST) Subject: [Hawkular-dev] openshift - using cluster up but building from source In-Reply-To: <1562886047.7401894.1487479320691.JavaMail.zimbra@redhat.com> Message-ID: <787504021.7401912.1487479511361.JavaMail.zimbra@redhat.com> Has anyone been able to use "oc cluster up --metrics" in order to run OpenShift Origin *and* Origin Metrics but running a local build (i.e. I need to pick up changes in master branch of Origin/Origin Metrics that aren't released yet). The docs make it look very complicated, and nothing I found seems to help a dev get this up and running quickly without having to look at tons of docs and run lots of commands with bunches of yaml :). I'm hoping it is easy, but not documented. This link doesn't even mention "cluster up" let alone running with Origin Metrics: https://github.com/openshift/origin/blob/master/CONTRIBUTING.adoc#develop-locally-on-your-host If I run "openshift start" - how do I get my own build of Origin Metrics to deploy, like "oc cluster up --metrics" does? It seems no matter what I do, using "cluster up" pulls down images from docker hub. I have no idea how to run Origin+Metrics using a local build. I'm hoping someone knows how to do this and can give me the steps. From jpkroehling at redhat.com Mon Feb 20 04:58:06 2017 From: jpkroehling at redhat.com (=?UTF-8?Q?Juraci_Paix=c3=a3o_Kr=c3=b6hling?=) Date: Mon, 20 Feb 2017 10:58:06 +0100 Subject: [Hawkular-dev] openshift - using cluster up but building from source In-Reply-To: <787504021.7401912.1487479511361.JavaMail.zimbra@redhat.com> References: <787504021.7401912.1487479511361.JavaMail.zimbra@redhat.com> Message-ID: <5aecc16c-fbdf-cd93-9b64-f3275cf2ae35@redhat.com> Mazz, On 02/19/2017 05:45 AM, John Mazzitelli wrote: > Has anyone been able to use "oc cluster up --metrics" in order to run OpenShift Origin *and* Origin Metrics but running a local build (i.e. I need to pick up changes in master branch of Origin/Origin Metrics that aren't released yet). As long as you have the desired artifacts in a Docker registry (local or remote), you should be able to specify it vi the --image and --version flags: oc cluster up --image='openshift/origin' --version=latest In case that's not enough: on a recent task, I built OpenShift locally and started it with "sudo ./openshift start", and then, from the origin-metrics clone: $ ./hack/build-images.sh \ --prefix="jpkroehling/origin-" \ --version="dev" $ oc process -f metrics.yaml \ -v IMAGE_PREFIX="jpkroehling/origin-" \ -v IMAGE_VERSION="dev" \ -v HAWKULAR_METRICS_HOSTNAME=hawkular-metrics.example.com \ -v USE_PERSISTENT_STORAGE=false \ -v CASSANDRA_NODES=2 | oc create -n openshift-infra -f - - Juca. From fbrychta at redhat.com Mon Feb 20 06:41:11 2017 From: fbrychta at redhat.com (Filip Brychta) Date: Mon, 20 Feb 2017 06:41:11 -0500 (EST) Subject: [Hawkular-dev] Performance problems? (was RxJava2 preliminary testing) In-Reply-To: <2f2cbdeb-9920-0d2f-4e62-c77d677e20ae@redhat.com> References: <2c187395-e920-e2f0-f9ed-c147ad4d0783@redhat.com> <2f2cbdeb-9920-0d2f-4e62-c77d677e20ae@redhat.com> Message-ID: <1740463496.23624663.1487590871835.JavaMail.zimbra@redhat.com> ----- Original Message ----- > Hi, > > Testing revealed something else that worried me, I was quite sure that > I've had far higher numbers before than what we have now, so I went back > to test some older versions to see if this was true.. > > The testing setup is as follows: 4 cores, 8GB, ccm Cassandra (3.0.10, > 3.10 made no significant difference here). So all the CPU freed from > HWKMETRICS is used by Cassandra (they compete on resources). Uses > core-metrics-service only, no REST interface - directly writing with > RxJava using addDatapoints. > > 1 or 10 datapoint per metric, writing 100 000 metrics in one call to > addDatapoints (insertBenchmark, not the -Single ending one). Sources in > the jmh-benchmark module (fix_jmh branch, PR # REST equivalent to previous call would be a POST request containing 100 000 metrics with 1 or 10 datapoints for each metric? Is that a possible scenario? Is it possible to compare perf results (not absolute numbers but at least trends) from tests bypassing the REST interface with end-to-end perf tests? Your results show huge drop which is not visible in my results from end-to-end test. Since 0.19.3 I can see only two drops caused by PR652 and PR710. In total about 20% drop (for small msgs). > 0.19.3: > > 1 datapoint -> 31004 metrics / second (31004 datapoints per second) > > 10 datapoints -> 19027 metrics / second (190270 datapoints per second) > > Current master: > > 1 datapoint -> 8535 metrics / second (8535 datapoints per second) > > 10 datapoints -> 7065 metrics / second (70650 datapoints per second) > > So performance has dropped significantly between current master and > 0.19.3 (0.19.0 was the first release without double writing). > > With HWKMETRICS-599 (micro-batching on top of the current master): > > RxJava1: > > 1 datapoint -> 55036 metrics / second (55036 datapoints / second) > > 10 datapoints -> 17870 metrics / second (178700 datapoints / second) > > RxJava2: > > 1 datapoint -> 76216 metrics / second (76216 datapoints / second) > > 10 datapoints -> 20088 metrics / second (200880 datapoints / second) > > HWKMETRICS-599 without retryWhen made no difference (an assumption that > this was the problem creator). > > The 1 datapoint per metric is the most interesting use-case, so that's > something we should concentrate on. But before someone asks: > > 100 metrics, 1000 datapoints per metric and batch size of 1000 (instead > of default 50 in the micro-batching): > > 298030 datapoints / second. > > That's all folks! > > - Micke > > On 02/11/2017 12:04 PM, Michael Burman wrote: > > Hi, > > > > I did yesterday evening and today some testing on how using RxJava2 > > would benefit us (I'm expecting more from RxJava 2.1 actually, since it > > has some enhanced parallelism features which we might benefit from). > > > > Short notes from RxJava2 migration, it's more painful than I assumed. > > The code changes can be small in terms of lines of code changed, but > > almost every method has had their signature or behavior changed. So at > > least I've had to read the documentation all the time when doing things > > and trying to unlearn what I've done in the RxJava1. > > > > And all this comes with a backwards compatibility pressure for Java 6 > > (so you can't benefit from many Java 8 advantages). Reactive-Commons / > > Reactor have started from Java 8 to provide cleaner implementation. Grr. > > > > I wrote a simple write path modification in the PR #762 (metrics) that > > writes Gauges using RxJava2 ported micro-batching feature. There's still > > some RxJavaInterOp use in it, so that might slow down the performance a > > little bit. However, it is possible to merge these two codes. There are > > also some other optimizations I think could be worth it. > > > > I'd advice against it though, reading gets quite complex. I would almost > > suggest that we would do the MetricsServiceImpl/DataAccessImpl merging > > by rewriting small parts at a time in the new class with RxJava2 and > > make that call the old code with RxJavaInterOp. That way we could move > > slowly to the newer codebase. > > > > I fixed the JMH-benchmarks (as they're not compiled in our CI and were > > actually broken by some other PRs) and ran some tests. These are the > > tests that measure only the metrics-core-service performance and do not > > touch the REST-interface (or Wildfly) at all, thus giving better > > comparison in how our internal changes behave. > > > > What I'm seeing is around 20-30% difference in performance when writing > > gauges this way. So this should offset some of the issues we saw when we > > improved error handling (which caused performance degradation). I did > > ran into the HWKMETRICS-542 (BusyPoolException) so the tests were run > > with 1024 connections. > > > > I'll continue next week some more testing, but at the same time I proved > > that the micro-batching features do improve performance in the internal > > processing, especially when there's small amount of writers to a single > > node. But testing those features could probably benefit from more > > benchmark tests without WIldfly (which takes so much processing power > > that most performance improvements can't be measured correctly anymore). > > > > - Micke > > _______________________________________________ > > hawkular-dev mailing list > > hawkular-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/hawkular-dev > > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > From mazz at redhat.com Mon Feb 20 07:27:52 2017 From: mazz at redhat.com (John Mazzitelli) Date: Mon, 20 Feb 2017 07:27:52 -0500 (EST) Subject: [Hawkular-dev] openshift - using cluster up but building from source In-Reply-To: <5aecc16c-fbdf-cd93-9b64-f3275cf2ae35@redhat.com> References: <787504021.7401912.1487479511361.JavaMail.zimbra@redhat.com> <5aecc16c-fbdf-cd93-9b64-f3275cf2ae35@redhat.com> Message-ID: <1743615631.7811273.1487593672907.JavaMail.zimbra@redhat.com> > On 02/19/2017 05:45 AM, John Mazzitelli wrote: > > Has anyone been able to use "oc cluster up --metrics" in order to run > > OpenShift Origin *and* Origin Metrics but running a local build (i.e. I > > need to pick up changes in master branch of Origin/Origin Metrics that > > aren't released yet). > > As long as you have the desired artifacts in a Docker registry (local or > remote), you should be able to specify it vi the --image and --version > flags: > > oc cluster up --image='openshift/origin' --version=latest How do you build the images from source so they are put in your local Docker registry? I couldn't see anything in their Makefiles of origin and origin-metrics that do this. From miburman at redhat.com Mon Feb 20 07:45:26 2017 From: miburman at redhat.com (Michael Burman) Date: Mon, 20 Feb 2017 14:45:26 +0200 Subject: [Hawkular-dev] openshift - using cluster up but building from source In-Reply-To: <787504021.7401912.1487479511361.JavaMail.zimbra@redhat.com> References: <787504021.7401912.1487479511361.JavaMail.zimbra@redhat.com> Message-ID: <42bceb16-fef4-bd85-70a6-daeca3dbd044@redhat.com> Hi, The way I tested these was to modify the openshift/origin-metrics and deploy the metrics that way (copy new ear from metrics build etc). There's a Dockerfile to deploy to your local instance also. But I didn't of course use --use-metrics or oc cluster up but did those steps manually also. - Micke On 02/19/2017 06:45 AM, John Mazzitelli wrote: > Has anyone been able to use "oc cluster up --metrics" in order to run OpenShift Origin *and* Origin Metrics but running a local build (i.e. I need to pick up changes in master branch of Origin/Origin Metrics that aren't released yet). > > The docs make it look very complicated, and nothing I found seems to help a dev get this up and running quickly without having to look at tons of docs and run lots of commands with bunches of yaml :). > > I'm hoping it is easy, but not documented. > > This link doesn't even mention "cluster up" let alone running with Origin Metrics: https://github.com/openshift/origin/blob/master/CONTRIBUTING.adoc#develop-locally-on-your-host > > If I run "openshift start" - how do I get my own build of Origin Metrics to deploy, like "oc cluster up --metrics" does? > > It seems no matter what I do, using "cluster up" pulls down images from docker hub. I have no idea how to run Origin+Metrics using a local build. > > I'm hoping someone knows how to do this and can give me the steps. > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev From jpkroehling at redhat.com Mon Feb 20 08:16:00 2017 From: jpkroehling at redhat.com (=?UTF-8?Q?Juraci_Paix=c3=a3o_Kr=c3=b6hling?=) Date: Mon, 20 Feb 2017 14:16:00 +0100 Subject: [Hawkular-dev] openshift - using cluster up but building from source In-Reply-To: <1743615631.7811273.1487593672907.JavaMail.zimbra@redhat.com> References: <787504021.7401912.1487479511361.JavaMail.zimbra@redhat.com> <5aecc16c-fbdf-cd93-9b64-f3275cf2ae35@redhat.com> <1743615631.7811273.1487593672907.JavaMail.zimbra@redhat.com> Message-ID: <3b8ee0c2-4a70-9b72-003c-57894c109e1c@redhat.com> On 02/20/2017 01:27 PM, John Mazzitelli wrote: > How do you build the images from source so they are put in your local Docker registry? I couldn't see anything in their Makefiles of origin and origin-metrics that do this. Take a look at `hack/build-images.sh`. For the record: as I had all the tools set already from a previous build, I just re-built and ran with "sudo ./openshift start" from `_output/local/bin/linux/amd64` - Juca. From pawanpal004 at gmail.com Tue Feb 21 01:07:58 2017 From: pawanpal004 at gmail.com (Pawan Pal) Date: Tue, 21 Feb 2017 11:37:58 +0530 Subject: [Hawkular-dev] Test Account credentials for Hawkular Android Client Message-ID: Hi all, I would like to know the credentials of any testing account for Hawkular android-client. I found jdoe/ password, but it is not working. Also please give server and port. Thanks. -- Pawan Pal *B.Tech (Information Technology and Mathematical Innovation)* *Cluster Innovation Centre, University of Delhi* -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170221/98342898/attachment.html From pawanpal004 at gmail.com Tue Feb 21 03:06:04 2017 From: pawanpal004 at gmail.com (Pawan Pal) Date: Tue, 21 Feb 2017 13:36:04 +0530 Subject: [Hawkular-dev] Test Account credentials for Hawkular Android Client In-Reply-To: References: Message-ID: Hi, I set up my Hawkular server following this guide : http://www.hawkular.org/hawkular-services/docs/installation-guide/ On Tue, Feb 21, 2017 at 11:37 AM, Pawan Pal wrote: > Hi all, > I would like to know the credentials of any testing account for Hawkular > android-client. I found jdoe/ password, but it is not working. Also please > give server and port. > > Thanks. > > -- > Pawan Pal > *B.Tech (Information Technology and Mathematical Innovation)* > *Cluster Innovation Centre, University of Delhi* > > > > -- Pawan Pal *B.Tech (Information Technology and Mathematical Innovation)* *Cluster Innovation Centre, University of Delhi* -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170221/ddda38cd/attachment-0001.html From theute at redhat.com Tue Feb 21 03:21:39 2017 From: theute at redhat.com (Thomas Heute) Date: Tue, 21 Feb 2017 09:21:39 +0100 Subject: [Hawkular-dev] Test Account credentials for Hawkular Android Client In-Reply-To: References: Message-ID: Then it would be localhost:8080 with myUsername and myPassword as defined in step 3 of the guide. On Tue, Feb 21, 2017 at 9:06 AM, Pawan Pal wrote: > Hi, > I set up my Hawkular server following this guide : > http://www.hawkular.org/hawkular-services/docs/installation-guide/ > > > > > On Tue, Feb 21, 2017 at 11:37 AM, Pawan Pal wrote: > >> Hi all, >> I would like to know the credentials of any testing account for Hawkular >> android-client. I found jdoe/ password, but it is not working. Also please >> give server and port. >> >> Thanks. >> >> -- >> Pawan Pal >> *B.Tech (Information Technology and Mathematical Innovation)* >> *Cluster Innovation Centre, University of Delhi* >> >> >> >> > > > -- > Pawan Pal > *B.Tech (Information Technology and Mathematical Innovation)* > *Cluster Innovation Centre, University of Delhi* > > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170221/7de3b3ae/attachment.html From anuj1708 at gmail.com Tue Feb 21 06:40:18 2017 From: anuj1708 at gmail.com (Anuj Garg) Date: Tue, 21 Feb 2017 17:10:18 +0530 Subject: [Hawkular-dev] Test Account credentials for Hawkular Android Client In-Reply-To: References: Message-ID: Hello pawan. I was last maintainer of this android client. Lets talk on hangout for detailed interaction if you interested in maintaining this code On 21 Feb 2017 1:52 p.m., "Thomas Heute" wrote: Then it would be localhost:8080 with myUsername and myPassword as defined in step 3 of the guide. On Tue, Feb 21, 2017 at 9:06 AM, Pawan Pal wrote: > Hi, > I set up my Hawkular server following this guide : > http://www.hawkular.org/hawkular-services/docs/installation-guide/ > > > > > On Tue, Feb 21, 2017 at 11:37 AM, Pawan Pal wrote: > >> Hi all, >> I would like to know the credentials of any testing account for Hawkular >> android-client. I found jdoe/ password, but it is not working. Also please >> give server and port. >> >> Thanks. >> >> -- >> Pawan Pal >> *B.Tech (Information Technology and Mathematical Innovation)* >> *Cluster Innovation Centre, University of Delhi* >> >> >> >> > > > -- > Pawan Pal > *B.Tech (Information Technology and Mathematical Innovation)* > *Cluster Innovation Centre, University of Delhi* > > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > > _______________________________________________ hawkular-dev mailing list hawkular-dev at lists.jboss.org https://lists.jboss.org/mailman/listinfo/hawkular-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170221/6102d5e8/attachment.html From mwringe at redhat.com Tue Feb 21 14:28:34 2017 From: mwringe at redhat.com (Matt Wringe) Date: Tue, 21 Feb 2017 14:28:34 -0500 (EST) Subject: [Hawkular-dev] openshift - using cluster up but building from source In-Reply-To: <3b8ee0c2-4a70-9b72-003c-57894c109e1c@redhat.com> References: <787504021.7401912.1487479511361.JavaMail.zimbra@redhat.com> <5aecc16c-fbdf-cd93-9b64-f3275cf2ae35@redhat.com> <1743615631.7811273.1487593672907.JavaMail.zimbra@redhat.com> <3b8ee0c2-4a70-9b72-003c-57894c109e1c@redhat.com> Message-ID: <42971047.36571164.1487705314568.JavaMail.zimbra@redhat.com> Also, just a note that deploying metrics via the deployer pod is deprecated after OpenShift 3.4 The current way to deploy metrics to OpenShift is it use Ansible. The docs for how to use this should be available here https://docs.openshift.org/latest/install_config/cluster_metrics.html but it looks like the docs have not been updated yet (the PR has been merged: https://github.com/openshift/openshift-docs/pull/3529) For the time being the deployer still needs to stay around for backwards compatibility reasons and to support 'oc cluster up', but you should be moving away from this for development purposes. ----- Original Message ----- > From: "Juraci Paix?o Kr?hling" > To: hawkular-dev at lists.jboss.org > Sent: Monday, 20 February, 2017 8:16:00 AM > Subject: Re: [Hawkular-dev] openshift - using cluster up but building from source > > On 02/20/2017 01:27 PM, John Mazzitelli wrote: > > How do you build the images from source so they are put in your local > > Docker registry? I couldn't see anything in their Makefiles of origin and > > origin-metrics that do this. > > Take a look at `hack/build-images.sh`. For the record: as I had all the > tools set already from a previous build, I just re-built and ran with > "sudo ./openshift start" from `_output/local/bin/linux/amd64` > > - Juca. > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > From mazz at redhat.com Tue Feb 21 17:25:22 2017 From: mazz at redhat.com (John Mazzitelli) Date: Tue, 21 Feb 2017 17:25:22 -0500 (EST) Subject: [Hawkular-dev] more openshift issues Message-ID: <1935619244.9131150.1487715922804.JavaMail.zimbra@redhat.com> If I start openshift with "sudo ./openshift start" and then try to log in like this: oc login -u system:admin What would cause this: Authentication required for https://192.168.1.15:8443 (openshift) Username: system:admin Password: error: username system:admin is invalid for basic auth When I start with "oc cluster up" I do not get asked for a password and it "just works" From vnguyen at redhat.com Tue Feb 21 18:55:36 2017 From: vnguyen at redhat.com (Viet Nguyen) Date: Tue, 21 Feb 2017 18:55:36 -0500 (EST) Subject: [Hawkular-dev] more openshift issues In-Reply-To: <1935619244.9131150.1487715922804.JavaMail.zimbra@redhat.com> References: <1935619244.9131150.1487715922804.JavaMail.zimbra@redhat.com> Message-ID: <551149505.24695954.1487721336269.JavaMail.zimbra@redhat.com> I had similar issue but with ansible-playbook-installed cluster. The oc client uses credentials in config.yaml in /etc/origin/master and the error usually means the user running the script doesn't have permissions to access the yaml file. hope it helps. Viet ----- Original Message ----- From: "John Mazzitelli" To: "Discussions around Hawkular development" Sent: Tuesday, February 21, 2017 2:25:22 PM Subject: [Hawkular-dev] more openshift issues If I start openshift with "sudo ./openshift start" and then try to log in like this: oc login -u system:admin What would cause this: Authentication required for https://192.168.1.15:8443 (openshift) Username: system:admin Password: error: username system:admin is invalid for basic auth When I start with "oc cluster up" I do not get asked for a password and it "just works" _______________________________________________ hawkular-dev mailing list hawkular-dev at lists.jboss.org https://lists.jboss.org/mailman/listinfo/hawkular-dev From mazz at redhat.com Wed Feb 22 00:16:16 2017 From: mazz at redhat.com (John Mazzitelli) Date: Wed, 22 Feb 2017 00:16:16 -0500 (EST) Subject: [Hawkular-dev] openshift - using cluster up but building from source In-Reply-To: <5aecc16c-fbdf-cd93-9b64-f3275cf2ae35@redhat.com> References: <787504021.7401912.1487479511361.JavaMail.zimbra@redhat.com> <5aecc16c-fbdf-cd93-9b64-f3275cf2ae35@redhat.com> Message-ID: <1916494421.9180507.1487740576629.JavaMail.zimbra@redhat.com> > $ oc process -f metrics.yaml \ > -v IMAGE_PREFIX="jpkroehling/origin-" \ > -v IMAGE_VERSION="dev" \ > -v HAWKULAR_METRICS_HOSTNAME=hawkular-metrics.example.com \ > -v USE_PERSISTENT_STORAGE=false \ > -v CASSANDRA_NODES=2 | oc create -n openshift-infra -f - OK, has anyone tried this on the latest code? (that is, 1.5 alpha 3 OR master). I have a feeling in the latest code this "metrics.yaml" isn't all that you need now. There is more that is needed. Because if you only create the entities within metrics.yaml, you get errors from this oc create: "Error from server (Forbidden): pods "metrics-deployer-" is forbidden: service account openshift-infra/metrics-deployer was not found, retry after the service account is created" OK, so I then try the following: before this "oc process -f metrics.yaml" I run this extra command: $ oc create -n openshift-infra -f metrics-deployer-setup.yaml Which all results in: serviceaccount "metrics-deployer" created pod "metrics-deployer-3k89r" created Sounds good right? Well, go to the UI Console and see that metrics deployer has an error: MountVolume.SetUp failed for volume "kubernetes.io/secret/2a843217-f8bd-11e6-95ab-54ee7549ae45-secret" (spec.Name: "secret") pod "2a843217-f8bd-11e6-95ab-54ee7549ae45" (UID: "2a843217-f8bd-11e6-95ab-54ee7549ae45") with: secrets "metrics-deployer" not found From mazz at redhat.com Wed Feb 22 00:30:00 2017 From: mazz at redhat.com (John Mazzitelli) Date: Wed, 22 Feb 2017 00:30:00 -0500 (EST) Subject: [Hawkular-dev] openshift - using cluster up but building from source In-Reply-To: <1916494421.9180507.1487740576629.JavaMail.zimbra@redhat.com> References: <787504021.7401912.1487479511361.JavaMail.zimbra@redhat.com> <5aecc16c-fbdf-cd93-9b64-f3275cf2ae35@redhat.com> <1916494421.9180507.1487740576629.JavaMail.zimbra@redhat.com> Message-ID: <1954198388.9181373.1487741400663.JavaMail.zimbra@redhat.com> OK, I got it. In case anyone cares (yes, I found these in the docs - go figure.) oc create -n openshift-infra -f metrics-deployer-setup.yaml oc adm policy add-role-to-user edit system:serviceaccount:openshift-infra:metrics-deployer -n openshift-infra oc secrets new metrics-deployer nothing=/dev/null -n openshift-infra oc adm policy add-role-to-user view system:serviceaccount:openshift-infra:hawkular -n openshift-infra oc adm policy add-cluster-role-to-user cluster-reader system:serviceaccount:openshift-infra:heapster -n openshift-infra NOW you can oc create metrics.yaml :) At least now I see all my "latest" containers starting up. I'll see what else is broke after it all starts :) ----- Original Message ----- > > $ oc process -f metrics.yaml \ > > -v IMAGE_PREFIX="jpkroehling/origin-" \ > > -v IMAGE_VERSION="dev" \ > > -v HAWKULAR_METRICS_HOSTNAME=hawkular-metrics.example.com \ > > -v USE_PERSISTENT_STORAGE=false \ > > -v CASSANDRA_NODES=2 | oc create -n openshift-infra -f - > > OK, has anyone tried this on the latest code? (that is, 1.5 alpha 3 OR > master). > > I have a feeling in the latest code this "metrics.yaml" isn't all that you > need now. There is more that is needed. Because if you only create the > entities within metrics.yaml, you get errors from this oc create: > > "Error from server (Forbidden): pods "metrics-deployer-" is forbidden: > service account openshift-infra/metrics-deployer was not found, retry > after the service account is created" > > OK, so I then try the following: before this "oc process -f metrics.yaml" I > run this extra command: > > $ oc create -n openshift-infra -f metrics-deployer-setup.yaml > > Which all results in: > > serviceaccount "metrics-deployer" created > pod "metrics-deployer-3k89r" created > > Sounds good right? Well, go to the UI Console and see that metrics deployer > has an error: > > MountVolume.SetUp failed for volume > "kubernetes.io/secret/2a843217-f8bd-11e6-95ab-54ee7549ae45-secret" > (spec.Name: "secret") pod "2a843217-f8bd-11e6-95ab-54ee7549ae45" (UID: > "2a843217-f8bd-11e6-95ab-54ee7549ae45") with: secrets "metrics-deployer" > not found > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > From jpkroehling at redhat.com Wed Feb 22 04:37:30 2017 From: jpkroehling at redhat.com (=?UTF-8?Q?Juraci_Paix=c3=a3o_Kr=c3=b6hling?=) Date: Wed, 22 Feb 2017 10:37:30 +0100 Subject: [Hawkular-dev] more openshift issues In-Reply-To: <1935619244.9131150.1487715922804.JavaMail.zimbra@redhat.com> References: <1935619244.9131150.1487715922804.JavaMail.zimbra@redhat.com> Message-ID: <1ce4be40-a0ba-7d79-6feb-c7be8d3251b4@redhat.com> On 02/21/2017 11:25 PM, John Mazzitelli wrote: > If I start openshift with "sudo ./openshift start" and then try to log in like this: > > oc login -u system:admin > > What would cause this: > > Authentication required for https://192.168.1.15:8443 (openshift) > Username: system:admin > Password: > error: username system:admin is invalid for basic auth > > When I start with "oc cluster up" I do not get asked for a password and it "just works" It seems I forgot to mention that. You need to use a "kubeconfig", created during the first boot. Most of the `oc` commands accept a config as parameter, but the easiest is to export a KUBECONFIG pointing to the admin file. I have this on my ~/.bashrc : export KUBECONFIG="/home/jpkroehling/go/src/github.com/openshift/origin/_output/local/bin/linux/amd64/openshift.local.config/master/admin.kubeconfig" - Juca. From mazz at redhat.com Wed Feb 22 08:00:27 2017 From: mazz at redhat.com (John Mazzitelli) Date: Wed, 22 Feb 2017 08:00:27 -0500 (EST) Subject: [Hawkular-dev] more openshift issues In-Reply-To: <1ce4be40-a0ba-7d79-6feb-c7be8d3251b4@redhat.com> References: <1935619244.9131150.1487715922804.JavaMail.zimbra@redhat.com> <1ce4be40-a0ba-7d79-6feb-c7be8d3251b4@redhat.com> Message-ID: <12474656.9349565.1487768427214.JavaMail.zimbra@redhat.com> Here's what I have so far: https://github.com/jmazzitelli/hawkular-openshift-agent/blob/hack-os-start/hack/start-openshift.sh which uses: https://github.com/jmazzitelli/hawkular-openshift-agent/blob/hack-os-start/hack/env-openshift.sh All my pods are blue. But I had to run it overnight - metrics took FOREVER for its ready probe to show "ready". But they are all blue now - says everything is running. But I have no idea if things are working because I have no way to interact with my pods because: I have no valid routes working, I can't look at pod logs, and I can't use a pod terminal to probe around. And people wonder why I don't upgrade things. Been days now, and I still can't get the latest OpenShift to work like it did before. :/ Here's the problems: 1. I can no longer get to the Hawkular Metrics URL from my browser or HawkFX: $ curl https://hawkular-metrics.example.com/ curl: (6) Could not resolve host: hawkular-metrics.example.com UI Console has a message about this: "The route is not accepting traffic yet because it has not been accepted by a router." 2. The same thing with my HOSA route - it doesn't work either with the same warning about not being accepted by a router. But I have a route defined - I can see it in my UI. Route named "hawkular-openshift-agent" has Hostname "http://hawkular-openshift-agent-default.router.default.svc.cluster.local/status " and Route To service "hawkular-openshift-agent". That service is defined, too. It think the way routes are defined might have changed from 1.5-alpha2 to 1.5-alpha3. 3. I can't see any Logs for any pod. For example, I used to be able to see logs for HOSA but now all the Logs tab says, "The logs are no longer available or could not be loaded." 4. I can no longer use the Terminal tab in the UI Console. All pod terminals (including HOSA's) page in the UI say "Could not connect to the container. Do you have sufficient privileges?" Note that my "admin" user has "cluster-admin" role via "oc adm policy add-cluster-role-to-user cluster-admin admin" so it should see everything (at least, that's how it worked in the older version). ----- Original Message ----- > On 02/21/2017 11:25 PM, John Mazzitelli wrote: > > If I start openshift with "sudo ./openshift start" and then try to log in > > like this: > > > > oc login -u system:admin > > > > What would cause this: > > > > Authentication required for https://192.168.1.15:8443 (openshift) > > Username: system:admin > > Password: > > error: username system:admin is invalid for basic auth > > > > When I start with "oc cluster up" I do not get asked for a password and it > > "just works" > > It seems I forgot to mention that. You need to use a "kubeconfig", > created during the first boot. Most of the `oc` commands accept a config > as parameter, but the easiest is to export a KUBECONFIG pointing to the > admin file. > > I have this on my ~/.bashrc : > > export > KUBECONFIG="/home/jpkroehling/go/src/github.com/openshift/origin/_output/local/bin/linux/amd64/openshift.local.config/master/admin.kubeconfig" > > - Juca. > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > From mazz at redhat.com Wed Feb 22 08:19:10 2017 From: mazz at redhat.com (John Mazzitelli) Date: Wed, 22 Feb 2017 08:19:10 -0500 (EST) Subject: [Hawkular-dev] more openshift issues In-Reply-To: <12474656.9349565.1487768427214.JavaMail.zimbra@redhat.com> References: <1935619244.9131150.1487715922804.JavaMail.zimbra@redhat.com> <1ce4be40-a0ba-7d79-6feb-c7be8d3251b4@redhat.com> <12474656.9349565.1487768427214.JavaMail.zimbra@redhat.com> Message-ID: <1098354744.9397099.1487769550279.JavaMail.zimbra@redhat.com> Oh, I just restarted OpenShift, and now both my Cassandra pods are orange - readiness probe is failing. This stuff is just AWESOME! ----- Original Message ----- > Here's what I have so far: > > https://github.com/jmazzitelli/hawkular-openshift-agent/blob/hack-os-start/hack/start-openshift.sh > > which uses: > > https://github.com/jmazzitelli/hawkular-openshift-agent/blob/hack-os-start/hack/env-openshift.sh > > All my pods are blue. But I had to run it overnight - metrics took FOREVER > for its ready probe to show "ready". But they are all blue now - says > everything is running. > > But I have no idea if things are working because I have no way to interact > with my pods because: I have no valid routes working, I can't look at pod > logs, and I can't use a pod terminal to probe around. > > And people wonder why I don't upgrade things. Been days now, and I still > can't get the latest OpenShift to work like it did before. :/ > > Here's the problems: > > 1. I can no longer get to the Hawkular Metrics URL from my browser or HawkFX: > > $ curl https://hawkular-metrics.example.com/ > curl: (6) Could not resolve host: hawkular-metrics.example.com > > UI Console has a message about this: "The route is not accepting traffic yet > because it has not been accepted by a router." > > 2. The same thing with my HOSA route - it doesn't work either with the same > warning about not being accepted by a router. But I have a route defined - I > can see it in my UI. Route named "hawkular-openshift-agent" has Hostname > "http://hawkular-openshift-agent-default.router.default.svc.cluster.local/status > " and Route To service "hawkular-openshift-agent". That service is defined, > too. > > It think the way routes are defined might have changed from 1.5-alpha2 to > 1.5-alpha3. > > 3. I can't see any Logs for any pod. For example, I used to be able to see > logs for HOSA but now all the Logs tab says, "The logs are no longer > available or could not be loaded." > > 4. I can no longer use the Terminal tab in the UI Console. All pod terminals > (including HOSA's) page in the UI say "Could not connect to the container. > Do you have sufficient privileges?" > > Note that my "admin" user has "cluster-admin" role via "oc adm policy > add-cluster-role-to-user cluster-admin admin" so it should see everything > (at least, that's how it worked in the older version). > > ----- Original Message ----- > > On 02/21/2017 11:25 PM, John Mazzitelli wrote: > > > If I start openshift with "sudo ./openshift start" and then try to log in > > > like this: > > > > > > oc login -u system:admin > > > > > > What would cause this: > > > > > > Authentication required for https://192.168.1.15:8443 (openshift) > > > Username: system:admin > > > Password: > > > error: username system:admin is invalid for basic auth > > > > > > When I start with "oc cluster up" I do not get asked for a password and > > > it > > > "just works" > > > > It seems I forgot to mention that. You need to use a "kubeconfig", > > created during the first boot. Most of the `oc` commands accept a config > > as parameter, but the easiest is to export a KUBECONFIG pointing to the > > admin file. > > > > I have this on my ~/.bashrc : > > > > export > > KUBECONFIG="/home/jpkroehling/go/src/github.com/openshift/origin/_output/local/bin/linux/amd64/openshift.local.config/master/admin.kubeconfig" > > > > - Juca. > > _______________________________________________ > > hawkular-dev mailing list > > hawkular-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/hawkular-dev > > > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > From mwringe at redhat.com Wed Feb 22 08:58:22 2017 From: mwringe at redhat.com (Matt Wringe) Date: Wed, 22 Feb 2017 08:58:22 -0500 (EST) Subject: [Hawkular-dev] openshift - using cluster up but building from source In-Reply-To: <1954198388.9181373.1487741400663.JavaMail.zimbra@redhat.com> References: <787504021.7401912.1487479511361.JavaMail.zimbra@redhat.com> <5aecc16c-fbdf-cd93-9b64-f3275cf2ae35@redhat.com> <1916494421.9180507.1487740576629.JavaMail.zimbra@redhat.com> <1954198388.9181373.1487741400663.JavaMail.zimbra@redhat.com> Message-ID: <1922023130.36872178.1487771902934.JavaMail.zimbra@redhat.com> ----- Original Message ----- > From: "John Mazzitelli" > To: "Discussions around Hawkular development" > Sent: Wednesday, 22 February, 2017 12:30:00 AM > Subject: Re: [Hawkular-dev] openshift - using cluster up but building from source > > OK, I got it. In case anyone cares (yes, I found these in the docs - go > figure.) > > oc create -n openshift-infra -f metrics-deployer-setup.yaml > oc adm policy add-role-to-user edit > system:serviceaccount:openshift-infra:metrics-deployer -n openshift-infra > oc secrets new metrics-deployer nothing=/dev/null -n openshift-infra > oc adm policy add-role-to-user view > system:serviceaccount:openshift-infra:hawkular -n openshift-infra > oc adm policy add-cluster-role-to-user cluster-reader > system:serviceaccount:openshift-infra:heapster -n openshift-infra > > NOW you can oc create metrics.yaml :) > > At least now I see all my "latest" containers starting up. I'll see what else > is broke after it all starts :) Just a reminder, this is using the deployer which is deprecated. Please start using openshift-ansible to deploy metrics. > ----- Original Message ----- > > > $ oc process -f metrics.yaml \ > > > -v IMAGE_PREFIX="jpkroehling/origin-" \ > > > -v IMAGE_VERSION="dev" \ > > > -v HAWKULAR_METRICS_HOSTNAME=hawkular-metrics.example.com \ > > > -v USE_PERSISTENT_STORAGE=false \ > > > -v CASSANDRA_NODES=2 | oc create -n openshift-infra -f - > > > > OK, has anyone tried this on the latest code? (that is, 1.5 alpha 3 OR > > master). > > > > I have a feeling in the latest code this "metrics.yaml" isn't all that you > > need now. There is more that is needed. Because if you only create the > > entities within metrics.yaml, you get errors from this oc create: > > > > "Error from server (Forbidden): pods "metrics-deployer-" is forbidden: > > service account openshift-infra/metrics-deployer was not found, retry > > after the service account is created" > > > > OK, so I then try the following: before this "oc process -f metrics.yaml" I > > run this extra command: > > > > $ oc create -n openshift-infra -f metrics-deployer-setup.yaml > > > > Which all results in: > > > > serviceaccount "metrics-deployer" created > > pod "metrics-deployer-3k89r" created > > > > Sounds good right? Well, go to the UI Console and see that metrics deployer > > has an error: > > > > MountVolume.SetUp failed for volume > > "kubernetes.io/secret/2a843217-f8bd-11e6-95ab-54ee7549ae45-secret" > > (spec.Name: "secret") pod "2a843217-f8bd-11e6-95ab-54ee7549ae45" (UID: > > "2a843217-f8bd-11e6-95ab-54ee7549ae45") with: secrets "metrics-deployer" > > not found > > _______________________________________________ > > hawkular-dev mailing list > > hawkular-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/hawkular-dev > > > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > From mwringe at redhat.com Wed Feb 22 09:03:22 2017 From: mwringe at redhat.com (Matt Wringe) Date: Wed, 22 Feb 2017 09:03:22 -0500 (EST) Subject: [Hawkular-dev] more openshift issues In-Reply-To: <1098354744.9397099.1487769550279.JavaMail.zimbra@redhat.com> References: <1935619244.9131150.1487715922804.JavaMail.zimbra@redhat.com> <1ce4be40-a0ba-7d79-6feb-c7be8d3251b4@redhat.com> <12474656.9349565.1487768427214.JavaMail.zimbra@redhat.com> <1098354744.9397099.1487769550279.JavaMail.zimbra@redhat.com> Message-ID: <1695499719.36874008.1487772202406.JavaMail.zimbra@redhat.com> It can be frustrating to run into issues. To me it sounds to me like you have a problem in your install. I am using the latest origin (built from master) and I am not running into any issues. It might be more productive to jump into #openshift-dev in freenode, or use the OpenShift mailing lists. - Matt ----- Original Message ----- > From: "John Mazzitelli" > To: "Discussions around Hawkular development" > Sent: Wednesday, 22 February, 2017 8:19:10 AM > Subject: Re: [Hawkular-dev] more openshift issues > > Oh, I just restarted OpenShift, and now both my Cassandra pods are orange - > readiness probe is failing. > > This stuff is just AWESOME! > > ----- Original Message ----- > > Here's what I have so far: > > > > https://github.com/jmazzitelli/hawkular-openshift-agent/blob/hack-os-start/hack/start-openshift.sh > > > > which uses: > > > > https://github.com/jmazzitelli/hawkular-openshift-agent/blob/hack-os-start/hack/env-openshift.sh > > > > All my pods are blue. But I had to run it overnight - metrics took FOREVER > > for its ready probe to show "ready". But they are all blue now - says > > everything is running. > > > > But I have no idea if things are working because I have no way to interact > > with my pods because: I have no valid routes working, I can't look at pod > > logs, and I can't use a pod terminal to probe around. > > > > And people wonder why I don't upgrade things. Been days now, and I still > > can't get the latest OpenShift to work like it did before. :/ > > > > Here's the problems: > > > > 1. I can no longer get to the Hawkular Metrics URL from my browser or > > HawkFX: > > > > $ curl https://hawkular-metrics.example.com/ > > curl: (6) Could not resolve host: hawkular-metrics.example.com > > > > UI Console has a message about this: "The route is not accepting traffic > > yet > > because it has not been accepted by a router." > > > > 2. The same thing with my HOSA route - it doesn't work either with the same > > warning about not being accepted by a router. But I have a route defined - > > I > > can see it in my UI. Route named "hawkular-openshift-agent" has Hostname > > "http://hawkular-openshift-agent-default.router.default.svc.cluster.local/status > > " and Route To service "hawkular-openshift-agent". That service is defined, > > too. > > > > It think the way routes are defined might have changed from 1.5-alpha2 to > > 1.5-alpha3. > > > > 3. I can't see any Logs for any pod. For example, I used to be able to see > > logs for HOSA but now all the Logs tab says, "The logs are no longer > > available or could not be loaded." > > > > 4. I can no longer use the Terminal tab in the UI Console. All pod > > terminals > > (including HOSA's) page in the UI say "Could not connect to the container. > > Do you have sufficient privileges?" > > > > Note that my "admin" user has "cluster-admin" role via "oc adm policy > > add-cluster-role-to-user cluster-admin admin" so it should see everything > > (at least, that's how it worked in the older version). > > > > ----- Original Message ----- > > > On 02/21/2017 11:25 PM, John Mazzitelli wrote: > > > > If I start openshift with "sudo ./openshift start" and then try to log > > > > in > > > > like this: > > > > > > > > oc login -u system:admin > > > > > > > > What would cause this: > > > > > > > > Authentication required for https://192.168.1.15:8443 (openshift) > > > > Username: system:admin > > > > Password: > > > > error: username system:admin is invalid for basic auth > > > > > > > > When I start with "oc cluster up" I do not get asked for a password and > > > > it > > > > "just works" > > > > > > It seems I forgot to mention that. You need to use a "kubeconfig", > > > created during the first boot. Most of the `oc` commands accept a config > > > as parameter, but the easiest is to export a KUBECONFIG pointing to the > > > admin file. > > > > > > I have this on my ~/.bashrc : > > > > > > export > > > KUBECONFIG="/home/jpkroehling/go/src/github.com/openshift/origin/_output/local/bin/linux/amd64/openshift.local.config/master/admin.kubeconfig" > > > > > > - Juca. > > > _______________________________________________ > > > hawkular-dev mailing list > > > hawkular-dev at lists.jboss.org > > > https://lists.jboss.org/mailman/listinfo/hawkular-dev > > > > > _______________________________________________ > > hawkular-dev mailing list > > hawkular-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/hawkular-dev > > > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > From theute at redhat.com Wed Feb 22 12:35:14 2017 From: theute at redhat.com (Thomas Heute) Date: Wed, 22 Feb 2017 18:35:14 +0100 Subject: [Hawkular-dev] APM - lost requests ? Message-ID: How do we explain the drops in requests in the following example: http://www.hawkular.org/img/blog/2017/2017-02-13-teaser.png Shouldn't it always be the same load ? Do we know if it's an it's an issue in APM, or is it that the load balancing "lose" some requests as we scale down a server ? or else ? Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170222/9f0b72d9/attachment.html From jpkroehling at redhat.com Wed Feb 22 12:46:17 2017 From: jpkroehling at redhat.com (=?UTF-8?Q?Juraci_Paix=c3=a3o_Kr=c3=b6hling?=) Date: Wed, 22 Feb 2017 18:46:17 +0100 Subject: [Hawkular-dev] APM - lost requests ? In-Reply-To: References: Message-ID: <08045738-05ec-fcb2-9585-2d91d825db0d@redhat.com> On 02/22/2017 06:35 PM, Thomas Heute wrote: > How do we explain the drops in requests in the following example: > http://www.hawkular.org/img/blog/2017/2017-02-13-teaser.png > > Shouldn't it always be the same load ? Yes. > Do we know if it's an it's an issue in APM, or is it that the load > balancing "lose" some requests as we scale down a server ? or else ? Not yet, but I have it on my radar. I think there's something in our example that is not perfectly coded, as I've seen some Vert.x warnings like "timed out waiting for answer from ...". - Juca. From theute at redhat.com Wed Feb 22 13:37:20 2017 From: theute at redhat.com (Thomas Heute) Date: Wed, 22 Feb 2017 19:37:20 +0100 Subject: [Hawkular-dev] APM - lost requests ? In-Reply-To: <08045738-05ec-fcb2-9585-2d91d825db0d@redhat.com> References: <08045738-05ec-fcb2-9585-2d91d825db0d@redhat.com> Message-ID: On Wed, Feb 22, 2017 at 6:46 PM, Juraci Paix?o Kr?hling < jpkroehling at redhat.com> wrote: > On 02/22/2017 06:35 PM, Thomas Heute wrote: > > How do we explain the drops in requests in the following example: > > http://www.hawkular.org/img/blog/2017/2017-02-13-teaser.png > > > > Shouldn't it always be the same load ? > > Yes. > > > Do we know if it's an it's an issue in APM, or is it that the load > > balancing "lose" some requests as we scale down a server ? or else ? > > Not yet, but I have it on my radar. I think there's something in our > example that is not perfectly coded, as I've seen some Vert.x warnings > like "timed out waiting for answer from ...". > Ok, thanks ! > > - Juca. > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170222/5a0f3ed9/attachment.html From gbrown at redhat.com Thu Feb 23 02:55:10 2017 From: gbrown at redhat.com (Gary Brown) Date: Thu, 23 Feb 2017 02:55:10 -0500 (EST) Subject: [Hawkular-dev] APM - lost requests ? In-Reply-To: <08045738-05ec-fcb2-9585-2d91d825db0d@redhat.com> References: <08045738-05ec-fcb2-9585-2d91d825db0d@redhat.com> Message-ID: <615326552.62706247.1487836510306.JavaMail.zimbra@redhat.com> I don't think it can be APM, as it wouldn't be impacted by the scaling of the app - i.e. if it was APM we would see drops in txns at other points where the scaling wasn't occurring. Regards Gary ----- Original Message ----- > On 02/22/2017 06:35 PM, Thomas Heute wrote: > > How do we explain the drops in requests in the following example: > > http://www.hawkular.org/img/blog/2017/2017-02-13-teaser.png > > > > Shouldn't it always be the same load ? > > Yes. > > > Do we know if it's an it's an issue in APM, or is it that the load > > balancing "lose" some requests as we scale down a server ? or else ? > > Not yet, but I have it on my radar. I think there's something in our > example that is not perfectly coded, as I've seen some Vert.x warnings > like "timed out waiting for answer from ...". > > - Juca. > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > From jpkroehling at redhat.com Thu Feb 23 05:02:32 2017 From: jpkroehling at redhat.com (=?UTF-8?Q?Juraci_Paix=c3=a3o_Kr=c3=b6hling?=) Date: Thu, 23 Feb 2017 11:02:32 +0100 Subject: [Hawkular-dev] APM - lost requests ? In-Reply-To: <615326552.62706247.1487836510306.JavaMail.zimbra@redhat.com> References: <08045738-05ec-fcb2-9585-2d91d825db0d@redhat.com> <615326552.62706247.1487836510306.JavaMail.zimbra@redhat.com> Message-ID: <9c3cbf4d-4e91-86d7-2227-2aadfd9de0b4@redhat.com> On 02/23/2017 08:55 AM, Gary Brown wrote: > I don't think it can be APM, as it wouldn't be impacted by the scaling of the app - i.e. if it was APM we would see drops in txns at other points where the scaling wasn't occurring. Right, I think it's a problem in our code *for the example*. APM is correctly doing its job, telling us there's a problem in our example :) - Juca. From jshaughn at redhat.com Fri Feb 24 08:04:22 2017 From: jshaughn at redhat.com (Jay Shaughnessy) Date: Fri, 24 Feb 2017 08:04:22 -0500 Subject: [Hawkular-dev] Cross-Tenant endpoints in Alerting on OS In-Reply-To: <1655878501.37837405.1487891140663.JavaMail.zimbra@redhat.com> References: <1655878501.37837405.1487891140663.JavaMail.zimbra@redhat.com> Message-ID: On 2/23/2017 6:05 PM, Matt Wringe wrote: > Is there any reason why this being sent in private emails and not to a mailing list? Matt, Not really, sending to dev-list for anyone interested in the discussion... > ----- Original Message ----- >> There was an IRC discussion today about $SUBJECT. Here is a summary of >> a conversation Matt and I had to drill down into whether there was a >> cross-tenant security concern with the Alerting API in OS. In short, >> the answer seems to be no. Alerting (1.4+) offers two endpoints for >> fetching cross-tenant: /alerts/admin/alerts and /alerts/admin/events. >> Note that the 'admin' is just in the path, and was chosen just to group >> what we deemed were admin-level endpoints, the first two of which are >> these cross-tenant fetches. The 'admin' does not mean anything else in >> this context, it does not reflect a special user or tenant. The way >> these endpoints work is that that they accept a Hawkular-Tenant HTTP >> header that can be a comma-separated-list of tenantIds. As with any of >> the alerting endpoints. Alerting does not perform any security in the >> request handling. But, in OS the HAM deployments both have the OS >> security filtering in place. That filtering does two things, for a >> cluster-admin user it's basically a pass-thru, the CSL Hawkular-Tenant >> header is passed on and the endpoints work. For all other users the >> Hawkular-Tenant header is validated. Because each project name is a >> tenant name, the value must match a project name. As such, the >> validation fails if a CSL is supplied. This is decent behavior for now >> as it prevents any undesired access. Note that as a corner-case, these >> endpoints will work fine if the header just supplies a single tenant, in >> which case they are basically the same as the typical single-tenant >> fetch endpoints. > What has happened is now Alerts is not considering the Hawkular-tenant header to contain just a string, but a comma separated lists of strings. > > eg "Hawkular-tenant: projectA,projectB" Note, not in general, comma-separated-lists handled only for the two cross-tenant endpoints mentioned above. > The OpenShift filter still considers this to be a string, so it will check with OpenShift if the user has permission to access the project named with a string value of "projectA,projectB". Since a project cannot have a ',' within its name, this check will always fail and return an access denied error. > > If the user is a cluster level user they are given access to everything, even impossibly named projects. So a cluster level user will happen to be able to use the current setup just due to how this works. > > So there doesn't appear to be any security issue that we need to deal with immediately, but we do probably want to handle this properly in the future. It might not be too difficult to add support to the tenant to consider a csl. > >> I'm not totally familiar with the Metrics approach to cross-tenant >> handling but going forward we (Metrics and Alerting) should probably >> look for some consistency, if possible. Moreover, any solution should >> reflect what best serves OS. The idea of a CSL for the header is fairly >> simple and flexible. It may be something to consider, for the OS filter >> it would mean validating that the bearer has access to each of the >> individual tenants before forwarding the request. > I don't recall any meetings about adding multitenancy to Metrics. From what I recall, there is no plans at all to introduce multitenancy at all for metrics. > > If I was aware of this discussion when this was brought up for alerts, I would have probably objected to the endpoint being called 'admin' since I don't think that reflects what the true purpose of this is suppose to be. Its not really an admin endpoint, but an endpoint for cross-tenancy. I could have access to projectA and projectB, but not be an 'admin' > > If we are making changes like this which affect security, I would really like to be notified so that I can make sure our security filters will function properly. Even if I am in the meeting when its being discussed it would be good to ping me on the PR with the actual implementation. Of course. This stuff went in in mid November and at that time we (in alerting) were really just getting settled with the initial integration into metrics for OS. Going forward I think we have a better idea of what is relevant to OS and can more easily flag items of import. From hrupp at redhat.com Mon Feb 27 06:02:33 2017 From: hrupp at redhat.com (Heiko W.Rupp) Date: Mon, 27 Feb 2017 12:02:33 +0100 Subject: [Hawkular-dev] Proposing closer integration of APM and "Hawkular metrics" on Kubernetes / OpenShift Message-ID: <6CC176AF-5B23-4EAE-8BA4-4660712AA215@redhat.com> Hi, right now Hawkular metrics and Hawkular APM are going relatively separate ways. This is in part due to the backend choice, but probably also for other reasons. I am proposing that we try to get the two closer together because at the end neither tracing data alone, not classic monitoring data can answer all the questions like: APM - why is my service XY slow (my be overload of underlying CPU) - how much disk will my service need in two years - how much network usage did my service have yesterday Classic montoring - which service will fail if I pull the plug here - what are customers buying - why is my service slow (may be come from a dependency) I am proposing that we integrate the two over the UI - in the first scenario here the key driver is the APM UI with its trace diagrams (red boxes). A klick on such a box will then show related metrics from the classic monitoring. On the level of the individual pod, both APM and Classic 'instrumentations' are present. For JVM-based apps this is on one side the APM agent and/or APM instrumentation ("OT-instrumentation") (*a) On the other side the jolokia agent/agent bond (*b) ![](cid:60528B4C-9911-4942-9091-D66129836840 at redhat.com "hosa-apm1.png") In this first scenario, APM and classic still have separate agents and connections to the backends and different backend storage. The 2nd scenario, assumes that it is possible to use only one agent binary that does both APM and classic metric export. For classic metrics, Hosa will poll it with P8s metrics. And on top APM trace data will also be made available for grab by Hosa, which will then forward them to the APM server. ![](cid:C4FC8BD0-2735-4CF5-9EEA-30A5B176909B at redhat.com "hosa-apm2.png") Thoughts? Heiko *a) I propose to always deploy the APM agent to get a quick and easy coverage of standard scenarios, so that the user only needs explicit instrumentation to increase granularity and/or to cover cases the agent can't cover. Also "manual" instrumentation should be able to use the agent's connection to talk to the APM server. *b) I think it would make sense to always use the Prometheus protocol (and Hosa may learn how to use the more efficient binary protocol) as Jolokia/http is JVM/Jmx specific, while P8s exporters also exist for other environments like Node or Ruby -- Reg. Adresse: Red Hat GmbH, Technopark II, Haus C, Werner-von-Siemens-Ring 14, D-85630 Grasbrunn Handelsregister: Amtsgericht M?nchen HRB 153243 Gesch?ftsf?hrer: Charles Cachera, Michael Cunningham, Michael O'Neill, Eric Shander -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170227/e6c47959/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: hosa-apm1.png Type: image/png Size: 59889 bytes Desc: not available Url : http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170227/e6c47959/attachment-0002.png -------------- next part -------------- A non-text attachment was scrubbed... Name: hosa-apm2.png Type: image/png Size: 56574 bytes Desc: not available Url : http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170227/e6c47959/attachment-0003.png From hrupp at redhat.com Mon Feb 27 10:53:50 2017 From: hrupp at redhat.com (Heiko W.Rupp) Date: Mon, 27 Feb 2017 16:53:50 +0100 Subject: [Hawkular-dev] Proposing closer integration of APM and "Hawkular metrics" on Kubernetes / OpenShift In-Reply-To: <6CC176AF-5B23-4EAE-8BA4-4660712AA215@redhat.com> References: <6CC176AF-5B23-4EAE-8BA4-4660712AA215@redhat.com> Message-ID: <350AC5B7-CE17-47AF-92BD-C062F44D1DEA@redhat.com> I clicked 'send' a bit too quickly :) On 27 Feb 2017, at 12:02, Heiko W.Rupp wrote: > I am proposing that we integrate the two over the UI - in the first > scenario > here the key driver is the APM UI with its trace diagrams (red boxes). Of course the other way of the integration should be possible as well. If for a pod we know it hosts a service that is traced with Hawkular APM, it should be possible to get (to) the APM-view. For the 1st scenario I quickly looked at Grafana, but it looks like one can only deep-link to a dashboard (or perhaps panel in a dashboard), but not by giving the name of an individual metric to graph. One could dynamically set up a dashboard on the fly though, which could even be interesting to see all the 'classic metrics' of an APM trace in one Grafana dashboard. From gbrown at redhat.com Mon Feb 27 11:15:37 2017 From: gbrown at redhat.com (Gary Brown) Date: Mon, 27 Feb 2017 11:15:37 -0500 (EST) Subject: [Hawkular-dev] Proposing closer integration of APM and "Hawkular metrics" on Kubernetes / OpenShift In-Reply-To: <350AC5B7-CE17-47AF-92BD-C062F44D1DEA@redhat.com> References: <6CC176AF-5B23-4EAE-8BA4-4660712AA215@redhat.com> <350AC5B7-CE17-47AF-92BD-C062F44D1DEA@redhat.com> Message-ID: <503919943.64146644.1488212137350.JavaMail.zimbra@redhat.com> It would be worth experimenting with different ideas, but we should remember that Jaeger is being built by Uber with the aim of making sure it helps their ops team easily identify and focus in on the cause of problems. So I think we need to build upon this work as much as possible. Regards Gary ----- Original Message ----- > I clicked 'send' a bit too quickly :) > > On 27 Feb 2017, at 12:02, Heiko W.Rupp wrote: > > > I am proposing that we integrate the two over the UI - in the first > > scenario > > here the key driver is the APM UI with its trace diagrams (red boxes). > > Of course the other way of the integration should be possible > as well. If for a pod we know it hosts a service that is traced > with Hawkular APM, it should be possible to get (to) the APM-view. > > For the 1st scenario I quickly looked at Grafana, but it looks like > one can only deep-link to a dashboard (or perhaps panel in a > dashboard), but not by giving the name of an individual metric to > graph. One could dynamically set up a dashboard on the fly though, > which could even be interesting to see all the 'classic metrics' of > an APM trace in one Grafana dashboard. > > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > From mazz at redhat.com Mon Feb 27 11:58:02 2017 From: mazz at redhat.com (John Mazzitelli) Date: Mon, 27 Feb 2017 11:58:02 -0500 (EST) Subject: [Hawkular-dev] Proposing closer integration of APM and "Hawkular metrics" on Kubernetes / OpenShift In-Reply-To: <6CC176AF-5B23-4EAE-8BA4-4660712AA215@redhat.com> References: <6CC176AF-5B23-4EAE-8BA4-4660712AA215@redhat.com> Message-ID: <1255502513.329033.1488214682647.JavaMail.zimbra@redhat.com> > *b) I think it would make sense to always use the Prometheus protocol (and > Hosa may learn how to use the more efficient binary protocol) If you are referring to the Prometheus binary exposition format [1] (aka content type of "application/vnd.google.protobuf; proto=io.prometheus.client.MetricFamily; encoding=delimited" as opposed to "text/plain"), HOSA already supports this - it can read any Prometheus endpoint that exposes metric data in either the text or binary format. See the source code [2] and a test [3] that shows this working. [1] https://prometheus.io/docs/instrumenting/exposition_formats/ [2] https://github.com/hawkular/hawkular-openshift-agent/blob/master/prometheus/prometheus_scraper.go#L69-L78 [3] https://github.com/hawkular/hawkular-openshift-agent/blob/master/prometheus/prometheus_scraper_test.go#L155-L164 From mazz at redhat.com Mon Feb 27 17:21:02 2017 From: mazz at redhat.com (John Mazzitelli) Date: Mon, 27 Feb 2017 17:21:02 -0500 (EST) Subject: [Hawkular-dev] HOSA: creating tags based on pod labels In-Reply-To: <1419600755.399491.1488233691574.JavaMail.zimbra@redhat.com> Message-ID: <2029210085.405925.1488234062032.JavaMail.zimbra@redhat.com> I spoke to Stefan about this last week - wanted to post here to let everyone know about it. Let me know if you see something wrong with this. The PR is here: https://github.com/hawkular/hawkular-openshift-agent/pull/140 Rather than me regurgitate what this new feature is, you can read it in the new README section "Pod Label Tags" found here: https://github.com/jmazzitelli/hawkular-openshift-agent/blob/146618afa5276d916307a051fda59340c33472a1/README.adoc#pod-label-tags In short, you can tag a pod's metrics with that pod's labels to make querying for metric data easier. From neil.okamoto+hawkular at gmail.com Mon Feb 27 18:23:36 2017 From: neil.okamoto+hawkular at gmail.com (neil.okamoto+hawkular at gmail.com) Date: Mon, 27 Feb 2017 15:23:36 -0800 Subject: [Hawkular-dev] Proposing closer integration of APM and "Hawkular metrics" on Kubernetes / OpenShift In-Reply-To: <1255502513.329033.1488214682647.JavaMail.zimbra@redhat.com> References: <6CC176AF-5B23-4EAE-8BA4-4660712AA215@redhat.com> <1255502513.329033.1488214682647.JavaMail.zimbra@redhat.com> Message-ID: <4B93AFC4-3450-4568-A5EB-C62C4792B23F@gmail.com> >From the outside looking in, this proposal seems to assume monitoring/metrics are similar enough to APM traces that they can and should converge in a single collection system. Whereas I would have expected the proposal was to combine the in the presentation/UI *after* collection. Am I misunderstanding? Because I would have thought the characteristics of these systems would be unique enough to stay separate, e.g. the traffic rates for metrics vs traces would be different, the ?schema? of what needs to be forwarded and stored is different, and for sure the APM traces need to be correlated based on their trace IDs and otherwise filtered based on somewhat arbitrary tags supplied in the instrumentation. > On Feb 27, 2017, at 8:58 AM, John Mazzitelli wrote: > >> *b) I think it would make sense to always use the Prometheus protocol (and >> Hosa may learn how to use the more efficient binary protocol) > > If you are referring to the Prometheus binary exposition format [1] (aka content type of "application/vnd.google.protobuf; proto=io.prometheus.client.MetricFamily; encoding=delimited" as opposed to "text/plain"), HOSA already supports this - it can read any Prometheus endpoint that exposes metric data in either the text or binary format. > > See the source code [2] and a test [3] that shows this working. > > [1] https://prometheus.io/docs/instrumenting/exposition_formats/ > [2] https://github.com/hawkular/hawkular-openshift-agent/blob/master/prometheus/prometheus_scraper.go#L69-L78 > [3] https://github.com/hawkular/hawkular-openshift-agent/blob/master/prometheus/prometheus_scraper_test.go#L155-L164 > > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev From gbrown at redhat.com Tue Feb 28 03:24:03 2017 From: gbrown at redhat.com (Gary Brown) Date: Tue, 28 Feb 2017 03:24:03 -0500 (EST) Subject: [Hawkular-dev] Proposing closer integration of APM and "Hawkular metrics" on Kubernetes / OpenShift In-Reply-To: <4B93AFC4-3450-4568-A5EB-C62C4792B23F@gmail.com> References: <6CC176AF-5B23-4EAE-8BA4-4660712AA215@redhat.com> <1255502513.329033.1488214682647.JavaMail.zimbra@redhat.com> <4B93AFC4-3450-4568-A5EB-C62C4792B23F@gmail.com> Message-ID: <565161360.64367068.1488270243505.JavaMail.zimbra@redhat.com> Agree - the most important aspect of the proposal is the ability for a user to navigate between the two types of information in the UI. Having a common collection mechanism may simplify the infrastructure, but in this respect the greater benefit would be convergence on Cassandra at the backend - which is something we (APM team) will be investigating. Regards Gary ----- Original Message ----- > From the outside looking in, this proposal seems to assume monitoring/metrics > are similar enough to APM traces that they can and should converge in a > single collection system. Whereas I would have expected the proposal was to > combine the in the presentation/UI *after* collection. Am I > misunderstanding? Because I would have thought the characteristics of these > systems would be unique enough to stay separate, e.g. the traffic rates for > metrics vs traces would be different, the ?schema? of what needs to be > forwarded and stored is different, and for sure the APM traces need to be > correlated based on their trace IDs and otherwise filtered based on somewhat > arbitrary tags supplied in the instrumentation. > > > > > On Feb 27, 2017, at 8:58 AM, John Mazzitelli wrote: > > > >> *b) I think it would make sense to always use the Prometheus protocol (and > >> Hosa may learn how to use the more efficient binary protocol) > > > > If you are referring to the Prometheus binary exposition format [1] (aka > > content type of "application/vnd.google.protobuf; > > proto=io.prometheus.client.MetricFamily; encoding=delimited" as opposed to > > "text/plain"), HOSA already supports this - it can read any Prometheus > > endpoint that exposes metric data in either the text or binary format. > > > > See the source code [2] and a test [3] that shows this working. > > > > [1] https://prometheus.io/docs/instrumenting/exposition_formats/ > > [2] > > https://github.com/hawkular/hawkular-openshift-agent/blob/master/prometheus/prometheus_scraper.go#L69-L78 > > [3] > > https://github.com/hawkular/hawkular-openshift-agent/blob/master/prometheus/prometheus_scraper_test.go#L155-L164 > > > > _______________________________________________ > > hawkular-dev mailing list > > hawkular-dev at lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/hawkular-dev > > > _______________________________________________ > hawkular-dev mailing list > hawkular-dev at lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > From hrupp at redhat.com Tue Feb 28 04:41:47 2017 From: hrupp at redhat.com (Heiko W.Rupp) Date: Tue, 28 Feb 2017 10:41:47 +0100 Subject: [Hawkular-dev] Proposing closer integration of APM and "Hawkular metrics" on Kubernetes / OpenShift In-Reply-To: <4B93AFC4-3450-4568-A5EB-C62C4792B23F@gmail.com> References: <6CC176AF-5B23-4EAE-8BA4-4660712AA215@redhat.com> <1255502513.329033.1488214682647.JavaMail.zimbra@redhat.com> <4B93AFC4-3450-4568-A5EB-C62C4792B23F@gmail.com> Message-ID: <8700AADC-8DD9-44A4-AD9E-9E950A9788FB@redhat.com> On 28 Feb 2017, at 0:23, neil.okamoto+hawkular at gmail.com wrote: > From the outside looking in, this proposal seems to assume > monitoring/metrics are similar enough to APM traces that they can and > should converge in a single collection system. Whereas Scenario 2 may be unclear here. I did not try to imply that this may be the same collection system or that they are close enough, but rather the same "forwarding" system. Right now one would e.g. deploy Jolokia + DropWizard Metrics to expose the classic metrics and also the APM agent for the traces. This means double configuration, potentially duplicated code and a lot more places for errors. > I would have expected the proposal was to combine the in the > presentation/UI *after* collection. Am I misunderstanding? No. And this was the main focus of the 1st scenario. > Because I would have thought the characteristics of these systems > would be unique enough to stay separate, e.g. the traffic rates for > metrics vs traces would be different, the ?schema? of what needs > to be forwarded and stored is different, and for sure the APM traces > need to be correlated based on their trace IDs and otherwise filtered > based on somewhat arbitrary tags supplied in the instrumentation. Absolutely. And still when you have a trace like the following ![](cid:6EE38915-A238-4335-8A96-839E1905C916 at redhat.com "Bildschirmfoto 2017-02-28 um 10.38.00.png") one wants to find out why it took 800ms and where the difference on 600ms was spent. Here looking at data from the underlying pod or even node may be very helpful. Which is where the classical monitoring data comes into play. Similarly if you have a service scaled to 3 instances and you get a larger variation in request time, the pod information may help you pinpoint the variations -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170228/66ffcaf2/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: Bildschirmfoto 2017-02-28 um 10.38.00.png Type: image/png Size: 15792 bytes Desc: not available Url : http://lists.jboss.org/pipermail/hawkular-dev/attachments/20170228/66ffcaf2/attachment-0001.png From jsanda at redhat.com Tue Feb 28 09:35:46 2017 From: jsanda at redhat.com (John Sanda) Date: Tue, 28 Feb 2017 09:35:46 -0500 Subject: [Hawkular-dev] Proposing closer integration of APM and "Hawkular metrics" on Kubernetes / OpenShift In-Reply-To: <565161360.64367068.1488270243505.JavaMail.zimbra@redhat.com> References: <6CC176AF-5B23-4EAE-8BA4-4660712AA215@redhat.com> <1255502513.329033.1488214682647.JavaMail.zimbra@redhat.com> <4B93AFC4-3450-4568-A5EB-C62C4792B23F@gmail.com> <565161360.64367068.1488270243505.JavaMail.zimbra@redhat.com> Message-ID: <93EBFDED-6A5A-4657-8CB1-5A2EA34B0A64@redhat.com> > On Feb 28, 2017, at 3:24 AM, Gary Brown wrote: > > Agree - the most important aspect of the proposal is the ability for a user to navigate between the two types of information in the UI. > > Having a common collection mechanism may simplify the infrastructure, but in this respect the greater benefit would be convergence on Cassandra at the backend - which is something we (APM team) will be investigating. I had spent some time not too long ago trying to get familiar enough with the APM code base so that I could propose and hopefully do a small PoC with a Cassandra backend. I did not make much progress and got busy with other tasks, but I do think it is very doable. I would love to help out to the extent that I can. Please keep me in the loop as efforts on this start to progress.