October 2016 - hawkular-dev - Jboss List Archives

by John Mazzitelli

There was some concern that the feed ID autogenerated by the agent might not be what a person always wants (e.g. feed ID for WF10 agents will be the unique UUID of the wildfly server). So, if you want to define your own feed ID, you can configure it now. In the agent's standalone.xml <storage-adapter> you can specify a feedId attribute if you want. By default, it isn't specified, so the agent will autogenerate a unique feed ID for itself (this should really be the normal mode of operation, but some people like to complain, so I made it easy to shut them up :-) This is in master. Will be in the next release.

15 hours, 33 minutes

2
1
0 / 0

"Getting Things Done" (David Allen) System: Artificer

by Brett Meyer

FWIW, I've been experimenting with Artificer as a "Getting Things Done" solution, mainly as a bit of dogfooding. However, it fits in surprisingly well. I'll be maintaining a demo that showcases a GTD ontology/classifier system, showing Artificer as not-just-another-artifact-repo, but also a powerful information and reference system. https://developer.jboss.org/en/artificer/blog/2015/03/31/getting-things-d...

5 months, 2 weeks

3
2
0 / 0

Labeling needs ?

by Heiko W.Rupp

Hey, we have labels in Hawkular-metrics right now, but apparently there are use cases that are not yet covered (and I know Matt has more) - Listing keys of tags. Currently one has to know the available keys to be able to list the available values - tag values are currently a comma-separated string and not an array, which may have implications on allowed characters and escaping of separators - Post-tagging of data points (tagging could be provoked by another system that e.g. parses log files and sees an anomaly) The idea behind it is that one should be able to create tags at certain points in time for a single metric or a list of metrics to uniquely identify that point in time for queries (relative performance of two versions of a deployment). What else?

7 years, 8 months

6
9
0 / 0

hawkular metrics go client - need assistance for a couple issues

by John Mazzitelli

I found two bugs in the Hawkular Go Client and I need assistance: 1. https://github.com/hawkular/hawkular-client-go/issues/10 2. https://github.com/hawkular/hawkular-client-go/issues/12 The first one I have a PR written that appears to fix it (link to PR in the issue). I would need someone to peer review it and merge it if it is OK. Frankly, I don't see how it ever worked - but I suspect that Hawkular Metrics changed the REST API recently in this area and that's why it wasn't seen before. The second one I do not have a PR, but in the issue I mention a quick and dirty fix, but I don't think it is the proper way to fix it. I need someone to look at that issue and recommend a proper fix. Just need a way or a GoLang API that encodes a string for a URL *path* (not for a URL query string, which is happening here) due to the different ways spaces are encoded. I'd do it myself, but it's really late, and I have to get up early to hopefully meet up with a turkey that is "dying" to make it on my dinner table. Bu Wa Ha... BU WA HAHAHAHAHAHA! :) That is all. --John Mazz

7 years, 9 months

2
4
0 / 0

Hawkular Inventory Storage and Queries In a Nutshell

by Lukas Krejci

Hi all, given the recent discussion about the backend we should be using for inventory, I've quickly put together a brief explanation of what inventory actually stores and how it queries things. This is expected to raise more questions than to provide answers though and is meant as a starter for the discussion that I hope will happen in the comments section of the google doc [1]. Thanks, [1] https://docs.google.com/document/d/ 1Z1pIQS4EML7_0tuSyHUPX6jzNvaeEqGm4XAiYcRGbvQ/edit?usp=sharing -- Lukas Krejci

7 years, 9 months

2
1
0 / 0

Hawkular Metrics 0.21.0 - Release

by Stefan Negrea

Hello Everybody, I am happy to announce release 0.21.0 of Hawkular Metrics. This release is anchored by performance enhancements and general fixes. Here is a list of major changes: - Cassandra - *Cassandra 3.0.9 is now the supported version of Cassandra.* - Note: this is a rollback from previously supported version of 3.7 due to Cassandra community recommendations for stability and production deployment. Cassandra 3.7 or 3.9 are still compatible but development and testing use 3.0.9 release. - Compression - Fixed an issue that allowed duplicate instances of the compression job to get scheduled on server restart (HWKMETRICS-492 <https://issues.jboss.org/browse/HWKMETRICS-492>) - Improved the fault tolerance of the compression job (HWKMETRICS-494 <https://issues.jboss.org/browse/HWKMETRICS-494>) - Improved the performance of the merge process for reading compressed data (HWKMETRICS-488 <https://issues.jboss.org/browse/HWKMETRICS-488>) - Fixed wrong ordering when fetching compressed and uncompressed data (HWKMETRICS-506 <https://issues.jboss.org/browse/HWKMETRICS-506>) - Compression job provides back pressure (HWKMETRICS-500 <https://issues.jboss.org/browse/HWKMETRICS-500>) - The job scheduler now handles failure scenarios (HWKMETRICS-505 <https://issues.jboss.org/browse/HWKMETRICS-505>) - Cassandra Schema - Fixed an issue where the server can fail to start due to Cassalog being in inconsistent state (HWKMETRICS-495 <https://issues.jboss.org/browse/HWKMETRICS-495>) - gc_grace_second is set to zero for single node clusters ( HWKMETRICS-381 <https://issues.jboss.org/browse/HWKMETRICS-381>) - API Updates - Inserting data points has server side retries to increase the fault tolerance for simple error scenarios (HWKMETRICS-510 <https://issues.jboss.org/browse/HWKMETRICS-510>) - fromEarliest parameter is now supported in all query endpoints ( HWKMETRICS-445 <https://issues.jboss.org/browse/HWKMETRICS-445>) - Configuration - The configuration options did not have a consistent naming scheme. hawkular-metrics, hawkular.metrics, and hawkular prefixes were used along no prefixes. - In this release the naming schema has been standardized to hawkular.metrics.* for metrics specific configuration and hawkular.* for general configuration. - Here is list of all configuration options currently available: ConfigurationKey <https://github.com/hawkular/hawkular-metrics/blob/release/0.21.0/api/metr...> - For more details: HWKMETRICS-508 <https://issues.jboss.org/browse/HWKMETRICS-508> *Hawkular Alerting - included* - Version 1.3.0 <https://issues.jboss.org/projects/HWKALERTS/versions/12331985> - Project details and repository: Github <https://github.com/hawkular/hawkular-alerts> - Documentation: REST API Documentation <http://www.hawkular.org/docs/rest/rest-alerts.html>, Examples <https://github.com/hawkular/hawkular-alerts/tree/master/examples>, Developer Guide <http://www.hawkular.org/community/docs/developer-guide/alerts.html> *Hawkular Metrics Clients* - Python: https://github.com/hawkular/hawkular-client-python - Go: https://github.com/hawkular/hawkular-client-go - Ruby: https://github.com/hawkular/hawkular-client-ruby - Java: https://github.com/hawkular/hawkular-client-java *Release Links* Github Release: https://github.com/hawkular/hawkular-metrics/releases/tag/0.21.0 JBoss Nexus Maven artifacts: http://origin-repository.jboss.org/nexus/content/repositorie s/public/org/hawkular/metrics/ Jira release tracker: https://issues.jboss.org/projects/HWKMETRICS/versions/12331718 A big "Thank you" goes to John Sanda, Matt Wringe, Michael Burman, Joel Takvorian, Jay Shaughnessy, Lucas Ponce, and Heiko Rupp for their project contributions. Thank you, Stefan Negrea

7 years, 9 months

4
4
0 / 0

Memory-usage of Hawkular-services

by Heiko W.Rupp

Hey, tl;dr: we need to investigate heap usage - especially in the case of compression kicking in - it looks like there could be a memory leak. Compression timing seems mostly ok. I originally wanted to see how more feeds influence the metrics compression timing. So I started the server with -Xmx512m as I did in all the weeks before and pointed a few feeds at the server - to see it crash with OOME shortly after the compression started. Now I restarted the server with -Xmx1024m and also -XX:MaxMetaspaceSize=512m (up from 256m before) and am running the server now with 1 feed for a day. To be continued below ... ( Time is in GMT, which is 2h off of my local time) hawkular_1 | 15:00:44,764 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution hawkular_1 | 15:00:45,452 INFO org.hawkular.metrics.core.jobs.CompressData Finished compressing data in 687 ms hawkular_1 | 17:00:44,757 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution hawkular_1 | 17:00:46,796 INFO org.hawkular.metrics.core.jobs.CompressData Finished compressing data in 2039 ms hawkular_1 | 19:00:44,751 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution hawkular_1 | 19:00:47,293 INFO org.hawkular.metrics.core.jobs.CompressData Finished compressing data in 2541 ms hawkular_1 | 21:00:44,749 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution hawkular_1 | 21:00:46,267 INFO org.hawkular.metrics.core.jobs.CompressData Finished compressing data in 1517 ms hawkular_1 | 23:00:44,749 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution hawkular_1 | 23:00:45,472 INFO org.hawkular.metrics.core.jobs.CompressData Finished compressing data in 722 ms hawkular_1 | 01:00:44,749 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution hawkular_1 | 01:00:46,241 INFO org.hawkular.metrics.core.jobs.CompressData Finished compressing data in 1492 ms hawkular_1 | 03:00:44,747 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution hawkular_1 | 03:00:45,780 INFO org.hawkular.metrics.core.jobs.CompressData Finished compressing data in 1033 ms hawkular_1 | 05:00:44,746 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution hawkular_1 | 05:00:45,781 INFO org.hawkular.metrics.core.jobs.CompressData Finished compressing data in 1034 ms hawkular_1 | 07:00:44,749 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution hawkular_1 | 07:00:46,386 INFO org.hawkular.metrics.core.jobs.CompressData Finished compressing data in 1636 ms hawkular_1 | 09:00:44,748 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution hawkular_1 | 09:00:45,682 INFO org.hawkular.metrics.core.jobs.CompressData Finished compressing data in 934 ms hawkular_1 | 11:00:44,750 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution hawkular_1 | 11:00:46,339 INFO org.hawkular.metrics.core.jobs.CompressData Finished compressing data in 1589 ms hawkular_1 | 13:00:44,748 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution hawkular_1 | 13:00:45,880 INFO org.hawkular.metrics.core.jobs.CompressData Finished compressing data in 1132 ms Looking at the memory usage I see the memory usage there is often a peak in heap usage around and after the compression - looking at the accumulated GC-time also shows heavy GC activity. I guess the peak in heap usage (and thus committed heap) comes from promoting objects from young generation into old during the compression and then later after compression is over, they are garbage collected, so that heap usage goes down and the system is able to free some memory. Starting at around 4:10pm (2pm in the above output) I am running with 10 extra feeds (hawkfly on an external box) While it looks like there is a slowly growing non-heap, it was growing a lot when the 10 extra feeds connected. Also it looks like it is growing a bit more each time compression kicks in ("non-heap") The first compression run with 11 feeds did not take too long, the next was hawkular_1 | 17:00:44,749 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution hawkular_1 | 17:00:50,277 INFO org.hawkular.metrics.core.jobs.CompressData Finished compressing data in 5528 ms Which used a lot more memory than before. Non-heap was able to reclaim some memory though. hawkular_1 | 19:00:44,751 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution hawkular_1 | 19:00:50,093 INFO org.hawkular.metrics.core.jobs.CompressData Finished compressing data in 5341 ms hawkular_1 | 21:00:44,751 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution hawkular_1 | 21:00:49,465 INFO org.hawkular.metrics.core.jobs.CompressData Finished compressing data in 4714 ms hawkular_1 | 23:00:44,753 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution hawkular_1 | 23:00:48,925 INFO org.hawkular.metrics.core.jobs.CompressData Finished compressing data in 4171 ms hawkular_1 | 01:00:44,750 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution hawkular_1 | 01:00:48,554 INFO org.hawkular.metrics.core.jobs.CompressData Finished compressing data in 3803 ms hawkular_1 | 03:00:44,761 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution hawkular_1 | 03:00:48,659 INFO org.hawkular.metrics.core.jobs.CompressData Finished compressing data in 3898 ms hawkular_1 | 05:00:44,748 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution hawkular_1 | 05:00:49,134 INFO org.hawkular.metrics.core.jobs.CompressData Finished compressing data in 4385 ms hawkular_1 | 07:00:44,755 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution hawkular_1 | 07:00:49,831 INFO org.hawkular.metrics.core.jobs.CompressData Finished compressing data in 5076 ms hawkular_1 | 09:00:44,751 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution hawkular_1 | 09:00:49,508 INFO org.hawkular.metrics.core.jobs.CompressData Finished compressing data in 4757 ms Now at 11:20 (9:20 for above logs) I started 5 more feeds (with a 2min sleep between starts) hawkular_1 | 11:00:44,749 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution hawkular_1 | 11:00:49,751 INFO org.hawkular.metrics.core.jobs.CompressData Finished compressing data in 5002 ms hawkular_1 | 13:00:44,749 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution hawkular_1 | 13:00:56,594 INFO org.hawkular.metrics.core.jobs.CompressData Finished compressing data in 11845 ms hawkular_1 | 15:00:44,754 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution hawkular_1 | 15:00:53,985 INFO org.hawkular.metrics.core.jobs.CompressData Finished compressing data in 9231 ms And another 5 starting at 17:12 (15:00 in above logs timezone) hawkular_1 | 17:00:44,768 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution hawkular_1 | 17:00:57,824 INFO org.hawkular.metrics.core.jobs.CompressData Finished compressing data in 13056 ms And another 5 starting at 19:57 (17:57 in above log timezone ) hawkular_1 | 19:00:44,751 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution hawkular_1 | 19:01:24,401 WARN org.apache.activemq.artemis.ra AMQ152007: Thread Thread[TearDown/ActiveMQActivation,5,] could not be finished hawkular_1 | 19:01:40,918 WARN org.apache.activemq.artemis.ra AMQ152007: Thread Thread[TearDown/ActiveMQActivation,5,default-threads] could not be finished hawkular_1 | 19:02:04,619 WARN org.apache.activemq.artemis.ra AMQ152007: Thread Thread[TearDown/ActiveMQActivation,5,default-threads] could not be finished hawkular_1 | 19:02:22,423 WARN org.apache.activemq.artemis.ra AMQ152007: Thread Thread[TearDown/ActiveMQActivation,5,default-threads] could not be finished hawkular_1 | 19:02:30,247 INFO org.hawkular.metrics.core.jobs.CompressData Finished compressing data in 105495 ms This took almost 2 minutes, where the server was non-responsive 21:00:44,753 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution 21:01:06,520 INFO org.hawkular.metrics.core.jobs.CompressData Finished compressing data in 21767 ms 23:00:44,751 INFO org.hawkular.metrics.core.jobs.CompressData Starting execution And here it went south and the server more or less died with OOME exceptions (it is still responding to queries, and potentially even ingesting new data, but the scheduler seems not to run anymore. I can imagine that once the OOME happened, the scheduler thread died, freeing up memory that the compression job held and which then allowed the server itself to continue. But it certainly is in an unusable state. This is the final "one day" with "max" values plotted: ![](cid:7C83CDE5-A45F-42C6-853A-2956DC3E1719@redhat.com "Bildschirmfoto 2016-10-11 um 08.21.53.png") The peaks at compression time for heap-max (green line) are clearly visible.

7 years, 9 months

2
7
0 / 0

hawkular-openshift-agent can limit what pods are monitored

by John Mazzitelli

The Hawkular OpenShift Agent will have a new feature once the PR is merged - a pod authorization list. You can now tell the agent (via environment variable or in its YAML config) to only authorize a specific set of pods (i.e. only the pods on the list can have their metrics collected) - any other pods not on the list are ignored and will not have metrics collected for them. See: https://github.com/hawkular/hawkular-openshift-agent/issues/6 https://github.com/hawkular/hawkular-openshift-agent/pull/13

7 years, 9 months

1
0
0 / 0

hawkular-openshift-agent now supports jolokia endpoints

by John Mazzitelli

Hawkular OpenShift Agent now supports Jolokia endpoints. This means we can now scrape metrics from both Prometheus endpoints and Jolokia endpoints running inside OpenShift pods (it also supports endpoints if the agent is running outside OS, too). This PR needs to be reviewed and merged for this feature to get in: https://github.com/hawkular/hawkular-openshift-agent/pull/10 There are several issues that need to be addressed to further build out this agent. If interested in working on this, see https://github.com/hawkular/hawkular-openshift-agent/issues

7 years, 9 months

1
0
0 / 0

hawkular openshift agent integration tests

by John Mazzitelli

(moving this thread to the mailing list for public consumption) Mazz says, > As for the Hawkular OpenShift Agent, note that there are no integration tests yet. > I have no idea how to do that yet. I've got unit tests throughout, but not itests. > Something we'll need to do eventually. Gotta figure out how to mock an OpenShift environment. Matt says, > For origin metrics we run e2e tests. So we point our test to our OpenShift instance, > it will then deploy our metrics components (with the various different deployment options) > and then check and see that everything deployed properly, there are no errors or restarts > and that we can gather metrics. > > For the agent we might want to do something similar. I don't know how useful it would be > to mock up an OpenShift environment when you can just have it directly use one. > > ... I would look to do this using the kubernetes client more directly and using a proper test framework. Matt - I have a followup question: Does this mean we would not be able to run itests on github commits via Travis? Not that that is a bad thing - I would not be heartbroken if I was told we will not be able to use Travis :) I'm just wondering if this means there will be no CI.

7 years, 9 months

2
1
0 / 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

hawkular-dev October 2016