Labeling needs ?
by Heiko W.Rupp
Hey,
we have labels in Hawkular-metrics right now, but apparently there are
use cases that are not yet covered (and I know Matt has more)
- Listing keys of tags. Currently one has to know the available keys to
be able to list the available values
- tag values are currently a comma-separated string and not an array,
which may have implications on allowed characters and escaping of
separators
- Post-tagging of data points (tagging could be provoked by another
system that e.g. parses log files and sees an anomaly)
The idea behind it is that one should be able to create tags at
certain points in time for a single metric or a list of metrics to
uniquely identify that point in time for queries (relative performance
of two versions of a deployment).
What else?
6 years, 4 months
hawkular metrics go client - need assistance for a couple issues
by John Mazzitelli
I found two bugs in the Hawkular Go Client and I need assistance:
1. https://github.com/hawkular/hawkular-client-go/issues/10
2. https://github.com/hawkular/hawkular-client-go/issues/12
The first one I have a PR written that appears to fix it (link to PR in the issue). I would need someone to peer review it and merge it if it is OK. Frankly, I don't see how it ever worked - but I suspect that Hawkular Metrics changed the REST API recently in this area and that's why it wasn't seen before.
The second one I do not have a PR, but in the issue I mention a quick and dirty fix, but I don't think it is the proper way to fix it. I need someone to look at that issue and recommend a proper fix. Just need a way or a GoLang API that encodes a string for a URL *path* (not for a URL query string, which is happening here) due to the different ways spaces are encoded. I'd do it myself, but it's really late, and I have to get up early to hopefully meet up with a turkey that is "dying" to make it on my dinner table. Bu Wa Ha... BU WA HAHAHAHAHAHA! :)
That is all.
--John Mazz
6 years, 4 months
Hawkular Inventory Storage and Queries In a Nutshell
by Lukas Krejci
Hi all,
given the recent discussion about the backend we should be using for
inventory, I've quickly put together a brief explanation of what inventory
actually stores and how it queries things.
This is expected to raise more questions than to provide answers though and is
meant as a starter for the discussion that I hope will happen in the comments
section of the google doc [1].
Thanks,
[1] https://docs.google.com/document/d/
1Z1pIQS4EML7_0tuSyHUPX6jzNvaeEqGm4XAiYcRGbvQ/edit?usp=sharing
--
Lukas Krejci
6 years, 5 months
Hawkular Metrics 0.21.0 - Release
by Stefan Negrea
Hello Everybody,
I am happy to announce release 0.21.0 of Hawkular Metrics. This release is
anchored by performance enhancements and general fixes.
Here is a list of major changes:
-
Cassandra
- *Cassandra 3.0.9 is now the supported version of Cassandra.*
- Note: this is a rollback from previously supported version of 3.7
due to Cassandra community recommendations for stability and production
deployment. Cassandra 3.7 or 3.9 are still compatible but development and
testing use 3.0.9 release.
-
Compression
- Fixed an issue that allowed duplicate instances of the compression job
to get scheduled on server restart (HWKMETRICS-492
<https://issues.jboss.org/browse/HWKMETRICS-492>)
- Improved the fault tolerance of the compression job (HWKMETRICS-494
<https://issues.jboss.org/browse/HWKMETRICS-494>)
- Improved the performance of the merge process for reading
compressed data (HWKMETRICS-488
<https://issues.jboss.org/browse/HWKMETRICS-488>)
- Fixed wrong ordering when fetching compressed and uncompressed data
(HWKMETRICS-506 <https://issues.jboss.org/browse/HWKMETRICS-506>)
- Compression job provides back pressure (HWKMETRICS-500
<https://issues.jboss.org/browse/HWKMETRICS-500>)
- The job scheduler now handles failure scenarios (HWKMETRICS-505
<https://issues.jboss.org/browse/HWKMETRICS-505>)
-
Cassandra Schema
- Fixed an issue where the server can fail to start due to Cassalog
being in inconsistent state (HWKMETRICS-495
<https://issues.jboss.org/browse/HWKMETRICS-495>)
- gc_grace_second is set to zero for single node clusters (
HWKMETRICS-381 <https://issues.jboss.org/browse/HWKMETRICS-381>)
-
API Updates
- Inserting data points has server side retries to increase the fault
tolerance for simple error scenarios (HWKMETRICS-510
<https://issues.jboss.org/browse/HWKMETRICS-510>)
- fromEarliest parameter is now supported in all query endpoints (
HWKMETRICS-445 <https://issues.jboss.org/browse/HWKMETRICS-445>)
-
Configuration
- The configuration options did not have a consistent naming scheme.
hawkular-metrics, hawkular.metrics, and hawkular prefixes were used
along no prefixes.
- In this release the naming schema has been standardized to
hawkular.metrics.* for metrics specific configuration and hawkular.*
for general configuration.
- Here is list of all configuration options currently available:
ConfigurationKey
<https://github.com/hawkular/hawkular-metrics/blob/release/0.21.0/api/metr...>
- For more details: HWKMETRICS-508
<https://issues.jboss.org/browse/HWKMETRICS-508>
*Hawkular Alerting - included*
- Version 1.3.0
<https://issues.jboss.org/projects/HWKALERTS/versions/12331985>
- Project details and repository: Github
<https://github.com/hawkular/hawkular-alerts>
- Documentation: REST API Documentation
<http://www.hawkular.org/docs/rest/rest-alerts.html>, Examples
<https://github.com/hawkular/hawkular-alerts/tree/master/examples>,
Developer
Guide
<http://www.hawkular.org/community/docs/developer-guide/alerts.html>
*Hawkular Metrics Clients*
- Python: https://github.com/hawkular/hawkular-client-python
- Go: https://github.com/hawkular/hawkular-client-go
- Ruby: https://github.com/hawkular/hawkular-client-ruby
- Java: https://github.com/hawkular/hawkular-client-java
*Release Links*
Github Release:
https://github.com/hawkular/hawkular-metrics/releases/tag/0.21.0
JBoss Nexus Maven artifacts:
http://origin-repository.jboss.org/nexus/content/repositorie
s/public/org/hawkular/metrics/
Jira release tracker:
https://issues.jboss.org/projects/HWKMETRICS/versions/12331718
A big "Thank you" goes to John Sanda, Matt Wringe, Michael Burman, Joel
Takvorian, Jay Shaughnessy, Lucas Ponce, and Heiko Rupp for their project
contributions.
Thank you,
Stefan Negrea
6 years, 5 months
Memory-usage of Hawkular-services
by Heiko W.Rupp
Hey,
tl;dr: we need to investigate heap usage - especially in the case of
compression
kicking in - it looks like there could be a memory leak. Compression
timing seems
mostly ok.
I originally wanted to see how more feeds influence the metrics
compression timing.
So I started the server with -Xmx512m as I did in all the weeks before
and pointed a
few feeds at the server - to see it crash with OOME shortly after the
compression started.
Now I restarted the server with -Xmx1024m and also
-XX:MaxMetaspaceSize=512m (up from 256m before) and am running the
server now with 1 feed for a day.
To be continued below ...
( Time is in GMT, which is 2h off of my local time)
hawkular_1 | 15:00:44,764 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 15:00:45,452 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
687 ms
hawkular_1 | 17:00:44,757 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 17:00:46,796 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
2039 ms
hawkular_1 | 19:00:44,751 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 19:00:47,293 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
2541 ms
hawkular_1 | 21:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 21:00:46,267 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
1517 ms
hawkular_1 | 23:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 23:00:45,472 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
722 ms
hawkular_1 | 01:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 01:00:46,241 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
1492 ms
hawkular_1 | 03:00:44,747 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 03:00:45,780 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
1033 ms
hawkular_1 | 05:00:44,746 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 05:00:45,781 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
1034 ms
hawkular_1 | 07:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 07:00:46,386 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
1636 ms
hawkular_1 | 09:00:44,748 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 09:00:45,682 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
934 ms
hawkular_1 | 11:00:44,750 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 11:00:46,339 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
1589 ms
hawkular_1 | 13:00:44,748 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 13:00:45,880 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
1132 ms
Looking at the memory usage I see the memory usage there is often a peak
in heap usage around and after the compression - looking at the
accumulated GC-time also shows heavy GC activity. I guess the peak in
heap usage (and thus committed heap) comes from promoting objects from
young generation into old during the compression and then later after
compression is over, they are garbage collected, so that heap usage goes
down and the system is able to free some memory.
Starting at around 4:10pm (2pm in the above output) I am running with 10
extra feeds (hawkfly on an external box)
While it looks like there is a slowly growing non-heap, it was growing a
lot when the 10 extra feeds connected.
Also it looks like it is growing a bit more each time compression kicks
in ("non-heap")
The first compression run with 11 feeds did not take too long, the next
was
hawkular_1 | 17:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 17:00:50,277 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
5528 ms
Which used a lot more memory than before. Non-heap was able to reclaim
some memory though.
hawkular_1 | 19:00:44,751 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 19:00:50,093 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
5341 ms
hawkular_1 | 21:00:44,751 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 21:00:49,465 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
4714 ms
hawkular_1 | 23:00:44,753 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 23:00:48,925 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
4171 ms
hawkular_1 | 01:00:44,750 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 01:00:48,554 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
3803 ms
hawkular_1 | 03:00:44,761 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 03:00:48,659 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
3898 ms
hawkular_1 | 05:00:44,748 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 05:00:49,134 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
4385 ms
hawkular_1 | 07:00:44,755 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 07:00:49,831 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
5076 ms
hawkular_1 | 09:00:44,751 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 09:00:49,508 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
4757 ms
Now at 11:20 (9:20 for above logs) I started 5 more feeds (with a 2min
sleep between starts)
hawkular_1 | 11:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 11:00:49,751 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
5002 ms
hawkular_1 | 13:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 13:00:56,594 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
11845 ms
hawkular_1 | 15:00:44,754 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 15:00:53,985 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
9231 ms
And another 5 starting at 17:12 (15:00 in above logs timezone)
hawkular_1 | 17:00:44,768 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 17:00:57,824 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
13056 ms
And another 5 starting at 19:57 (17:57 in above log timezone )
hawkular_1 | 19:00:44,751 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 19:01:24,401 WARN org.apache.activemq.artemis.ra AMQ152007:
Thread Thread[TearDown/ActiveMQActivation,5,] could not be finished
hawkular_1 | 19:01:40,918 WARN org.apache.activemq.artemis.ra AMQ152007:
Thread Thread[TearDown/ActiveMQActivation,5,default-threads] could not
be finished
hawkular_1 | 19:02:04,619 WARN org.apache.activemq.artemis.ra AMQ152007:
Thread Thread[TearDown/ActiveMQActivation,5,default-threads] could not
be finished
hawkular_1 | 19:02:22,423 WARN org.apache.activemq.artemis.ra AMQ152007:
Thread Thread[TearDown/ActiveMQActivation,5,default-threads] could not
be finished
hawkular_1 | 19:02:30,247 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
105495 ms
This took almost 2 minutes, where the server was non-responsive
21:00:44,753 INFO org.hawkular.metrics.core.jobs.CompressData Starting
execution
21:01:06,520 INFO org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 21767 ms
23:00:44,751 INFO org.hawkular.metrics.core.jobs.CompressData Starting
execution
And here it went south and the server more or less died with OOME
exceptions (it is still responding to queries, and potentially even
ingesting new data, but the scheduler seems not to run anymore.
I can imagine that once the OOME happened, the scheduler thread died,
freeing up memory that the compression job held and which then allowed
the server itself to continue. But it certainly is in an unusable state.
This is the final "one day" with "max" values plotted:

The peaks at compression time for heap-max (green line) are clearly
visible.
6 years, 5 months
hawkular openshift agent integration tests
by John Mazzitelli
(moving this thread to the mailing list for public consumption)
Mazz says,
> As for the Hawkular OpenShift Agent, note that there are no integration tests yet.
> I have no idea how to do that yet. I've got unit tests throughout, but not itests.
> Something we'll need to do eventually. Gotta figure out how to mock an OpenShift environment.
Matt says,
> For origin metrics we run e2e tests. So we point our test to our OpenShift instance,
> it will then deploy our metrics components (with the various different deployment options)
> and then check and see that everything deployed properly, there are no errors or restarts
> and that we can gather metrics.
>
> For the agent we might want to do something similar. I don't know how useful it would be
> to mock up an OpenShift environment when you can just have it directly use one.
>
> ... I would look to do this using the kubernetes client more directly and using a proper test framework.
Matt - I have a followup question:
Does this mean we would not be able to run itests on github commits via Travis? Not that that is a bad thing - I would not be heartbroken if I was told we will not be able to use Travis :) I'm just wondering if this means there will be no CI.
6 years, 5 months
Hawkular Alerting 1.3.0.Final has been released!
by Jay Shaughnessy
The Hawkular Alerting team is happy to announce the release of Hawkular
Alerting 1.3.0.Final.
This is a feature and fix release.
* [HWKALERTS-176] - Support conditions on missing events and data
o An *exciting* new alerting feature! This introduces
*MissingCondition*. MissingConditions let you generate alerts or
events when expected data fails to report, or when an expected
event does not happen.
* [HWKALERTS-174] - Add CORS filters
o Cross Origin Resource Sharing support allows for optional
request origin validation.
* [HWKALERTS-181] - Add clustering information on status endpoint
o The /status endpoint now reflects cluster topology!
* [HWKALERTS-175] - Improvements on webhook plugins
* [HWKALERTS-177] - Add new perf tests to study asynchronous send*() calls
Additionally, this is the first Hawkular Alerting release to deliver
Alerting features in *3* different distributions:
* The alerting engine used inside Hawkular Services and supporting the
Middleware Provider in ManageIQ.
* As a Standalone alerting engine for general use.
* And soon to be released, embedded inside Hawkular Metrics!!
For more details for this release:
https://issues.jboss.org/secure/ReleaseNote.jspa?projectId=12315924&versi...
Hawkular Alerting Team
Jay Shaughnessy (jshaughn(a)redhat.com)
Lucas Ponce (lponce(a)redhat.com)
6 years, 5 months