Hawkular Inventory Storage and Queries In a Nutshell
by Lukas Krejci
Hi all,
given the recent discussion about the backend we should be using for
inventory, I've quickly put together a brief explanation of what inventory
actually stores and how it queries things.
This is expected to raise more questions than to provide answers though and is
meant as a starter for the discussion that I hope will happen in the comments
section of the google doc [1].
Thanks,
[1] https://docs.google.com/document/d/
1Z1pIQS4EML7_0tuSyHUPX6jzNvaeEqGm4XAiYcRGbvQ/edit?usp=sharing
--
Lukas Krejci
8 years, 2 months
Hawkular Metrics 0.21.0 - Release
by Stefan Negrea
Hello Everybody,
I am happy to announce release 0.21.0 of Hawkular Metrics. This release is
anchored by performance enhancements and general fixes.
Here is a list of major changes:
-
Cassandra
- *Cassandra 3.0.9 is now the supported version of Cassandra.*
- Note: this is a rollback from previously supported version of 3.7
due to Cassandra community recommendations for stability and production
deployment. Cassandra 3.7 or 3.9 are still compatible but development and
testing use 3.0.9 release.
-
Compression
- Fixed an issue that allowed duplicate instances of the compression job
to get scheduled on server restart (HWKMETRICS-492
<https://issues.jboss.org/browse/HWKMETRICS-492>)
- Improved the fault tolerance of the compression job (HWKMETRICS-494
<https://issues.jboss.org/browse/HWKMETRICS-494>)
- Improved the performance of the merge process for reading
compressed data (HWKMETRICS-488
<https://issues.jboss.org/browse/HWKMETRICS-488>)
- Fixed wrong ordering when fetching compressed and uncompressed data
(HWKMETRICS-506 <https://issues.jboss.org/browse/HWKMETRICS-506>)
- Compression job provides back pressure (HWKMETRICS-500
<https://issues.jboss.org/browse/HWKMETRICS-500>)
- The job scheduler now handles failure scenarios (HWKMETRICS-505
<https://issues.jboss.org/browse/HWKMETRICS-505>)
-
Cassandra Schema
- Fixed an issue where the server can fail to start due to Cassalog
being in inconsistent state (HWKMETRICS-495
<https://issues.jboss.org/browse/HWKMETRICS-495>)
- gc_grace_second is set to zero for single node clusters (
HWKMETRICS-381 <https://issues.jboss.org/browse/HWKMETRICS-381>)
-
API Updates
- Inserting data points has server side retries to increase the fault
tolerance for simple error scenarios (HWKMETRICS-510
<https://issues.jboss.org/browse/HWKMETRICS-510>)
- fromEarliest parameter is now supported in all query endpoints (
HWKMETRICS-445 <https://issues.jboss.org/browse/HWKMETRICS-445>)
-
Configuration
- The configuration options did not have a consistent naming scheme.
hawkular-metrics, hawkular.metrics, and hawkular prefixes were used
along no prefixes.
- In this release the naming schema has been standardized to
hawkular.metrics.* for metrics specific configuration and hawkular.*
for general configuration.
- Here is list of all configuration options currently available:
ConfigurationKey
<https://github.com/hawkular/hawkular-metrics/blob/release/0.21.0/api/metr...>
- For more details: HWKMETRICS-508
<https://issues.jboss.org/browse/HWKMETRICS-508>
*Hawkular Alerting - included*
- Version 1.3.0
<https://issues.jboss.org/projects/HWKALERTS/versions/12331985>
- Project details and repository: Github
<https://github.com/hawkular/hawkular-alerts>
- Documentation: REST API Documentation
<http://www.hawkular.org/docs/rest/rest-alerts.html>, Examples
<https://github.com/hawkular/hawkular-alerts/tree/master/examples>,
Developer
Guide
<http://www.hawkular.org/community/docs/developer-guide/alerts.html>
*Hawkular Metrics Clients*
- Python: https://github.com/hawkular/hawkular-client-python
- Go: https://github.com/hawkular/hawkular-client-go
- Ruby: https://github.com/hawkular/hawkular-client-ruby
- Java: https://github.com/hawkular/hawkular-client-java
*Release Links*
Github Release:
https://github.com/hawkular/hawkular-metrics/releases/tag/0.21.0
JBoss Nexus Maven artifacts:
http://origin-repository.jboss.org/nexus/content/repositorie
s/public/org/hawkular/metrics/
Jira release tracker:
https://issues.jboss.org/projects/HWKMETRICS/versions/12331718
A big "Thank you" goes to John Sanda, Matt Wringe, Michael Burman, Joel
Takvorian, Jay Shaughnessy, Lucas Ponce, and Heiko Rupp for their project
contributions.
Thank you,
Stefan Negrea
8 years, 2 months
Memory-usage of Hawkular-services
by Heiko W.Rupp
Hey,
tl;dr: we need to investigate heap usage - especially in the case of
compression
kicking in - it looks like there could be a memory leak. Compression
timing seems
mostly ok.
I originally wanted to see how more feeds influence the metrics
compression timing.
So I started the server with -Xmx512m as I did in all the weeks before
and pointed a
few feeds at the server - to see it crash with OOME shortly after the
compression started.
Now I restarted the server with -Xmx1024m and also
-XX:MaxMetaspaceSize=512m (up from 256m before) and am running the
server now with 1 feed for a day.
To be continued below ...
( Time is in GMT, which is 2h off of my local time)
hawkular_1 | 15:00:44,764 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 15:00:45,452 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
687 ms
hawkular_1 | 17:00:44,757 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 17:00:46,796 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
2039 ms
hawkular_1 | 19:00:44,751 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 19:00:47,293 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
2541 ms
hawkular_1 | 21:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 21:00:46,267 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
1517 ms
hawkular_1 | 23:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 23:00:45,472 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
722 ms
hawkular_1 | 01:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 01:00:46,241 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
1492 ms
hawkular_1 | 03:00:44,747 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 03:00:45,780 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
1033 ms
hawkular_1 | 05:00:44,746 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 05:00:45,781 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
1034 ms
hawkular_1 | 07:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 07:00:46,386 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
1636 ms
hawkular_1 | 09:00:44,748 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 09:00:45,682 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
934 ms
hawkular_1 | 11:00:44,750 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 11:00:46,339 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
1589 ms
hawkular_1 | 13:00:44,748 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 13:00:45,880 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
1132 ms
Looking at the memory usage I see the memory usage there is often a peak
in heap usage around and after the compression - looking at the
accumulated GC-time also shows heavy GC activity. I guess the peak in
heap usage (and thus committed heap) comes from promoting objects from
young generation into old during the compression and then later after
compression is over, they are garbage collected, so that heap usage goes
down and the system is able to free some memory.
Starting at around 4:10pm (2pm in the above output) I am running with 10
extra feeds (hawkfly on an external box)
While it looks like there is a slowly growing non-heap, it was growing a
lot when the 10 extra feeds connected.
Also it looks like it is growing a bit more each time compression kicks
in ("non-heap")
The first compression run with 11 feeds did not take too long, the next
was
hawkular_1 | 17:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 17:00:50,277 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
5528 ms
Which used a lot more memory than before. Non-heap was able to reclaim
some memory though.
hawkular_1 | 19:00:44,751 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 19:00:50,093 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
5341 ms
hawkular_1 | 21:00:44,751 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 21:00:49,465 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
4714 ms
hawkular_1 | 23:00:44,753 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 23:00:48,925 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
4171 ms
hawkular_1 | 01:00:44,750 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 01:00:48,554 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
3803 ms
hawkular_1 | 03:00:44,761 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 03:00:48,659 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
3898 ms
hawkular_1 | 05:00:44,748 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 05:00:49,134 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
4385 ms
hawkular_1 | 07:00:44,755 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 07:00:49,831 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
5076 ms
hawkular_1 | 09:00:44,751 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 09:00:49,508 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
4757 ms
Now at 11:20 (9:20 for above logs) I started 5 more feeds (with a 2min
sleep between starts)
hawkular_1 | 11:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 11:00:49,751 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
5002 ms
hawkular_1 | 13:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 13:00:56,594 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
11845 ms
hawkular_1 | 15:00:44,754 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 15:00:53,985 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
9231 ms
And another 5 starting at 17:12 (15:00 in above logs timezone)
hawkular_1 | 17:00:44,768 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 17:00:57,824 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
13056 ms
And another 5 starting at 19:57 (17:57 in above log timezone )
hawkular_1 | 19:00:44,751 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 19:01:24,401 WARN org.apache.activemq.artemis.ra AMQ152007:
Thread Thread[TearDown/ActiveMQActivation,5,] could not be finished
hawkular_1 | 19:01:40,918 WARN org.apache.activemq.artemis.ra AMQ152007:
Thread Thread[TearDown/ActiveMQActivation,5,default-threads] could not
be finished
hawkular_1 | 19:02:04,619 WARN org.apache.activemq.artemis.ra AMQ152007:
Thread Thread[TearDown/ActiveMQActivation,5,default-threads] could not
be finished
hawkular_1 | 19:02:22,423 WARN org.apache.activemq.artemis.ra AMQ152007:
Thread Thread[TearDown/ActiveMQActivation,5,default-threads] could not
be finished
hawkular_1 | 19:02:30,247 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data in
105495 ms
This took almost 2 minutes, where the server was non-responsive
21:00:44,753 INFO org.hawkular.metrics.core.jobs.CompressData Starting
execution
21:01:06,520 INFO org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 21767 ms
23:00:44,751 INFO org.hawkular.metrics.core.jobs.CompressData Starting
execution
And here it went south and the server more or less died with OOME
exceptions (it is still responding to queries, and potentially even
ingesting new data, but the scheduler seems not to run anymore.
I can imagine that once the OOME happened, the scheduler thread died,
freeing up memory that the compression job held and which then allowed
the server itself to continue. But it certainly is in an unusable state.
This is the final "one day" with "max" values plotted:
![](cid:7C83CDE5-A45F-42C6-853A-2956DC3E1719@redhat.com "Bildschirmfoto
2016-10-11 um 08.21.53.png")
The peaks at compression time for heap-max (green line) are clearly
visible.
8 years, 2 months
hawkular openshift agent integration tests
by John Mazzitelli
(moving this thread to the mailing list for public consumption)
Mazz says,
> As for the Hawkular OpenShift Agent, note that there are no integration tests yet.
> I have no idea how to do that yet. I've got unit tests throughout, but not itests.
> Something we'll need to do eventually. Gotta figure out how to mock an OpenShift environment.
Matt says,
> For origin metrics we run e2e tests. So we point our test to our OpenShift instance,
> it will then deploy our metrics components (with the various different deployment options)
> and then check and see that everything deployed properly, there are no errors or restarts
> and that we can gather metrics.
>
> For the agent we might want to do something similar. I don't know how useful it would be
> to mock up an OpenShift environment when you can just have it directly use one.
>
> ... I would look to do this using the kubernetes client more directly and using a proper test framework.
Matt - I have a followup question:
Does this mean we would not be able to run itests on github commits via Travis? Not that that is a bad thing - I would not be heartbroken if I was told we will not be able to use Travis :) I'm just wondering if this means there will be no CI.
8 years, 2 months
Hawkular Alerting 1.3.0.Final has been released!
by Jay Shaughnessy
The Hawkular Alerting team is happy to announce the release of Hawkular
Alerting 1.3.0.Final.
This is a feature and fix release.
* [HWKALERTS-176] - Support conditions on missing events and data
o An *exciting* new alerting feature! This introduces
*MissingCondition*. MissingConditions let you generate alerts or
events when expected data fails to report, or when an expected
event does not happen.
* [HWKALERTS-174] - Add CORS filters
o Cross Origin Resource Sharing support allows for optional
request origin validation.
* [HWKALERTS-181] - Add clustering information on status endpoint
o The /status endpoint now reflects cluster topology!
* [HWKALERTS-175] - Improvements on webhook plugins
* [HWKALERTS-177] - Add new perf tests to study asynchronous send*() calls
Additionally, this is the first Hawkular Alerting release to deliver
Alerting features in *3* different distributions:
* The alerting engine used inside Hawkular Services and supporting the
Middleware Provider in ManageIQ.
* As a Standalone alerting engine for general use.
* And soon to be released, embedded inside Hawkular Metrics!!
For more details for this release:
https://issues.jboss.org/secure/ReleaseNote.jspa?projectId=12315924&versi...
Hawkular Alerting Team
Jay Shaughnessy (jshaughn(a)redhat.com)
Lucas Ponce (lponce(a)redhat.com)
8 years, 2 months
Hawkular OpenShift Agent is now available
by John Mazzitelli
(FYI: we had an "underwhelming" participation in the naming poll. What was decided upon was a more descriptive name rather than some code name anyway)
Hawkular OpenShift Agent source has been published on github.com here:
https://github.com/hawkular/hawkular-openshift-agent
For now, we'll track issues in github until we figure out what we want to do (if anything) in JIRA and where.
If interested, read the README - it provides build and run instructions.
Currently the Hawkular OpenShift Agent supports the following:
1) Watches OpenShift as pods and configmaps are added, modified, and removed
2) As things change in OpenShift, the agent adjusts what it monitors
3) All metric data is stored in Hawkular Metrics
4) Pods tell the agent what should be monitored via an annotation which names a config map. In that config map is a single YAML configuration that contains all the endpoint information the agent needs in order to monitor it and store its data. Pods can ask for multiple endpoints to be monitored, and multiple pods within the node can be monitored - but only one node is monitored. If you have multiple nodes, you need one agent per node.
5) Each endpoint data can have its data stored in its own tenant (as defined in the config map yaml)
6) The agent can monitor any endpoints you define in the global agent config file also - you don't need to have pods/config maps for this (this is useful if the agent is running outside of Open Shift or there are some things in the agent's node that you want monitored without having to look up pods/configmaps).
7) Currently, Prometheus endpoints are supported (both binary and text protocols).
There are many things we need to get done.
* Jolokia support is not yet implemented
* Secure access (encryption and authentication) to both Hawkular-Metrics and the metric endpoints
* Details on how to run the agent within Open Shift (daemon set?)
* Tag the metrics being stored (there are no tags being associated with the metrics yet)
* Determine the names of the metrics (right now its just using the names of the prometheus metrics as-is)
* etc, etc, etc
Many thanks to Matt Wringe who got this kicked off with his ideas and Open Shift integration code which was the foundation of the current codebase.
--John Mazz
8 years, 2 months
OpenShift Pet vs Cattle metaphor
by Jiri Kremser
Hello,
today, I was on L&L about storage in OpenShift and I learn interesting
thing. I always thought, that everything needs to be immutable and
stateless and all the state needs to be handled by means of NFS persistent
volumes. Luckily, there is a feature in Kubernetess (since 1.3) that allows
the PODs to be treated as pets. It's called PetSet [1] and it assigns a
unique ID (and persistent DNS record) to a POD that runs in this "mode".
Common use-case for PetSet is a set of pods with a relational DBs that uses
some kind of master-slave replication and slaves needs to know the master's
address. But it can be used for anything. We can use the hostname as the
feed id for instance.
I don't know how much this will be popular because it kind of defeats the
purpose of immutable infrastructure but it can save us some work with the
feed identity. And of course we need to support also the "normal" POD
scenario.
[1]: http://kubernetes.io/docs/user-guide/petset/
jk
8 years, 2 months
New repo for various travis scripts?
by Joel Takvorian
Hi,
I just wonder if we should create a new git repo to store the different
files that are required for integration tests on the hawkular clients
(ruby, java, now dropwizard...). For now there's just 2 required files
afaik: ".travis/wait_for_services.rb" and the docker-compose file, but
there may be more in the future.
So, rather than storing a copy of each file in each client that use that
docker-based integration tests, isn't it better to store them in a new repo
and download them from travis script?
There's also a maven install script that I picked from inventory and copied
to the java client repo, that would fit in this scripts repo as well.
Joel
8 years, 2 months