Availability metrics: aggregate stats series
by Joel Takvorian
Hello all,
I'm still aiming to add some features to the grafana plugin. I've started
to integrate availabilities, but now I'm facing a problem when it comes to
show aggregated availabilities ; for example think about an OpenShift pod
that is scaled up to several instances.
Since availability is basically "up" or "down" (or, to simplify with the
other states such as "unknown", say it's either "up" or "non-up"), I
propose to add this new feature: availability stats with aggregation. The
call would be parameterized with an aggregation method, which would be
either "all of" or "any of": with "all of" we consider that the aggregated
series is UP when all its parts are UP.
It would require a new endpoint since the AvailabilityHandler currently
only expose stats queries with metric id as query parameter - not suitable
for multiple metrics.
Any objection or remark for this feature?
Joel
8 years, 3 months
metrics: REST API evolution when querying for multiple metrics
by Joel Takvorian
Hello,
Following the discussions here:
https://github.com/hawkular/hawkular-metrics/pull/584 I would like to bring
that up:
---------
The problem
I came to some limitation of the current REST API of hawkular-metrics when
adding features to the Grafana plugin. The main point here is that because
of an issue in Grafana (and/or golang) which you've certainly already
discussed before I join, we're limited in the use of @get endpoints when
metric ids are sent as query parameters. Although it has already been
solved in the current version of the grafana plugin, the problem reappears
when I'm trying to add features.
But beside those grafana centric considerations I think that it could be a
good occasion to bring more consistency to the REST API when querying data
for multiple metrics. There's several endpoints that, in my opinion, could
be harmonized.
-----
So what's missing?
(note, by "*" I mean "gauges", "counters", "availability" or "string" ; not
"metrics". When I write "/raw", same applies for "/rate".)
- Some endpoints like "@get */stats" allow to query by list of ids or list
of tags, but there's no equivalent on "@get */raw", and the "@post
*/raw/query" doesn't handle tags
Note that I've already opened a JIRA ticket for query-by-tag
generalization: https://issues.jboss.org/browse/HWKMETRICS-466 . I can't
find any good reason why we could do it when querying stats, but not when
querying raw data.
- On the other hand, endpoints like "@post */raw/query", used for grafana,
has no equivalent for "stats". I would really like to have this one for
Grafana.
- List of metrics sometimes referred as "ids" (class QueryRequest),
sometimes "metrics" (all @get query params). We could add "metrics" in
QueryRequest but keep/deprecate "ids" for compatibility?
-----
Suggestions:
1st Option: keep the API in the current form and bring all endpoints up to
the same level of functionality, that is:
* Create "@get */raw" that can take ids or tags
* Make "@post */raw/query" understand tags
* Create "@post */stats/query" that can take ids or tags
---
2d Option: change the current model to dissociate query by ids or by tags
Basically the idea is to provide different path if we want to query by ids
or by tags, to avoid having the assertion we currently have "must provide
either ids or tags but not both". I think it's cleaner to separate them. Of
course it has the downside of breaking (making deprecated) all query-by-tag
in the current API.
So, queries by ids would be the default one:
* "@get */raw"
* "@get */stats"
* "@post */raw/query"
* "@post */stats/query"
None of them would accept tags
And query by tag would be:
* "@get */raw/tags" (or "*/raw/tags/{tags}" ?)
* "@get */stats/tags"
* "@post */raw/tags/query"
* "@post */stats/tags/query"
None of them would accept ids
---
Other Option?
=> Personally, given the downside of the 2/ I would rather go for the 1st
one.
8 years, 3 months
"simpler" agent config ?
by John Mazzitelli
Some people have complained that the agent configuration file is too "complicated". Now, in defense of the agent, its configuration is nothing more than the WildFly standard configuration file (standalone.xml) that all WildFly subsystems use. Since the agent is a standard WildFly subsystem like all the rest, it uses the same configuration file like every other subsystem (which is nice, we look like any other subsystem and changes made to the agent config via the JBoss CLI are persisted for us, and the JBoss CLI can query the agent config like any other subsystem).
That said, it has been suggested that we provide an "easier to understand" configuration file for those that don't know or care about the WildFly internals (say, for those that want to run the swarm agent, or those that run a standalone agent managing remote resources).
So I wrote a PoC that does this. It uses YAML for a configuration file because apparently its all the rage :-) It is in this PR: https://github.com/hawkular/hawkular-agent/pull/250
A sample config file could look like this:
========
subsystem:
enabled: true
storage-adapter:
url: http://localhost:8080
tenant-id: hawkular
username: jdoe
password: password
managed-servers:
remote-dmr:
- name: My WildFly
enabled: true
host: my-host
port: 9999
username: admin
password: wotgorilla?
========
Some things to note (many of these need to be addressed if this feature is to get merged into master):
1. Not all the things you can configure in standalone.xml you can configure in the yaml. No metric types, resource types, etc. are configurable (though I may need to change this for the JMX stuff because the JMX resource types are not "hard-codable" like the WildFly ones simply because we don't know what MBeans people want to monitor). Of course, the more I make the yaml look like standalone.xml, the more it becomes just as complicated and defeats the purpose.
2. Right now the yaml is read-only. If, for example, you go through the JBoss CLI and enable a disabled managed server or you create a new one (like a new wildfly endpoint) nothing gets written out to the yaml. This is because all the wiring in WildFly Server and the JBoss CLI touches standalone.xml as this is the way things were designed to work inside of WildFly subsystems (WildFly does not want subsystems to have their own external config files - we are going against the design philosophy of the WildFly architecture here). This is not a problem for the swarm-based agent because its standalone.xml-based agent config is already read-only - so there is no difference there.
2a. Not only that, but since the JBoss CLI reads data that ultimately comes from the standalone.xml, it currently will not reflect any data found in the yaml. Again, because Swarm agent doesn't support the JBoss CLI anyway, its a moot point for the swarm agent.
3. Security concerns are for the most part ignored in this PoC.
3a. See the password in clear text? We do not have access to the WildFly vault like we would if we put this configuration in standalone.xml. The agent can encrypt passwords in its configuration, but only if you use standalone.xml for that config (because we just rely on WildFly's vault feature). If you put the config in the external yaml config, we lose that functionality. If we want obfuscation, we have to write our own mechanism and ask people to manage it using the agent's own tools (like we would have to provide a tool to encrypt agent passwords for users to use). (caveat: I don't know if it is possible, but we can look into how the vault works and maybe the agent can make internal API calls to take encrypted text from the yaml and request its decrypted value from the vault? Would need to research this possibility).
3b. WildFly provides the ability to configure "security realms" and allow subsystems to share those realms. This gives subsystems access to keystores and truststores configured in WildFly Server' security realms. These security realms are configured in standalone.xml. There is no such configuration in the yaml - in order to define security realms you need to do it in standalone.xml (these security realm definitions are their own section in standalone.xml, outside of any subsystem config - this is why they are not in the yaml, because they are WildFly configuration settings, not agent configuration). Can we define our own keystore and truststore config settings in the yaml? Yes, probably. But this would be outside of the WildFly standard config and I'm not sure how easily implemented this would be. This is, again, something that needs to be researched.
4. Right now there are no JMX endpoints configurable in the yaml - it only supports local-dmr, remote-dmr, and remote-prometheus. Its doable, but for the PoC I limited the work to include only WildFly servers and Prometheus servers to be configurable in the yaml.
5. The yaml is actually an overlay - for example, if the <subsystem> entry in standalone shows "enabled=true" but the yaml shows "enabled=false" the yaml is overlayed on top of standalone.xml (thus the yaml enabled=false takes effect). If the yaml defines a remote-dmr with name "Foo" and standalone.xml defines a remote-dmr with the same name, the yaml overrides the standalone.xml (any settings in the yaml take effect; if a setting is not in the yaml but is in the standalone.xml, that setting from standalone.xml takes effect).
That is all. Just wanted to bring this up, let people know that we can have a "simpler" agent configuration (with limitations as described above), if we so choose. This is not going to be merged without a more in-depth discussion to see if people really want it and if we really want to support this (it adds some complexity to the implementation, and if we are to work around some of the limitations, more complexity will be introduced).
8 years, 3 months
Business transaction support with opentracing API
by Gary Brown
Hi
The current release of Hawkular APM has introduced support for zipkin clients, so that existing zipkin instrumented applications can now report their span data to the APM server and have it represented in the UI.
Further work is being done in this release to ensure that it works well with polyglot examples, including the Red Hat helloworld MSA example.
This means that in a JVM only environment it is possible to use the non-instrusive byteman approach to instrument the applications, but if wanting to monitor a polyglot environment, then we also can now support zipkin instrumented clients.
However this means that for those zipkin instrumented applications, it will not be possible to take advantage of the "business transaction" capabilities that are available with the non-instrusive instrumentation approach, as shown here: https://vimeo.com/167739840
Looking ahead, we are also investigating providing support for the opentracing API (http://opentracing.io/) - which will provide end users with the ability to more easily switch between backend application tracing solutions.
The purpose of this email is to look at the possible approaches that could be used to introduce business transaction support for opentracing and potentially zipkin.
1) Only support as part of our own opentracing providers
The opentracing API provides the means to record 'log' events against the spans being reported: https://github.com/opentracing/opentracing-java/blob/master/opentracing-a...
As shown in this javascript example:
http.request(opts, function (res) {
res.setEncoding('utf8');
res.on('error', function (err) {
span.logEvent('request_error', err);
span.finish();
});
res.on('data', function (chunk) {
span.logEvent('data_received', chunk);
});
res.on('end', function(err) {
span.logEvent('request_end', err);
span.finish();
});
}).end();
the log event can be used to record data - in this case from the response.
Our opentracing provider could use this mechanism to process request or response data and extract relevant business properties.
The benefit of this approach is that the API used by the application is 'standard' - they just need to ensure the relevant business data is supplied as log events, and then define a business transaction config to process that data to extract the relevant information.
2) Provide additional module used by application above the zipkin/opentracing API
This would require the application to deal with a Hawkular-APM specific client library to preprocess their business messages and then associate relevant properties with the zipkin/opentracing spans.
This approach obviously does not provide a standard API to the application for this aspect, and as it can only deal with the spans, is potentially limited in the actions it can perform - i.e. possibly only extract business properties and representing them as tags (in opentracing) or binary annotations (in zipkin).
My current preference is option 1 as it encapsulates all of the functionality in our own opentracing providers (one per language supported), and gives us a bit more flexibility in terms of what the business txn config actions can perform.
Any feedback/suggestions welcome.
Regards
Gary
8 years, 3 months
Hawkular Alerting 1.1.3.Final has been released!
by Jay Shaughnessy
Hawkular Alerting 1.1.3.Final has been released!
* [HWKALERTS-155] - Failed schema creation because of
OperationTimedOutException
o A useful bug fix
* [HWKALERTS-156] - Propagate type from metrics to alerting
o This is part of the overall hawkular-1102 solution, the alerting
side of improved filtering between metrics and alerting
* [HWKALERTS-157] - Sequential processing of ConditionEval could
create on false alerting
o *Important* bug fix for clients using multi-condition triggers!
For more details for this release:
https://issues.jboss.org/projects/HWKALERTS/versions/12331263
Hawkular Alerting Team
Jay Shaughnessy (jshaughn(a)redhat.com)
Lucas Ponce (lponce(a)redhat.com)
8 years, 3 months
Re: [Hawkular-dev] Hawkular Services 0.0.12.Final released
by Heiko W.Rupp
And now the same for 0.12 with updates to alerts and metrics.
Grab it while it is hot :)
https://github.com/hawkular/hawkular-services/releases/tag/0.0.12.Final
On 6 Sep 2016, at 11:44, Juraci Paixão Kröhling wrote:
> Team,
>
> Hawkular Services 0.0.11.Final has (finally) been released.
>
> As the previous distributions, the Agent has to be configured with an
> user. This can be accomplished by:
>
> - Adding an user via bin/add-user.sh like:
> ./bin/add-user.sh \
> -a \
> -u <theusername> \
> -p <thepassword> \
> -g read-write,read-only
>
> - Changing the Agent's credential on standalone.xml to the credentials
> from the previous step or by passing hawkular.rest.user /
> hawkular.rest.password as system properties
> (-Dhawkular.rest.user=jdoe)
>
> You can find the release packages, sources and checksums at GitHub, in
> addition to Maven:
>
> https://github.com/hawkular/hawkular-services/releases/tag/0.0.11.Final
>
> Shortcuts for the downloads:
> Zip - https://git.io/viGKO
> tar.gz - https://git.io/viGKL
>
> - Juca.
> _______________________________________________
> hawkular-dev mailing list
> hawkular-dev(a)lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hawkular-dev
>
> _______________________________________________
> hawkular-dev mailing list
> hawkular-dev(a)lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hawkular-dev
--
Reg. Adresse: Red Hat GmbH, Technopark II, Haus C,
Werner-von-Siemens-Ring 14, D-85630 Grasbrunn
Handelsregister: Amtsgericht München HRB 153243
Geschäftsführer: Charles Cachera, Michael Cunningham, Michael O'Neill,
Eric Shander
8 years, 3 months
Hawkular Metrics 0.19.0 - Release
by Stefan Negrea
Hello Everybody,
I am happy to announce release 0.19.0 of Hawkular Metrics.This release is
anchored by performance enhancements and a lot of REST API enhancements.
Here is a list of major changes:
1.
*REST API - Query Improvements*
- It is now possible to use relative timestamps when querying for data
via `+` or `-` added to timestamp parameters. When used, the
timestamp is relative to the local system timestamp (HWKMETRICS-358
<https://issues.jboss.org/browse/HWKMETRICS-358>, HWKMETRICS-457
<https://issues.jboss.org/browse/HWKMETRICS-457>)
- Querying for data from earliest available data point has been
expanded to raw data queries for gauges and counters (HWKMETRICS-435
<https://issues.jboss.org/browse/HWKMETRICS-435>)
- `DELETE tenants/{id}` has been added to allow the deletion of an
entire tenant (HWKMETRICS-446
<https://issues.jboss.org/browse/HWKMETRICS-446>)
- AvailabilityType is serialized as simple string in bucket data
points (HWKMETRICS-436
<https://issues.jboss.org/browse/HWKMETRICS-436>)
2.
*Performance Enhancements*
- The write performance has been increased by 10% across the board after
reorganizing the meta-data internal indexes. (HWKMETRICS-422
<https://issues.jboss.org/browse/HWKMETRICS-422>)
- GZIP replaced LZ4 as the compression algorithm for the data table.
This produces anywhere between 60% to 70% disk usage savings over LZ4. (
HWKMETRICS-454 <https://issues.jboss.org/browse/HWKMETRICS-454>)
3.
*Job Scheduler - Improvements*
- The newly introduced internal job scheduler received several
improvements and fixes; the main focus was to enhance the scalability and
fault tolerance.
- The job scheduler will be used only for internal tasks.
- For more details: HWKMETRICS-221
<https://issues.jboss.org/browse/HWKMETRICS-221>, HWKMETRICS-451
<https://issues.jboss.org/browse/HWKMETRICS-451>, HWKMETRICS-453
<https://issues.jboss.org/browse/HWKMETRICS-453>, HWKMETRICS-94
<https://issues.jboss.org/browse/HWKMETRICS-94>
*Hawkular Metrics Clients*
- Python: https://github.com/hawkular/hawkular-client-python
- Go: https://github.com/hawkular/hawkular-client-go
- Ruby: https://github.com/hawkular/hawkular-client-ruby
- Java: https://github.com/hawkular/hawkular-client-java
*Release Links*
Github Release:
https://github.com/hawkular/hawkular-metrics/releases/tag/0.19.0
JBoss Nexus Maven artifacts:
http://origin-repository.jboss.org/nexus/content/repositories/public/org/...
Jira release tracker:
https://issues.jboss.org/browse/HWKMETRICS/fixforversion/12331192
A big "Thank you" goes to John Sanda, Thomas Segismont, Mike Thompson, Matt
Wringe, Michael Burman, Joel Takvorian, and Heiko Rupp for their project
contributions.
Thank you,
Stefan Negrea
8 years, 3 months
managing cassandra cluster
by John Sanda
To date we haven’t really done anything by way of managing/monitoring the Cassandra cluster. We need to monitor Cassandra in order to know things like:
* When additional nodes are needed
* When disk space is low
* When I/O is too slow
* When more heap space is needed
Cassandra exposes a lot of metrics. I created HWKMETRICS-448. It briefly talks about collecting metrics from Cassandra. In terms of managing the cluster, I will provide a few concrete examples that have come up recently in OpenShift.
Scenario 1: User deploys additional node(s) to reduce the load on cluster
After the new node has bootstrapped and is running, we need to run nodetool cleanup on each node (or run it via JMX) in order to remove keys/data that each each node no longer owns; otherwise, disk space won’t be freed up. The cleanup operation can potentially be resource intensive as it triggers compactions. Given this, we probably want to run it one node at a time. Right now the user is left to do this manually.
Scenario 2: User deploys additional node(s) to get replication and fault tolerance
I connect to Cassandra directly via cqlsh and update replication_factor. I then need to run repair on each node can be tricky because 1) it is resource intensive, 2) can take a long time, 3) prone to failure, and 4) Cassandra does not give progress indicators.
Scenario 3: User sets up regularly, scheduled repair to ensure data is consistent across cluster
Once replication_factor > 1, repair needs to be run on a regular basis. More specifically it should be run within gc_grace_seconds which is configured per table and defaults to 10 days. The data table in metrics has reduced gc_grace_seconds to 1 day and probably reduce it to zero since it is append-only. The value for gc_grace_seconds might vary per table based on access patterns, which means the frequency of repair should vary as well.
There has already been some discussion of these things for Hawkular Metrics in the context of OpenShift. It applies to all of Hawkular Services as well. Initially I was thinking about building some management components directly in metrics, but it probably makes more sense as a separate, shared component (or components) that can be reused in both stand alone metrics in OpenShift and a full Hawkular Services deployment in MiQ for example.
We are already running into these scenarios in OpenShift and probably need to start putting something in place sooner rather than later.
8 years, 3 months