"simpler" agent config ?
by John Mazzitelli
Some people have complained that the agent configuration file is too "complicated". Now, in defense of the agent, its configuration is nothing more than the WildFly standard configuration file (standalone.xml) that all WildFly subsystems use. Since the agent is a standard WildFly subsystem like all the rest, it uses the same configuration file like every other subsystem (which is nice, we look like any other subsystem and changes made to the agent config via the JBoss CLI are persisted for us, and the JBoss CLI can query the agent config like any other subsystem).
That said, it has been suggested that we provide an "easier to understand" configuration file for those that don't know or care about the WildFly internals (say, for those that want to run the swarm agent, or those that run a standalone agent managing remote resources).
So I wrote a PoC that does this. It uses YAML for a configuration file because apparently its all the rage :-) It is in this PR: https://github.com/hawkular/hawkular-agent/pull/250
A sample config file could look like this:
========
subsystem:
enabled: true
storage-adapter:
url: http://localhost:8080
tenant-id: hawkular
username: jdoe
password: password
managed-servers:
remote-dmr:
- name: My WildFly
enabled: true
host: my-host
port: 9999
username: admin
password: wotgorilla?
========
Some things to note (many of these need to be addressed if this feature is to get merged into master):
1. Not all the things you can configure in standalone.xml you can configure in the yaml. No metric types, resource types, etc. are configurable (though I may need to change this for the JMX stuff because the JMX resource types are not "hard-codable" like the WildFly ones simply because we don't know what MBeans people want to monitor). Of course, the more I make the yaml look like standalone.xml, the more it becomes just as complicated and defeats the purpose.
2. Right now the yaml is read-only. If, for example, you go through the JBoss CLI and enable a disabled managed server or you create a new one (like a new wildfly endpoint) nothing gets written out to the yaml. This is because all the wiring in WildFly Server and the JBoss CLI touches standalone.xml as this is the way things were designed to work inside of WildFly subsystems (WildFly does not want subsystems to have their own external config files - we are going against the design philosophy of the WildFly architecture here). This is not a problem for the swarm-based agent because its standalone.xml-based agent config is already read-only - so there is no difference there.
2a. Not only that, but since the JBoss CLI reads data that ultimately comes from the standalone.xml, it currently will not reflect any data found in the yaml. Again, because Swarm agent doesn't support the JBoss CLI anyway, its a moot point for the swarm agent.
3. Security concerns are for the most part ignored in this PoC.
3a. See the password in clear text? We do not have access to the WildFly vault like we would if we put this configuration in standalone.xml. The agent can encrypt passwords in its configuration, but only if you use standalone.xml for that config (because we just rely on WildFly's vault feature). If you put the config in the external yaml config, we lose that functionality. If we want obfuscation, we have to write our own mechanism and ask people to manage it using the agent's own tools (like we would have to provide a tool to encrypt agent passwords for users to use). (caveat: I don't know if it is possible, but we can look into how the vault works and maybe the agent can make internal API calls to take encrypted text from the yaml and request its decrypted value from the vault? Would need to research this possibility).
3b. WildFly provides the ability to configure "security realms" and allow subsystems to share those realms. This gives subsystems access to keystores and truststores configured in WildFly Server' security realms. These security realms are configured in standalone.xml. There is no such configuration in the yaml - in order to define security realms you need to do it in standalone.xml (these security realm definitions are their own section in standalone.xml, outside of any subsystem config - this is why they are not in the yaml, because they are WildFly configuration settings, not agent configuration). Can we define our own keystore and truststore config settings in the yaml? Yes, probably. But this would be outside of the WildFly standard config and I'm not sure how easily implemented this would be. This is, again, something that needs to be researched.
4. Right now there are no JMX endpoints configurable in the yaml - it only supports local-dmr, remote-dmr, and remote-prometheus. Its doable, but for the PoC I limited the work to include only WildFly servers and Prometheus servers to be configurable in the yaml.
5. The yaml is actually an overlay - for example, if the <subsystem> entry in standalone shows "enabled=true" but the yaml shows "enabled=false" the yaml is overlayed on top of standalone.xml (thus the yaml enabled=false takes effect). If the yaml defines a remote-dmr with name "Foo" and standalone.xml defines a remote-dmr with the same name, the yaml overrides the standalone.xml (any settings in the yaml take effect; if a setting is not in the yaml but is in the standalone.xml, that setting from standalone.xml takes effect).
That is all. Just wanted to bring this up, let people know that we can have a "simpler" agent configuration (with limitations as described above), if we so choose. This is not going to be merged without a more in-depth discussion to see if people really want it and if we really want to support this (it adds some complexity to the implementation, and if we are to work around some of the limitations, more complexity will be introduced).
9 years, 5 months
Business transaction support with opentracing API
by Gary Brown
Hi
The current release of Hawkular APM has introduced support for zipkin clients, so that existing zipkin instrumented applications can now report their span data to the APM server and have it represented in the UI.
Further work is being done in this release to ensure that it works well with polyglot examples, including the Red Hat helloworld MSA example.
This means that in a JVM only environment it is possible to use the non-instrusive byteman approach to instrument the applications, but if wanting to monitor a polyglot environment, then we also can now support zipkin instrumented clients.
However this means that for those zipkin instrumented applications, it will not be possible to take advantage of the "business transaction" capabilities that are available with the non-instrusive instrumentation approach, as shown here: https://vimeo.com/167739840
Looking ahead, we are also investigating providing support for the opentracing API (http://opentracing.io/) - which will provide end users with the ability to more easily switch between backend application tracing solutions.
The purpose of this email is to look at the possible approaches that could be used to introduce business transaction support for opentracing and potentially zipkin.
1) Only support as part of our own opentracing providers
The opentracing API provides the means to record 'log' events against the spans being reported: https://github.com/opentracing/opentracing-java/blob/master/opentracing-a...
As shown in this javascript example:
http.request(opts, function (res) {
res.setEncoding('utf8');
res.on('error', function (err) {
span.logEvent('request_error', err);
span.finish();
});
res.on('data', function (chunk) {
span.logEvent('data_received', chunk);
});
res.on('end', function(err) {
span.logEvent('request_end', err);
span.finish();
});
}).end();
the log event can be used to record data - in this case from the response.
Our opentracing provider could use this mechanism to process request or response data and extract relevant business properties.
The benefit of this approach is that the API used by the application is 'standard' - they just need to ensure the relevant business data is supplied as log events, and then define a business transaction config to process that data to extract the relevant information.
2) Provide additional module used by application above the zipkin/opentracing API
This would require the application to deal with a Hawkular-APM specific client library to preprocess their business messages and then associate relevant properties with the zipkin/opentracing spans.
This approach obviously does not provide a standard API to the application for this aspect, and as it can only deal with the spans, is potentially limited in the actions it can perform - i.e. possibly only extract business properties and representing them as tags (in opentracing) or binary annotations (in zipkin).
My current preference is option 1 as it encapsulates all of the functionality in our own opentracing providers (one per language supported), and gives us a bit more flexibility in terms of what the business txn config actions can perform.
Any feedback/suggestions welcome.
Regards
Gary
9 years, 5 months
Hawkular Alerting 1.1.3.Final has been released!
by Jay Shaughnessy
Hawkular Alerting 1.1.3.Final has been released!
* [HWKALERTS-155] - Failed schema creation because of
OperationTimedOutException
o A useful bug fix
* [HWKALERTS-156] - Propagate type from metrics to alerting
o This is part of the overall hawkular-1102 solution, the alerting
side of improved filtering between metrics and alerting
* [HWKALERTS-157] - Sequential processing of ConditionEval could
create on false alerting
o *Important* bug fix for clients using multi-condition triggers!
For more details for this release:
https://issues.jboss.org/projects/HWKALERTS/versions/12331263
Hawkular Alerting Team
Jay Shaughnessy (jshaughn(a)redhat.com)
Lucas Ponce (lponce(a)redhat.com)
9 years, 5 months
Re: [Hawkular-dev] Hawkular Services 0.0.12.Final released
by Heiko W.Rupp
And now the same for 0.12 with updates to alerts and metrics.
Grab it while it is hot :)
https://github.com/hawkular/hawkular-services/releases/tag/0.0.12.Final
On 6 Sep 2016, at 11:44, Juraci Paixão Kröhling wrote:
> Team,
>
> Hawkular Services 0.0.11.Final has (finally) been released.
>
> As the previous distributions, the Agent has to be configured with an
> user. This can be accomplished by:
>
> - Adding an user via bin/add-user.sh like:
> ./bin/add-user.sh \
> -a \
> -u <theusername> \
> -p <thepassword> \
> -g read-write,read-only
>
> - Changing the Agent's credential on standalone.xml to the credentials
> from the previous step or by passing hawkular.rest.user /
> hawkular.rest.password as system properties
> (-Dhawkular.rest.user=jdoe)
>
> You can find the release packages, sources and checksums at GitHub, in
> addition to Maven:
>
> https://github.com/hawkular/hawkular-services/releases/tag/0.0.11.Final
>
> Shortcuts for the downloads:
> Zip - https://git.io/viGKO
> tar.gz - https://git.io/viGKL
>
> - Juca.
> _______________________________________________
> hawkular-dev mailing list
> hawkular-dev(a)lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hawkular-dev
>
> _______________________________________________
> hawkular-dev mailing list
> hawkular-dev(a)lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hawkular-dev
--
Reg. Adresse: Red Hat GmbH, Technopark II, Haus C,
Werner-von-Siemens-Ring 14, D-85630 Grasbrunn
Handelsregister: Amtsgericht München HRB 153243
Geschäftsführer: Charles Cachera, Michael Cunningham, Michael O'Neill,
Eric Shander
9 years, 5 months
Hawkular Metrics 0.19.0 - Release
by Stefan Negrea
Hello Everybody,
I am happy to announce release 0.19.0 of Hawkular Metrics.This release is
anchored by performance enhancements and a lot of REST API enhancements.
Here is a list of major changes:
1.
*REST API - Query Improvements*
- It is now possible to use relative timestamps when querying for data
via `+` or `-` added to timestamp parameters. When used, the
timestamp is relative to the local system timestamp (HWKMETRICS-358
<https://issues.jboss.org/browse/HWKMETRICS-358>, HWKMETRICS-457
<https://issues.jboss.org/browse/HWKMETRICS-457>)
- Querying for data from earliest available data point has been
expanded to raw data queries for gauges and counters (HWKMETRICS-435
<https://issues.jboss.org/browse/HWKMETRICS-435>)
- `DELETE tenants/{id}` has been added to allow the deletion of an
entire tenant (HWKMETRICS-446
<https://issues.jboss.org/browse/HWKMETRICS-446>)
- AvailabilityType is serialized as simple string in bucket data
points (HWKMETRICS-436
<https://issues.jboss.org/browse/HWKMETRICS-436>)
2.
*Performance Enhancements*
- The write performance has been increased by 10% across the board after
reorganizing the meta-data internal indexes. (HWKMETRICS-422
<https://issues.jboss.org/browse/HWKMETRICS-422>)
- GZIP replaced LZ4 as the compression algorithm for the data table.
This produces anywhere between 60% to 70% disk usage savings over LZ4. (
HWKMETRICS-454 <https://issues.jboss.org/browse/HWKMETRICS-454>)
3.
*Job Scheduler - Improvements*
- The newly introduced internal job scheduler received several
improvements and fixes; the main focus was to enhance the scalability and
fault tolerance.
- The job scheduler will be used only for internal tasks.
- For more details: HWKMETRICS-221
<https://issues.jboss.org/browse/HWKMETRICS-221>, HWKMETRICS-451
<https://issues.jboss.org/browse/HWKMETRICS-451>, HWKMETRICS-453
<https://issues.jboss.org/browse/HWKMETRICS-453>, HWKMETRICS-94
<https://issues.jboss.org/browse/HWKMETRICS-94>
*Hawkular Metrics Clients*
- Python: https://github.com/hawkular/hawkular-client-python
- Go: https://github.com/hawkular/hawkular-client-go
- Ruby: https://github.com/hawkular/hawkular-client-ruby
- Java: https://github.com/hawkular/hawkular-client-java
*Release Links*
Github Release:
https://github.com/hawkular/hawkular-metrics/releases/tag/0.19.0
JBoss Nexus Maven artifacts:
http://origin-repository.jboss.org/nexus/content/repositories/public/org/...
Jira release tracker:
https://issues.jboss.org/browse/HWKMETRICS/fixforversion/12331192
A big "Thank you" goes to John Sanda, Thomas Segismont, Mike Thompson, Matt
Wringe, Michael Burman, Joel Takvorian, and Heiko Rupp for their project
contributions.
Thank you,
Stefan Negrea
9 years, 5 months
managing cassandra cluster
by John Sanda
To date we haven’t really done anything by way of managing/monitoring the Cassandra cluster. We need to monitor Cassandra in order to know things like:
* When additional nodes are needed
* When disk space is low
* When I/O is too slow
* When more heap space is needed
Cassandra exposes a lot of metrics. I created HWKMETRICS-448. It briefly talks about collecting metrics from Cassandra. In terms of managing the cluster, I will provide a few concrete examples that have come up recently in OpenShift.
Scenario 1: User deploys additional node(s) to reduce the load on cluster
After the new node has bootstrapped and is running, we need to run nodetool cleanup on each node (or run it via JMX) in order to remove keys/data that each each node no longer owns; otherwise, disk space won’t be freed up. The cleanup operation can potentially be resource intensive as it triggers compactions. Given this, we probably want to run it one node at a time. Right now the user is left to do this manually.
Scenario 2: User deploys additional node(s) to get replication and fault tolerance
I connect to Cassandra directly via cqlsh and update replication_factor. I then need to run repair on each node can be tricky because 1) it is resource intensive, 2) can take a long time, 3) prone to failure, and 4) Cassandra does not give progress indicators.
Scenario 3: User sets up regularly, scheduled repair to ensure data is consistent across cluster
Once replication_factor > 1, repair needs to be run on a regular basis. More specifically it should be run within gc_grace_seconds which is configured per table and defaults to 10 days. The data table in metrics has reduced gc_grace_seconds to 1 day and probably reduce it to zero since it is append-only. The value for gc_grace_seconds might vary per table based on access patterns, which means the frequency of repair should vary as well.
There has already been some discussion of these things for Hawkular Metrics in the context of OpenShift. It applies to all of Hawkular Services as well. Initially I was thinking about building some management components directly in metrics, but it probably makes more sense as a separate, shared component (or components) that can be reused in both stand alone metrics in OpenShift and a full Hawkular Services deployment in MiQ for example.
We are already running into these scenarios in OpenShift and probably need to start putting something in place sooner rather than later.
9 years, 5 months
[Inventory] Performance of Tinkerpop3 backends
by Lukas Krejci
Hi all,
to move inventory forward, we need to port it to Tinkerpop3 - a new(ish) and
actively maintained version of the Tinkerpop graph API.
Apart from the huge improvement in the API expressiveness and capabilities,
the important thing is that it comes with a variety of backends, 2 of which
are of particular interest to us ATM. The Titan backend (with Titan in version
1.0) and SQL backend (using the sqlg library).
The SQL backend is a much improved (yet still unfinished in terms of
optimizations and some corner case features) version of the toy SQL backend
for Tinkerpop2.
Back in March I ran performance comparisons for SQL/postgres and Titan (0.5.4)
on Tinkerpop2 and concluded that Titan was the best choice then.
After completing a simplistic port of inventory to Tinkerpop3 (not taking
advantage of any new features or opportunities to simplify inventory
codebase), I've run the performance tests again for the 2 new backends - Titan
1.0 and Sqlg (on postgres).
This time the results are not so clear as the last time.
>From the charts [1] you can see that Postgres is actually quite a bit faster
on reads and can better handle concurrent read access while Titan shines in
writes (arguably thanks to Cassandra as its storage).
Of course, I can imagine that the read performance advantage of Postgres would
decrease with the growing amount of data stored (the tests ran with the
inventory size of ~10k entities) but I am quite positive we'd get competitive
read performance from both solutions up to the sizes of inventory we
anticipate (100k-1M entities).
Now the question is whether the insert performance is something we should be
worried about in Postgres too much. IMHO, there should be some room for
improvement in Sqlg and also our move to /sync for agent synchronization would
make this less of a problem (because there would be not that many initial
imports that would create vast amounts of entities).
Nevertheless I currently cannot say who is the "winner" here. Each backend has
its pros and cons:
Titan:
Pros:
- high write throughput
- backed by cassandra
Cons:
- slower reads
- project virtually dead
- complex codebase (self-made fixes unlikely)
Sqlg:
Pros:
- small codebase
- everybody knows SQL
- faster reads
- faster concurrent reads
Cons:
- slow writes
- another backend needed (Postgres)
Therefore my intention here is to go forward with a "proper" port to
Tinkerpop3 with Titan still enabled but focus primarily on Sqlg to see if we
can do anything with the write performance.
IMHO, any choice we make is "workable" as it is even today but we need to
weigh in the productization requirements. For those Sqlg with its small dep
footprint and postgres backend seems preferable to the huge dependency mess of
Titan.
[1] https://dashboards.ly/ua-TtqrpCXcQ3fnjezP5phKhc
--
Lukas Krejci
9 years, 6 months
Assertj framework
by Joel Takvorian
Hello,
I'd like to propose the addition of the "assertj" lib in test scope for
hawkular-metrics (and/or other modules).
If you don't know it, assertj is basically a "fluent assertion framework".
You write assertions like :
*assertThat(result).extracting(DataPoint::getTimestamp).containsExactly(0L,
1L, 5L, 7L, 10L);*
It has a very rich collection of assertions, I find it particularly
powerful when working on collections, whether they are sorted or not,
whether they have nested objects or not.
Also, the fact that it's "fluent" helps a lot when you write a test,
because your IDE auto-completion will help you a lot to find what's the
assertion you're looking for - something that is not so easy with hamcrest.
I see it as a kind of virtuous cycle: tests are easier to write, easier to
read, so you write more tests / better tests.
Would you be ok to give it a try?
If you want to read about it: official site is there
http://joel-costigliola.github.io/assertj/
or an article that promotes it nicely :
https://www.infoq.com/articles/custom-assertions
or an example of file I'd like to push, if you're ok with assertj :) :
https://github.com/jotak/hawkular-metrics/blob/72e2f95f7e19c9433ce44ee83d...
Joel
9 years, 6 months