Re: [Hawkular-dev] agent running as a VM javaagent - non-EAP based solution
by John Mazzitelli
As for the question "Did you try deployments over this?" - I have been writing a bunch of itests. I basically copied the itests we already have and tweeked them to run against the javaagent. So far, so good. All of the tests pass in this PR with virtually no changes to the core engine (they pass on my box, who knows what travis is doing - sometimes they pass, sometimes they don't).
https://github.com/hawkular/hawkular-agent/pull/302
I need to create two more sets of tests before I can say it all works the same as the wildfly agent without breakage:
1) domain tests (I've got the standalone tests passing - including deployments - just have to test domain mode)
2) immutability tests (show that the agent won't try to change anything if its immutable)
Tests I'm building are here:
https://github.com/jmazzitelli/hawkular-agent/tree/refactor-core/hawkular...
NOTE: the agents (both javaagent and wildfly agent) now support invoking JMX operations in both local and remote JMX servers in my branch - the old agent did not have this support.
6 years, 2 months
agent running as a VM javaagent - non-EAP based solution
by John Mazzitelli
Right now, the Hawkular WildFly Agent runs as a WildFly subsystem extension - which means it must run in EAP.
To avoid this requirement, I refactored the agent code [1] so it can be run using a standard VM javaagent (using the VM argument -javaagent).
This means the agent need not run as a subsystem extension, in fact, it can run in any VM (non-EAP/WildFly based). Just pass in the -javaagent command line argument to your VM and you got an agent. You have a Karaf container in a JVM exposing metrics via JMX? You can use this. You got a vert.x server in a JVM exposing metrics via JMX? You can use this.
This new agent's config file is a single YAML file that mimics virtually the same settings as you see today in the agent's <subsystem> XML in standalone.xml.
Like the EAP-based agent, this new agent can still:
* collect DMR metrics and inventory from any EAP 6.4, EAP 7.x, or WildFly 10+ servers - local or remote.
* collect JMX metrics and inventory from any remote Jolokia endpoint or local JMX MBeanServer
* receive websocket messages from a Hawkular Server (if running in full Hawkular mode)
* run in metrics-only mode where it only stores data to Hawkular-Metrics and does not connect via websocket to the server nor stores inventory to H-Inventory
* Talk to the Hawkular Server and remote DMR/JMX endpoints via SSL/https using defined security-realms/keystores
Unlike the EAP-based agent, this new agent:
* does NOT have a CLI (jboss-cli.sh is the EAP-based agent's CLI)
* does NOT have a built-in configuration persistent mechanicm (EAP-based agents can persist their config changes to standalone.xml when those changes are made via the WildFly management interface - because the javaagent is completely independent, it has nothing like this for its YAML config).
* does NOT integrate with the lifecycle of any EAP/WildFly running in the same VM (in other words, if you ask to "reload" the EAP server, the javaagent knows nothing about that and won't also itself reload. Remember, this javaagent is now completely independent of any co-located EAP/WildFly server).
In addition, if your JVM is installed in OpenShift, you can get the nice option of putting the javaagent's YAML config in a config map and mount it so your javaagent can access it. This means you can edit your javaagent YAML directly in the OpenShift UI :)
I still need to see how well this supports the disabling/enabling of metrics on the fly from the websocket commands (e.g. I will need to add the ability to change the YAML config to persist the settings, that is not done yet - the EAP-based agent got this support for free - its config changes go into standalone.xml).
Other than persisting changes to the YAML file, I believe this javaagent will work essentially the same as the current agent (indeed, the core agent engine is identical - its the same code). Just have to make sure the websocket commands can work considering there is a YAML config backing the settings and not standalone.xml.
I need people's feedback on this. Thoughts? If you want to try it, you can build it from my PR right now using the README [2] and the instructions [3].
--John Mazz
[1] https://github.com/hawkular/hawkular-agent/pull/302
[2] https://github.com/jmazzitelli/hawkular-agent/blob/refactor-core/hawkular...
[3] Instructions to build using the PR and test it by collecting EAP metrics:
1) Build it:
$ git clone git@github.com:hawkular/hawkular-agent.git
$ cd hawkular-agent
$ git checkout -b jmazzitelli-refactor-core master
$ git pull https://github.com/jmazzitelli/hawkular-agent.git refactor-core
$ mvn clean install
2) Install it - assuming you have a base EAP 6.4 or 7.x or WildFly 10+ installed at <eap.dir>
2a) Change this line in <eap.dir>/bin/standalone.conf:
JBOSS_MODULES_SYSTEM_PKGS="org.jboss.byteman"
to this:
JBOSS_MODULES_SYSTEM_PKGS="org.jboss.byteman,org.jboss.logmanager"
2b) Inside that same <eap.dir>/bin/standalone.conf you add this line in the
appropriate place (you know where, just under where it sets JAVA_OPTS):
JAVA_OPTS="$JAVA_OPTS -Djava.util.logging.manager=org.jboss.logmanager.LogManager -javaagent:$JBOSS_HOME/bin/hawkular-javaagent-*.jar=config=$JBOSS_HOME/standalone/configuration/real-config*.yaml,delay=10"
2c) Copy the binary/config files you built to that EAP / WildFly install:
cp hawkular-agent/hawkular-javaagent/target/hawkular-javaagent-*-jar-with-dependencies.jar <eap.dir>/bin
if EAP 7.x/WildFly 10+:
cp hawkular-agent/hawkular-javaagent/src/test/resources/real-config.yaml <eap.dir>/standalone/configuration
if EAP 6.4:
cp hawkular-agent/hawkular-javaagent/src/test/resources/real-config-eap6.yaml <eap.dir>/standalone/configuration
6 years, 2 months
Hawkular Metrics 0.25.0 - Release
by Stefan Negrea
Hello,
I am happy to announce release 0.25.0 of Hawkular Metrics. This release is
anchored by general stability improvements and enhanced query capabilities
for the external metrics alerter.
Here is a list of major changes:
- *External Metrics Alerter - Enhancements*
- Both syntax and query capabilities have been revamped to allow
defining conditions using flexible expressions with embedded stats queries
- The ExternalCondition expression is now JSON and has support for
the new tag query language
- The full documentation about this feature can be found in the Alerting
section
<http://www.hawkular.org/hawkular-metrics/docs/user-guide/#_alerting>
of the user guide
<http://www.hawkular.org/hawkular-metrics/docs/user-guide/>
- For more details please see: HWKMETRICS-566
<https://issues.jboss.org/browse/HWKMETRICS-566> and Pull Request 727
<https://github.com/hawkular/hawkular-metrics/pull/727>
- *Dropwizard-Metrics - Merged*
- The Dropwizard-Metrics reporter
<https://github.com/hawkular/hawkular-dropwizard-reporter> has been
merged into the Metrics repository in clients/dropwizard
<https://github.com/hawkular/hawkular-metrics/tree/44677cc5d6267e738ea51bd...>
module
- The old repository
<https://github.com/hawkular/hawkular-dropwizard-reporter> is
decommissioned, all new development will happen in the Metrics project
- This was done to simplify the compatibility matrix between the
reporter and Hawkular Metrics REST API; going forward they will have
identical versions. Furthermore, the compatibility is now tested via
continuous integration tests.
- Please use the new maven artifact hawkular-dropwizard-reporter
<http://origin-repository.jboss.org/nexus/content/repositories/public/org/...>
- For more details please see: HWKMETRICS-585
<https://issues.jboss.org/browse/HWKMETRICS-585>
- *Tag Query Language - Enhancements*
- The tag query language now supports the dot character in the tag
name. The list of allowed characters is *a-zA-Z_0-9.*
- The query language allows regex matching for tag values but not tag
names
- This allows translating JSON-like tag structures into Hawkular
Metrics tags and query using the new tag query language
- For example, a tag structure like [tag.subtag1: value1,
tag.subtag2.subsubtag1: value2] is now queriable via the tag query
language with queries like tag.subtag1 = value1 or
tag.subtag2.subsubtag1
- For more details please see: HWKMETRICS-611
<https://issues.jboss.org/browse/HWKMETRICS-611>
- *REST API - Request Logging*
- It is now possible to enable detailed logging for all REST API
requests
- Two properties have been added to enable this features:
hawkular.metrics.request.logging.level to enable logging for all read
requests and hawkular.metrics.request.logging.level.writes to enable
logging for write requests
- By default this feature is disabled, to enable just set the log
level via each property
- For more details please see: HWKMETRICS-589
<https://issues.jboss.org/browse/HWKMETRICS-589>
- Here is a sample log:
INFO [org.hawkular.metrics.api.jaxrs.util.RequestLoggingFilter]
(default task-49)
REST API request:
--------------------------------------
path: /metrics
segments: [metrics]
method: GET
query parameters: {type=[availability]}
Tenant: T9a116f18-28cf-41b3-8ff8-c9752ac60e26232
- *Other Updates*
- Automatically fix schema issues that occur when the server is
restarted during initial schema installation (HWKMETRICS-594
<https://issues.jboss.org/browse/HWKMETRICS-594>)
- Metric data points inserts have been optimized to use token ranges
for Cassandra writes (HWKMETRICS-599
<https://issues.jboss.org/browse/HWKMETRICS-599>)
*Hawkular Alerting - Included*
- Version 1.5.3
<https://issues.jboss.org/projects/HWKALERTS/versions/12333651>
- Project details and repository: Github
<https://github.com/hawkular/hawkular-alerts>
- Documentation: REST API
<http://www.hawkular.org/docs/rest/rest-alerts.html>, Examples
<https://github.com/hawkular/hawkular-alerts/tree/master/examples>,
Developer
Guide
<http://www.hawkular.org/community/docs/developer-guide/alerts.html>
*Hawkular Metrics Clients*
- Python: https://github.com/hawkular/hawkular-client-python
- Go: https://github.com/hawkular/hawkular-client-go
- Ruby: https://github.com/hawkular/hawkular-client-ruby
- Java: https://github.com/hawkular/hawkular-client-java
*Release Links*
Github Release:
https://github.com/hawkular/hawkular-metrics/releases/tag/0.25.0
JBoss Nexus Maven artifacts:
http://origin-repository.jboss.org/nexus/content/repositorie
s/public/org/hawkular/metrics/
Jira release tracker:
https://issues.jboss.org/projects/HWKMETRICS/versions/12333676
A big "Thank you" goes to John Sanda, Matt Wringe, Michael Burman, Joel
Takvorian, Jay Shaughnessy, Lucas Ponce, and Heiko Rupp for their project
contributions.
Thank you,
Stefan Negrea
6 years, 2 months
Re: [Hawkular-dev] Command gateway and WF agent
by John Mazzitelli
> Can you explain to me what's the big picture of the command gateway? What's
> the chain of calls when, for instance, there's a new deployment in Wildfly?
Joel asked this question - I figured it is good to post to h-dev, too, since I'm sure others don't know this.
The "big picture" is:
First, remember the MiQ UI is connected to Hawkular-Services server via websocket. The agents all connect to some Hawkular-Services server via a websocket. There can be more than one Hawkular-Services server running and the UI may be talking to a server that is different than the server the agent is talking to:
UI <---websocket---> Server A
^
|
(message bus)
|
V
Server B <---websocket---> agent
1) MiQ sends a JSON message over the websocket to a Hawkular-Services server (Server A in the diagram above).
2) Hawkular-Services server A looks at the ResourcePath in the message to determine what agent is responsible for managing that resource.
3) Hawkular-Services server A addresses the message to the proper agent and puts the JSON message on the agent's queue on the message bus.
3b) At this point the Hawkular-Services server A sends back a JSON message to the UI over the websocket that says "message forwarded to the bus".
4) The Hawkular-Services server that has a websocket connection to the targeted agent picks off that message from the bus (Server B in the diagram)
5) That Hawkular-Services server B forwards the message to the agent via its websocket connection.
6) Agent looks the the message's ResourcePath and the rest of the JSON to determine what it needs to do. It does the action.
// now the process goes in reverse
7) Agent sends back a response to the server B over websocket (either a success or fail message)
8) Server B takes response, figures out which UI the response should go to, and puts it on message bus addressed to the correct UI
9) The Server A takes the response message off the bus
10) Server A sends the response message to the UI over its websocket connection.
> Tell me if I'm correct, for what I've seen it's MIQ which, though the ruby
> client, sends a websocket event, caught by the WF agent through the command
> gateway. In the end, the WF agent will perform a full sync of affected root
> resources in inventory. Is that correct?
In the case of a "Deployment" JSON message, yes, the agent will trigger a full discovery scan so it can quickly discover the new deployment you just added to WildFly.
> If that's correct, there's no direct interaction between inventory and the
> command gateway. I'm asking because I hope we can keep the command gateway
> messages unchanged, and change only places where there's direct calls to
> the inventory rest api.
Most of the time, its just using ResourcePath API (just that simple inventory POJO). HOWEVER, I believe Jay added some code that does interact with Inventory to do things like create relationships in inventory when new cluster entities are added. That is why (I think) he did this commit. Though I can't remember where this code is that builds inventory relationships - he'll have to point that out. But I'm pretty sure there is some inventory stuff going on under the covers for that stuff - more than just using the canonical ResourcePath stuff.
6 years, 2 months
agent running in EAP6.4 cannot talk to Hawkular Server over HTTPS
by John Mazzitelli
Josejulio, cc hawkular-dev:
<TL;DR>
Due to incomplete API support in a EAP 6.4 library, we cannot support the agent installed as a subsystem extension inside EAP6 if the agent is to talk to the Hawkular Server over HTTPS.
</TL;DR>
I don't know how to workaround this one - maybe someone has a bright idea. But right now, it looks like we can't support an EAP6-based agent talking to Hawkular-Metrics over HTTPS *unless* the agent is running as a javaagent (a new feature not even in master yet, but I tried it and it works).
This is a EAP 6.4 method that OKHttp is calling when making an HTTP request requiring SSL - I'll give you the summary - its a one-line auto-generated stub method that "return null;" :)
https://github.com/wildfly/wildfly-core/blame/de6b17d4d342e98871c0e95f7e6...
I stepped into this code via a debugger and the line number and behavior (returning null always) matches up with that code.
Needless to say, this causes a NullPointerException later on in the OKHttp library and thus cannot talk to the Hawkular Server over HTTPS.
Here's the stack trace that got me there:
Daemon Thread [Hawkular WildFly Agent Startup Thread] (Suspended)
org.jboss.as.domain.management.security.WrapperSSLContext$WrapperSpi$WrapperSSLSocketFactory.createSocket(java.net.Socket, java.lang.String, int, boolean) line: 126
okhttp3.internal.connection.RealConnection.connectTls(int, int, okhttp3.internal.connection.ConnectionSpecSelector) line: 230
okhttp3.internal.connection.RealConnection.establishProtocol(int, int, okhttp3.internal.connection.ConnectionSpecSelector) line: 198
okhttp3.internal.connection.RealConnection.buildConnection(int, int, int, okhttp3.internal.connection.ConnectionSpecSelector) line: 174
okhttp3.internal.connection.RealConnection.connect(int, int, int, java.util.List<okhttp3.ConnectionSpec>, boolean) line: 114
okhttp3.internal.connection.StreamAllocation.findConnection(int, int, int, boolean) line: 193
okhttp3.internal.connection.StreamAllocation.findHealthyConnection(int, int, int, boolean, boolean) line: 129
okhttp3.internal.connection.StreamAllocation.newStream(okhttp3.OkHttpClient, boolean) line: 98
okhttp3.internal.connection.ConnectInterceptor.intercept(okhttp3.Interceptor$Chain) line: 42
okhttp3.internal.http.RealInterceptorChain.proceed(okhttp3.Request, okhttp3.internal.connection.StreamAllocation, okhttp3.internal.http.HttpStream, okhttp3.Connection) line: 92
okhttp3.internal.http.RealInterceptorChain.proceed(okhttp3.Request) line: 67
okhttp3.internal.cache.CacheInterceptor.intercept(okhttp3.Interceptor$Chain) line: 109
okhttp3.internal.http.RealInterceptorChain.proceed(okhttp3.Request, okhttp3.internal.connection.StreamAllocation, okhttp3.internal.http.HttpStream, okhttp3.Connection) line: 92
okhttp3.internal.http.RealInterceptorChain.proceed(okhttp3.Request) line: 67
okhttp3.internal.http.BridgeInterceptor.intercept(okhttp3.Interceptor$Chain) line: 93
okhttp3.internal.http.RealInterceptorChain.proceed(okhttp3.Request, okhttp3.internal.connection.StreamAllocation, okhttp3.internal.http.HttpStream, okhttp3.Connection) line: 92
okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(okhttp3.Interceptor$Chain) line: 124
okhttp3.internal.http.RealInterceptorChain.proceed(okhttp3.Request, okhttp3.internal.connection.StreamAllocation, okhttp3.internal.http.HttpStream, okhttp3.Connection) line: 92
okhttp3.internal.http.RealInterceptorChain.proceed(okhttp3.Request) line: 67
okhttp3.RealCall.getResponseWithInterceptorChain() line: 170
okhttp3.RealCall.execute() line: 60
org.hawkular.agent.monitor.service.MonitorService(org.hawkular.agent.monitor.service.AgentCoreEngine).waitForHawkularServer() line: 648
org.hawkular.agent.monitor.service.MonitorService(org.hawkular.agent.monitor.service.AgentCoreEngine).startHawkularAgent(org.hawkular.agent.monitor.config.AgentCoreEngineConfiguration) line: 279
org.hawkular.agent.monitor.service.MonitorService(org.hawkular.agent.monitor.service.AgentCoreEngine).startHawkularAgent() line: 164
org.hawkular.agent.monitor.service.MonitorService$1CustomPropertyChangeListener$1.run() line: 395
java.lang.Thread.run() line: 745
6 years, 2 months
auto testing client integrations with hawkular apm
by Neil Okamoto
I'm in the process of developing a clojure library that wraps the
opentracing-java client (the abstract interface), and provides concrete
implementations for Hawkular APM and Zipkin.
I'm considering ways to test the concrete Hawkular implementation in
Travis. One thought is to configure Travis to launch a hawkular dev server
(from the docker image) and then execute a series of test cases to collect
traces, and finally use the REST API to read back what hawkular received.
The problem being, I haven't found the documentation for the query
parameters you pass to "GET /traces/fragments/search".
https://hawkular.gitbooks.io/hawkular-apm-user-guide/
content/restapi/#GET__traces_fragments_search
Any pointers how to do this? How does the hawkular development team run
integration tests?
6 years, 2 months
Little research help with collection times..
by Michael Burman
Hi,
I could use some help if someone has access to this sort of data. I'm
trying to find our collection interval fluctuations when using our
collectors. The actual collection interval is not interesting, but the
amount it's delayed per run and how that changes.
Currently we use the default's from the Gorilla paper (with the typo
that the paper had), which gives us ranges: [-63,64] (7 bits),
[-255,256] (9 bits), [-2047,2048] (12 bits) and rest. First one needs 2
bits as metadata, second needs 3 and third needs already 4 bits.
Those ranges were done by the Facebook guys from their data, but that
was using the second precision. We use milliseconds like for example
Prometheus does, and they use ranges for 14,17,20 meaningful bits
(IIRC). I don't think their values are however useful to us as from what
I've gathered from my small set of data, our best ranges are somewhere
around:
[-256,256] millisecond difference in delta of delta, D = (tn - t(n-1) -
(t(n-1) - t(n-2))
[-512,512] milliseconds
[-2048,2048] milliseconds
First two are probably wiser to merge to a single range and ignore the
first one. While we store one extra bit of info, we save one on the
metadata. Last one could technically be something like "missed one
datapoint", but then that would be collection interval and I'm not sure
if we have any target value for that in the long range.
But, I'm guessing too much. From the hawkular-openshift-agent I can see
that values are fetched in sequential mode, so delay in fetching one
metric will cause the next one to be delayed and so on. I can see from
an empty instance how much this time is, but when running in a congested
environment, what can we expect?
Saving a bit or two might not sound like much, but lets say with 30 000
pods in Openshift, 32 metrics per pod in 15 seconds interval we're
talking about 5 529 600 000 datapoints per day and it adds up (659MB per
day saved space with a single bit). So getting the ranges correct nets
us a lot of savings.
- Micke
6 years, 2 months
Requirement of Aerogear Unified Push Server Instance
by Anuj Garg
Hello team,
We need one instance of Aerogear Unified Push Server for setting up push
messaging feature in android client of hawkular.
I can think of 2 ways this can be achieved.
1. Creating instance on openshift.
2. Asking aerogear team if they can provide account for one application.
I request suggestions about what in these or other things we can do for
this.
Regards
Anuj Garg
6 years, 2 months
Rolling upgrades (on Kubernetes)
by Heiko W.Rupp
Hey,
on Kubernetes a common use case (that has been demonstrated by Thomas
and Juca recently) are rolling upgrades (blue/green, canary, ...) where a
new version of a service comes into play, gets scaled up and when this
is scaled up, the old version of the service gets scaled down and eventually
disappears.
It can also happen though that the new version is buggy and the previous
version gets scaled up again to take the full load until a fix is available.
We need to make sure to support those rolling upgrades and also the
rollback to the previous version of a service.
E.g. if a (No)Sql schema update happens in version N+1, a still
running instance of version N must not break by it, when N+1
is rolling out while N is still active. Similarly when N+1 has upgraded
the schema and the application gets rolled back to version N, it must
still be able to cope with the upgraded schema version.
Same applies to APIs, but this is something we have discussed and
mastered in the past.
We should perhaps investigate how we can automatically test
this scenario in our Travis testing.
6 years, 3 months