Hawkular APM version 0.10.0.Final released with Zipkin integration
by Gary Brown
We are pleased to announce the release of version 0.10.0 of the Hawkular APM project.
The release notes can be found here: https://github.com/hawkular/hawkular-apm/releases/tag/0.10.0.Final
The highlights for this release are:
* In addition to the existing aggregated view of service invocations, it is now possible to view the list of trace instances for that (potentially filtered) aggregated view, and then select an individual instance to display the end to end call trace for that instance. (Screenshots available in the release notes).
* Zipkin integration. It is now possible to point applications, instrumented using zipkin compliant libraries, to the Hawkular APM server and have their information processed and visually represented in the UI.
Feedback on the new features would be very welcome!
Regards
Hawkular APM Team
8 years, 3 months
metrics on the bus
by Jay Shaughnessy
Lucas and I were talking over jira [1] which has to do with
metrics/alerting scale. This was discussed a bit on IRC recently as
well. Today, metrics publishes all datapoints to the bus (metrics and
avail go to different topics). The only consumer of that data is
alerting, and it consumes a small fraction of the total data (actually
it consumes none of it OOB at the moment, but that will hopefully change
as Lucas's alerting work comes on line in MIQ).
Although in its purest form this publish-it-all is the essence of bus
publishing, we both feel it's an unnecessary waste of resources, as
metrics can reach very high volume. There are a few approaches to
reducing the publishing/filtering that we're currently doing. The
options we discussed boil down to:
* No Publishing
o Just query metrics for the data needed for alerting (or whatever
other external use we may have for the data)
o This is essentially a polling approach with frequent polling
* Demand Publishing
o The "just tell me what movie you want to see" approach
o Let clients request the metric ids it wants published to the bus
I'm purposefully not going into much detail at this point. I'd rather
we talk out a preferred approach between these two, or something not
presented. But we'd like to move away from the current publish-it-all
approach.
[1] https://issues.jboss.org/browse/HWKALERTS-118.
8 years, 3 months
change notification: parameter metadata is in a different place
by John Mazzitelli
Recently, we added the ability for the Hawkular WildFly Agent to advertise what parameters can get passed to an operation by storing parameter metadata in the operation type's general properties.
However, Hawkular-Inventory provides an "official" place to store this data. Rather than have general properties host this metadata, H-Inventory wants parameters stored in a child data entity called "parameterTypes" under the operation type. [3]
The agent now does it this "official" way. However, to avoid clients from breaking before they can fix themselves and get parameters from this new location, the agent retains the original parameter metadata in general properties as well.
But of course we do not want the agent to store copies of the same metadata in two different locations in Hawkular-Inventory. So a JIRA [1] has been created to remove the parameters from general properties - a PR has been submitted and is able to be merged [2].
So, clients that look up operation parameters inside H-Inventory need to look at the parameterTypes data entity child and NOT look for them in general properties. If you have a client (MiQ?, Ruby gem?, Hawkfx?) that obtains parameter metadata from operation types' general properties, it should be fixed because once this PR is merged, parameter information will no longer exist in general properties.
-- John Mazz
[1] https://issues.jboss.org/browse/HWKAGENT-130
[2] https://github.com/hawkular/hawkular-agent/pull/244
[3] This is what the "official" parameters types entity looks like - parameters are stored in a data entity child called "parameterTypes" under the operation type - this example shows the parameters for the WildFly Server's "Shutdown" operation (the parameters are "timeout" and "restart"):
{
"path": "/t;hawkular/f;74ceb88e-4740-49f8-9e46-8d192dc6e14a/rt;WildFly%20Server/ot;Shutdown/d;parameterTypes",
"name": "parameterTypes",
"identityHash": "1ad5d5963d6978975e958de5abcb49f778695f",
"contentHash": "5ff8951bc6a6ab6483f69ec0ab139d5a76e22c",
"syncHash": "27163827304a2c50e31b9ce2ee74e5e32f396d58",
"value": {
"timeout": {
"type": "int",
"description": "Timeout in seconds to allow active connections to drain",
"defaultValue": "0",
"required": false
},
"restart": {
"type": "bool",
"description": "Should the server be restarted after shutdown?",
"defaultValue": "false",
"required": false
}
}
}
8 years, 3 months
services itest failure
by John Mazzitelli
Regarding the services itest failure with metric storage from the agent:
When running on my desktop, I don't even get that far - tests can't even start without timing out.
When running on my laptop (which is a faster machine with SSD), I confirmed the failure (ran it 2 times, failed the same way both times).
I changed this line:
https://github.com/hawkular/hawkular-services/blob/master/itest/src/test/...
from: Retry.times(50).delay(100))
to: Retry.times(50).delay(1000));
and ran it several times, all times it successfully passed.
So looks like a timing issue - the test just didn't wait long enough for the data to come in (which would explain the 204 it was getting). Moving it from 5s to 50s max timeout solved it for me.
8 years, 3 months
Integration tests - Improvements
by Lucas Ponce
Hello,
Today we have a situation with the integration tests that is starting to be a concern.
Basically, there is a lack of integration testing that makes some potential bugs escapes preliminary controls and it is showed in the ManageIQ.
I guess we could increase the integration test coverage overall and define some additional processes to improve this.
These are some ideas commented with the team:
- Today, we have some coverage per component, but not at hawkular-services level. It could be good if we can add more end-to-end tests at hawkular-services level.
- Before a component is released, it could be good if component can test the hawkular-services integration suite to validate that there is nothing broken, or if it is, start a discussion to get a consensus/tradeoff. Probably from isolated component view everything is ok, but in the hawkular-services context it could be a potential problem. (Not sure if the CI can help us here, extending running the new component against the hawkular-services itests).
- This also happens with the clients, in the hawkular-client-ruby, we have a recorded set of http calls (VCR cassettes). This works fine and it is great, but today we have some mix of versions recorded (at least for the hawkular-services part). So, it would be great if that can be normalized. For example, I think that for a new release of the hawkular-services version, the CI could run the tests to validate if the recording are still valid or not.
- The same situation happens at ManageIQ side, there are recordings of VCR cassettes for different versions of hawkular-services (we have started to annotate the version), but it happens that the tests can pass but that doesnt mean that the last hawkular-services version will pass, so an action tasks could be that if the hawkular-client-ruby version is changed, VCR tests could be re-recorded against for version validated for hawkular-services.
These are just some ideas to start discussing.
I guess that the CI (travis or internally using torii) can help in some degree, but I have lack of experience with them.
In any case, the goal of this is not slow any development, but just to have an early indicator that if there is some change in the component, the overall system is notified and we can address it better than if this is happening during a demo/final stage of a QE task, for example.
Any thoughts about these ideas are welcome.
Lucas
8 years, 3 months
change in hawkular wildfly agent platform resource type and metric type IDs
by John Mazzitelli
I realized recently that the Hawkular WildFly Agent generates poorly named resource type and metric type IDs for the platform resources (e.g. Operation System, File Store, Memory, Processor, Power Source resource types and their metrics like "Total Memory" and "Available Memory"). Because all types must be stored in Hawkular Inventory at the root level and are all therefore peers of one another, the names need to be such that they don't risk collisions with other names now or in the future (e.g. the type name "Available Memory" is just way to generic for anyone to know what it is, what resource types it belongs to and risks colliding with other resource types that may need to define a "Available Memory" metric type in the future).
This JIRA [1] has been fixed in a PR [2].
I do not think this affects anybody because if the clients are doing "the right thing" and not hardcoding IDs (tsk tsk if you are), then a normal querying of resources to get their resource types and metric types will get you the new IDs without having to change anything.
Besides, I don't think anyone is using the Platform resources now anyway (the old Hawkular UI did, I think, but I don't think we do in the MiQ stuff).
But I bring this up just in case. Look at your stuff - do you use the platform resources like Operating System, File Store, Memory, Processor, etc??? If you do, and you hardcoded IDs, you need to change your code (better yet, change your code so it doesn't hardcode and use the inventory API to query the data instead).
--John Mazz
[1] JIRA: https://issues.jboss.org/browse/HWKAGENT-133
[2] PR: https://github.com/hawkular/hawkular-agent/pull/246
8 years, 3 months
Property location in the APM model
by Gary Brown
Hi
Currently the concept of a Property is associated with a trace fragment in the APM model. The fragment represents is the root object of the call trace that occurs within a particular service. So an end to end trace is comprised of a set of fragments captured from the various services that were involved in the execution of a particular business transaction.
The reason for associating the Property with the fragment initially was that these properties represented contextual information extracted from the message contents (or headers) that could be used to search for the relevant trace (business transaction) instance. So for example, the "order id" may be extracted and then associated with the fragment, which would then subsequently become associated with the end to end call trace.
However zipkin and the opentracing standard are based on the concept of spans, which represent particular points in the trace - and don't have an equivalent 'fragment' concept. Therefore all 'properties' (binary annotations in zipkin, tags in opentracing) are associated with each instrumentation point (node in APM model). This is also because the information they record in binary annotations/tags is not necessarily business contextual information, but can also be lower level.
So we are considering moving the APM Property concept to the Node instead of Trace fragment, to align more with zipkin/opentracing. However when querying the various data types in APM, the properties would still be aggregated as before, the only difference is that the finer grained association between Node and Property will now be maintained.
Let me know if this is an issue for anyone.
Regards
Gary
8 years, 3 months