Business app/services representation in Inventory
by Gary Brown
Hi
Before going too far down the BTM road, I just wanted to confirm whether or not we want the business app, their components services, and their relationships to IT resources they use, stored in Hawkular Inventory?
An alternative approach would be to derive the structure and relationships dynamically from the business transaction instance information.
The benefit of storing in Inventory is it enables end users to navigate through the inventory to understand the relationships to the business apps/services, as well as allow other tooling (e.g. impact analysis) to determine the effect of IT resource downtime on business apps.
Thoughts?
Regards
Gary
9 years, 8 months
RHQ Metrics - 0.2.7 & Hawkular Future
by Stefan Negrea
Hello Everybody,
I want to summarize the latest release of the RHQ Metrics project and the future of the project.
1) RHQ Metrics migrates to Hawkular organization
Release 0.2.7 of the RHQ Metrics is the last one from the current repository. But do not panic! Beyond the mechanics of the transfer and rename, the development will continue with the regular crew.
For the migration, two project repositories (rhq-metrics and rhq-metrics-openshift) will just be transferred to the Hawkular organization. The code from rhqm-charts was already moved to Hawkular, so we will just close the RHQ repository. We will have a follow up communication once all the infrastructure is in place under the new organization.
2) RHQ Metrics 0.2.7 was released today
This release has mainly stability fixes and minor enhancements. The Keycloak integration was delayed and not part of this release (as announced in the planning notes). For more details checkout the Github release notes.
Github Release:
https://github.com/rhq-project/rhq-metrics/releases/tag/0.2.7
JBoss Nexus Maven artifacts:
http://origin-repository.jboss.org/nexus/content/repositories/public/org/...
2) OpenShift Cartridge for RHQ Metrics 0.2.7
The cartridge supports RHQ Metrics 0.2.7, 0.2.6, and 0.2.5. Just a reminder, the cartridge is the simplest and easiest way to get a public facing instance of RHQ Metrics in just a few minutes with a single command. The cartridge configures Cassandra, Wildfly, and RHQ Metrics (REST interface and UI console) to run in a single gear. For more details please visit the Github repository of the project.
Sample command to create a new RHQ Metrics deployment:
rhc app create test_app https://raw.githubusercontent.com/rhq-project/rhq-metrics-openshift/maste...
Github Repository:
https://github.com/rhq-project/rhq-metrics-openshift
A big "Thank you!" goes to John Sanda, Mike Thompson, Heiko Rupp, and Thomas Segismont for their project contributions.
Any discussion, suggestions or contributions are more than welcomed; so feel free to reply to this email or comment directly on the various forum threads.
Thank you,
Stefan Negrea
Software Engineer
_______________________________________________
rhq-devel mailing list
rhq-devel(a)lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/rhq-devel
9 years, 8 months
RfC: Availability
by Heiko W.Rupp
Hey,
there was apparently some watercooler discussion yesterday without any
minutes, so the following
will not be able to refer to it in any way.
Hawkular needs to have a way to store, retrieve and display availability
of a resource or a bunch of them [1].
While we have some short term goals, for the longer run we need to
better identify what needs to be done.
I think we need to separately look at the following concerns:
* availability reporting
* api
*values
* availability computation
* availability storage
* availability retrieval
* alerting on availability
* computed resource state
The basic assumption here is that availability is something relatively
stable. Meaning that usually
the same state (hopefully "UP") is reported each time in a row for a
veeery long period of time (there
are some servers with uptimes >> 1 year).
== reporting
Feeds report availability to Hawkular, where the data may be further
processed and stored.
The reported values are probably in the range of "UP", "DOWN". I can
also imagine that e.g.
an application server that starts shutting down could send a
"GOING_DOWN" value.
On the API side, we need to be able to receive (a list of) tuples
`< resource id, report time, state >`
In case of full Hawkular, the _resource id_ needs to be a valid one from
Inventory.
_Report time_ is the local time on the resource / agent when that state
was retrieved,
represented in ms since the epoch UTC and
then finally the _state_ which would be an Enum of "UP", "DOWN" and
potentially some other
values. While I have described them as string here, the representation
on the wire may be
implemented differently like 1 and 0 or true and false.
== computed availability
In addition to above reporting we may have feeds that either are not
able to deliver availability or
where the availability is delivered as a numeric value - see e.g. the
pinger, where a <rid>.status.code
is delivered as metric value representing the http status codes.
Here we need to be apply a mapping from return code -> availability.
f(code) -> code < 400 ? "UP" : "DOWN"
and then further proceed with that computed availability value.
See also [2] and [3]
=== "Backfill"
As feeds may not report back all the time, we may want to have a
watchdog which adds
a transition into "UNKNOWN" state.
=== Admin-down
A feed may discover resources that report their state as DOWN but where
this is not an issue and rather an
administrative decision. Take a network card as example where the card
as 8 ports, but only 4 of them
are connected. So the other 4 will be reported as DOWN, but in fact they
are DOWN on purpose.
The admin may mark those interfaces as ADMIN_DOWN, which also implies
that further incoming
DOWN-reports (what about UP, UNKNOWN?) reports can be ignored until the
admin re-enables the
interface.
This admin-down probably also needs to be marked in inventory.
=== Maintenance mode
On top of the availability we also have maintenance mode which is
orthogonal to availability and is more meant for alert suppression and
SLA computation. Maintenance mode should not overwrite the recorded or
computed availability.
We still want to record the original state no matter how maintenance
mode is.
== Storage
As I wrote earlier, the base assumption is that availability is supposed
to stay the same for
long periods of time. For that reason run-length encoded storage is
advised
< resource id, state, from , to >
The fields are more or less self-explanatory - to would be "null" if the
current state continues.
This is also sort of what we have done in RHQ, where we have also been
running into some issues,
(especially as we had a very db-bound approach). One issue is that if
you have a transition from UP to DOWN
the DB situation looks like this:
Start:
<rid , UP, from1 , null >
up-> Down at time = from2
find tuple <rid, ??, ??, null > and update to
<rid, UP, from1, from2>
append new tuple
<rid, DOWN, from2, null>
The other issue is to get the current availability (for display in UI
and/or in the previous transition)
find tuple <rid, ??, ??, null>
which are expensive.
The retrieval of the current availability for a resource can be improved
by introducing a cache that stores
as minimal information <rid, last state>.
Another issue that Yak pointed out is that if availability is recorded
infrequently and at random points in time,
just recording when a transition from UP to DOWN or even UNKNOWN
happened may be not enough, as there are scenarios when it is still
important to know when we heard the last UP report.
So above storage (and cache) tuple needs to be extended to contain the
_last heard_ time:
< resource id, state, from , to, last_head >
In this case, as we do not want to update that record for each incoming
availability report, we need to really
cache this information and have either some periodic write back to the
store or at least when a shutdown listener indicates that Hawkular is
going down. In case that we have multiple API endpoints that receive
alert reports , this may need to be a distributed cache.
== Retrieval
Retrieval of availability information may actually a bit more tricky as
returning the current availability state,
as there will be more information to convey:
We have two basic cases
* return current availability / resource state : this can probably be
answered directly from above mentioned cache
* return a timeline between some arbitrary start and end times. Here we
need to go out and return all records
that satisfy something like ( start_time < requested start && end_time >
requested start ) || (start_time > requested start && end_time <=
requested_end )
=== application / group of resources
For applications the situation becomes more complicated as we need to
retrieve the state (records) for each involved resource and then compute
the total state of the application.
Take an app with a load balancer, 3 app servers and a DB then this
computation may go like
avail ( app ) :=
UP if all resources are UP
MIXED if one app server is not UP
DOWN otherwise
Actually this may even contain a time component
avail ( app , time of day ) :=
if (business_hours (time of day) )
UP if all resources are UP
MIXED if one app server is not UP
DOWN otherwise
else
UP if all resources are UP
MIXED if two app servers are not UP
DOWN otherwise
It may be a good idea to not compute that on the fly at retrieval time,
but to add the result as synthetic availability records for the
computation into the normal availability processing stream as indicated
earlier in the "computed availability" section. This way, the computed
information is also available for alerting as input
== Alerting on availability
Alerting will need to see the (computed) availability data and also the
maintenance mode information to be able to
alert on
* is UP/DOWN/... ( for X time )
* goes UP/DOWN/...
With the above I think that alerting should not need to do complex
availability calculations on its own, but rather
work on the stream of incoming (compute
[1] https://issues.jboss.org/browse/HWKMETRICS-35
[2] http://lists.jboss.org/pipermail/hawkular-dev/2015-March/000413.html
[3] http://lists.jboss.org/pipermail/hawkular-dev/2015-March/000402.html
9 years, 8 months
Public Hawkular Instances
by Matthew Mahoney
A couple notes on the Public Hawkular instance:
1) Improvements have been made to the Public Hawkular instance ( http://209.132.178.218:18080 ) smoke test, as to mitigate the number of false failures that were happening.
We invite & encourage community use of this instance.
2) A new Public Hawkular instance ( http://209.132.178.218:18090 ) has been created which is intended to be used by Development/community for those use-cases such as Demos where you need to ensure that the instance will not be randomly restarted (for reasons such as when new Hawkular images are created, test cases git pushes, infrastructure improvements, etc).
This instance is an on-demand build only. Please ping/email me ( mmahoney(a)redhat.com ) if you want access to the Jenkins Hawkular Dev build job.
Thanks,
Matt
9 years, 8 months
availability and metric endpoints
by John Sanda
There has been some good discussion around availability lately. I want to add one more to the mix, but hopefully this one is not as in-depth as some of the other topics. Right in metrics we have endpoints like,
POST /metrics/numeric/data
GET /metrics/numeric/{id}/data
POST /metrics/availability/data
GET /metrics/availability/{id}/data
I would like to change these to,
POST /metrics/data
GET /metrics/{id}/data
POST /availability/data
GET /availability/{id}/data
I think the “metrics” prefix is awkward and unnecessary. I think that it is intuitive enough that metrics on its own refers to numeric data. Thoughts?
- John
9 years, 8 months
hawkular agent reporting metrics
by John Mazzitelli
OK, I can report some success.
In our new Hawkular Agent repo [1] I have a new-and-improved hawkular-wildfly-monitor maven module. It produces the hawkular monitor subsystem that gets deployed inside Wildfly and can monitor any number of attributes of any number of wildfly resources (right now I just have it configured to collect some memory and thread metric data - see [2] for the subsystem configuration's metricSet definitions).
It isn't baked into kettle, but using Libor's nice maven plugin, when you build that new agent you can tell maven to install it in your kettle instance (see [3] for the command to run when building hawkular-agent/hawkular-wildfly-monitor).
You won't be able to see anything in any nice graphs since kettle is strongly typed to that pinger thing. But the kettle log file should show many messages about the data getting pushed into hawkular-metrics. So, it should be in there.
This hawular agent's sole job is to monitor the wildfly instance it is running in.
That's all for now. Just wanted to give an early status since I was out last week.
--John Mazz
[1] https://github.com/hawkular/hawkular-agent
[2] https://github.com/hawkular/hawkular-agent/blob/master/hawkular-wildfly-m...
[3] mvn -Dorg.hawkular.wildfly.home=/source/hawkular/kettle/target/wildfly-8.2.0.Final clean install wildfly-extension:deploy
9 years, 8 months
hawkular wildfly agent
by John Mazzitelli
I am almost done with the hawkular monitor agent - I basically took the wildfly-monitor project and updated it as our first agent.
I created hawkular-agent repo with stuff in it: https://github.com/hawkular/hawkular-agent
I should be done by today - just have to finish integrating with metrics.
What this will give us is a subsystem that you install in Wildfly that can then monitor other subsystems in that wildfly instance. So it should be that once done, I can put it in kettle, and we'll get nice graphs for the internals of the kettle itself.
Today it does not monitor external wildfly instances, but that should be an easy enhancement.
9 years, 8 months