March 2015 - hawkular-dev - Jboss List Archives

"Getting Things Done" (David Allen) System: Artificer

by Brett Meyer

FWIW, I've been experimenting with Artificer as a "Getting Things Done" solution, mainly as a bit of dogfooding. However, it fits in surprisingly well. I'll be maintaining a demo that showcases a GTD ontology/classifier system, showing Artificer as not-just-another-artifact-repo, but also a powerful information and reference system. https://developer.jboss.org/en/artificer/blog/2015/03/31/getting-things-d...

4 months

4
3
0 / 0

Coming to Parent: publishing JavaDoc and sources via Maven

by Peter Palaga

Hi *, there is a question about handling JavaDoc checks violations at the bottom. I have added maven-javadoc-plugin and maven-source-plugin to the new "release" profile in Hawkular Parent. https://github.com/hawkular/hawkular-parent-pom/pull/20 I have tested the above only with Hawkular Bus. Therefore, please test the setup with your Hawkular component: * Make whatever is appropriate to make your component use the pom from https://github.com/hawkular/hawkular-parent-pom/pull/20 * Invoke … [View More]

9 years, 11 months

6
12
0 / 0

Business app/services representation in Inventory

by Gary Brown

Hi Before going too far down the BTM road, I just wanted to confirm whether or not we want the business app, their components services, and their relationships to IT resources they use, stored in Hawkular Inventory? An alternative approach would be to derive the structure and relationships dynamically from the business transaction instance information. The benefit of storing in Inventory is it enables end users to navigate through the inventory to understand the relationships to the business … [View More]

9 years, 11 months

6
12
0 / 0

RHQ Metrics - 0.2.7 & Hawkular Future

by Stefan Negrea

Hello Everybody, I want to summarize the latest release of the RHQ Metrics project and the future of the project. 1) RHQ Metrics migrates to Hawkular organization Release 0.2.7 of the RHQ Metrics is the last one from the current repository. But do not panic! Beyond the mechanics of the transfer and rename, the development will continue with the regular crew. For the migration, two project repositories (rhq-metrics and rhq-metrics-openshift) will just be transferred to the Hawkular … [View More]

9 years, 11 months

2
2
0 / 0

RfC: Availability

by Heiko W.Rupp

Hey, there was apparently some watercooler discussion yesterday without any minutes, so the following will not be able to refer to it in any way. Hawkular needs to have a way to store, retrieve and display availability of a resource or a bunch of them [1]. While we have some short term goals, for the longer run we need to better identify what needs to be done. I think we need to separately look at the following concerns: * availability reporting * api *values * availability … [View More]computation * availability storage * availability retrieval * alerting on availability * computed resource state The basic assumption here is that availability is something relatively stable. Meaning that usually the same state (hopefully "UP") is reported each time in a row for a veeery long period of time (there are some servers with uptimes >> 1 year). == reporting Feeds report availability to Hawkular, where the data may be further processed and stored. The reported values are probably in the range of "UP", "DOWN". I can also imagine that e.g. an application server that starts shutting down could send a "GOING_DOWN" value. On the API side, we need to be able to receive (a list of) tuples `< resource id, report time, state >` In case of full Hawkular, the _resource id_ needs to be a valid one from Inventory. _Report time_ is the local time on the resource / agent when that state was retrieved, represented in ms since the epoch UTC and then finally the _state_ which would be an Enum of "UP", "DOWN" and potentially some other values. While I have described them as string here, the representation on the wire may be implemented differently like 1 and 0 or true and false. == computed availability In addition to above reporting we may have feeds that either are not able to deliver availability or where the availability is delivered as a numeric value - see e.g. the pinger, where a <rid>.status.code is delivered as metric value representing the http status codes. Here we need to be apply a mapping from return code -> availability. f(code) -> code < 400 ? "UP" : "DOWN" and then further proceed with that computed availability value. See also [2] and [3] === "Backfill" As feeds may not report back all the time, we may want to have a watchdog which adds a transition into "UNKNOWN" state. === Admin-down A feed may discover resources that report their state as DOWN but where this is not an issue and rather an administrative decision. Take a network card as example where the card as 8 ports, but only 4 of them are connected. So the other 4 will be reported as DOWN, but in fact they are DOWN on purpose. The admin may mark those interfaces as ADMIN_DOWN, which also implies that further incoming DOWN-reports (what about UP, UNKNOWN?) reports can be ignored until the admin re-enables the interface. This admin-down probably also needs to be marked in inventory. === Maintenance mode On top of the availability we also have maintenance mode which is orthogonal to availability and is more meant for alert suppression and SLA computation. Maintenance mode should not overwrite the recorded or computed availability. We still want to record the original state no matter how maintenance mode is. == Storage As I wrote earlier, the base assumption is that availability is supposed to stay the same for long periods of time. For that reason run-length encoded storage is advised < resource id, state, from , to > The fields are more or less self-explanatory - to would be "null" if the current state continues. This is also sort of what we have done in RHQ, where we have also been running into some issues, (especially as we had a very db-bound approach). One issue is that if you have a transition from UP to DOWN the DB situation looks like this: Start: <rid , UP, from1 , null > up-> Down at time = from2 find tuple <rid, ??, ??, null > and update to <rid, UP, from1, from2> append new tuple <rid, DOWN, from2, null> The other issue is to get the current availability (for display in UI and/or in the previous transition) find tuple <rid, ??, ??, null> which are expensive. The retrieval of the current availability for a resource can be improved by introducing a cache that stores as minimal information <rid, last state>. Another issue that Yak pointed out is that if availability is recorded infrequently and at random points in time, just recording when a transition from UP to DOWN or even UNKNOWN happened may be not enough, as there are scenarios when it is still important to know when we heard the last UP report. So above storage (and cache) tuple needs to be extended to contain the _last heard_ time: < resource id, state, from , to, last_head > In this case, as we do not want to update that record for each incoming availability report, we need to really cache this information and have either some periodic write back to the store or at least when a shutdown listener indicates that Hawkular is going down. In case that we have multiple API endpoints that receive alert reports , this may need to be a distributed cache. == Retrieval Retrieval of availability information may actually a bit more tricky as returning the current availability state, as there will be more information to convey: We have two basic cases * return current availability / resource state : this can probably be answered directly from above mentioned cache * return a timeline between some arbitrary start and end times. Here we need to go out and return all records that satisfy something like ( start_time < requested start && end_time > requested start ) || (start_time > requested start && end_time <= requested_end ) === application / group of resources For applications the situation becomes more complicated as we need to retrieve the state (records) for each involved resource and then compute the total state of the application. Take an app with a load balancer, 3 app servers and a DB then this computation may go like avail ( app ) := UP if all resources are UP MIXED if one app server is not UP DOWN otherwise Actually this may even contain a time component avail ( app , time of day ) := if (business_hours (time of day) ) UP if all resources are UP MIXED if one app server is not UP DOWN otherwise else UP if all resources are UP MIXED if two app servers are not UP DOWN otherwise It may be a good idea to not compute that on the fly at retrieval time, but to add the result as synthetic availability records for the computation into the normal availability processing stream as indicated earlier in the "computed availability" section. This way, the computed information is also available for alerting as input == Alerting on availability Alerting will need to see the (computed) availability data and also the maintenance mode information to be able to alert on * is UP/DOWN/... ( for X time ) * goes UP/DOWN/... With the above I think that alerting should not need to do complex availability calculations on its own, but rather work on the stream of incoming (compute [1] https://issues.jboss.org/browse/HWKMETRICS-35 [2] http://lists.jboss.org/pipermail/hawkular-dev/2015-March/000413.html [3] http://lists.jboss.org/pipermail/hawkular-dev/2015-March/000402.html [View Less]

9 years, 11 months

4
8
0 / 0

Public Hawkular Instances

by Matthew Mahoney

A couple notes on the Public Hawkular instance: 1) Improvements have been made to the Public Hawkular instance ( http://209.132.178.218:18080 ) smoke test, as to mitigate the number of false failures that were happening. We invite & encourage community use of this instance. 2) A new Public Hawkular instance ( http://209.132.178.218:18090 ) has been created which is intended to be used by Development/community for those use-cases such as Demos where you need to ensure that the instance … [View More]

9 years, 11 months

5
6
0 / 0

availability and metric endpoints

by John Sanda

There has been some good discussion around availability lately. I want to add one more to the mix, but hopefully this one is not as in-depth as some of the other topics. Right in metrics we have endpoints like, POST /metrics/numeric/data GET /metrics/numeric/{id}/data POST /metrics/availability/data GET /metrics/availability/{id}/data I would like to change these to, POST /metrics/data GET /metrics/{id}/data POST /availability/data GET /availability/{id}/data I think the … [View More]

9 years, 12 months

6
28
0 / 0

hawkular agent reporting metrics

by John Mazzitelli

OK, I can report some success. In our new Hawkular Agent repo [1] I have a new-and-improved hawkular-wildfly-monitor maven module. It produces the hawkular monitor subsystem that gets deployed inside Wildfly and can monitor any number of attributes of any number of wildfly resources (right now I just have it configured to collect some memory and thread metric data - see [2] for the subsystem configuration's metricSet definitions). It isn't baked into kettle, but using Libor's nice maven … [View More]

9 years, 12 months

1
1
0 / 0

hawkular wildfly agent

by John Mazzitelli

I am almost done with the hawkular monitor agent - I basically took the wildfly-monitor project and updated it as our first agent. I created hawkular-agent repo with stuff in it: https://github.com/hawkular/hawkular-agent I should be done by today - just have to finish integrating with metrics. What this will give us is a subsystem that you install in Wildfly that can then monitor other subsystems in that wildfly instance. So it should be that once done, I can put it in kettle, and we'll get … [View More]

9 years, 12 months

1
0
0 / 0

Proposal: Add PGP artifact signing

by Peter Palaga

Hi *, I propose to add maven-gpg-plugin to the release profile, similarly as I did for javadoc and sources in https://github.com/hawkular/hawkular-parent-pom/commit/d54a8d03b4ef251d59... A pom.xml snippet is in https://issues.jboss.org/browse/HAWKULAR-108 == Why? Because Maven Central requires it [1]. Although apparently, they already have accepted our unsigned artifacts already. I would not let our CI to sign the SNAPSHOT releases. == So what is the problem? The team members doing … [View More]

9 years, 12 months

2
1
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

hawkular-dev March 2015