April 2015 - hawkular-dev - Jboss List Archives

by Lukas Krejci

The following is a new meeting request: Subject: [Hawkular-dev] hawkular monitor agent - inventory Organiser: "Lukas Krejci" <lkrejci(a)redhat.com> Time: Friday, 17 April, 2015, 4:00:00 PM - 5:00:00 PM GMT +01:00 Belgrade, Bratislava, Budapest, Ljubljana, Prague Invitees: mazz(a)redhat.com; hawkular-dev(a)lists.jboss.org *~*~*~*~*~*~*~*~*~* Agenda: Discuss how to set up and represent wildfly management model entities in hawkular inventory. To join the Meeting: https://bluejeans.com/691016054 To join via Browser: https://bluejeans.com/691016054/browser To join with Lync: https://bluejeans.com/691016054/lync To join via Room System: Video Conferencing System: bjn.vc -or- 199.48.152.152 Meeting ID: 691016054 To join via Phone: 1) Dial: +442035746870 (see all numbers - https://www.intercallonline.com/listNumbersByCode.action?confCode=8169978803) 2) Enter Conference ID: 8169978803

10 years, 8 months

1
0
0 / 0

hawkular monitor agent - inventory

by John Mazzitelli

The Hawkular Monitor Agent (the thing that runs inside of WildFly as a subsystem) can now monitor both its own WildFly instance as well as any remote WildFly instance. It can collect metrics and perform availability checks on any attribute in any subsystem within WildFly and can store that data in Hawkular-Metrics or Hawkular ecosystem. (so if there is any product that runs inside of WildFly, we can now monitor it). I am at a point where I can integrate it into kettle. I need to now talk to Lukas and gang about the inventory stuff. I have no idea where to start when it comes to integrating with inventory. Lukas - can you setup a call of some sort where you guys can tell me what I have to do and I can ask questions? Hopefully this can help us figure out what we need from inventory from a client perspective.

10 years, 8 months

1
0
0 / 0

metrics explorer UI

by John Mazzitelli

Can someone describe to me what needs to be done to see the Hawkular-Metrics UI (the explorer)? If I store metric and avail data into kettle via the Metrics REST API, I can't use the kettle UI because it is geared to the pinger. Is there a explorer UI that comes with kettle outside of the main hawkular UI? If not, is there something I have to do to enable that explorer webapp?

10 years, 8 months

6
11
0 / 0

Business app/services representation in Inventory

by Gary Brown

Hi Before going too far down the BTM road, I just wanted to confirm whether or not we want the business app, their components services, and their relationships to IT resources they use, stored in Hawkular Inventory? An alternative approach would be to derive the structure and relationships dynamically from the business transaction instance information. The benefit of storing in Inventory is it enables end users to navigate through the inventory to understand the relationships to the business apps/services, as well as allow other tooling (e.g. impact analysis) to determine the effect of IT resource downtime on business apps. Thoughts? Regards Gary

10 years, 8 months

6
12
0 / 0

RHQ Metrics - 0.2.7 & Hawkular Future

by Stefan Negrea

Hello Everybody, I want to summarize the latest release of the RHQ Metrics project and the future of the project. 1) RHQ Metrics migrates to Hawkular organization Release 0.2.7 of the RHQ Metrics is the last one from the current repository. But do not panic! Beyond the mechanics of the transfer and rename, the development will continue with the regular crew. For the migration, two project repositories (rhq-metrics and rhq-metrics-openshift) will just be transferred to the Hawkular organization. The code from rhqm-charts was already moved to Hawkular, so we will just close the RHQ repository. We will have a follow up communication once all the infrastructure is in place under the new organization. 2) RHQ Metrics 0.2.7 was released today This release has mainly stability fixes and minor enhancements. The Keycloak integration was delayed and not part of this release (as announced in the planning notes). For more details checkout the Github release notes. Github Release: https://github.com/rhq-project/rhq-metrics/releases/tag/0.2.7 JBoss Nexus Maven artifacts: http://origin-repository.jboss.org/nexus/content/repositories/public/org/... 2) OpenShift Cartridge for RHQ Metrics 0.2.7 The cartridge supports RHQ Metrics 0.2.7, 0.2.6, and 0.2.5. Just a reminder, the cartridge is the simplest and easiest way to get a public facing instance of RHQ Metrics in just a few minutes with a single command. The cartridge configures Cassandra, Wildfly, and RHQ Metrics (REST interface and UI console) to run in a single gear. For more details please visit the Github repository of the project. Sample command to create a new RHQ Metrics deployment: rhc app create test_app https://raw.githubusercontent.com/rhq-project/rhq-metrics-openshift/maste... Github Repository: https://github.com/rhq-project/rhq-metrics-openshift A big "Thank you!" goes to John Sanda, Mike Thompson, Heiko Rupp, and Thomas Segismont for their project contributions. Any discussion, suggestions or contributions are more than welcomed; so feel free to reply to this email or comment directly on the various forum threads. Thank you, Stefan Negrea Software Engineer _______________________________________________ rhq-devel mailing list rhq-devel(a)lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/rhq-devel

10 years, 8 months

2
2
0 / 0

Another perspective on Availability

by mike thompson

I found this view of availability interesting from open source monitoring project: CachetHQ(https://cachethq.io/ <https://cachethq.io/>) Perhaps we might want to incorporate these ideas of component status versus incident status. Component Status https://docs.cachethq.io/v1.0/docs/component-statuses <https://docs.cachethq.io/v1.0/docs/component-statuses> Status Name Description 1 Operational The component is working. 2 Performance Issues The component is experiencing some slowness. 3 Partial Outage The component may not be working for everybody. This could be a geographical issue for example. 4 Major Outage The component is not working for anybody. Incident Status https://docs.cachethq.io/v1.0/docs/incident-statuses <https://docs.cachethq.io/v1.0/docs/incident-statuses> Status Name Description 0 Scheduled This status is used for a scheduled status. 1 Investigating You have reports of a problem and you're currently looking into them. 2 Identified You've found the issue and you're working on a fix. 3 Watching You've since deployed a fix and you're currently watching the situation. 4 Fixed The fix has worked, you're happy to close the incident.

10 years, 8 months

2
1
0 / 0

Stronger typing of metrics

by Michael Burman

Hi, With our metric definitions, I'd like to see stronger definition of what sort of data we're storing and how it could be processed in the future. And with this I mean the same sort of stuff we had in the RHQ, such as "cumulative / gauge / trendsup / etc", so that we could give better post processing capabilities when fetching the data such as transforming the data between deltas and cumulative (depending on the user needs). While this could definitely be done with the tags, using such as "units", "type", we don't have any defined names for these options that we could depend on later. I guess this goes to the other tags discussion also, but I assume our tags are designed to work for searching capabilities as well as for definitions? - Micke

10 years, 8 months

2
1
0 / 0

RfC: Availability

by Heiko W.Rupp

Hey, there was apparently some watercooler discussion yesterday without any minutes, so the following will not be able to refer to it in any way. Hawkular needs to have a way to store, retrieve and display availability of a resource or a bunch of them [1]. While we have some short term goals, for the longer run we need to better identify what needs to be done. I think we need to separately look at the following concerns: * availability reporting * api *values * availability computation * availability storage * availability retrieval * alerting on availability * computed resource state The basic assumption here is that availability is something relatively stable. Meaning that usually the same state (hopefully "UP") is reported each time in a row for a veeery long period of time (there are some servers with uptimes >> 1 year). == reporting Feeds report availability to Hawkular, where the data may be further processed and stored. The reported values are probably in the range of "UP", "DOWN". I can also imagine that e.g. an application server that starts shutting down could send a "GOING_DOWN" value. On the API side, we need to be able to receive (a list of) tuples `< resource id, report time, state >` In case of full Hawkular, the _resource id_ needs to be a valid one from Inventory. _Report time_ is the local time on the resource / agent when that state was retrieved, represented in ms since the epoch UTC and then finally the _state_ which would be an Enum of "UP", "DOWN" and potentially some other values. While I have described them as string here, the representation on the wire may be implemented differently like 1 and 0 or true and false. == computed availability In addition to above reporting we may have feeds that either are not able to deliver availability or where the availability is delivered as a numeric value - see e.g. the pinger, where a <rid>.status.code is delivered as metric value representing the http status codes. Here we need to be apply a mapping from return code -> availability. f(code) -> code < 400 ? "UP" : "DOWN" and then further proceed with that computed availability value. See also [2] and [3] === "Backfill" As feeds may not report back all the time, we may want to have a watchdog which adds a transition into "UNKNOWN" state. === Admin-down A feed may discover resources that report their state as DOWN but where this is not an issue and rather an administrative decision. Take a network card as example where the card as 8 ports, but only 4 of them are connected. So the other 4 will be reported as DOWN, but in fact they are DOWN on purpose. The admin may mark those interfaces as ADMIN_DOWN, which also implies that further incoming DOWN-reports (what about UP, UNKNOWN?) reports can be ignored until the admin re-enables the interface. This admin-down probably also needs to be marked in inventory. === Maintenance mode On top of the availability we also have maintenance mode which is orthogonal to availability and is more meant for alert suppression and SLA computation. Maintenance mode should not overwrite the recorded or computed availability. We still want to record the original state no matter how maintenance mode is. == Storage As I wrote earlier, the base assumption is that availability is supposed to stay the same for long periods of time. For that reason run-length encoded storage is advised < resource id, state, from , to > The fields are more or less self-explanatory - to would be "null" if the current state continues. This is also sort of what we have done in RHQ, where we have also been running into some issues, (especially as we had a very db-bound approach). One issue is that if you have a transition from UP to DOWN the DB situation looks like this: Start: <rid , UP, from1 , null > up-> Down at time = from2 find tuple <rid, ??, ??, null > and update to <rid, UP, from1, from2> append new tuple <rid, DOWN, from2, null> The other issue is to get the current availability (for display in UI and/or in the previous transition) find tuple <rid, ??, ??, null> which are expensive. The retrieval of the current availability for a resource can be improved by introducing a cache that stores as minimal information <rid, last state>. Another issue that Yak pointed out is that if availability is recorded infrequently and at random points in time, just recording when a transition from UP to DOWN or even UNKNOWN happened may be not enough, as there are scenarios when it is still important to know when we heard the last UP report. So above storage (and cache) tuple needs to be extended to contain the _last heard_ time: < resource id, state, from , to, last_head > In this case, as we do not want to update that record for each incoming availability report, we need to really cache this information and have either some periodic write back to the store or at least when a shutdown listener indicates that Hawkular is going down. In case that we have multiple API endpoints that receive alert reports , this may need to be a distributed cache. == Retrieval Retrieval of availability information may actually a bit more tricky as returning the current availability state, as there will be more information to convey: We have two basic cases * return current availability / resource state : this can probably be answered directly from above mentioned cache * return a timeline between some arbitrary start and end times. Here we need to go out and return all records that satisfy something like ( start_time < requested start && end_time > requested start ) || (start_time > requested start && end_time <= requested_end ) === application / group of resources For applications the situation becomes more complicated as we need to retrieve the state (records) for each involved resource and then compute the total state of the application. Take an app with a load balancer, 3 app servers and a DB then this computation may go like avail ( app ) := UP if all resources are UP MIXED if one app server is not UP DOWN otherwise Actually this may even contain a time component avail ( app , time of day ) := if (business_hours (time of day) ) UP if all resources are UP MIXED if one app server is not UP DOWN otherwise else UP if all resources are UP MIXED if two app servers are not UP DOWN otherwise It may be a good idea to not compute that on the fly at retrieval time, but to add the result as synthetic availability records for the computation into the normal availability processing stream as indicated earlier in the "computed availability" section. This way, the computed information is also available for alerting as input == Alerting on availability Alerting will need to see the (computed) availability data and also the maintenance mode information to be able to alert on * is UP/DOWN/... ( for X time ) * goes UP/DOWN/... With the above I think that alerting should not need to do complex availability calculations on its own, but rather work on the stream of incoming (compute [1] https://issues.jboss.org/browse/HWKMETRICS-35 [2] http://lists.jboss.org/pipermail/hawkular-dev/2015-March/000413.html [3] http://lists.jboss.org/pipermail/hawkular-dev/2015-March/000402.html

10 years, 8 months

4
8
0 / 0

Jira work log

by John Sanda

I see that there is a work log tab for tickets. Is there a way to set it up so that when commits are pushed, the ticket can automatically be updated? I am pushing commits to my personal repo, and it would be nice for the ticket to indicate this so that it is easy to see where work is being done. I could just add a comment with the info, but I figured it would be worth checking to see if there is an automagic way for the ticket to be updated with that info. - John

10 years, 8 months

2
1
0 / 0

migration to the new inventory

by Jiri Kremser

Hi, we've merged all the pull requests so the new inventory is there. There are still some minor issues though. In UI you may see couple of errors, but the pinger seems to be working after all. Everything should be buildable of course. Hopefully I'll resolve the rest of the issues tomorrow, jk

10 years, 8 months

1
0
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

hawkular-dev April 2015