March 2015 - hawkular-dev - Jboss List Archives

by Thomas Segismont

Hi everyone, We've been talking about Influx support in Metrics recently so I've started a gap analysis document: https://docs.google.com/document/d/1xZRt7MjvZJ7J8jKFNrR8bYebcObihfd1YIljv... It doesn't say how we could implement missing features but rather lists them (well, the most important). Feel free to comment. Regards, Thomas

9 years, 9 months

2
2
0 / 0

[metrics] Consistent error replies from REST API

by Thomas Segismont

Hi, There's a growing portion of application code in the metrics REST API which returns errors (with whichever HTTP status) in the following form: ==== {"errorMsg": "blah"} ==== Whether this format is nice or suitable is not the purpose of this message. I just want to draw your attention to the fact that being consistent regarding error reporting is important. In this regard, I've added a new ExceptionMapper for org.jboss.resteasy.spi.ReaderException[2] instances. So now, not only application errors conform to the format above, but also errors coming from payload parsing. Regards, Thomas [1] http://docs.jboss.org/resteasy/docs/3.0.9.Final/userguide/html_single/#bu...

9 years, 9 months

6
9
0 / 0

Computed resource state

by Heiko W.Rupp

But in fact (and we were discussing that already) if the above url "ping" would be done from two different sites (e.g. US and EU) and one would return 200 and the other a timeout, then the real availability would be UP, as it is reachable (*1). Here a single feed (pinger in one location) is no longer able to determine the availability alone. Also it may not be enough to determine availability by status code alone, as a 200 after 2 minutes is for the end customer equivalent to down. And then we found out in RHQ that just having availability states of "UP" and "DOWN" are not enough, as individual resources may be down on purpose, the feed may just not report anything. Or when you look at a group of resources (or composite resource) like an application consisting of multiple services, the total availability of my shop may be up, but degraded (e.g. slow response time). Or it may be up and fast, but one of the 3 servers in the cluster is down . This is why I am proposing a) to have a more differentiated set of "resource state"s and b) to have this state being a function of several input parameters. About a) this is a list of possible resource states, where UP and DOWN correspond to the classical binary availability terms. UP: Resource is available and working normally DEGRADED: Resource is available but not at full performance DOWN: Resource is at fault and not working normally MAINTENANCE: There is a scheduled maintenance period, availability may be UP or DOWN MISSING: The resource was recorded in inventory, but does not exist in reality (e.g. was deleted on file system) ADMIN_DOWN/DISABLED: The resource exists, but was disabled by the admin (e.g. a network interface on a 8 port card where only 1 cable is connected) UNKNOWN: Resource state can not be determined Aggregated state A state of “MIXED” can be added for groups or applications (e.g 3 servers in a cluster, one server is down, 2 are up). For groups, the aggregated state could be computed as follows, but see below All UP: Group is UP All DOWN: Group is DOWN Otherwise: Group is MIXED Wrt b) computation of state For the example of the url ping, the resource state could be computed as function(list< code, time >) { result = down; for (< code, time > ) { if (code == 200 ) { if (time < threshold ) { return UP; } } } return DOWN } This is already sort of what alerting is doing partially right now, and we could use this in a rectified way [input values]----> [ resource state processor ] ---(+) and then at the (+) point we expose the resource state to e.g. the UI and other services, where one of the services is the alert engine (+)----> [ alert engine ] ----> [ notification handlers ] That decides upon the computed states if alerting needs to be done and in what way. *1) Of course we still need to flag the timeout, as the timeout may have an impact on customers being able to reach the shop.

9 years, 9 months

1
0
0 / 0

Availability vs uptime for URL "ping"

by Thomas Heute

In the most simplistic form of monitoring we're looking at pinging website and report up/down and response time from the initiator of the HTTP HEAD/GET We've discussed a bit about availability vs uptime. First question: Do we need to distinguish ? Is it important for someone who wants to know if his website is accessible to really separate the 2 concepts. (Details vs simplicity) Second question: If we separate the 2, how do we do distinguish ? A suggestion: * HTTP Code 2xx and 3xx -> URL is up and available * HTTP Code 4xx -> The server may reject the request (it may not like bots, user entered a wrong url (should be checked upfront), or resource has been deleted)... Server is up, availability is unknown * HTTP Code 5xx -> URL is up but not available * Timeout -> URL is down and not available Couldn't resolve host 'www.fffffffffefwefdwdf.com' -> Domain name is deleted: URL is down and not available 4xx is likely the most debatable, it's a client issue and likely needs either code fix or user intervention... (And we can't unfortunately expect servers to fully respect the codes) Thoughts ? Thomas

9 years, 9 months

2
1
0 / 0

infinispan + cassandra

by John Mazzitelli

I'm looking at Infinispan and how it can be configured in Wildfly (there was talk about us needing a clustered cache in the kettle, so this is what started me looking at this). Since we already have C* in use by metrics, and there is talk that inventory is going to use C* under the covers, I was wondering what people thought about utlizing C* as the persistent backend for infinispan: http://infinispan.org/docs/cachestores/cassandra/ This provides persistence across a clustered infinispan cache. Is this something we'd be interested in or would want to use?

9 years, 9 months

4
3
0 / 0

@Null / @NotNull

by Heiko W.Rupp

Hey, as we were having a discussion this morning on #hawkular-dev about null I think we should start (on top of writing more JavaDoc) using annotations to mark methods and method parameters that allow being null / not null and if they may return null or not. This is especially important on API classes(*1), that may be consumed by 3rd parties that may or may not have insight into source of the implementation (on top of that, such annotations are an enhancement of the API contract). I know we have been talking about that in the past without no real result. Luckily since then JSR 308 has been marked as final and thus a standard exists, that includes such annotations (+ a checking framework, + tooling to retroactively add such annotations to existing source). See http://types.cs.washington.edu/checker-framework/ and http://mvnrepository.com/artifact/org.checkerframework for mvn artifacts http://mvnrepository.com/artifact/org.checkerframework/checker-qual/1.8.10 <- annotations http://types.cs.washington.edu/checker-framework/current/api/ <- Javadoc Another option are the org.jetbrains versions of the annotations, which are around for a while. If we only care about documentation, but not compile time and/or static analysis, we can also use the ones from BeanValidation, which live under the javax package spaces (iirc). *2 Here is a more complete list of frameworks http://stackoverflow.com/questions/4963300/which-notnull-java-annotation-... *1) Java and REST *2) BV may be a good idea for more complex input like in alert definition

9 years, 9 months

6
10
0 / 0

Forecasting of bad things

by Heiko W. Rupp

Something I was talking about for a while ... I saw this today in Android 5 - they give you (when the device is running on battery) not only a chart about past battery consumption (dark green), but also about estimated future depletion (grey-green) and on top of the chart the current state (40%) + an estimate that the battery will last ~4 more days at the same usage level. -- Heiko Rupp hwr(a)pilhuhn.de Blog: http://pilhuhn.blogspot.com @pilhuhn

9 years, 9 months

6
5
0 / 0

SE Radio episode on Continuous Devliery

by Heiko W.Rupp

Hey, this is another excellent episode of Software Engineering Radio - this time on Continuous Delivery. http://www.se-radio.net/2015/02/episode-221-jez-humble-on-continuous-deli... Something to consider for your commute to work. Heiko

9 years, 9 months

1
0
0 / 0

C* at Apple

by Heiko W.Rupp

https://www.youtube.com/watch?v=Bc4ql9TDzyg&index=48&list=PLqcm6qE9lgKJkx... -- Reg. Adresse: Red Hat GmbH, Technopark II, Haus C, Werner-von-Siemens-Ring 14, D-85630 Grasbrunn Handelsregister: Amtsgericht München HRB 153243 Geschäftsführer: Charles Cachera, Michael Cunningham, Paul Hickey, Charlie Peters

9 years, 10 months

1
0
0 / 0

Notification messages

by Gary Brown

Hi Started looking at Hawkular alerts with an eye on RTGov eventually using the notification mechanism to represent what we currently store/display as 'situations'. Had a couple of questions: 1) The notification message currently has a notifierId - so does this mean an alert trigger will only have a single notifer? Or could a single alert (notification message) potentially be sent to multiple notifiers (e.g. twitter, email, etc.) - and if so require a notifierId list? 2) Only other field is currently a description, which is fine for targets such as email, twitter etc, but in RTGov 'situations' are also used to hold other information that can be used to understand the source of the problem, and tie it back to the originating business transaction. Will it be possible to add such fields to the notification message, even though they may not be relevant for the email/sms/twitter type notifiers? Regards Gary

9 years, 10 months

3
10
0 / 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

hawkular-dev March 2015