[Hawkular-dev] Availability (Service and UI)
Michael Burman
miburman at redhat.com
Fri Feb 13 17:18:37 EST 2015
Hi,
I disagree, seeing this pattern first hand in resulting large penalty
payment on one case (inability to prove that monitoring reported up on
certain time when it wasn't responsive). If there's a need to monitor
availability, there must be recorded event flow, eg:
Up 11:00
Up 11:05
Up 11:10
Down 11:15
Down 11:20
Up 11:25
And why? If we only record:
Up 11:00 - 11:10
Down 11:15 - 11:20
Up 11:25 - ..
There's a difference. Did the monitoring system report 11:05 or was the
monitoring system down 11:05? In many cases a need to prove that system
was in certain state in certain times. Especially when there are SLA
disagreements. Even more importantly, what happened between 11:10 -
11:15 and 11:20 - 11:25?
The SLA behaviour requires that some system noticed something happening
at certain point of time. And if there's issues with system
responsibility say at 11:05, there needs to be a event showing that the
system really did report up.
- Micke
On 13.02.2015 14:01, Thomas Heute wrote:
>
> Getting back to availability discussion...
>
> To me availability is a set of periods, not so much "time series" and
> we should just record change of status (closing the previous event and
> opening a new one).
>
> - Server is up from 8:00am to 11:30am
> - Server is down from 11:30am to 11:32am
> - Server is unknown from 11:32am to 12:00pm (an agent running on a
> machine can tell if a server is up or down, if the agent dies then we
> don't know if the server is up or down)
> - Server is in erratic state from 12:00pm to 12:30pm (agent
> reports down every few requests)
>
> We were discussing the best way to represent availability over time in
> a graph, representation in RHQ [1] is very decent IMO, can be extended
> with more colors to reflect how often/long the website was down for
> each "brick" (if the line represent a year with 52 blocks, 1 block can
> be more or less red depending on how long it was done during the week).
>
> But thinking of it more, availability graph is not that interesting by
> itself IMO and more interesting in the context of other values.
> I attached a mockup of what I meant, a red area is displayed on
> response time graph, that means that the system is down, obviously
> there is no response time reported anymore in that period. Earlier
> there is an erratic area, seems related to higher response time ;)
> Rest of the time the system is just up and running...
>
> Additionally I would want to see reports of availability:
> - overall availability over a period of time (a day, a month, a
> year...). "99.99% available in the past month"
> - lists of the down periods with start dates and duration for a
> particular resource or set of resources (filtering options)
>
> Thoughts ?
>
> [1]
> http://3.bp.blogspot.com/-0MsmG5h5i5E/TfjTMZlvx3I/AAAAAAAAABU/6PKDs0RlzuI/s1600/ProblemManagement-RHQ.png
>
> Thomas
>
>
> _______________________________________________
> hawkular-dev mailing list
> hawkular-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hawkular-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20150214/7bf5e408/attachment-0001.html
More information about the hawkular-dev
mailing list