Getting back to availability discussion...
To me availability is a set of periods, not so much "time series"
and we should just record change of status (closing the previous
event and opening a new one).
- Server is up from 8:00am to 11:30am
- Server is down from 11:30am to 11:32am
- Server is unknown from 11:32am to 12:00pm (an agent running
on a machine can tell if a server is up or down, if the agent dies
then we don't know if the server is up or down)
- Server is in erratic state from 12:00pm to 12:30pm (agent
reports down every few requests)
We were discussing the best way to represent availability over
time in a graph, representation in RHQ [1] is very decent IMO, can
be extended with more colors to reflect how often/long the website
was down for each "brick" (if the line represent a year with 52
blocks, 1 block can be more or less red depending on how long it
was done during the week).
But thinking of it more, availability graph is not that
interesting by itself IMO and more interesting in the context of
other values.
I attached a mockup of what I meant, a red area is displayed on
response time graph, that means that the system is down, obviously
there is no response time reported anymore in that period. Earlier
there is an erratic area, seems related to higher response time ;)
Rest of the time the system is just up and running...
Additionally I would want to see reports of availability:
- overall availability over a period of time (a day, a month,
a year...). "99.99% available in the past month"
- lists of the down periods with start dates and duration for
a particular resource or set of resources (filtering options)
Thoughts ?
[1]
http://3.bp.blogspot.com/-0MsmG5h5i5E/TfjTMZlvx3I/AAAAAAAAABU/6PKDs0RlzuI/s1600/ProblemManagement-RHQ.png
Thomas
_______________________________________________
hawkular-dev mailing list
hawkular-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hawkular-dev