On 7 Apr 2015, at 19:40, mike thompson wrote:
I found this view of availability interesting from open source
monitoring project: CachetHQ(https://cachethq.io/
<
https://cachethq.io/>)
Perhaps we might want to incorporate these ideas of component status
versus incident status.
I think this is another example of computed resource state.
Where some function analysis incoming data and then computes
those.
Operational
This probably corresponds to "UP" in out terms.
Performance Issues
The component is experiencing some slowness.
3
Partial Outage
Both are sort of "WARN", that we do not directly have on the
radar for a single resource yet, but more in the sense of mixed
state. I agree, that it could be good to give the admin an idea
of some sub-variants of "WARN" to convey more information
directly in the state.
Incident Status
This is completely orthogonal to the above. As e.g. "scheduled" sort of
"overrides" in the sense that we know that a resource may be down of
affected during maintenance.
Scheduled
=> Maintenance mode
Investigating
You have reports of a problem and you're currently looking into them.
Sub-division of "acknowledged"
2
Identified
You've found the issue and you're working on a fix.
Another sub-variant of acknowledged
3
Watching
You've since deployed a fix and you're currently watching the
situation.
Dito to me
Fixed
The fix has worked, you're happy to close the incident.
This is probably more like "closed" -- and could be even triggered by
recovery alerts (forgot how we call them now - safety mode?)