[Hawkular-dev] Availability (Service and UI)

Heiko W.Rupp hrupp at redhat.com
Tue Feb 17 02:46:36 EST 2015


Hi,

> The SLA behaviour requires that some system noticed something happening at certain point of time. And if there's issues with system responsibility say at 11:05, there needs to be a event showing that the system really did report up.

I think we may ned to look at two things

- determining availability
- storing/displaying it.

In RHQ/Jon, the plugin is doing periodic checks and reporting results.
If the agent/server connection is down we at some point in time recorded that
and did "backfill".
RHQ does not allow a resource to say "Hey, I am going down". Nor does it allow to have 
e.g. a 2nd agent to also check availability by e.g. pinging the resource.
Another drawback in RHQ are the sometimes very large reporting intervals
that trigger the uncertainty that you mention

In Hawkular we need to change the logic in a way that allows for multiple sources to
contribute to the overall picture (see for example diagram on https://developer.jboss.org/thread/251479 ).
And with those much better information and smaller check intervals, the uncertainty 
decreases.

The whole of availability recording and alerting on it only makes sense if our
system notifies the admin before the customer does.

I am still convinced that some sort of run length encoding for the stored data
< start time, end time (?), state > is the way to go. Especially when the system
is not going up and down every other second, but stable for long times.
In RHQ we made the error to have a triple
< start, end, state> where end was null for an ongoing interval, so we had to
not only insert a new <start, end, state> triple, but to also update the previous one
with the end time.
This is something we should do better this time. Question of getting the time range
still remains as even with only tuples of <start, state> for questions like "what was 
the state last midnight" or "how was availability at Dec 15th at noon". Or for computations
of "give me percentage of all availabilty states in the month of January".
Here having  many many points in time with state makes it easier at the expense of 
storage usage.


More information about the hawkular-dev mailing list