[Hawkular-dev] Availability revisited

Mon Jul 20 05:13:45 EDT 2015

On 16 Jul 2015, at 3:45, John Sanda wrote:
> Are there any docs, notes, etc. on the feed-comm system? I am not 
> familiar with this.

Mazz should have /will talk about it. But at the end this has only 
little
to do with this topic, as any sort of availability should work with
"normal" REST-based-feeds too.

>
> Why do we need to store the last seen availability in memory?

Because we can? :-)
Seriously: in RHQ we had huge issues with all the availability records
that were never cached, so everything working with "current 
availability)
had to go to the database with its latencies and costs.
Availability is by nature something run-length encoded. Unlike
counter or gauge metrics.
We could of course store each incoming new availability record, so that
its (start) timestamp would reflect the last seen time, but querying for
"since when was it up" would result in a pretty heavy backtracking query
(with some luck we have a limit like "in the last hour/day", but what if
we want an absolute date or over the last year.

This is why I am thinking about keeping the RLE with start/stop/state,
but augmented by "last seen" for the in-memory version.

Keeping last seen in memory prevents all the expensive backend-hits
(either getting the same value over and over again, or doing in-place 
updates)
and still allows jobs to check if the last-seen is e.g. within the last 
minute
and react accordingly (RHQ-term: "backfill").

> If you talking about correlation, then I am +1. When I think about 
> RHQ, the user could easily see availability state change, but he would 
> have to go hunting around to see what precipitated it.

This is certainly another aspect of this "root cause analysis".

   Heiko