On Feb 16, 2015, at 11:16 PM, mike thompson
<mithomps(a)redhat.com> wrote:
>
> On 16 Feb 2015, at 12:48, John Mazzitelli <mazz(a)redhat.com
<mailto:mazz@redhat.com>> wrote:
>
> I will add the following to the discussion:
>
> If past history with customers of JON/users of RHQ/salespeople is any indication, we
will get requests to support collecting availability every second (if not every
sub-second) even AFTER telling people, documenting, and shouting from the mountaintop that
we are not a realtime profiler. So, do we have the throughput/infrastructure that can
support one "UP" datapoint coming in every minute for N resources. Do we want to
support this?
>
> Resource #1: 11:00:01 UP
> Resource #1: 11:00:02 UP
> Resource #1: 11:00:03 UP
> Resource #1: 11:00:04 UP
> ...56 datapoints later....
> Resource #1: 11:01:00 UP
> Resource #1: 11:01:01 UP
> Resource #1: 11:01:02 UP
> ... and on ... and repeat for Resource #3, #5, etc (not every resource, but many of
them).
>
> Perhaps we store data like this, but only quickly aggregate them (say, every hour,
aggregate this data into a tuple like "UP, 11:00:01, 11:01:02" where that is
"avail state, begin time, end time").
>
> I also find the pattern of the typical avail state of resources should be considered
when designing the best storage schema for availability data (i.e. changes are very
infrequent in between large amounts of time where the avail state is the same - "UP
UP UP UP UP ... UP DOWN DOWN DOWN DOWN ..." - what's the best way to store
that?). Storing just state changes has problems as we know (e.g. with alerts - we can only
alert on state changes, like "Going UP" or "Going DOWN" not to mention
what about admin states like "maintenance”.
As a UI (or any other client) we only want to know when the availability state changes.
So why store redundant data. We have process it anyway (CPU cycles) to provide meaningful
information from data. So for this first MVP (we can revisit this later) Why not skip the
redundant data and just store the deltas. Let’s just work with what we need now.
It make sense to focus on what we need now especially since requirements are still being
flushed out. With that said, the first and maybe easiest way I can think of to avoid
storing redundant data is to implement a read-before-write pattern which is common and
easy enough with the RDBMS. The problem is that read before write does not scale, and I
think scalability should be a primary consideration.