On 13 Mar 2015, at 8:57, Thomas Heute wrote:
But that opens a can of worms and mix of options...
For uptime/downtime for instance, the embedded in-process, can only
say
"I'm up", if not up -> it's down or unknown (network issue ?). From
a
separate process you can tell if it's down, separate process is better
in that case (but it may not be installed, so do we fallback on
embedded
process info ?).
While this may first sound like a disadvantage, it can also be seen as
advantage.
In RHQ when we lost the connection to the agent, all the resources
behind it were
at backfilling considered as down.
Now when when have e.g. the pinger in a different location and this sees
that the
application is reachable, we could "overwrite" the unavailable data from
the agent.
(and yes, I know that the previous sentence/scenario was
oversimplified).
Similar when the embedded agent is not able to send perhaps due to high
usage
of the app server, the platform agent can still see if the process is up
and report it
as up.
Or the other way around. The embedded agent sends a "UP" report every
minute, but the
external agent can detect that the PID has changed and can either report
a "DOWN" for the
time in between two ups (which may be too much) or at least report a
"tag" that the server
process was rebooted, which can then later be displayed to the user in
the charts. This
may then explain why some resource usage is higher/lower than before.
We have started this discussion (from a different angle) in two
different places
already - the Alert on URL return code mail
http://lists.jboss.org/pipermail/hawkular-dev/2015-March/000402.html
and "Computed Resource State"
http://lists.jboss.org/pipermail/hawkular-dev/2015-March/000413.html
The point is that the very strict 1:1 mapping that we had in RHQ is not
enough. In scenarios like Aircraft,
they often even have a 2 of 3 voting principle.