Hey,
What is the scope of the agent? Will it gather only data about the
environment, like, memory consumption, span of worker threads, CPU
consumption, GC-time, ...? Or business metrics as well, like "requests
per second", "number of exceptions in the last X minutes", "average
time spent on database", and so on?
There is no "one size fits all" answer, as we will most likely see more
than one kind of "agent" in the future. Those are currently called
feeds.
Like e.g. the wildfly-monitor being an agent inside a Wildfly instance.
The above mentioned agent probably corresponds best to the "platform
agent" in RHQ, which gathers
data about the environment, most probably acts as a proxy for e.g. such
WildFly monitor feeds, that can just connect to the local agent to
deliver their data, which will reduce configuration complexity.
And then probably plugins for resource specific metrics.
Also, is the same agent also responsible for capturing non-JVM
related
data, like, disk space utilization, network traffic, ...?
Isn't this sort of what you describe above as environment?
Some major differences between RHQ and Hawkular will most probably be
that in the past we had a very generic framework and then a plugin
descriptor hard-coding metadata. One platform agent was the only source
of data for a certain resource. Those traits made it extremely hard to
e.g. easily take arbitrary MBeans into inventory, because not
ResourceType for that was defined. Or having servers that report their
data over the REST-Interface only.
In Hawkular, we need to make sure that it gets easy to import Resources
like said MBeans. Or "arbitrary" SNMP-MIB subtrees - Requiring a
hard-coded (=at compile time) plugin descriptor is no longer an option
(but such a descriptor could be used as a baseline).
Similarly it must be possible to incorporate feeds that are not tied to
the (platform) agent. And / or cooperation, where a n agent inside a
process reports data to the server and the (platform) agent provides
additional information (e.g. paging info or network stats for an EAP
with the WildFly-monitor). It may even be possible that this information
is overriding other information (e.g. WildFly-monitor is not able to
communicate with the Hawkular-server due to high load, so it is
degraded, but the platform agent could still see the process as running.
So the availability of the process would be "up" despite the
WildFly-monitor not sending availability information).