> So, what do we plan on having the agent do? What is its scope?
I take it that the agent is a "metrics collector". In my mind, it has
no power to influence the state of the application or machine: it just
collects and reports.
There was talk some time ago to perhaps think about the agent also doing some preliminary
alert processing (which brings up more questions - for example, how does the agent get
alert definitions from the server?) In addition, are we going to have the agent be able to
perform operations such as "start" or "stop" or "restart" on
managed resources? Can an agent reconfigure a managed resource?
So an agent is potentially more than just a "metrics collector". If an agent is
going to be nothing more than a metrics collector, that would be something important to
decide now because the agent will be much easier to implement if so.
Collecting to me means: take a snapshot of the moment and store it
somewhere locally.
Reporting to me means: take the stored snapshots and send it to a
server somewhere. Once the server ACKs that it received, those
snapshots are then removed from the local storage.
This is roughly how we do envision this. Collecting means spooling the metrics until its
time to send the data to the server (what you call "reporting"). I originally
envisioned this as being metric messages we send to the server over the bus - but since it
appears no one likes having a bus infrastructure as the core messaging infrastructure
between components, it would have to be a REST request (with a load balancer in front to
support scalability :-D). But that's getting into implementation details.
> How much knowledge of the inventory does the agent need to
know?
> How does it gain this knowledge of its inventory?
Is "inventory" also a definition from RHQ? Or can I assume that
"inventory" is "a thing that can be measured"? Like, "CPU"
is an item
on the inventory, as well as "Memory" and "Web Application 'Acme
Travel System'".
Yes, "inventory" is a definition we used from RHQ-land.
Inventory just means those resources that are known to be installed/running and are being
managed/monitored by Hawkular. They can be CPUs, file systems, or things like EAP
instances, or other running products like that.
It can get very complicated to ensure agents synchronize the known inventory that the
server has.
> Does the agent auto-discover resources? How?
Yes.
For web applications, via a Java Agent, as this would allow to detect
"everything" that runs on the JVM. With that, it would be possible to
detect deployments and its capabilities (JAX-RS, EJBs, Batch jobs,
...) and apply specific metrics to specific facets. For instance, for
a JAX-RS application, there could be a collection of metrics per
endpoint (average/mean/max/min response times). For EJBs, the same but
per "business method". For JPA, it could collect the time spent on the
DB (and perhaps even aggregate that by prepared statement).
For system resources, I think it would need to be something that would
run with (high?) privileges, so that it can take data like disk space
in all partitions, CPU/memory/swap usage, network traffic, ...
> Can the agent be told about resources that it cannot
> auto-discover?
Do you have a sample use case for this? What I can imagine is: web
application 'Acme Travel System' is deployed but the agent can't
detect it automatically. Something tells the agent that there's a web
application deployed.
Exactly. That's right.
But still, if it cannot see it, how can it measure it?
An prime example is we know an EAP instance is there, but we don't have the
credentials to probe it. So the agent could know an EAP instance is there, but what's
the admin password? Without it, we can't get that EAP's metrics; with it, we can.
Or, another example is that there is an EAP instance but it is not running. Agent has no
idea where it is (or even that it exists). We can tell the agent "There is an EAP in
directory '/my/app/foo' --- and oh, by the way, its admin credentials are
'admin/foo'".
Or, another example is that its running on a non-default port that the agent wasn't
expecting and therefore didn't see. So we'd be able to say "Acme Travel
System is installed on port 9091 (not the default 8080)."
So, there could be many reasons why an agent cannot auto-detect something but, with a
little help from a human administrator to give it some hints, the agent can be told about
a resource and be able to manage it that resource.
> Does the agent need to be pluggable so it can load and
interface
> with plugins to handle monitoring of different resources? What are
> these plugins?
Not sure I understand the question, but I'd say yes: as an application
developer, I want to be able to add custom metrics, such as "# of
logged in users", "# of registrations", "# of checkouts" and so
on.
And not just add new, custom metrics, but add entirely new resource types, too. For
example, we'd like Hawkular to be able to monitor all JBoss middleware products.
Suppose we do that - Hawkular 1.0 is a success and people are using it to monitor every
JBoss middleware product in existence. But the following year, JBoss might introduce a
completely new "JBoss Awesome Product version 1.0" - we'd like an easy way
to plug in new capabilities to an existing and running Hawkular system so it can quickly
start managing JBoss Awesome Product instances without having to upgrade the entire
Hawkular system.