[Hawkular-dev] scope of the agent design

Wed Mar 11 16:12:52 EDT 2015

> > So, what do we plan on having the agent do? What is its scope?
> 
> I take it that the agent is a "metrics collector". In my mind, it has
> no power to influence the state of the application or machine: it just
> collects and reports.

There was talk some time ago to perhaps think about the agent also doing some preliminary alert processing (which brings up more questions - for example, how does the agent get alert definitions from the server?) In addition, are we going to have the agent be able to perform operations such as "start" or "stop" or "restart" on managed resources? Can an agent reconfigure a managed resource?

So an agent is potentially more than just a "metrics collector". If an agent is going to be nothing more than a metrics collector, that would be something important to decide now because the agent will be much easier to implement if so.

> Collecting to me means: take a snapshot of the moment and store it
> somewhere locally.
> 
> Reporting to me means: take the stored snapshots and send it to a
> server somewhere. Once the server ACKs that it received, those
> snapshots are then removed from the local storage.

This is roughly how we do envision this. Collecting means spooling the metrics until its time to send the data to the server (what you call "reporting"). I originally envisioned this as being metric messages we send to the server over the bus - but since it appears no one likes having a bus infrastructure as the core messaging infrastructure between components, it would have to be a REST request (with a load balancer in front to support scalability :-D). But that's getting into implementation details.

> > How much knowledge of the inventory does the agent need to know?
> > How does it gain this knowledge of its inventory?
> 
> Is "inventory" also a definition from RHQ? Or can I assume that
> "inventory" is "a thing that can be measured"? Like, "CPU" is an item
> on the inventory, as well as "Memory" and "Web Application 'Acme
> Travel System'".

Yes, "inventory" is a definition we used from RHQ-land.

Inventory just means those resources that are known to be installed/running and are being managed/monitored by Hawkular. They can be CPUs, file systems, or things like EAP instances, or other running products like that.

It can get very complicated to ensure agents synchronize the known inventory that the server has.

> > Does the agent auto-discover resources? How?
> 
> Yes.
> 
> For web applications, via a Java Agent, as this would allow to detect
> "everything" that runs on the JVM. With that, it would be possible to
> detect deployments and its capabilities (JAX-RS, EJBs, Batch jobs,
> ...) and apply specific metrics to specific facets. For instance, for
> a JAX-RS application, there could be a collection of metrics per
> endpoint (average/mean/max/min response times). For EJBs, the same but
> per "business method". For JPA, it could collect the time spent on the
> DB (and perhaps even aggregate that by prepared statement).
> 
> For system resources, I think it would need to be something that would
> run with (high?) privileges, so that it can take data like disk space
> in all partitions, CPU/memory/swap usage, network traffic, ...
> 
> > Can the agent be told about resources that it cannot
> > auto-discover?
> 
> Do you have a sample use case for this? What I can imagine is: web
> application 'Acme Travel System' is deployed but the agent can't
> detect it automatically. Something tells the agent that there's a web
> application deployed.

Exactly. That's right.

> But still, if it cannot see it, how can it measure it?

An prime example is we know an EAP instance is there, but we don't have the credentials to probe it. So the agent could know an EAP instance is there, but what's the admin password? Without it, we can't get that EAP's metrics; with it, we can.

Or, another example is that there is an EAP instance but it is not running. Agent has no idea where it is (or even that it exists). We can tell the agent "There is an EAP in directory '/my/app/foo' --- and oh, by the way, its admin credentials are 'admin/foo'".

Or, another example is that its running on a non-default port that the agent wasn't expecting and therefore didn't see. So we'd be able to say "Acme Travel System is installed on port 9091 (not the default 8080)."

So, there could be many reasons why an agent cannot auto-detect something but, with a little help from a human administrator to give it some hints, the agent can be told about a resource and be able to manage it that resource.

> > Does the agent need to be pluggable so it can load and interface
> > with plugins to handle monitoring of different resources? What are
> > these plugins?
> 
> Not sure I understand the question, but I'd say yes: as an application
> developer, I want to be able to add custom metrics, such as "# of
> logged in users", "# of registrations", "# of checkouts" and so on.

And not just add new, custom metrics, but add entirely new resource types, too. For example, we'd like Hawkular to be able to monitor all JBoss middleware products. Suppose we do that - Hawkular 1.0 is a success and people are using it to monitor every JBoss middleware product in existence. But the following year, JBoss might introduce a completely new "JBoss Awesome Product version 1.0" - we'd like an easy way to plug in new capabilities to an existing and running Hawkular system so it can quickly start managing JBoss Awesome Product instances without having to upgrade the entire Hawkular system.