On Wednesday, March 11, 2015 16:12:52 John Mazzitelli wrote:
> > So, what do we plan on having the agent do? What is its
scope?
>
> I take it that the agent is a "metrics collector". In my mind, it has
> no power to influence the state of the application or machine: it just
> collects and reports.
There was talk some time ago to perhaps think about the agent also doing
some preliminary alert processing (which brings up more questions - for
example, how does the agent get alert definitions from the server?)
Alerting has a CRUD interface for it, doesn't it? Since we assume only agent-
server traffic, the agent will have to know the server's address
one way or
another.
Also, I am not sure that we want to consider the agent a "monolith" doing
"agenty" stuff and *also* something else like alert preprocessing. For the
agent, I think we should too think about how we could slice it into individual
(semi-)independent pieces that could potentially live on their own (imagine
this topology: agent -> preprocessor -> inventory
-> metrics
-> alerting).
In
addition, are we going to have the agent be able to perform operations such
as "start" or "stop" or "restart" on managed resources? Can
an agent
reconfigure a managed resource?
Yeah, I am not sure if operations and config management have been scoped in.
If they end up in, though, agent is an active player in the environment.
So an agent is potentially more than just a "metrics
collector". If an agent
is going to be nothing more than a metrics collector, that would be
something important to decide now because the agent will be much easier to
implement if so.
+1.
Also, because the agent's metrics collection is inherently polling-based, we
need to figure how we are going to define how often the metrics are going to
be collected.
> Collecting to me means: take a snapshot of the moment and store
it
> somewhere locally.
>
> Reporting to me means: take the stored snapshots and send it to a
> server somewhere. Once the server ACKs that it received, those
> snapshots are then removed from the local storage.
This is roughly how we do envision this. Collecting means spooling the
metrics until its time to send the data to the server (what you call
"reporting"). I originally envisioned this as being metric messages we send
to the server over the bus - but since it appears no one likes having a bus
infrastructure as the core messaging infrastructure between components, it
would have to be a REST request (with a load balancer in front to support
scalability :-D). But that's getting into implementation details.
> > How much knowledge of the inventory does the agent need to know?
> > How does it gain this knowledge of its inventory?
>
> Is "inventory" also a definition from RHQ? Or can I assume that
> "inventory" is "a thing that can be measured"? Like,
"CPU" is an item
> on the inventory, as well as "Memory" and "Web Application 'Acme
> Travel System'".
Yes, "inventory" is a definition we used from RHQ-land.
Also, it is a name of a lesser-known Hawkular component ;)
Inventory just means those resources that are known to be
installed/running
and are being managed/monitored by Hawkular. They can be CPUs, file
systems, or things like EAP instances, or other running products like that.
It can get very complicated to ensure agents synchronize the known inventory
that the server has.
I don't think this needs to be too complex in Hawkular though. The agent just
needs to be told what NOT to collect.
> > Does the agent auto-discover resources? How?
>
> Yes.
In RHQ ;) It is both a cool feature and a curse because discovery can
literally take the machine to its knees with RHQ agent and extremely "densely
populated" machines. We used to have problems with EAP5 for which the
discovery was so heavyweight (not RHQ's fault actually, but hey) that we could
spin the CPU on 100% for tens of minutes just to discover the 20 EAP5s running
on the box.
>
> For web applications, via a Java Agent, as this would allow to detect
> "everything" that runs on the JVM. With that, it would be possible to
> detect deployments and its capabilities (JAX-RS, EJBs, Batch jobs,
> ...) and apply specific metrics to specific facets. For instance, for
> a JAX-RS application, there could be a collection of metrics per
> endpoint (average/mean/max/min response times). For EJBs, the same but
> per "business method". For JPA, it could collect the time spent on the
> DB (and perhaps even aggregate that by prepared statement).
>
> For system resources, I think it would need to be something that would
> run with (high?) privileges, so that it can take data like disk space
> in all partitions, CPU/memory/swap usage, network traffic, ...
>
You just described RHQ ;)
> > Can the agent be told about resources that it cannot
> > auto-discover?
>
> Do you have a sample use case for this? What I can imagine is: web
> application 'Acme Travel System' is deployed but the agent can't
> detect it automatically. Something tells the agent that there's a web
> application deployed.
Exactly. That's right.
> But still, if it cannot see it, how can it measure it?
An prime example is we know an EAP instance is there, but we don't have the
credentials to probe it. So the agent could know an EAP instance is there,
but what's the admin password? Without it, we can't get that EAP's metrics;
with it, we can.
Or, another example is that there is an EAP instance but it is not running.
Agent has no idea where it is (or even that it exists).
That is true unless we do discovery by filesystem scan (which I DO NOT
propose, just pointing out the possibility).
We can tell the
agent "There is an EAP in directory '/my/app/foo' --- and oh, by the way,
its admin credentials are 'admin/foo'".
Or, another example is that its running on a non-default port that the agent
wasn't expecting and therefore didn't see. So we'd be able to say "Acme
Travel System is installed on port 9091 (not the default 8080)."
So, there could be many reasons why an agent cannot auto-detect something
but, with a little help from a human administrator to give it some hints,
the agent can be told about a resource and be able to manage it that
resource.
> > Does the agent need to be pluggable so it can load and interface
> > with plugins to handle monitoring of different resources? What are
> > these plugins?
>
> Not sure I understand the question, but I'd say yes: as an application
> developer, I want to be able to add custom metrics, such as "# of
> logged in users", "# of registrations", "# of checkouts"
and so on.
And not just add new, custom metrics, but add entirely new resource types,
too. For example, we'd like Hawkular to be able to monitor all JBoss
middleware products. Suppose we do that - Hawkular 1.0 is a success and
people are using it to monitor every JBoss middleware product in existence.
But the following year, JBoss might introduce a completely new "JBoss
Awesome Product version 1.0" - we'd like an easy way to plug in new
capabilities to an existing and running Hawkular system so it can quickly
start managing JBoss Awesome Product instances without having to upgrade
the entire Hawkular system.
Well, in here I think we need to answer a crucial question. Do we want to
continue the RHQ path of "writing agent plugin for X" or should we rather try
to focus on "providing X with easy to use tools to report to Hawkular", i.e.
easily consumable REST API, possibly some other niceties for Java-based
applications, like annotations for declarative monitoring, or some kind of
Wildfly extension that could be configured inject itself into deployments and
collect configured metrics (byteman-based bytecode enhancements?), etc.
The latter I think is favorable if for nothing else then for the fact that it
creates potentially less work for us and lets the "domain expert", i.e. the
developer of X think about what and how to report instead of us learning X in
haste and second guessing.
IMHO the agent exists only for the cases where embedding the "feed" into the
managed software itself is not possible.
Note that the embedded feed has other benefits, too. For one, it is (or at
least has a chance of being) in-process with the data, and more importantly,
if it is in-process, it doesn't create any security concerns.
_______________________________________________
hawkular-dev mailing list
hawkular-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hawkular-dev