agent implementation

Hawkular-Team Update

ADMIN: set reply-to list?

John Mazzitelli

Wednesday, 11 March 2015 Wed, 11 Mar '15

10:53 a.m.

Time to start thinking about the agent. As you probably know, the "nest" started out being the "agent" - but it worked out to be something we could use for the server components and it ended up being used for the "kettle" - hosting everything (console, server components, data storage, etc). We can still use the "nest" for agents - that's not a problem. My question is - what do people think about that? So, what do you think the Hawkular agent should "be"? I see three options - Feel free to suggest others: (I mention only one PRO/CON for each, though I'm sure there are many more) 1) It could be our own homegrown Java app (similar to what RHQ did, which means we build out our own agent container). * PRO: We put in and control everything that goes into it - however big it gets (code size and runtime footprint size) is based on what we explicitly put in it. * CON: We put in and control everything that goes into it :) It is not trivial to implement classloading isolation and plugin mechanisms. 2) Or it could be built on top of WildFly (either use the "nest" or use WildFly Core) * PRO: We don't worry about things that WildFly gives us for free (e.g. JBoss Modules for plugin/deployments and classloading isolation; if we want, we can get EE capabilities) * CON: The runtime footprint might be larger than we want or what customers expect. 3) Or it could be built on top of something else (Vert.x? Something else)? * PRO: Depending on the technology, might provide us with a smaller runtime footprint than a full WildFly container * CON: Depending on the technology, might not provide us all the capabilities we want What are your ideas for what you think the "agent" should look like? --John Mazz

Show replies by date

Juraci Paixão Kröhling

Wednesday, 11 March Wed, 11 Mar

11:28 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 John, On 03/11/2015 04:53 PM, John Mazzitelli wrote:

...

So, what do you think the Hawkular agent should "be"?

I'm sorry for asking this, but I'm not familiar with the details of how the RHQ agent works. What is the scope of the agent? Will it gather only data about the environment, like, memory consumption, span of worker threads, CPU consumption, GC-time, ...? Or business metrics as well, like "requests per second", "number of exceptions in the last X minutes", "average time spent on database", and so on? Also, is the same agent also responsible for capturing non-JVM related data, like, disk space utilization, network traffic, ...? - - Juca. [re-sending to the list, as it seems I sent this only to John]

...PGP SIGNATURE...

Heiko W.Rupp

Thursday, 12 March Thu, 12 Mar

8:20 a.m.

Hey,

...

What is the scope of the agent? Will it gather only data about the environment, like, memory consumption, span of worker threads, CPU consumption, GC-time, ...? Or business metrics as well, like "requests per second", "number of exceptions in the last X minutes", "average time spent on database", and so on?

There is no "one size fits all" answer, as we will most likely see more than one kind of "agent" in the future. Those are currently called feeds. Like e.g. the wildfly-monitor being an agent inside a Wildfly instance. The above mentioned agent probably corresponds best to the "platform agent" in RHQ, which gathers data about the environment, most probably acts as a proxy for e.g. such WildFly monitor feeds, that can just connect to the local agent to deliver their data, which will reduce configuration complexity. And then probably plugins for resource specific metrics.

...

Also, is the same agent also responsible for capturing non-JVM related data, like, disk space utilization, network traffic, ...?

Isn't this sort of what you describe above as environment? Some major differences between RHQ and Hawkular will most probably be that in the past we had a very generic framework and then a plugin descriptor hard-coding metadata. One platform agent was the only source of data for a certain resource. Those traits made it extremely hard to e.g. easily take arbitrary MBeans into inventory, because not ResourceType for that was defined. Or having servers that report their data over the REST-Interface only. In Hawkular, we need to make sure that it gets easy to import Resources like said MBeans. Or "arbitrary" SNMP-MIB subtrees - Requiring a hard-coded (=at compile time) plugin descriptor is no longer an option (but such a descriptor could be used as a baseline). Similarly it must be possible to incorporate feeds that are not tied to the (platform) agent. And / or cooperation, where a n agent inside a process reports data to the server and the (platform) agent provides additional information (e.g. paging info or network stats for an EAP with the WildFly-monitor). It may even be possible that this information is overriding other information (e.g. WildFly-monitor is not able to communicate with the Hawkular-server due to high load, so it is degraded, but the platform agent could still see the process as running. So the availability of the process would be "up" despite the WildFly-monitor not sending availability information).

John Mazzitelli

Wednesday, 11 March Wed, 11 Mar

12:57 p.m.

New subject: scope of the agent design

...

Juraci Paixão Kröhling

2:10 p.m.

New subject: scope of the agent design

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi John, On 03/11/2015 06:57 PM, John Mazzitelli wrote:

...

Regardless of how the agent is to be implemented, we need to answer the question "WHAT will the agent DO?" It is an existential question - kind of like asking "What is the purpose to my life?" :)

I think the "what" needs to be defined before the "how", hence my questions.

...

So, what do we plan on having the agent do? What is its scope?

I take it that the agent is a "metrics collector". In my mind, it has no power to influence the state of the application or machine: it just collects and reports. Collecting to me means: take a snapshot of the moment and store it somewhere locally. Reporting to me means: take the stored snapshots and send it to a server somewhere. Once the server ACKs that it received, those snapshots are then removed from the local storage.

...

How much knowledge of the inventory does the agent need to know? How does it gain this knowledge of its inventory?

Is "inventory" also a definition from RHQ? Or can I assume that "inventory" is "a thing that can be measured"? Like, "CPU" is an item on the inventory, as well as "Memory" and "Web Application 'Acme Travel System'".

...

Does the agent auto-discover resources? How?

Yes. For web applications, via a Java Agent, as this would allow to detect "everything" that runs on the JVM. With that, it would be possible to detect deployments and its capabilities (JAX-RS, EJBs, Batch jobs, ...) and apply specific metrics to specific facets. For instance, for a JAX-RS application, there could be a collection of metrics per endpoint (average/mean/max/min response times). For EJBs, the same but per "business method". For JPA, it could collect the time spent on the DB (and perhaps even aggregate that by prepared statement). For system resources, I think it would need to be something that would run with (high?) privileges, so that it can take data like disk space in all partitions, CPU/memory/swap usage, network traffic, ...

...

Can the agent be told about resources that it cannot auto-discover?

Do you have a sample use case for this? What I can imagine is: web application 'Acme Travel System' is deployed but the agent can't detect it automatically. Something tells the agent that there's a web application deployed. But still, if it cannot see it, how can it measure it?

...

Does the agent need to be pluggable so it can load and interface with plugins to handle monitoring of different resources? What are these plugins?

Not sure I understand the question, but I'd say yes: as an application developer, I want to be able to add custom metrics, such as "# of logged in users", "# of registrations", "# of checkouts" and so on. - - Juca.

...PGP SIGNATURE...

John Mazzitelli

3:12 p.m.

New subject: scope of the agent design

...

> So, what do we plan on having the agent do? What is its scope? I take it that the agent is a "metrics collector". In my mind, it has no power to influence the state of the application or machine: it just collects and reports.

There was talk some time ago to perhaps think about the agent also doing some preliminary alert processing (which brings up more questions - for example, how does the agent get alert definitions from the server?) In addition, are we going to have the agent be able to perform operations such as "start" or "stop" or "restart" on managed resources? Can an agent reconfigure a managed resource? So an agent is potentially more than just a "metrics collector". If an agent is going to be nothing more than a metrics collector, that would be something important to decide now because the agent will be much easier to implement if so.

...

Collecting to me means: take a snapshot of the moment and store it somewhere locally. Reporting to me means: take the stored snapshots and send it to a server somewhere. Once the server ACKs that it received, those snapshots are then removed from the local storage.

This is roughly how we do envision this. Collecting means spooling the metrics until its time to send the data to the server (what you call "reporting"). I originally envisioned this as being metric messages we send to the server over the bus - but since it appears no one likes having a bus infrastructure as the core messaging infrastructure between components, it would have to be a REST request (with a load balancer in front to support scalability :-D). But that's getting into implementation details.

...

> How much knowledge of the inventory does the agent need to know? > How does it gain this knowledge of its inventory? Is "inventory" also a definition from RHQ? Or can I assume that "inventory" is "a thing that can be measured"? Like, "CPU" is an item on the inventory, as well as "Memory" and "Web Application 'Acme Travel System'".

Yes, "inventory" is a definition we used from RHQ-land. Inventory just means those resources that are known to be installed/running and are being managed/monitored by Hawkular. They can be CPUs, file systems, or things like EAP instances, or other running products like that. It can get very complicated to ensure agents synchronize the known inventory that the server has.

...

> Does the agent auto-discover resources? How? Yes. For web applications, via a Java Agent, as this would allow to detect "everything" that runs on the JVM. With that, it would be possible to detect deployments and its capabilities (JAX-RS, EJBs, Batch jobs, ...) and apply specific metrics to specific facets. For instance, for a JAX-RS application, there could be a collection of metrics per endpoint (average/mean/max/min response times). For EJBs, the same but per "business method". For JPA, it could collect the time spent on the DB (and perhaps even aggregate that by prepared statement). For system resources, I think it would need to be something that would run with (high?) privileges, so that it can take data like disk space in all partitions, CPU/memory/swap usage, network traffic, ... > Can the agent be told about resources that it cannot > auto-discover? Do you have a sample use case for this? What I can imagine is: web application 'Acme Travel System' is deployed but the agent can't detect it automatically. Something tells the agent that there's a web application deployed.

Exactly. That's right.

...

But still, if it cannot see it, how can it measure it?

An prime example is we know an EAP instance is there, but we don't have the credentials to probe it. So the agent could know an EAP instance is there, but what's the admin password? Without it, we can't get that EAP's metrics; with it, we can. Or, another example is that there is an EAP instance but it is not running. Agent has no idea where it is (or even that it exists). We can tell the agent "There is an EAP in directory '/my/app/foo' --- and oh, by the way, its admin credentials are 'admin/foo'". Or, another example is that its running on a non-default port that the agent wasn't expecting and therefore didn't see. So we'd be able to say "Acme Travel System is installed on port 9091 (not the default 8080)." So, there could be many reasons why an agent cannot auto-detect something but, with a little help from a human administrator to give it some hints, the agent can be told about a resource and be able to manage it that resource.

...

> Does the agent need to be pluggable so it can load and interface > with plugins to handle monitoring of different resources? What are > these plugins? Not sure I understand the question, but I'd say yes: as an application developer, I want to be able to add custom metrics, such as "# of logged in users", "# of registrations", "# of checkouts" and so on.

And not just add new, custom metrics, but add entirely new resource types, too. For example, we'd like Hawkular to be able to monitor all JBoss middleware products. Suppose we do that - Hawkular 1.0 is a success and people are using it to monitor every JBoss middleware product in existence. But the following year, JBoss might introduce a completely new "JBoss Awesome Product version 1.0" - we'd like an easy way to plug in new capabilities to an existing and running Hawkular system so it can quickly start managing JBoss Awesome Product instances without having to upgrade the entire Hawkular system.

Lukas Krejci

4:27 p.m.

New subject: scope of the agent design

On Wednesday, March 11, 2015 16:12:52 John Mazzitelli wrote:

...

> > So, what do we plan on having the agent do? What is its scope? > > I take it that the agent is a "metrics collector". In my mind, it has > no power to influence the state of the application or machine: it just > collects and reports. There was talk some time ago to perhaps think about the agent also doing some preliminary alert processing (which brings up more questions - for example, how does the agent get alert definitions from the server?)

Alerting has a CRUD interface for it, doesn't it? Since we assume only agent-

...

server traffic, the agent will have to know the server's address one way or

another. Also, I am not sure that we want to consider the agent a "monolith" doing "agenty" stuff and *also* something else like alert preprocessing. For the agent, I think we should too think about how we could slice it into individual (semi-)independent pieces that could potentially live on their own (imagine this topology: agent -> preprocessor -> inventory -> metrics -> alerting).

...

In addition, are we going to have the agent be able to perform operations such as "start" or "stop" or "restart" on managed resources? Can an agent reconfigure a managed resource?

Yeah, I am not sure if operations and config management have been scoped in. If they end up in, though, agent is an active player in the environment.

...

So an agent is potentially more than just a "metrics collector". If an agent is going to be nothing more than a metrics collector, that would be something important to decide now because the agent will be much easier to implement if so.

+1. Also, because the agent's metrics collection is inherently polling-based, we need to figure how we are going to define how often the metrics are going to be collected.

...

> Collecting to me means: take a snapshot of the moment and store it > somewhere locally. > > Reporting to me means: take the stored snapshots and send it to a > server somewhere. Once the server ACKs that it received, those > snapshots are then removed from the local storage. This is roughly how we do envision this. Collecting means spooling the metrics until its time to send the data to the server (what you call "reporting"). I originally envisioned this as being metric messages we send to the server over the bus - but since it appears no one likes having a bus infrastructure as the core messaging infrastructure between components, it would have to be a REST request (with a load balancer in front to support scalability :-D). But that's getting into implementation details. > > How much knowledge of the inventory does the agent need to know? > > How does it gain this knowledge of its inventory? > > Is "inventory" also a definition from RHQ? Or can I assume that > "inventory" is "a thing that can be measured"? Like, "CPU" is an item > on the inventory, as well as "Memory" and "Web Application 'Acme > Travel System'". Yes, "inventory" is a definition we used from RHQ-land.

Also, it is a name of a lesser-known Hawkular component ;)

...

Inventory just means those resources that are known to be installed/running and are being managed/monitored by Hawkular. They can be CPUs, file systems, or things like EAP instances, or other running products like that. It can get very complicated to ensure agents synchronize the known inventory that the server has.

I don't think this needs to be too complex in Hawkular though. The agent just needs to be told what NOT to collect.

...

> > Does the agent auto-discover resources? How? > > Yes.

In RHQ ;) It is both a cool feature and a curse because discovery can literally take the machine to its knees with RHQ agent and extremely "densely populated" machines. We used to have problems with EAP5 for which the discovery was so heavyweight (not RHQ's fault actually, but hey) that we could spin the CPU on 100% for tens of minutes just to discover the 20 EAP5s running on the box.

...

> > For web applications, via a Java Agent, as this would allow to detect > "everything" that runs on the JVM. With that, it would be possible to > detect deployments and its capabilities (JAX-RS, EJBs, Batch jobs, > ...) and apply specific metrics to specific facets. For instance, for > a JAX-RS application, there could be a collection of metrics per > endpoint (average/mean/max/min response times). For EJBs, the same but > per "business method". For JPA, it could collect the time spent on the > DB (and perhaps even aggregate that by prepared statement). > > For system resources, I think it would need to be something that would > run with (high?) privileges, so that it can take data like disk space > in all partitions, CPU/memory/swap usage, network traffic, ... >

You just described RHQ ;)

...

> > Can the agent be told about resources that it cannot > > auto-discover? > > Do you have a sample use case for this? What I can imagine is: web > application 'Acme Travel System' is deployed but the agent can't > detect it automatically. Something tells the agent that there's a web > application deployed. Exactly. That's right. > But still, if it cannot see it, how can it measure it? An prime example is we know an EAP instance is there, but we don't have the credentials to probe it. So the agent could know an EAP instance is there, but what's the admin password? Without it, we can't get that EAP's metrics; with it, we can. Or, another example is that there is an EAP instance but it is not running. Agent has no idea where it is (or even that it exists).

That is true unless we do discovery by filesystem scan (which I DO NOT propose, just pointing out the possibility).

...

We can tell the agent "There is an EAP in directory '/my/app/foo' --- and oh, by the way, its admin credentials are 'admin/foo'". Or, another example is that its running on a non-default port that the agent wasn't expecting and therefore didn't see. So we'd be able to say "Acme Travel System is installed on port 9091 (not the default 8080)." So, there could be many reasons why an agent cannot auto-detect something but, with a little help from a human administrator to give it some hints, the agent can be told about a resource and be able to manage it that resource. > > Does the agent need to be pluggable so it can load and interface > > with plugins to handle monitoring of different resources? What are > > these plugins? > > Not sure I understand the question, but I'd say yes: as an application > developer, I want to be able to add custom metrics, such as "# of > logged in users", "# of registrations", "# of checkouts" and so on. And not just add new, custom metrics, but add entirely new resource types, too. For example, we'd like Hawkular to be able to monitor all JBoss middleware products. Suppose we do that - Hawkular 1.0 is a success and people are using it to monitor every JBoss middleware product in existence. But the following year, JBoss might introduce a completely new "JBoss Awesome Product version 1.0" - we'd like an easy way to plug in new capabilities to an existing and running Hawkular system so it can quickly start managing JBoss Awesome Product instances without having to upgrade the entire Hawkular system.

Well, in here I think we need to answer a crucial question. Do we want to continue the RHQ path of "writing agent plugin for X" or should we rather try to focus on "providing X with easy to use tools to report to Hawkular", i.e. easily consumable REST API, possibly some other niceties for Java-based applications, like annotations for declarative monitoring, or some kind of Wildfly extension that could be configured inject itself into deployments and collect configured metrics (byteman-based bytecode enhancements?), etc. The latter I think is favorable if for nothing else then for the fact that it creates potentially less work for us and lets the "domain expert", i.e. the developer of X think about what and how to report instead of us learning X in haste and second guessing. IMHO the agent exists only for the cases where embedding the "feed" into the managed software itself is not possible. Note that the embedded feed has other benefits, too. For one, it is (or at least has a chance of being) in-process with the data, and more importantly, if it is in-process, it doesn't create any security concerns.

...

_______________________________________________ hawkular-dev mailing list hawkular-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/hawkular-dev

John Mazzitelli

Thursday, 12 March Thu, 12 Mar

1:34 p.m.

New subject: scope of the agent design

So what I heard today was we don't want a standalone agent, but we do want something that can be embedded in Wildfly/EAP so it can monitor things running in Wildfly (not just monitor Wildfly itself, but applications running inside wildfly). That lends itself to supporting customizable modules that can be deployed in Wildfly/EAP as hawkular subsystems. Can someone give me a quick summary of what Wildfly-Monitor does? https://github.com/hawkular/wildfly-monitor

Thomas Heute

1:50 p.m.

New subject: scope of the agent design

We still need: - something to monitor the platform, the container... (what is it if not a "standalone agent" ?) - monitoring non-WF/EAP based servers: Fuse, AMQ... (vert.x...) Thomas On 03/12/2015 07:34 PM, John Mazzitelli wrote:

...

Heiko W.Rupp

2:57 p.m.

New subject: scope of the agent design

On 12 Mar 2015, at 19:34, John Mazzitelli wrote:

...

Can someone give me a quick summary of what Wildfly-Monitor does? https://github.com/hawkular/wildfly-monitor

http://pilhuhn.blogspot.de/2014/10/wildfly-subsystem-for-rhq-metrics_10.html The current one is an improved version where Heiko Braun and Harald Pehl have contributed some local scheduling and other stuff. I think this needs some more love to get it into a state where it can be used with Hawkular.

John Sanda

9:56 p.m.

New subject: scope of the agent design

I think we need to consider having the following types of agents * embedded, in-process * co-located in separate process * agent-less where we do management/monitoring from the server side There are different scenarios in which each of the above have advantages. For (only) collecting metrics, an embedded collector would be best as it involves the least overhead. Based on our experience of managing Cassandra in RHQ, I think that there are a lot of situations where agent-less makes things much easier. Consider running repair. It is a cluster-wide operation. In RHQ, we go through a complex workflow of server to agent to Cassandra node to agent to server to next agent to next Cassandra node to agent to server and so on. Managing the repair directly from server to Cassandra would make things a lot easier. With respect to monitoring, the agent, particularly one that runs co-located in a separate process, should be capable of being more than just a collector. I am seeing that more and more with other systems I am looking at. With an Open TSDB extension for example, the agent/client side piece can convert data into a blob format that is optimized for storage. This blob conversion can be done on the server side but it causes a hit on performance. I believe that the DataDog agent is capable of performing client-side aggregations, like computing percentiles. This could be efficient and useful for computing method execution times, response times, etc. With respect to alerting, I think it is something we should consider for an agent that runs its own process. The agent could push alerts onto the bus for instance. And if we look more closely at ISPN, we could do the same with distributed caches. The agent fires an alert by putting an entry into its alert cache to which the server is subscribed for notifications. This could also be used for the agent to obtain alert definitions. Something to consider...

...

On Mar 12, 2015, at 3:57 PM, Heiko W.Rupp <hrupp(a)redhat.com> wrote: On 12 Mar 2015, at 19:34, John Mazzitelli wrote: > > Can someone give me a quick summary of what Wildfly-Monitor does? > https://github.com/hawkular/wildfly-monitor http://pilhuhn.blogspot.de/2014/10/wildfly-subsystem-for-rhq-metrics_10.html The current one is an improved version where Heiko Braun and Harald Pehl have contributed some local scheduling and other stuff. I think this needs some more love to get it into a state where it can be used with Hawkular. _______________________________________________ hawkular-dev mailing list hawkular-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/hawkular-dev

Thomas Heute

Friday, 13 March Fri, 13 Mar

2:57 a.m.

New subject: scope of the agent design

On 03/13/2015 03:56 AM, John Sanda wrote:

...

I think we need to consider having the following types of agents * embedded, in-process * co-located in separate process * agent-less where we do management/monitoring from the server side

+1 But that opens a can of worms and mix of options... For uptime/downtime for instance, the embedded in-process, can only say "I'm up", if not up -> it's down or unknown (network issue ?). From a separate process you can tell if it's down, separate process is better in that case (but it may not be installed, so do we fallback on embedded process info ?). Agent-less works *if* the network is open enough to allow it... Also for embedded one, we may be bound to product releases, unless we instrument ourself the server and update as we wish. Thomas

...

There are different scenarios in which each of the above have advantages. For (only) collecting metrics, an embedded collector would be best as it involves the least overhead. Based on our experience of managing Cassandra in RHQ, I think that there are a lot of situations where agent-less makes things much easier. Consider running repair. It is a cluster-wide operation. In RHQ, we go through a complex workflow of server to agent to Cassandra node to agent to server to next agent to next Cassandra node to agent to server and so on. Managing the repair directly from server to Cassandra would make things a lot easier. With respect to monitoring, the agent, particularly one that runs co-located in a separate process, should be capable of being more than just a collector. I am seeing that more and more with other systems I am looking at. With an Open TSDB extension for example, the agent/client side piece can convert data into a blob format that is optimized for storage. This blob conversion can be done on the server side but it causes a hit on performance. I believe that the DataDog agent is capable of performing client-side aggregations, like computing percentiles. This could be efficient and useful for computing method execution times, response times, etc. With respect to alerting, I think it is something we should consider for an agent that runs its own process. The agent could push alerts onto the bus for instance. And if we look more closely at ISPN, we could do the same with distributed caches. The agent fires an alert by putting an entry into its alert cache to which the server is subscribed for notifications. This could also be used for the agent to obtain alert definitions. Something to consider... > On Mar 12, 2015, at 3:57 PM, Heiko W.Rupp <hrupp(a)redhat.com> wrote: > > On 12 Mar 2015, at 19:34, John Mazzitelli wrote: > >> Can someone give me a quick summary of what Wildfly-Monitor does? >> https://github.com/hawkular/wildfly-monitor > http://pilhuhn.blogspot.de/2014/10/wildfly-subsystem-for-rhq-metrics_10.html > > The current one is an improved version where Heiko Braun and Harald Pehl > have contributed some local scheduling and other stuff. > I think this needs some more love to get it into a state where it can be > used with > Hawkular. > _______________________________________________ > hawkular-dev mailing list > hawkular-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev _______________________________________________ hawkular-dev mailing list hawkular-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/hawkular-dev

Michael Burman

3:28 a.m.

New subject: scope of the agent design

On 13.03.2015 09:57, Thomas Heute wrote:

...

For uptime/downtime for instance, the embedded in-process, can only say "I'm up", if not up -> it's down or unknown (network issue ?). From a separate process you can tell if it's down, separate process is better in that case (but it may not be installed, so do we fallback on embedded process info ?).

The network issue will affect the embedded and process agent just as likely, there's no difference in that case. Also, separate process can't really tell if something is down or not, the example mentioned here was that the CPU of the process is overloaded and can't report anything but the system agent can see the pid is up. This definitely is not the case usually, having a PID up doesn't mean the software is alive anymore (who here hasn't booted their Cassandra with kill -9 more than once?). As an example of agent system, BMC's Control-M handles this differently than we're planning. While it's a job monitoring system, it has two statuses, agent status and job status. If the agent can not be contacted for certain period of time, the agent is marked down (and alerted), while the job itself is marked with unknown state. If the agent is up, but can't read the job's status, job is again marked unknown. Only if it really knows something has failed, the job is marked as failed.

...

Agent-less works *if* the network is open enough to allow it... Also for embedded one, we may be bound to product releases, unless we instrument ourself the server and update as we wish.

I don't think the product releases are an issue, if we have working versioning in our APIs. We should just log the version of the agents in our UI. As long as the API versioning is handled correctly, we should able to support older versions quite fine. Sure, those services wouldn't get new features, but I don't think this is an issue, if the new features are marked as product features (eg. EAP monitoring features) instead of our updated features. My wish is that we would support multiple approaches. If we have platform agent, it should take care of connecting to this product 'agents' and dispatching stuff to the server, but otherwise there could be smaller agents doing just a simple job. It's not unheard of that some enterprise wants to monitor infra with different tools than what's used to monitor running applications, as they could be monitored by different departments and with different responsibilities. For containers, we'll probably want to do something like cAdvisor, so running a container monitoring other containers. Just one wish - very low overhead agent. From my past we had a test machine with slightly reduced resources, yet the infra installed first IBM's TSM agent (Storage Manager) that had increasing memory usage based on how many files there were, then BMC's Patrol with plugins for TSM etc and of course we needed CTM agent to run our jobs. In the end the machine couldn't actually run any tests because it had no memory left for those jobs. Sounds like horror story but it's very real (in the end I solved the issue by killing every agent except CTM before running the tests and then restarting them after the tests as I knew the root password and infra's SLA). - Micke

Thomas Heute

5:02 a.m.

New subject: scope of the agent design

On 03/13/2015 09:28 AM, Michael Burman wrote:

...

On 13.03.2015 09:57, Thomas Heute wrote: > For uptime/downtime for instance, the embedded in-process, can only say > "I'm up", if not up -> it's down or unknown (network issue ?). From a > separate process you can tell if it's down, separate process is better > in that case (but it may not be installed, so do we fallback on embedded > process info ?). The network issue will affect the embedded and process agent just as likely, there's no difference in that case.

The difference is that: - if embedded, it can only say "I'm up", "I'm up', "I'm up", if you receive nothing you can't differentiate from being down or not receiving the up message - if non embeeded, it can say "Resource is up", "Resource is down", if you receive nothing you can tell that the agent is messed up, the state of the resource is unknown.

...

Also, separate process can't really tell if something is down or not, the example mentioned here was that the CPU of the process is overloaded and can't report anything but the system agent can see the pid is up. This definitely is not the case usually, having a PID up doesn't mean the software is alive anymore (who here hasn't booted their Cassandra with kill -9 more than once?).

There are various way to say if something is down, we can find many exceptions but let's not the exceptions state the rule. If there is no process running an agent can tell that it is down, but the process can't tell itself.

...

As an example of agent system, BMC's Control-M handles this differently than we're planning. While it's a job monitoring system, it has two statuses, agent status and job status. If the agent can not be contacted for certain period of time, the agent is marked down (and alerted), while the job itself is marked with unknown state. If the agent is up, but can't read the job's status, job is again marked unknown. Only if it really knows something has failed, the job is marked as failed.

So they still have an agent

...

> Agent-less works *if* the network is open enough to allow it... > > Also for embedded one, we may be bound to product releases, unless we > instrument ourself the server and update as we wish. > I don't think the product releases are an issue, if we have working versioning in our APIs. We should just log the version of the agents in our UI. As long as the API versioning is handled correctly, we should able to support older versions quite fine. Sure, those services wouldn't get new features, but I don't think this is an issue, if the new features are marked as product features (eg. EAP monitoring features) instead of our updated features.

I raised it because it has been a big issue for JON updates. So if not a blocker it is a concern and need to be taken into account.

...

My wish is that we would support multiple approaches.

We need to be careful, multiple approaches = increased QE and since this is not infinite this leads to decreased quality.

...

If we have platform agent, it should take care of connecting to this product 'agents' and dispatching stuff to the server, but otherwise there could be smaller agents doing just a simple job. It's not unheard of that some enterprise wants to monitor infra with different tools than what's used to monitor running applications, as they could be monitored by different departments and with different responsibilities. For containers, we'll probably want to do something like cAdvisor, so running a container monitoring other containers.

I am not sure to understand. In any case "smaller" agents need to be easily manageable.

...

Just one wish - very low overhead agent. From my past we had a test machine with slightly reduced resources, yet the infra installed first IBM's TSM agent (Storage Manager) that had increasing memory usage based on how many files there were, then BMC's Patrol with plugins for TSM etc and of course we needed CTM agent to run our jobs. In the end the machine couldn't actually run any tests because it had no memory left for those jobs. Sounds like horror story but it's very real (in the end I solved the issue by killing every agent except CTM before running the tests and then restarting them after the tests as I knew the root password and infra's SLA).

I think we'll all agree on the req, but we need to agree on what that means for the implementation... There will be a tradeoff to make between low overhead and some capabilities (like generating alerts itself)... Will likely need to be configurable... Thomas

...

- Micke _______________________________________________ hawkular-dev mailing list hawkular-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/hawkular-dev

Heiko W.Rupp

4:34 a.m.

New subject: scope of the agent design

On 13 Mar 2015, at 8:57, Thomas Heute wrote:

...

But that opens a can of worms and mix of options... For uptime/downtime for instance, the embedded in-process, can only say "I'm up", if not up -> it's down or unknown (network issue ?). From a separate process you can tell if it's down, separate process is better in that case (but it may not be installed, so do we fallback on embedded process info ?).

While this may first sound like a disadvantage, it can also be seen as advantage. In RHQ when we lost the connection to the agent, all the resources behind it were at backfilling considered as down. Now when when have e.g. the pinger in a different location and this sees that the application is reachable, we could "overwrite" the unavailable data from the agent. (and yes, I know that the previous sentence/scenario was oversimplified). Similar when the embedded agent is not able to send perhaps due to high usage of the app server, the platform agent can still see if the process is up and report it as up. Or the other way around. The embedded agent sends a "UP" report every minute, but the external agent can detect that the PID has changed and can either report a "DOWN" for the time in between two ups (which may be too much) or at least report a "tag" that the server process was rebooted, which can then later be displayed to the user in the charts. This may then explain why some resource usage is higher/lower than before. We have started this discussion (from a different angle) in two different places already - the Alert on URL return code mail http://lists.jboss.org/pipermail/hawkular-dev/2015-March/000402.html and "Computed Resource State" http://lists.jboss.org/pipermail/hawkular-dev/2015-March/000413.html The point is that the very strict 1:1 mapping that we had in RHQ is not enough. In scenarios like Aircraft, they often even have a 2 of 3 voting principle.

Gary Brown

Monday, 16 March Mon, 16 Mar

3:13 a.m.

New subject: scope of the agent design

This embedded 'agent' would also be useful for collecting the activity information for RTGov. I assume information will be routed depending on type at the backend? Regards Gary ----- Original Message -----

...

John Sanda

2:07 p.m.

New subject: scope of the agent design

For monitoring purposes, do we really need to write an agent? Should we just leverage existing tools/libraries? I previously cited three common architectures for monitoring agents, 1) embedded, in-process 2) separate process but co-located on same host 3) remote monitoring from different host Let’s consider monitoring the JVM and applications running on it. Coda Hale Metrics is a widely used metrics library that is becoming ubiquitous. It provides reporters for exporting metrics that are collected. The core metrics library provides several reporters, console, JMX, CSV, to name a few. There are plenty of 3rd party reported as well, like the Graphite reporter. We could implement a hawkular reporter which then makes it very easy then for any application, library, etc. that uses Coda Hale Metrics to collect and report to hawkular. The in-process collector might not always be possible or desirable. For those situations the co-located agent is a better fit. jmxtrans could be an excellent option. It can query and collect metrics from external JVMs and then write them to other systems like Graphite, Ganglia, Open TSDB, and more. We could implement a hawkular metrics writer. Maybe we take a similar approach for platform metrics with collectd for example. We are already doing something similar by seeing how we can integrate more directly with cadvisor. Is it worth considering doing the same with some of the tools/libraries that are already in wide spread use?

...

On Mar 16, 2015, at 4:13 AM, Gary Brown <gbrown(a)redhat.com> wrote: This embedded 'agent' would also be useful for collecting the activity information for RTGov. I assume information will be routed depending on type at the backend? Regards Gary ----- Original Message ----- > So what I heard today was we don't want a standalone agent, but we do want > something that can be embedded in Wildfly/EAP so it can monitor things > running in Wildfly (not just monitor Wildfly itself, but applications > running inside wildfly). > > That lends itself to supporting customizable modules that can be deployed in > Wildfly/EAP as hawkular subsystems. > > Can someone give me a quick summary of what Wildfly-Monitor does? > https://github.com/hawkular/wildfly-monitor > _______________________________________________ > hawkular-dev mailing list > hawkular-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev > _______________________________________________ hawkular-dev mailing list hawkular-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/hawkular-dev

Heiko W.Rupp

2:24 p.m.

New subject: scope of the agent design

On 16 Mar 2015, at 20:07, John Sanda wrote:

...

For monitoring purposes, do we really need to write an agent? Should we just leverage existing tools/libraries? I previously cited three

(Re)using all those tools is fine and certainly desired, but the issue is less about what some random tool uses to collect metrics inside an app, but rather how to access and transport them. Using JMX like in the good ol' days is certainly a way. Or using the Jolokia Java agent. But still someone needs to talk to them. Writing a subsystem for inside Wildfly to actively report/submit data is in fact an (embedded) agent. Not a general purpose one. We already have converters from collectd, gmon and a few other protocols into Hawkular(-metrics). So yes, they should all be allowable as input. And then we will have more specialized use cases that most probably go much further than just submitting some metrics to the Hawkular(-metrics) server. In this case some more specialized code may be needed too.

John Sanda

2:42 p.m.

New subject: scope of the agent design

...

On Mar 16, 2015, at 3:24 PM, Heiko W.Rupp <hrupp(a)redhat.com> wrote: On 16 Mar 2015, at 20:07, John Sanda wrote: > For monitoring purposes, do we really need to write an agent? Should > we just leverage existing tools/libraries? I previously cited three (Re)using all those tools is fine and certainly desired, but the issue is less about what some random tool uses to collect metrics inside an app, but rather how to access and transport them. Using JMX like in the good ol' days is certainly a way. Or using the Jolokia Java agent. But still someone needs to talk to them.

Accessing and transporting the data is already answered to a large degree. That is the primary reason I brought it up in the first place. Another aspect to consider is that different types of monitoring agents lend themselves better to different scenarios. I think that focusing more on integration with existing agents/collectors gives us a better chance of being able to use the best tool for a particular situation.

...

Writing a subsystem for inside Wildfly to actively report/submit data is in fact an (embedded) agent. Not a general purpose one. We already have converters from collectd, gmon and a few other protocols into Hawkular(-metrics). So yes, they should all be allowable as input. And then we will have more specialized use cases that most probably go much further than just submitting some metrics to the Hawkular(-metrics) server. In this case some more specialized code may be needed too. _______________________________________________ hawkular-dev mailing list hawkular-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/hawkular-dev

Lukas Krejci

2:58 p.m.

New subject: scope of the agent design

On Monday, March 16, 2015 15:42:19 John Sanda wrote:

...

> On Mar 16, 2015, at 3:24 PM, Heiko W.Rupp <hrupp(a)redhat.com> wrote: > > On 16 Mar 2015, at 20:07, John Sanda wrote: >> For monitoring purposes, do we really need to write an agent? Should >> we just leverage existing tools/libraries? I previously cited three > > (Re)using all those tools is fine and certainly desired, but the issue > is less > about what some random tool uses to collect metrics inside an app, but > rather how to access and transport them. Using JMX like in the good ol' > days is certainly a way. Or using the Jolokia Java agent. But still > someone > needs to talk to them. Accessing and transporting the data is already answered to a large degree. That is the primary reason I brought it up in the first place. Another aspect to consider is that different types of monitoring agents lend themselves better to different scenarios. I think that focusing more on integration with existing agents/collectors gives us a better chance of being able to use the best tool for a particular situation.

+1 That's why I was advocating for "no agent at all" a couple of days ago. Frankly, that was a little bit too radical and not thought through. You articulated the point I wanted to make much better. On the "agent side" there are more than plenty of tools that are already in use. We should first try to find ways of integrating with these tools and only when none of pre-existing stuff implements our usecase (in a good enough way) we should look to implement an "agent" of our own. It is my current (most probably naive) understanding that those agents should be small tools for particular jobs (or maybe extensions to existing tools), not some "heavy" agent in the RHQ sense. Better yet, in addition to integration with existing metric-collection tools, I think we could write some "integration libraries" for some popular languages (bash, java, python, ...) that would enable easy integration of hawk metrics into existing software or enable users to write new "feeds" (because "agent" is a too overloaded term) on their own. These feeds could be any of the 3 types you mentioned and we wouldn't actually care on the server side.

...

> Writing a subsystem for inside Wildfly to actively report/submit data > is in fact an (embedded) agent. Not a general purpose one. > > We already have converters from collectd, gmon and a few other > protocols into Hawkular(-metrics). So yes, they should all be allowable > as input. > > And then we will have more specialized use cases that most probably go > much > further than just submitting some metrics to the Hawkular(-metrics) > server. > In this case some more specialized code may be needed too. > _______________________________________________ > hawkular-dev mailing list > hawkular-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev _______________________________________________ hawkular-dev mailing list hawkular-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/hawkular-dev

John Sanda

3:19 p.m.

New subject: scope of the agent design

...

On Mar 16, 2015, at 3:58 PM, Lukas Krejci <lkrejci(a)redhat.com> wrote: On Monday, March 16, 2015 15:42:19 John Sanda wrote: >> On Mar 16, 2015, at 3:24 PM, Heiko W.Rupp <hrupp(a)redhat.com> wrote: >> >> On 16 Mar 2015, at 20:07, John Sanda wrote: >>> For monitoring purposes, do we really need to write an agent? Should >>> we just leverage existing tools/libraries? I previously cited three >> >> (Re)using all those tools is fine and certainly desired, but the issue >> is less >> about what some random tool uses to collect metrics inside an app, but >> rather how to access and transport them. Using JMX like in the good ol' >> days is certainly a way. Or using the Jolokia Java agent. But still >> someone >> needs to talk to them. > > Accessing and transporting the data is already answered to a large degree. > That is the primary reason I brought it up in the first place. Another > aspect to consider is that different types of monitoring agents lend > themselves better to different scenarios. I think that focusing more on > integration with existing agents/collectors gives us a better chance of > being able to use the best tool for a particular situation. +1 That's why I was advocating for "no agent at all" a couple of days ago. Frankly, that was a little bit too radical and not thought through. You articulated the point I wanted to make much better. On the "agent side" there are more than plenty of tools that are already in use. We should first try to find ways of integrating with these tools and only when none of pre-existing stuff implements our usecase (in a good enough way) we should look to implement an "agent" of our own. It is my current (most probably naive) understanding that those agents should be small tools for particular jobs (or maybe extensions to existing tools), not some "heavy" agent in the RHQ sense. Better yet, in addition to integration with existing metric-collection tools, I think we could write some "integration libraries" for some popular languages (bash, java, python, ...) that would enable easy integration of hawk metrics into existing software or enable users to write new "feeds" (because "agent" is a too overloaded term) on their own. These feeds could be any of the 3 types you mentioned and we wouldn't actually care on the server side.

I definitely agree that we should be thinking about integration libraries for different languages. We have a clear use case for Go already, and I think Python makes sense too.

Thomas Heute

Tuesday, 17 March Tue, 17 Mar

3:35 a.m.

New subject: scope of the agent design

On 03/16/2015 08:58 PM, Lukas Krejci wrote:

...

On Monday, March 16, 2015 15:42:19 John Sanda wrote: >> On Mar 16, 2015, at 3:24 PM, Heiko W.Rupp <hrupp(a)redhat.com> wrote: >> >> On 16 Mar 2015, at 20:07, John Sanda wrote: >>> For monitoring purposes, do we really need to write an agent? Should >>> we just leverage existing tools/libraries? I previously cited three >> (Re)using all those tools is fine and certainly desired, but the issue >> is less >> about what some random tool uses to collect metrics inside an app, but >> rather how to access and transport them. Using JMX like in the good ol' >> days is certainly a way. Or using the Jolokia Java agent. But still >> someone >> needs to talk to them. > Accessing and transporting the data is already answered to a large degree. > That is the primary reason I brought it up in the first place. Another > aspect to consider is that different types of monitoring agents lend > themselves better to different scenarios. I think that focusing more on > integration with existing agents/collectors gives us a better chance of > being able to use the best tool for a particular situation. +1 That's why I was advocating for "no agent at all" a couple of days ago. Frankly, that was a little bit too radical and not thought through. You articulated the point I wanted to make much better. On the "agent side" there are more than plenty of tools that are already in use. We should first try to find ways of integrating with these tools and only when none of pre-existing stuff implements our usecase (in a good enough way) we should look to implement an "agent" of our own. It is my current (most probably naive) understanding that those agents should be small tools for particular jobs (or maybe extensions to existing tools), not some "heavy" agent in the RHQ sense. Better yet, in addition to integration with existing metric-collection tools, I think we could write some "integration libraries" for some popular languages (bash, java, python, ...) that would enable easy integration of hawk metrics into existing software or enable users to write new "feeds" (because "agent" is a too overloaded term) on their own. These feeds could be any of the 3 types you mentioned and we wouldn't actually care on the server side.

We also need auto-discovery to fill-in the inventory. It's not just a flow of data but we need to know how those "feeds" are related. Thomas

...

>> Writing a subsystem for inside Wildfly to actively report/submit data >> is in fact an (embedded) agent. Not a general purpose one. >> >> We already have converters from collectd, gmon and a few other >> protocols into Hawkular(-metrics). So yes, they should all be allowable >> as input. >> >> And then we will have more specialized use cases that most probably go >> much >> further than just submitting some metrics to the Hawkular(-metrics) >> server. >> In this case some more specialized code may be needed too. >> _______________________________________________ >> hawkular-dev mailing list >> hawkular-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/hawkular-dev > _______________________________________________ > hawkular-dev mailing list > hawkular-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev _______________________________________________ hawkular-dev mailing list hawkular-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/hawkular-dev

Heiko W.Rupp

5:13 a.m.

New subject: scope of the agent design

On 16 Mar 2015, at 20:58, Lukas Krejci wrote:

...

On the "agent side" there are more than plenty of tools that are already in use. We should first try to find ways of integrating with these tools and only when none of pre-existing stuff implements our usecase (in a good enough way) we should look to implement an "agent" of our own.

What if the users does not have any of those tools installed? Do we tell them "install Ganglia, but not the graphing, only the monitoring". Ah and as this does not cope well with WildFly 94 please install collectd on top?

...

not some "heavy" agent in the RHQ sense.

Running many of the small tools in parallel also has a cost. Similar to forking hundreds and thousands of shell commands.

Michael Burman

5:41 a.m.

New subject: scope of the agent design

Hi, Not to mention all the integration weight we'll have to carry. Keeping up with the third party software versions and backwards compatibility is going to incur a huge cost in terms of development hours. The same issue we had with building RHQ plugins for every product. Supporting software where we can send plugins for the third-party to "keep up" is going to be easier for us (assuming we get some users for those - in which case there might be community updates to some of those plugins to keep them working) than building compatible APIs to our core. And in any case, those third-party agents will not provide us the USP we want. - Micke ----- Original Message ----- From: "Heiko W.Rupp" <hrupp(a)redhat.com> To: hawkular-dev(a)lists.jboss.org Sent: Tuesday, March 17, 2015 12:13:32 PM Subject: Re: [Hawkular-dev] scope of the agent design On 16 Mar 2015, at 20:58, Lukas Krejci wrote:

...

not some "heavy" agent in the RHQ sense.

Running many of the small tools in parallel also has a cost. Similar to forking hundreds and thousands of shell commands. _______________________________________________ hawkular-dev mailing list hawkular-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/hawkular-dev

Lukas Krejci

10:07 a.m.

New subject: scope of the agent design

On Tuesday, March 17, 2015 06:41:38 Michael Burman wrote:

...

I don't see a difference here. IMHO, integration with a 3rd party monitoring tool is the same as writing a plugin for monitoring an application. Both can change incompatibly out of our control and we'll always be just catching up on. The reason I think integration with 3rd party monitoring is better is that, more probably than not, people will have used some monitoring solution by the time they decide to use Hawkular. If we offer them integration with their existing monitoring solution they will love us. If we tell them they need to toss everything they have and instead deploy our stuff that might or might not do what they were used to, they might not love us at all ;)

...

Supporting software where we can send plugins for the third-party to "keep up" is going to be easier for us (assuming we get some users for those - in which case there might be community updates to some of those plugins to keep them working) than building compatible APIs to our core. And in any case, those third-party agents will not provide us the USP we want.

Well, that is a big question. How are you going to force people to write Hawkular plugins? Sure, this can be done inside Red Hat were we follow the common goal and divide the work across the teams, but I doubt community will be chuffed to learn yet another way of monitoring. But then again, that is what we propose, too ;) Integrate with existing monitoring solutions AND provide libraries for people to easily write their own feeds to Hawkular.

...

- Micke ----- Original Message ----- From: "Heiko W.Rupp" <hrupp(a)redhat.com> To: hawkular-dev(a)lists.jboss.org Sent: Tuesday, March 17, 2015 12:13:32 PM Subject: Re: [Hawkular-dev] scope of the agent design On 16 Mar 2015, at 20:58, Lukas Krejci wrote: > On the "agent side" there are more than plenty of tools that are > already in > use. We should first try to find ways of integrating with these tools > and only > when none of pre-existing stuff implements our usecase (in a good > enough way) > we should look to implement an "agent" of our own. What if the users does not have any of those tools installed? Do we tell them "install Ganglia, but not the graphing, only the monitoring". Ah and as this does not cope well with WildFly 94 please install collectd on top? > not some "heavy" agent in the RHQ sense. Running many of the small tools in parallel also has a cost. Similar to forking hundreds and thousands of shell commands. _______________________________________________ hawkular-dev mailing list hawkular-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/hawkular-dev _______________________________________________ hawkular-dev mailing list hawkular-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/hawkular-dev

Lukas Krejci

10:17 a.m.

New subject: scope of the agent design

On Tuesday, March 17, 2015 11:13:32 Heiko W.Rupp wrote:

...

On 16 Mar 2015, at 20:58, Lukas Krejci wrote: > On the "agent side" there are more than plenty of tools that are > already in > use. We should first try to find ways of integrating with these tools > and only > when none of pre-existing stuff implements our usecase (in a good > enough way) > we should look to implement an "agent" of our own. What if the users does not have any of those tools installed? Do we tell them "install Ganglia, but not the graphing, only the monitoring".

I think that is going to just depend on the usecase. It might be easier to tell them "install Ganglia" if it provides just the monitoring capabilities they need. It also might be easier to tell them use Hawkular agent. Or they might have requirements that neither can meet.

...

Ah and as this does not cope well with WildFly 94 please install collectd on top?

I am pretty much sure Hawkular will have the same kind of problems. RHQ had them for sure because incompatibilities are pretty much unavoidable IMHO.

...

> not some "heavy" agent in the RHQ sense. Running many of the small tools in parallel also has a cost. Similar to forking hundreds and thousands of shell commands.

In my mind, the most optimal feed is an in-process feed - no problems with security, inter-process/network comms overhead, etc. The way people do "small" monitoring IMHO, is to run many small scripts on schedule. Even if there is a thousand of them, they run for a short period of time so the load spreads nicely (usually, of course everything is a function of scale as we know from RHQ ;) ). Having 1 external process to do all monitoring, as we had in RHQ, has also its pros - for one it is easier to estimate the monitoring "cost" by looking at the memory and CPU consumption of the external agent.

...

_______________________________________________ hawkular-dev mailing list hawkular-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/hawkular-dev

Thomas Heute

10:50 a.m.

New subject: scope of the agent design

...

On Tuesday, March 17, 2015 11:13:32 Heiko W.Rupp wrote: > On 16 Mar 2015, at 20:58, Lukas Krejci wrote: >> On the "agent side" there are more than plenty of tools that are >> already in >> use. We should first try to find ways of integrating with these tools >> and only >> when none of pre-existing stuff implements our usecase (in a good >> enough way) >> we should look to implement an "agent" of our own. > What if the users does not have any of those tools installed? > Do we tell them "install Ganglia, but not the graphing, only the > monitoring". I think that is going to just depend on the usecase. It might be easier to tell them "install Ganglia" if it provides just the monitoring capabilities they need. It also might be easier to tell them use Hawkular agent. Or they might have requirements that neither can meet. > Ah and as this does not cope well with WildFly 94 please install > collectd on top? > I am pretty much sure Hawkular will have the same kind of problems. RHQ had them for sure because incompatibilities are pretty much unavoidable IMHO. >> not some "heavy" agent in the RHQ sense. > Running many of the small tools in parallel also has a cost. Similar > to forking hundreds and thousands of shell commands. In my mind, the most optimal feed is an in-process feed - no problems with security, inter-process/network comms overhead, etc. The way people do "small" monitoring IMHO, is to run many small scripts on schedule. Even if there is a thousand of them, they run for a short period of time so the load spreads nicely (usually, of course everything is a function of scale as we know from RHQ ;) ). Having 1 external process to do all monitoring, as we had in RHQ, has also its pros - for one it is easier to estimate the monitoring "cost" by looking at the memory and CPU consumption of the external agent. > _______________________________________________ > hawkular-dev mailing list > hawkular-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev _______________________________________________ hawkular-dev mailing list hawkular-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/hawkular-dev

John Doyle

11:57 a.m.

New subject: scope of the agent design

+1, we want to be up and running very rapidly, even if that means in a limited fashion. ~jd ----- Original Message -----

...

I don't think we are on the same page regarding the focus of the project/product. Hawkular is targeted toward Red Hat Middleware management, for infrastructure management there are gazillion other solutions. We need "some" basic infrastructure management, for those basic needs a user shouldn't need to maintain a solution made of multiple puzzle pieces. Look at NewRelic or ruxit those are dead simple to install and to start monitoring. Thomas On 03/17/2015 04:17 PM, Lukas Krejci wrote: > On Tuesday, March 17, 2015 11:13:32 Heiko W.Rupp wrote: >> On 16 Mar 2015, at 20:58, Lukas Krejci wrote: >>> On the "agent side" there are more than plenty of tools that are >>> already in >>> use. We should first try to find ways of integrating with these tools >>> and only >>> when none of pre-existing stuff implements our usecase (in a good >>> enough way) >>> we should look to implement an "agent" of our own. >> What if the users does not have any of those tools installed? >> Do we tell them "install Ganglia, but not the graphing, only the >> monitoring". > I think that is going to just depend on the usecase. It might be easier to > tell them "install Ganglia" if it provides just the monitoring capabilities > they need. It also might be easier to tell them use Hawkular agent. > > Or they might have requirements that neither can meet. > >> Ah and as this does not cope well with WildFly 94 please install >> collectd on top? >> > I am pretty much sure Hawkular will have the same kind of problems. RHQ had > them for sure because incompatibilities are pretty much unavoidable IMHO. > >>> not some "heavy" agent in the RHQ sense. >> Running many of the small tools in parallel also has a cost. Similar >> to forking hundreds and thousands of shell commands. > In my mind, the most optimal feed is an in-process feed - no problems with > security, inter-process/network comms overhead, etc. > > The way people do "small" monitoring IMHO, is to run many small scripts on > schedule. Even if there is a thousand of them, they run for a short period > of > time so the load spreads nicely (usually, of course everything is a > function > of scale as we know from RHQ ;) ). > > Having 1 external process to do all monitoring, as we had in RHQ, has also > its > pros - for one it is easier to estimate the monitoring "cost" by looking at > the memory and CPU consumption of the external agent. > >> _______________________________________________ >> hawkular-dev mailing list >> hawkular-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/hawkular-dev > _______________________________________________ > hawkular-dev mailing list > hawkular-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev _______________________________________________ hawkular-dev mailing list hawkular-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/hawkular-dev

Lukas Krejci

12:14 p.m.

New subject: scope of the agent design

On Tuesday, March 17, 2015 16:50:59 Thomas Heute wrote:

...

As great as ruxit is, it requires to run their agent as root, forcing itself into ld.so.preload and modifying selinux policies. That gave me a bit of shivers (https://answers.ruxit.com/questions/286/). I thought Red Hat does things the UNIX/microservice way - integrating tools that do 1 thing and do it well. Hence my focus on integration. I do hope we are on the same page, though. I was maybe a bit loud at stressing the integration aspect because I saw the tendencies to write yet another RHQ agent. IMHO, that is something we should try to avoid/minimize if possible (and I am not saying there won't be usecases requiring it - I just think it should not be the first choice when trying to implement something). The good thing about integrating with 3rd party monitoring is that we get what they are able to monitor "for free" (where free is the cost of keeping ourselves compatible with the 3rd party tool ;) ) which enables our users to reuse the monitoring they most probably have already set up and only upgrade the server-side to a hopefully much more capable thing that Hawkular will be. As I said already below, I think it is worth pursuing the in-process management where possible as that should avoid the pitfalls/usability problems we ran into with RHQ agent. The proportion of work on in-process management versus the integration with 3rd party tools vs out-of-process "full" agent is of course a function of priorities and project/product focus. I just wanted to try and persuade people that relying on there being an "Hawkular agent" is IMHO not the best solution to our overall "agent-side" architecture. It probably will be a part of the picture, but it should not be required, IMHO.

...

Thomas On 03/17/2015 04:17 PM, Lukas Krejci wrote: > On Tuesday, March 17, 2015 11:13:32 Heiko W.Rupp wrote: >> On 16 Mar 2015, at 20:58, Lukas Krejci wrote: >>> On the "agent side" there are more than plenty of tools that are >>> already in >>> use. We should first try to find ways of integrating with these tools >>> and only >>> when none of pre-existing stuff implements our usecase (in a good >>> enough way) >>> we should look to implement an "agent" of our own. >> >> What if the users does not have any of those tools installed? >> Do we tell them "install Ganglia, but not the graphing, only the >> monitoring". > > I think that is going to just depend on the usecase. It might be easier to > tell them "install Ganglia" if it provides just the monitoring > capabilities > they need. It also might be easier to tell them use Hawkular agent. > > Or they might have requirements that neither can meet. > >> Ah and as this does not cope well with WildFly 94 please install >> collectd on top? > > I am pretty much sure Hawkular will have the same kind of problems. RHQ > had > them for sure because incompatibilities are pretty much unavoidable IMHO. > >>> not some "heavy" agent in the RHQ sense. >> >> Running many of the small tools in parallel also has a cost. Similar >> to forking hundreds and thousands of shell commands. > > In my mind, the most optimal feed is an in-process feed - no problems with > security, inter-process/network comms overhead, etc. > > The way people do "small" monitoring IMHO, is to run many small scripts on > schedule. Even if there is a thousand of them, they run for a short period > of time so the load spreads nicely (usually, of course everything is a > function of scale as we know from RHQ ;) ). > > Having 1 external process to do all monitoring, as we had in RHQ, has also > its pros - for one it is easier to estimate the monitoring "cost" by > looking at the memory and CPU consumption of the external agent. > >> _______________________________________________ >> hawkular-dev mailing list >> hawkular-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/hawkular-dev > > _______________________________________________ > hawkular-dev mailing list > hawkular-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev

Thomas Segismont

Friday, 20 March Fri, 20 Mar

12:42 p.m.

New subject: scope of the agent design

Le 17/03/2015 16:50, Thomas Heute a écrit :

...

Look at NewRelic or ruxit those are dead simple to install and to start monitoring.

AFAIK, with NewRelic you have to install: * their _sysmond_ for OS level monitoring * a Java agent (a true JVM agent, not an external process written in Java) for each JVM you want to monitor I don't know about ruxit

Juraci Paixão Kröhling

Monday, 23 March Mon, 23 Mar

4:50 a.m.

New subject: scope of the agent design

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 03/20/2015 06:42 PM, Thomas Segismont wrote:

...

Le 17/03/2015 16:50, Thomas Heute a écrit : > Look at NewRelic or ruxit those are dead simple to install and to > start monitoring. AFAIK, with NewRelic you have to install: * their _sysmond_ for OS level monitoring

Their system monitoring tool is provided in RPM format (and probably other formats) which includes their own repository, meaning, it installs everything it needs + auto updates using the system's mechanism. No need to check a special website to see if there are updates.

...

* a Java agent (a true JVM agent, not an external process written in Java) for each JVM you want to monitor

For this, they have an "installer" that auto detects most application servers and add whatever is needed wherever is needed (including changing standalone.conf). So, this is somewhat transparent to the system's administrator. - - Juca.

...PGP SIGNATURE...

Thomas Segismont

5:16 a.m.

New subject: scope of the agent design

Le 23/03/2015 10:50, Juraci Paixão Kröhling a écrit :

...

On 03/20/2015 06:42 PM, Thomas Segismont wrote: > >Le 17/03/2015 16:50, Thomas Heute a écrit : >> >>Look at NewRelic or ruxit those are dead simple to install and to >> >>start monitoring. > > > >AFAIK, with NewRelic you have to install: * their_sysmond_ for OS > >level monitoring Their system monitoring tool is provided in RPM format (and probably other formats) which includes their own repository, meaning, it installs everything it needs + auto updates using the system's mechanism. No need to check a special website to see if there are updates.

Thanks for emphasizing the value of distribution through package managers.

...

> >* a Java agent (a true JVM agent, not an external process written > >in Java) for each JVM you want to monitor For this, they have an "installer" that auto detects most application servers and add whatever is needed wherever is needed (including changing standalone.conf). So, this is somewhat transparent to the system's administrator. - - Juca.

I didn't know about that.

Thomas Segismont

Friday, 20 March Fri, 20 Mar

1:07 p.m.

New subject: scope of the agent design

Le 17/03/2015 16:50, Thomas Heute a écrit :

...

So in this thread we talked about what we want the user experience to be (simple install, works across different systems) and what we don't the developer experience to be (with the RHQ agent pain points in mind). That's important but could we elaborate on the use cases we want to solve in the next milestones?

Thomas Heute

Monday, 23 March Mon, 23 Mar

4:07 a.m.

New subject: scope of the agent design

On 03/20/2015 07:07 PM, Thomas Segismont wrote:

...

Le 17/03/2015 16:50, Thomas Heute a écrit : > I don't think we are on the same page regarding the focus of the > project/product. > > Hawkular is targeted toward Red Hat Middleware management, for > infrastructure management there are gazillion other solutions. > We need "some" basic infrastructure management, for those basic needs a > user shouldn't need to maintain a solution made of multiple puzzle pieces. So in this thread we talked about what we want the user experience to be (simple install, works across different systems) and what we don't the developer experience to be (with the RHQ agent pain points in mind). That's important but could we elaborate on the use cases we want to solve in the next milestones?

- Monitor Middleware servers (Metrics/alerts) running either in OpenShift (Online/Enterprise) or "on premise". - "Basic" monitoring of the underlying infrastructure (CPU/Memory/Disk of the host or kube/docker environment) - Have an inventory to relates "resources": what runs where ? - Operate/Provision servers: the "mutable kind" a la JON and the "immutable" kind (with F8) For the "kube/docker/OpenShift" environment, we'll go though this kadvisor thing, this should collect CPU/Memory/disk about a docker image (cAdvisor) + CPU/Memory/disk at a cluster level (heapster) + any metric expose through JMX or DMR. IMO we should be able to read the same data when our servers are running outside of that environment. Thomas

...

_______________________________________________ hawkular-dev mailing list hawkular-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/hawkular-dev

Thomas Segismont

Friday, 20 March Fri, 20 Mar

12:17 p.m.

New subject: scope of the agent design

Le 17/03/2015 11:13, Heiko W.Rupp a écrit :

...

From my perspective, there's no point in writing collectd/ganglia/... plugins for JBoss middleware projects. The value I see in integrating with third party collectors is that users may have their own scripts/plugins or use one of their many community plugins.

...

Running many of the small tools in parallel also has a cost. Similar to forking hundreds and thousands of shell commands.

I'm not sure about others, but you could run a few collectd processes in parallel and still have a lower memory footprint than a single JVM :)

Heiko W.Rupp

Monday, 23 March Mon, 23 Mar

3:19 a.m.

New subject: scope of the agent design

On 20 Mar 2015, at 18:17, Thomas Segismont wrote:

...

Le 17/03/2015 11:13, Heiko W.Rupp a écrit : > What if the users does not have any of those tools installed? > Do we tell them "install Ganglia, but not the graphing, only the > monitoring". > Ah and as this does not cope well with WildFly 94 please install > collectd on top? > From my perspective, there's no point in writing collectd/ganglia/... plugins for JBoss middleware projects.

Absolutely - which means we (= larger JBoss teams) need to do that.

...

The value I see in integrating with third party collectors is that users may have their own scripts/plugins or use one of their many community plugins.

Yes.

Heiko W.Rupp

6:19 a.m.

New subject: scope of the agent design

On 23 Mar 2015, at 9:19, Heiko W.Rupp wrote:

...

> From my perspective, there's no point in writing collectd/ganglia/... > plugins for JBoss middleware projects. Absolutely - which means we (= larger JBoss teams) need to do that.

As I re-read, this can be misunderstood -- I am not advocating to write collectd/ganglia plugins, but "agents" that do the job for monitoring/management of JBoss Middleware products.

Heiko W.Rupp

Tuesday, 17 March Tue, 17 Mar

5:10 a.m.

New subject: scope of the agent design

On 16 Mar 2015, at 20:42, John Sanda wrote:

...

Accessing and transporting the data is already answered to a large degree. That is the primary reason I brought it up in the first

Still, someone needs to go out and talk to Jolokia and fetch the data periodically. Past experience has shown that users like the fact that the (RHQ/Hawkular) server is not talking to all the end-resources / - processes individually, but that this is channeled through the agent.

Lukas Krejci

10:19 a.m.

New subject: scope of the agent design

On Tuesday, March 17, 2015 11:10:22 Heiko W.Rupp wrote:

...

On 16 Mar 2015, at 20:42, John Sanda wrote: > Accessing and transporting the data is already answered to a large > degree. That is the primary reason I brought it up in the first Still, someone needs to go out and talk to Jolokia and fetch the data periodically. Past experience has shown that users like the fact that the (RHQ/Hawkular) server is not talking to all the end-resources / - processes individually, but that this is channeled through the agent.

They also complained about it because it made filesystem permissions and other security aspects more complex.

...

_______________________________________________ hawkular-dev mailing list hawkular-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/hawkular-dev

Thomas Segismont

Friday, 20 March Fri, 20 Mar

12:10 p.m.

New subject: scope of the agent design

Sorry, very late reply. Others to come. Le 17/03/2015 11:10, Heiko W.Rupp a écrit :

...

Past experience has shown that users like the fact that the (RHQ/Hawkular) server is not talking to all the end-resources / - processes individually, but that this is channeled through the agent.

We could enhance ptrans to turn it into some sort of proxy for the Metrics API.

Thomas Segismont

12:23 p.m.

New subject: scope of the agent design

Le 20/03/2015 18:10, Thomas Segismont a écrit :

...

Sorry, very late reply. Others to come. Le 17/03/2015 11:10, Heiko W.Rupp a écrit : > Past experience has shown that users like the fact that the > (RHQ/Hawkular) > server is not talking to all the end-resources / - processes > individually, but > that this is channeled through the agent. We could enhance ptrans to turn it into some sort of proxy for the Metrics API.

Just realized you were talking about the server to managed resource use case.

Lukas Krejci

Monday, 16 March Mon, 16 Mar

3:02 p.m.

New subject: scope of the agent design

On Monday, March 16, 2015 15:07:10 John Sanda wrote:

...

I think re-using existing stuff is desirable but 1 question needs to be addressed first: How are we going to configure these tools? We could just say that that is out of the scope for Hawkular, but I don't find that too user friendly. Things like collection intervals or disabling/enabling should be configurable from inside Hawkular if the external tool has such capability.

...

Let’s consider monitoring the JVM and applications running on it. Coda Hale Metrics is a widely used metrics library that is becoming ubiquitous. It provides reporters for exporting metrics that are collected. The core metrics library provides several reporters, console, JMX, CSV, to name a few. There are plenty of 3rd party reported as well, like the Graphite reporter. We could implement a hawkular reporter which then makes it very easy then for any application, library, etc. that uses Coda Hale Metrics to collect and report to hawkular. The in-process collector might not always be possible or desirable. For those situations the co-located agent is a better fit. jmxtrans could be an excellent option. It can query and collect metrics from external JVMs and then write them to other systems like Graphite, Ganglia, Open TSDB, and more. We could implement a hawkular metrics writer. Maybe we take a similar approach for platform metrics with collectd for example. We are already doing something similar by seeing how we can integrate more directly with cadvisor. Is it worth considering doing the same with some of the tools/libraries that are already in wide spread use? > On Mar 16, 2015, at 4:13 AM, Gary Brown <gbrown(a)redhat.com> wrote: > > This embedded 'agent' would also be useful for collecting the activity > information for RTGov. I assume information will be routed depending on > type at the backend? > > Regards > Gary > > ----- Original Message ----- > >> So what I heard today was we don't want a standalone agent, but we do >> want >> something that can be embedded in Wildfly/EAP so it can monitor things >> running in Wildfly (not just monitor Wildfly itself, but applications >> running inside wildfly). >> >> That lends itself to supporting customizable modules that can be deployed >> in Wildfly/EAP as hawkular subsystems. >> >> Can someone give me a quick summary of what Wildfly-Monitor does? >> https://github.com/hawkular/wildfly-monitor >> _______________________________________________ >> hawkular-dev mailing list >> hawkular-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/hawkular-dev > > _______________________________________________ > hawkular-dev mailing list > hawkular-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev _______________________________________________ hawkular-dev mailing list hawkular-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/hawkular-dev

John Sanda

3:30 p.m.

New subject: scope of the agent design

...

On Mar 16, 2015, at 4:02 PM, Lukas Krejci <lkrejci(a)redhat.com> wrote: On Monday, March 16, 2015 15:07:10 John Sanda wrote: > For monitoring purposes, do we really need to write an agent? Should we just > leverage existing tools/libraries? I previously cited three common > architectures for monitoring agents, > > 1) embedded, in-process > 2) separate process but co-located on same host > 3) remote monitoring from different host > I think re-using existing stuff is desirable but 1 question needs to be addressed first: How are we going to configure these tools? We could just say that that is out of the scope for Hawkular, but I don't find that too user friendly. Things like collection intervals or disabling/enabling should be configurable from inside Hawkular if the external tool has such capability.

Configuration is an open question, but I think it is a question that applies equally whether we are talking integration with existing tools or our own, custom monitoring agent(s). I agree that stuff like collection intervals and enabling/disabling metrics should be configurable via Hawkular. On one end of the spectrum, the integration could be completely exposed. This places the largest burden on the user because the user has to understand the various tools used in/by Hawkular. For users familiar with those tools though, it provides benefits because it gives them greater flexibility in how to set up and configure things. At the other end of the spectrum, the integration is an implementation detail to the greatest extent possible. Users only need to know how to set up/configure Hawkular. This is more along the lines of the approach we took with integrating Cassandra into RHQ. I think that answer should be somewhere in the middle so that we can give users the flexibility when they want it, but at the same time not force it on users.

...

> Let’s consider monitoring the JVM and applications running on it. Coda Hale > Metrics is a widely used metrics library that is becoming ubiquitous. It > provides reporters for exporting metrics that are collected. The core > metrics library provides several reporters, console, JMX, CSV, to name a > few. There are plenty of 3rd party reported as well, like the Graphite > reporter. We could implement a hawkular reporter which then makes it very > easy then for any application, library, etc. that uses Coda Hale Metrics to > collect and report to hawkular. > > The in-process collector might not always be possible or desirable. For > those situations the co-located agent is a better fit. jmxtrans could be an > excellent option. It can query and collect metrics from external JVMs and > then write them to other systems like Graphite, Ganglia, Open TSDB, and > more. We could implement a hawkular metrics writer. > > Maybe we take a similar approach for platform metrics with collectd for > example. We are already doing something similar by seeing how we can > integrate more directly with cadvisor. Is it worth considering doing the > same with some of the tools/libraries that are already in wide spread use? >> On Mar 16, 2015, at 4:13 AM, Gary Brown <gbrown(a)redhat.com> wrote: >> >> This embedded 'agent' would also be useful for collecting the activity >> information for RTGov. I assume information will be routed depending on >> type at the backend? >> >> Regards >> Gary >> >> ----- Original Message ----- >> >>> So what I heard today was we don't want a standalone agent, but we do >>> want >>> something that can be embedded in Wildfly/EAP so it can monitor things >>> running in Wildfly (not just monitor Wildfly itself, but applications >>> running inside wildfly). >>> >>> That lends itself to supporting customizable modules that can be deployed >>> in Wildfly/EAP as hawkular subsystems. >>> >>> Can someone give me a quick summary of what Wildfly-Monitor does? >>> https://github.com/hawkular/wildfly-monitor >>> _______________________________________________ >>> hawkular-dev mailing list >>> hawkular-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/hawkular-dev >> >> _______________________________________________ >> hawkular-dev mailing list >> hawkular-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/hawkular-dev > > _______________________________________________ > hawkular-dev mailing list > hawkular-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hawkular-dev _______________________________________________ hawkular-dev mailing list hawkular-dev(a)lists.jboss.org <mailto:hawkular-dev@lists.jboss.org> https://lists.jboss.org/mailman/listinfo/hawkular-dev <https://lists.jboss.org/mailman/listinfo/hawkular-dev>

Thomas Heute

Tuesday, 17 March Tue, 17 Mar

3:48 a.m.

New subject: scope of the agent design

On 03/16/2015 09:30 PM, John Sanda wrote:

...

> On Mar 16, 2015, at 4:02 PM, Lukas Krejci <lkrejci(a)redhat.com > <mailto:lkrejci@redhat.com>> wrote: > > On Monday, March 16, 2015 15:07:10 John Sanda wrote: >> For monitoring purposes, do we really need to write an agent? Should >> we just >> leverage existing tools/libraries? I previously cited three common >> architectures for monitoring agents, >> >> 1) embedded, in-process >> 2) separate process but co-located on same host >> 3) remote monitoring from different host >> > > I think re-using existing stuff is desirable but 1 question needs to be > addressed first: > > How are we going to configure these tools? We could just say that > that is out > of the scope for Hawkular, but I don't find that too user friendly. > Things > like collection intervals or disabling/enabling should be > configurable from > inside Hawkular if the external tool has such capability. Configuration is an open question, but I think it is a question that applies equally whether we are talking integration with existing tools or our own, custom monitoring agent(s). I agree that stuff like collection intervals and enabling/disabling metrics should be configurable via Hawkular. On one end of the spectrum, the integration could be completely exposed. This places the largest burden on the user because the user has to understand the various tools used in/by Hawkular. For users familiar with those tools though, it provides benefits because it gives them greater flexibility in how to set up and configure things. At the other end of the spectrum, the integration is an implementation detail to the greatest extent possible. Users only need to know how to set up/configure Hawkular. This is more along the lines of the approach we took with integrating Cassandra into RHQ. I think that answer should be somewhere in the middle so that we can give users the flexibility when they want it, but at the same time not force it on users.

Using/Reusing those tools should be done (there is no reason to rewrite cAdvisor for instance which should work on any platform that supports Docker today), but we need a way to easily be able to configure them "from the server". For some "tools" we may also need to check if they are available and if not, use an alternative solution, install the missing piece, or inform the user that he's missing something. Thomas

...

> >> Let’s consider monitoring the JVM and applications running on it. >> Coda Hale >> Metrics is a widely used metrics library that is becoming ubiquitous. It >> provides reporters for exporting metrics that are collected. The core >> metrics library provides several reporters, console, JMX, CSV, to name a >> few. There are plenty of 3rd party reported as well, like the Graphite >> reporter. We could implement a hawkular reporter which then makes it >> very >> easy then for any application, library, etc. that uses Coda Hale >> Metrics to >> collect and report to hawkular. >> >> The in-process collector might not always be possible or desirable. For >> those situations the co-located agent is a better fit. jmxtrans >> could be an >> excellent option. It can query and collect metrics from external >> JVMs and >> then write them to other systems like Graphite, Ganglia, Open TSDB, and >> more. We could implement a hawkular metrics writer. >> >> Maybe we take a similar approach for platform metrics with collectd for >> example. We are already doing something similar by seeing how we can >> integrate more directly with cadvisor. Is it worth considering doing the >> same with some of the tools/libraries that are already in wide >> spread use? >>> On Mar 16, 2015, at 4:13 AM, Gary Brown <gbrown(a)redhat.com >>> <mailto:gbrown@redhat.com>> wrote: >>> >>> This embedded 'agent' would also be useful for collecting the activity >>> information for RTGov. I assume information will be routed depending on >>> type at the backend? >>> >>> Regards >>> Gary >>> >>> ----- Original Message ----- >>> >>>> So what I heard today was we don't want a standalone agent, but we do >>>> want >>>> something that can be embedded in Wildfly/EAP so it can monitor things >>>> running in Wildfly (not just monitor Wildfly itself, but applications >>>> running inside wildfly). >>>> >>>> That lends itself to supporting customizable modules that can be >>>> deployed >>>> in Wildfly/EAP as hawkular subsystems. >>>> >>>> Can someone give me a quick summary of what Wildfly-Monitor does? >>>> https://github.com/hawkular/wildfly-monitor >>>> _______________________________________________ >>>> hawkular-dev mailing list >>>> hawkular-dev(a)lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/hawkular-dev >>> >>> _______________________________________________ >>> hawkular-dev mailing list >>> hawkular-dev(a)lists.jboss.org <mailto:hawkular-dev@lists.jboss.org> >>> https://lists.jboss.org/mailman/listinfo/hawkular-dev >> >> _______________________________________________ >> hawkular-dev mailing list >> hawkular-dev(a)lists.jboss.org <mailto:hawkular-dev@lists.jboss.org> >> https://lists.jboss.org/mailman/listinfo/hawkular-dev > > > _______________________________________________ > hawkular-dev mailing list > hawkular-dev(a)lists.jboss.org <mailto:hawkular-dev@lists.jboss.org> > https://lists.jboss.org/mailman/listinfo/hawkular-dev _______________________________________________ hawkular-dev mailing list hawkular-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/hawkular-dev

Thomas Heute

Thursday, 12 March Thu, 12 Mar

7:34 a.m.

New subject: scope of the agent design

IMO for requirements: - Easy to install on a single host... - Download a package (from the server, so that it can be preconfigured, so that the server always deliver something compatible with itself ?), run it with no other dependency but Java (assuming this is Java based), have an interactive questionnaire if needed. - Easy to install on multiple hosts... - Configuration can be passed in CLI, RPM ? - Auto-upgradable (getting updates from the hawkular server) (Without data loss) - (Interactive) shell to troubleshoot/configure - Multi-OS support - "low" footprint If we have that, the rest of the requirements are a bit less of an issue IMO as we can upgrade/reconfigure... Then: - Auto-discovery capability (This is a very high added value) - Collect and report host data: - CPU usage/Memory usage/Disk usage metrics - OS / Kernel version / CPU&cores / Disks / Memory For the servers monitoring, I am still unclear if it would be better to: - collect by the agent (like JON 3) - collect by the monitored server, such an EAP modules (needs the server to come with the piece of agent + configuration *or* be provisioned by the server or the agent...) talking straight to the server. - collect by the monitored server through the agent. - a mix depending on server capabilities... We may not need to look into pluggable plugins right now. I'd rather have a solid monolithic solution that we can split apart if/when needed. Since the UI is made to be much more focused on the scenarios we want to support, it reduces the need for plug-able plugins IMO, of course this limits the project to what we can do out of the box and is not ideal, but shouldn't prevent someone to contribute extensions and the UI for it as part of the solution. (From a user point of view, he should be able to configure what he wants to collect or not unless this is dictated by the server) From a release perspective, we need to be able to release the agent much more often than the server. The dev team has much more experience with agent possible issues than me :) PS: we should also "discover" kubernetes/docker and what they contain. Thomas On 03/11/2015 06:57 PM, John Mazzitelli wrote:

...

I was going to wait to post this, but since it was brought up already, might as well start this thread now. Regardless of how the agent is to be implemented, we need to answer the question "WHAT will the agent DO?" It is an existential question - kind of like asking "What is the purpose to my life?" :) So, what do we plan on having the agent do? What is its scope? How much knowledge of the inventory does the agent need to know? How does it gain this knowledge of its inventory? Does the agent auto-discover resources? How? Can the agent be told about resources that it cannot auto-discover? Does the agent need to be pluggable so it can load and interface with plugins to handle monitoring of different resources? What are these plugins? Additionally, we can add (in Juca's words): > What is the scope of the agent? Will it gather only data about the > environment, like, memory consumption, span of worker threads, CPU > consumption, GC-time, ...? Or business metrics as well, like "requests > per second", "number of exceptions in the last X minutes", "average > time spent on database", and so on? > > Also, is the same agent also responsible for capturing non-JVM related > data, like, disk space utilization, network traffic, ...? _______________________________________________ hawkular-dev mailing list hawkular-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/hawkular-dev

4010

days inactive

4022

days old

hawkular-dev@lists.jboss.org

Manage subscription

44 comments

10 participants

tags (0)

participants (10)

Gary Brown
Heiko W.Rupp
John Doyle
John Mazzitelli
John Sanda
Juraci Paixão Kröhling
Lukas Krejci
Michael Burman
Thomas Heute
Thomas Segismont

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

agent implementation