Inventory and 'transient' servers
by Heiko W.Rupp
Hey,
when WildFly connects to Inventory for the 1st time, we sync
the EAP information into inventory, which also includes the information
about which metrics are available.
Now when WildFly is deployed into Kubernetes or OpenShift, we
will see that WildFly is started, syncs and then dies at some point
in time, where k8s will not re-use the existing one, but start
a new one, which will have a different FeedId.
This will leave a WildFly in Inventory, that is later detected as
down in Metrics/Alerting, but the one in Inventory will stay
forever. Consequences are
- Inventory will get "full" with no-longer needed information
- Clients will retrieve data about non-"usable" servers
We need to figure out how to deal with this situation like e.g.:
- have inventory listen on k8s events and remove the server
when k8s removes it (not when it is going down; stopped pods
can stay around for some time)
- Have a different way of creating the feed id / server id so that
the state is "re-used". Something like feedId/server name could
be the name of the deployment config + version + k8s tenant.
Thoughts?
7 years, 10 months
Hawkular drill down on calls
by Kavin Kankeshwar
Hi,
Iam using the Hawkular JVM agent to send metrics to Hawkular controller,
But the one thing which i cannot do is drill down into where the time was
spent. i.e. drill down to the class which was taking time. (something like
AppDynamics Agent)
So wanted to check if i am missing something or the feature is not yet
possible in hawkular.
Regards,
--
Kavin.Kankeshwar
7 years, 10 months
Ability to group by datapoint tag in Grafana
by Gareth Healy
The OpenShift Agent when monitoring a prometheus endpoint creates a single
metric with tagged datapoints, i.e.:
https://github.com/coreos/etcd/blob/master/Documentation/v2/metrics.md#
http-requests
I1228 21:02:01.820530 1 metrics_storage.go:155] TRACE: Stored [3]
[counter] datapoints for metric named
[pod/fa32a887-cd08-11e6-ab2e-525400c583ad/custom/etcd_http_received_total]:
[
{2016-12-28 21:02:01.638767339 +0000 UTC 622 map[method:DELETE]}
{2016-12-28 21:02:01.638767339 +0000 UTC 414756 map[method:GET]}
{2016-12-28 21:02:01.638767339 +0000 UTC 33647 map[method:PUT]}
]
But when trying to view this via the grafana datasource, only 1 metric and
the aggregated counts are shown. What i'd like to do is something like the
below:
{
"start": 1482999755690,
"end": 1483000020093,
"order": "ASC",
"tags": "pod_namespace:etcd-testing",
"groupDatapointsByTagKey": "method"
}
Search via tags or name (as-is) and group the datapoints by a tag key,
which would give you 3 lines, instead of 1.
Does that sound possible?
Cheers.
7 years, 10 months
Fwd: [hawkular-dev] Grafana querying usability
by Joel Takvorian
Hi everybody,
I would like to get some opinions on the hawkular-grafana-datasource
querying usability, especially if you had the opportunity to create
dashboards recently and had to juggle with querying by metric name and by
tag.
Currently the panel displays different elements depending on you're
querying by metric name (see picture:
https://raw.githubusercontent.com/hawkular/hawkular-grafana-datasource/ma...
)
Querying by name is quite straightforward, but it can be cumbersome when
there's a lot of available metrics and you have to scroll among suggestions
to find the one you want.
Or by tag (see picture:
https://raw.githubusercontent.com/hawkular/hawkular-grafana-datasource/ma...
)
The "query by tag" interface is not very intuitive IMO (you define a list
of key-value pairs), moreover to this date there is no auto-completion on
tag name.
There's been some features in metrics recently that, I think, can enable a
better UI on Grafana side. First, there is Micke's feature "Allow fetching
of available tagNames" [1] that enables suggestions (auto-completion) on
tag name. And most importantly there's the new "tag query language" that
could (should) have its place in Grafana.
So I would have several suggestions for improving the UI and queries.
*1*: First of all I think we can remove the "Search by name" / "Search by
tag" selector, and allow searching by name AND by tag at the same time:
providing tags would refine the available metrics in the metrics text field
suggestions (auto-completion). If this text field is left blank, all
available metrics are displayed.
Then, there's several scenarios to take advantage of the new hawkular
metrics features:
*2a*: keep current key/value pairs system, but improve with adding
suggestion on tag names.
*2b*: replace key-value pairs with the new tags query, via a simple text
field:
We may or may not include syntax validation here. We must provide some
documentation on that.
*2c*: replace key-value pairs with the new tags query, with a dedicated
"builder" UI:
Each of these boxes in tags query would have a contextual auto-completion
feature:
- suggestions on 1: list of tag keys
- suggestions on 2: list of operators (=, !=, IN, NOT IN)
- suggestions on 3: list of tag values for the given key (with some slights
differences on brackets if it's the first element or not; closing bracket
as a choice if it's not the first element)
- suggestions on 4: operators AND / OR
The 2b option is obviously simpler, very fast to implement. It has the
downside of loosing all auto-completion capabilities, even compared to the
current version.
2c looks nicer and more intuitive in its usage, people won't have to read
the doc to use it. However there's several downsides:
- Need to implement the logic => need for time for development, adds
complexity to our grafana plugin.
- Introduces a dependency to the language on server side. When the
language evolves, we'll have to maintain this as well.
Ideas & thoughts? I think it's preferable to proceed in several steps. In a
first time I could implement *1* and *2a*, and later (and maybe giving some
time to the language to be "stabilized") going with *2c*.
[1] https://issues.jboss.org/browse/HWKMETRICS-532
Thanks
Joel
7 years, 10 months
what to name metrics in HOSA?
by John Mazzitelli
HOSA has its own Prometheus endpoint that emits its own metrics. Right now, we just have one custom agent metric but we plan to add more. Before I get too far, I'm trying to figure out a good prefix to use for metric names.
I was looking over the Prometheus naming conventions for metric names here: https://prometheus.io/docs/practices/naming/
In addition, I found additional naming conventions listed in the Prom Go client comments here: https://github.com/prometheus/client_golang/blob/master/prometheus/metric...
Right now the one custom agent metric is called:
hawkular_openshift_agent_metric_data_points_collected_total
I think it's too long :) And the "subsystem" is two words (openshift_agent) when the Go comment says (and all other prometheus metrics I've seen) use one word with no underscore.
I think starting it with "hawkular_" is good because looking at the metric you immediately know it is from a Hawkular component. But I don't know what the subsystem should be.
I was thinking:
hawkular_openshiftagent_<metric-name>
That is a one-word subsystem "openshiftagent" but its still too long IMO. Maybe:
hawkular_agent_<metric-name>
But then, if our other agents emit their own metrics in the future, this will be confusing (think vert.x agent, fuse agent, whatever).
How about use the HOSA abbreviation?
hawkular_hosa_<metric-name>
That seems smaller and is more specific to the OpenShift Agent. But will "HOSA" make sense to people?
Thoughts? Suggestions?
7 years, 10 months
Hawkular in OpenShift
by Thomas Heute
When I use:
oc cluster up --metrics=true
Hawkular metrics fails until I do:
oc adm policy add-role-to-user view
system:serviceaccount:openshift-infra:hawkular -n openshift-infra
Should that last command be part of the --metrics=true magic ?
Thomas
7 years, 10 months
hosa and its own role
by John Mazzitelli
Playing around with OpenShift roles, I found the agent doesn't need the vast majority of permissions the cluster-reader role provides.
So, rather than assign the agent to the cluster-reader role, I instead create a single role for the agent to be given where that role provides only the permissions the agent actually needs to do its job and no others:
https://github.com/hawkular/hawkular-openshift-agent/pull/87/files#diff-e...
So far, this looks to be working. Heiko, feel free to try this out. Its part of that use-secrets PR/branch.
7 years, 10 months
some new hawkular openshift agent stuff
by John Mazzitelli
FYI: some new things went into HOSA (that's the Hawkular OpenShift Agent for the uninitiated).
1. The agent now emits its own metrics and can monitor itself. Right now it just emits some basic "go" metrics like memory usage, CPU usage, etc along with one agent-specific one - a counter that counts the number of data points it has collected in its lifetime. We'll add more metrics as we figure out the things people want to see, but we have the infrastructure in place now.
2. The agent is deployed as a daemonset. This means as new nodes are brought online, an agent will go along with it (or so I'm told :)
3. The agent has changed the way it discovers what to monitor - it no longer looks at annotations on pods to determine where the configmaps are for those pods. Instead, it looks up volume declarations to see if there is an agent configmap defined. This was done to be ready for the future when new security constraints will be introduced in OpenShift which would have broken our annotation approach. This approach using volumes should not hit that issue.
NOTE: If you are building the latest agent from master, we added some dependencies so you have to update your dependencies via Glide by using the "make update-deps" target prior to building from source.
7 years, 10 months