Hi,
Right, the PR HWKMETRICS-180 currently implements the following query language (as
inspired by OpenTSDB/Prometheus):
tags=tagName:tagValue,tagName2,tagValue2 etc like previously in the REST-API query,
splitted at the ",". All of these given names and values are considered as
"AND", so it is read like:
tagName:tagValue AND tagName2:tagValue2 (both must match)
There are some identifiers which allow modifications to the query, namely at this point:
"*" as the value and "|" in the value:
tagName:*,tagName2:tagValue2|tagValue3
That one would be read as:
tagName:ANY AND (tagName2:tagValue2 OR tagName2:tagValue3)
ANY means that the tag must exists for the metric definition, but the value can be
anything. The query against metrics_tags_idx is done with values for these OR operations
and exact matches, and against only the clustering column tname in case of "*"
query. Now, one feature that both of those systems mentioned earlier do also is
regexp-querying, the Prometheus syntax for that would be:
tagName:~regexpValue
The implementation for that one is relatively simple, we discard the tvalue fetching and
just get those values as with tagName and then apply the given regexp filtering. However,
what sort of features do we want from this query language and how should this query
language look like? At the moment it was designed to be useable from the REST-API (and to
mimick those two systems as some might be familiar with them already).
- Micke
PS. The IRC nickname is "Yak", but someone registered on Freenode so I can't
use it there unlike in every other major IRC-network. So I'm using a name of one of my
pre-historic bots there.
----- Original Message -----
From: "John Sanda" <jsanda(a)redhat.com>
To: "Discussions around Hawkular development"
<hawkular-dev(a)lists.jboss.org>
Sent: Wednesday, July 22, 2015 5:38:28 PM
Subject: Re: [Hawkular-dev] [Metrics] Unified query support brainstorming
On Jul 22, 2015, at 10:13 AM, Thomas Segismont
<tsegismo(a)redhat.com> wrote:
Hi everyone,
Right now, when you query data in Metrics, you can, given a time range:
- get raw data
- get raw data having some tags
- get a "bucketed" view of raw data with on-the fly aggregates
- get periods for which a condition on the value holds true
But you can't mix these capabilities.
For example, you can't ask for periods where the avg of a gauge,
computed over 1 min buckets, is greater than a threshold.
It's not possible either to express AND/OR in tags or condition queries.
gayak has already added support for OR and is working on support for AND in
HWKMETRICS-180. He is also looking into a mini query language for tag filtering as part of
this work.
Searching by tags is useful, but users have to tag data manually (unless
the collector adds some predefined tags). It would be useful to be able
to operate on data coming from different metrics having similar names:
like tell me the average cpu usage over all my web hosts, provided the
metrics are _webhost*.cpu.usage_
We added support for tagging data very early on in the project, but I am becoming more of
the mindset that we do not want this. Tagging metrics/time series and subsequently
searching for metric data data based on those tags are common use cases. It might be worth
thinking about events in place of tagging individual data points.
When aggregating data, users should able to provide the name of the
function, be it a builtin or user defined function.
I think user defined functions sound nice but not that critical. I think that we can cover
most use cases with built-in functions.
Last but not least, when rollups will be implemented, users should
probably not have to care about where the data is stored,if they ask for
the 1-month average of a metric over the past year, or the 30-seconds
average over the past five minutes (think about the UI zooming graphs).
Agreed. The fact that we might be generating and storing multiple time series for a metric
is in large part an implementation detail to make querying more efficient and to reduce
storage footprint on disk. In RHQ we never exposed the generated time series. The date
range in request determined which time series we queried.
Whether we generate aggregated metrics in an ad hoc fashion at query time or they are
pre-computed, we are still dealing with the same data types/structures. There should not
be two separate APIs.
The examples are not completely fabricated, it's feedback I got while
presenting Hawkular. I understood Stefan heard similar comments during
Summit.
I'm starting this thread to gather inputs so that we can build a
powerful, unified query API. Feel free to provide more use cases and/or
ideas for implementation.
Thanks!
Thomas
_______________________________________________
hawkular-dev mailing list
hawkular-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hawkular-dev
_______________________________________________
hawkular-dev mailing list
hawkular-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hawkular-dev