Hi,

I just want to share some ideas about the eventuality of having a language to perform some arithmetic / aggregations on metrics, before it goes out of my head...

Here's an example of what I would personally love to see in Hawkular:

----------------------------------------------
Example:
sum(stats(rate((id(my_metric), tags(a=foo AND b=bar), regexp(something_.+)), 5m), 10m))
|
|=> "id", "tags" and "regexp" all return a set of raw metrics (0-1 for id, 0-n for tags and regexp)
|==> "(a,b,c)" takes n parameters, all sets of metrics, and flatten them in a single set
|===> rate(set_of_raw_metrics, rate_period) computes the rate for each of them and return a set of metrics (map n=>n)
|====> stats(set_of_raw_metrics, bucket_size) bucketize the raw metrics, returning the same number of bucketized metrics (map n=>n)
|=====> sum(set_of_stats_metrics) sums every buckets, returning a single bucketized metric (fold n=>1)


Other:
Functions like "sum" that take stats_metrics could have overloaded shortcut "sum(set_of_raw_metrics, bucket_size)" to perform the bucketing.
In other words above example could be rewritten:
sum(rate((id(my_metric), tags(a=foo AND b=bar), regexp(something_.+)), 5m), 10m)

Note: we can do aggregations like "sum" on raw data if necessary, it just means we have to interpolate.

Scalar operations:
sum((id(a_metric_in_milliseconds), 1000*id(a_metric_in_seconds)), 10m)
----------------------------------------------

Of course many other functions could come growing the library.

Now I suppose the big question, if we want to do such thing, is "are we going to invent our own language?" I don't know if there are standards for this, and if they are good.

The Prometheus query language cannot be transposed because a label is not a tag and it makes no sense for us to write something like "my_metric{tag=foo}", it's either "my_metric" or "tag=foo".

The same language could be used both for read-time on-the-fly aggregations and write-time / cron-based rollups.

WDYT?