On Oct 31, 2016, at 5:59 PM, Matt Wringe <mwringe@redhat.com> wrote:

----- Original Message -----
From: "John Sanda" <jsanda@redhat.com>
To: "Discussions around Hawkular development" <hawkular-dev@lists.jboss.org>
Sent: Monday, 31 October, 2016 10:22:19 AM
Subject: Re: [Hawkular-dev] Labeling needs ?

On Oct 31, 2016, at 3:39 AM, Heiko W.Rupp < hrupp@redhat.com > wrote:

Hey,

we have labels in Hawkular-metrics right now, but apparently there are
use cases that are not yet covered (and I know Matt has more)

* Listing keys of tags. Currently one has to know the available keys to
be able to list the available values

hmm, interesting, we can list all the values values of a key, but not get the list of the key itself?

That is correct. We simply have not had a need for it thus far. Adding endpoints to query tags is straightforward. In fact, it would be a great ticket to work on for anyone looking to get involved in the project.

   * tag values are currently a comma-separated string and not an array,
   which may have implications on allowed characters and escaping of
   separators

The problem with OpenShift is that we have hierarchical data that we would like to store as tags so that we can perform queries on them.

Most of our data that we store as tags are simple key:value pairs, where the value is a string. This works nicely with how tags work.

But the more important tag we want to keep track of is labels, and in this case the value is an array of key:value pairs, which doesn't work too nicely. We store this as a comma separated string and its possible to use regex in this case, but it starts to be really messy to write and is prone to errors. Its really not a great solution and looks really bad when people have to write these complex queries which should be really easy.

   * Post-tagging of data points (tagging could be provoked by another
   system that e.g. parses log files and sees an anomaly) The idea behind
   it is that one should be able to create tags at certain points in time
   for a single metric or a list of metrics to uniquely identify that point
   in time for queries (relative performance of two versions of a
   deployment).

This feels like an 'event' what could be marked. I don't know if it needs to be applied to a datapoint directly, or stored in a separate list, or if it even needs to be stored in Hawkular Metrics.

The main thing I think is to be able to query the list of events which have occurred over a specific time frame.

OpenShift has a lot of events it can gather, it could be really cool if we could display these events as marks in the graphs. I don't know if we can use Hawkular Metric string metrics to store these, or something in Alerts to do it. Heapster has a component to gather these events for us (https://github.com/kubernetes/heapster/tree/master/events).

Data points are immutable by design. Once a row is written it is never
updated. This allows for optimizations with compaction and deletions. We
need to take that into consideration with post-tagging. Using a different
table or tables might be a better option.

Do individual data points have tags applied to them? Or only the overall metric?

We may want to keep track of historic changes to tags. I know in OpenShift we store values in tags which can change at any point in time but I don't believe we can query what the values was at time X.