[Hawkular-dev] Metrics and use of metrics_idx

Tue Jul 5 15:14:32 EDT 2016

I think the key question which Micke raised earlier in the thread is whether or not we even need the metrics_idx table at all. I think the points raised about searching via tags are all valid, and I think that querying the data table for unique ids should be sufficient for those use cases in which we need to fetch all ids. There is one I am not so sure about. Today we store all data points for a metric within a single partition (bad). We have had plans for a while now to do date partitioning, by day for example. If/when we implement date partitioning, the performance for fetching all metrics ids would probably be a bit worse; however, we can also introduce an index if needed. 

We can introduce a separate index for rollups. Rollups will be configurable so we will need to look somewhere in the database to determine which metrics support which  rollups. Originally I was thinking that this would be metrics_idx, but a separate index would be cleaner. 

> On Jul 4, 2016, at 12:23 PM, Michael Burman <miburman at redhat.com> wrote:
> 
> Hi,
> 
> The basic problem with supporting anything with metricId is the fact that we don't have data storage that is designed for the metricId matching. We have exact match or no match, nothing in between. We really should avoid supporting features we can't support. Or then we'll need inverted index for full text search for all the metricIds. 
> 
> Even with it, it's like transferring metadata in a filename. That has never been a good idea.
> 
>  -  Micke
> 
> ----- Original Message -----
> From: "Thomas Segismont" <tsegismo at redhat.com>
> To: hawkular-dev at lists.jboss.org
> Sent: Monday, July 4, 2016 5:29:23 PM
> Subject: Re: [Hawkular-dev] Metrics and use of metrics_idx
> 
> I don't agree with the philosophy. I err on the side of making it 
> possible to use a server side filter and explain in the documentation 
> why tagging your metrics is a better option.
> 
> Le 04/07/2016 à 15:54, Michael Burman a écrit :
>> Hi,
>> 
>> Sure, but we should discourage users to use metricIds for anything. Best approach would be to randomize them and force users to use tagging to find their metrics. Otherwise what we'll get is silly integrations where someone has "hostname.wildfly.metric.name" and then they want to search all the "metric.name" by doing idFilter="*.metric\.name" and complain "your metrics db is slow!". What they should do instead is always "/raw/query?tags=hostname=X,app=wildfly,metric=metric.name" and so on.
>> 
>>  - Micke
>> 
>> ----- Original Message -----
>> From: "Thomas Segismont" <tsegismo at redhat.com>
>> To: hawkular-dev at lists.jboss.org
>> Sent: Monday, July 4, 2016 4:19:30 PM
>> Subject: Re: [Hawkular-dev] Metrics and use of metrics_idx
>> 
>> 
>> 
>> Le 04/07/2016 à 13:11, Michael Burman a écrit :
>>> Hi,
>>> 
>>> The id filter does not work without tags filter because we can't do the id filtering on Cassandra side. It was done for performance reasons, as otherwise you have to fetch all the available metrics to the HWKMETRICS and then do the client side filtering.
>> 
>> I understand we can't filter on the database. But then wouldn't it be
>> better to filter on the server in order to save that to the client at
>> least? I mean, if you need to get metrics by name, as a user you will
>> have to load everything anyway. Couldn't we save that to the user?
>> 
>>> 
>>> Fulltext indexing could certainly be an interesting choice for many of our use-cases to investigate at least. I forgot the whole thing, we should look into it before making any greater changes. Mm..
>>> 
>>>  - Micke
>>> 
>>> ----- Original Message -----
>>> From: "Thomas Segismont" <tsegismo at redhat.com>
>>> To: hawkular-dev at lists.jboss.org
>>> Sent: Monday, July 4, 2016 1:27:41 PM
>>> Subject: Re: [Hawkular-dev] Metrics and use of metrics_idx
>>> 
>>> 
>>> 
>>> Le 04/07/2016 à 11:59, Michael Burman a écrit :
>>>> Hi,
>>>> 
>>>> If you consider a use case to sync with Grafana, then sending a million metricIds to a select list is probably a bad use-case. If it's on the other hand syncing all the metrics daily to some external system, it would not matter if it takes minutes to do the listing (as it probably will take a while to transfer everything in JSON). And both of those use-cases should use tags filtering to find something and not list all the metrics blindfolded.
>>> 
>>> There is no bad use case, only bad solutions :) Joke aside, it is true
>>> that the current solution for autocomplete in Grafana is far from
>>> perfect. It does not query all metrics of a tenant, but all metrics of a
>>> same type for a tenant, and then does the filtering on the client side.
>>> For some reason the Metrics API does not allow the name filter if no tag
>>> filter is set. Should I open a JIRA?
>>> 
>>>> 
>>>> Currently to list 50k metrics it takes 200ms on our current metrics_idx (to localhost) and 400ms with the other methology. For millions even the current method takes a long time already without filtering. A billion metricIds is no problem if you really want to find them. But why would you list a billion metricIds to choose from instead of actually finding them with something relevant (and in that case there's no slowdown to current) ?
>>>> 
>>>> Metric name autocomplete can't use non-filtered processing in our current case either as we can't check for partial keys from Cassandra (Cassandra does not support prefix querying of keys).
>>> 
>>> Could the new features fulltext index capabilities in C* 3.x help here?
>>> 
>>>> 
>>>>  -  Micke
>>>> 
>>>> ----- Original Message -----
>>>> From: "Thomas Segismont" <tsegismo at redhat.com>
>>>> To: hawkular-dev at lists.jboss.org
>>>> Sent: Monday, July 4, 2016 12:44:07 PM
>>>> Subject: Re: [Hawkular-dev] Metrics and use of metrics_idx
>>>> 
>>>> Hi,
>>>> 
>>>> First a note about number of metrics per tenant. A million metrics per
>>>> tenant would be easy to reach IMO. Let's take a simple example: a
>>>> machine running an Wildfly/EAP server. Combine OS (CPU, memory, network,
>>>> disk) with container (Undertow, Datasource, ActiveMQ Artemis, EJB, JPA)
>>>> metrics and you get close to a thousand metrics. Then multiply by a
>>>> thousand machines and you reach the million. And of course there are
>>>> users with more machines, and more complex setups (multiple Wildly/EAP
>>>> servers with hundreds of apps deployed).
>>>> Keep in mind that one of the promises of Metrics was the ability to
>>>> store huge number of metrics, instead of disabling metric collection.
>>>> 
>>>> That being said, do you have absolute numbers about the response time
>>>> when querying for all metrics of a tenant? Twice as slower may not be
>>>> that bad if we're going from 10ms down to 20ms :) Especially considering
>>>> the use cases for such queries: daily sync with external system for
>>>> example, or metric name autocomplete in Grafana/ManageIQ.
>>>> 
>>>> Regards,
>>>> 
>>>> Le 01/07/2016 à 22:56, Michael Burman a écrit :
>>>>> Hi,
>>>>> 
>>>>> Well, I've done some testing of the performance as well as a implementation that mimics the current behavior but does the reading from partition key. The performance is just fine with quite large amount of metricIds (at 50k the difference is ~2x, so in my dev machine I can still read about 3 times per second all the possible tenants and current behavior would be about 6 times per second), I started to see performance "issues" when the amount of unique metricIds reached millions (although I'm not sure if it's a performance issue in that case either - given that such query is probably not in some performance critical query). The amount of datapoints did not make a difference (I just tested with 8668 metricIds with a total of >86M stored datapoints, thus confirming my expectations also).
>>>>> 
>>>>> So unless there's a known use case of using millions of metricIds with stored datapoints (IIRC, once all the rows for a partition key are deleted, the partition key disappears.. John can confirm) I think we should move forward with this. Although we could make a enhancement to fetching metricIds by allowing to query only metrics_idx table (something like fetch only the registered metrics) - in that case someone who registers all the metrics could fetch them quickly also.
>>>>> 
>>>>>  -  Micke
>>>>> 
>>>>> ----- Original Message -----
>>>>> From: "Stefan Negrea" <snegrea at redhat.com>
>>>>> To: "Discussions around Hawkular development" <hawkular-dev at lists.jboss.org>
>>>>> Sent: Friday, July 1, 2016 9:14:43 PM
>>>>> Subject: Re: [Hawkular-dev] Metrics and use of metrics_idx
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> For Hawkular Metrics the speed of writes is always more important than the speed of reads (due to a variety of reasons). But that only works up to a certain extent, in the sense that we cannot totally neglect the read part. Let me see if I can narrow the impact of your proposal ...
>>>>> 
>>>>> You made a very good point that the performance of reads is not affected if we discard metrics_idx for endpoints that require the metrics id. We only need to consider the impact of querying for metrics and tenants since both use metrics_idx. Querying the list of tenants is not very important because it is an admin feature that we will soon secure via the newly proposed "admin framework". So only querying for metrics definitions will be truly affected by removing the metrics_idx completely. But only a portion of those requests are affected because tags queries use the tags index.
>>>>> 
>>>>> To conclude, metrics_idx is only important in cases where the user wants a full list of all metrics ever stored for a tenant id. If we can profile the performance impact on a larger set of metric definitions and we find the time difference without the metrics_idx is negligible then we should go forward with your proposal.
>>>>> 
>>>>> 
>>>>> John Sanda, do you foresee using metrics_idx in the context of metric aggregation and the job scheduling framework that you've been working on?
>>>>> 
>>>>> Micke, what do you think are the next steps to move forward with your proposal?
>>>>> 
>>>>> 
>>>>> Thank you,
>>>>> Stefan Negrea
>>>>> 
>>>>> 
>>>>> On Thu, Jun 30, 2016 at 8:46 AM, Michael Burman < miburman at redhat.com > wrote:
>>>>> 
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> This sparked my interest after the discussions in PR #523 (adding cache to avoid metrics_idx writes). Stefan commented that he still wants to write to this table to keep metrics available instantly, jsanda wants to write them asynchronously. Maybe we should instead just stop writing there?
>>>>> 
>>>>> Why? We do the same thing in tenants also at this time, we don't write there if someone writes a metric to a new tenant. We fetch the partition keys from metrics_idx table. Now, the same ideology could be applied to the metrics_idx writing, read the partition keys from data. There's a small performance penalty, but the main thing is that we don't really need that information often - in most use cases never.
>>>>> 
>>>>> If we want to search something with for example tags, we search it with tags - that metricId has been manually added to the metrics_idx table. No need to know if there's metrics which were not initialized. This should be the preferred way of doing things in any case - use tags instead of pushing metadata to the metricId.
>>>>> 
>>>>> If we need to find out if id exists, fetching that from the PK (PartitionKey) index is fast. The only place where we could slow down is if there's lots of tenants with lots of metricIds each and we want to fetch all the metricIds of a single tenant. In that case the fetching of definitions could slow down. How often do users fetch all the tenant metricIds without any filtering? And how performance critical is this sort of behavior? And what use case does list of ids serve (without any information associated to them) ?
>>>>> 
>>>>> If you need to fetch datapoints from a known metricId, there's no need for metrics_idx table writing or reading. So this index writing only applies to listing metrics.
>>>>> 
>>>>> - Micke
>>>>> _______________________________________________
>>>>> hawkular-dev mailing list
>>>>> hawkular-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/hawkular-dev
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> hawkular-dev mailing list
>>>>> hawkular-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/hawkular-dev
>>>>> _______________________________________________
>>>>> hawkular-dev mailing list
>>>>> hawkular-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/hawkular-dev
>>>>> 
>>>> 
>>> 
>> 
> 
> -- 
> Thomas Segismont
> JBoss ON Engineering Team
> _______________________________________________
> hawkular-dev mailing list
> hawkular-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hawkular-dev
> 
> _______________________________________________
> hawkular-dev mailing list
> hawkular-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hawkular-dev