Hi,
First a note about number of metrics per tenant. A million metrics per
tenant would be easy to reach IMO. Let's take a simple example: a
machine running an Wildfly/EAP server. Combine OS (CPU, memory, network,
disk) with container (Undertow, Datasource, ActiveMQ Artemis, EJB, JPA)
metrics and you get close to a thousand metrics. Then multiply by a
thousand machines and you reach the million. And of course there are
users with more machines, and more complex setups (multiple Wildly/EAP
servers with hundreds of apps deployed).
Keep in mind that one of the promises of Metrics was the ability to
store huge number of metrics, instead of disabling metric collection.
That being said, do you have absolute numbers about the response time
when querying for all metrics of a tenant? Twice as slower may not be
that bad if we're going from 10ms down to 20ms :) Especially considering
the use cases for such queries: daily sync with external system for
example, or metric name autocomplete in Grafana/ManageIQ.
Regards,
Le 01/07/2016 à 22:56, Michael Burman a écrit :
Hi,
Well, I've done some testing of the performance as well as a implementation that
mimics the current behavior but does the reading from partition key. The performance is
just fine with quite large amount of metricIds (at 50k the difference is ~2x, so in my dev
machine I can still read about 3 times per second all the possible tenants and current
behavior would be about 6 times per second), I started to see performance
"issues" when the amount of unique metricIds reached millions (although I'm
not sure if it's a performance issue in that case either - given that such query is
probably not in some performance critical query). The amount of datapoints did not make a
difference (I just tested with 8668 metricIds with a total of >86M stored datapoints,
thus confirming my expectations also).
So unless there's a known use case of using millions of metricIds with stored
datapoints (IIRC, once all the rows for a partition key are deleted, the partition key
disappears.. John can confirm) I think we should move forward with this. Although we could
make a enhancement to fetching metricIds by allowing to query only metrics_idx table
(something like fetch only the registered metrics) - in that case someone who registers
all the metrics could fetch them quickly also.
- Micke
----- Original Message -----
From: "Stefan Negrea" <snegrea(a)redhat.com>
To: "Discussions around Hawkular development"
<hawkular-dev(a)lists.jboss.org>
Sent: Friday, July 1, 2016 9:14:43 PM
Subject: Re: [Hawkular-dev] Metrics and use of metrics_idx
Hello,
For Hawkular Metrics the speed of writes is always more important than the speed of reads
(due to a variety of reasons). But that only works up to a certain extent, in the sense
that we cannot totally neglect the read part. Let me see if I can narrow the impact of
your proposal ...
You made a very good point that the performance of reads is not affected if we discard
metrics_idx for endpoints that require the metrics id. We only need to consider the impact
of querying for metrics and tenants since both use metrics_idx. Querying the list of
tenants is not very important because it is an admin feature that we will soon secure via
the newly proposed "admin framework". So only querying for metrics definitions
will be truly affected by removing the metrics_idx completely. But only a portion of those
requests are affected because tags queries use the tags index.
To conclude, metrics_idx is only important in cases where the user wants a full list of
all metrics ever stored for a tenant id. If we can profile the performance impact on a
larger set of metric definitions and we find the time difference without the metrics_idx
is negligible then we should go forward with your proposal.
John Sanda, do you foresee using metrics_idx in the context of metric aggregation and the
job scheduling framework that you've been working on?
Micke, what do you think are the next steps to move forward with your proposal?
Thank you,
Stefan Negrea
On Thu, Jun 30, 2016 at 8:46 AM, Michael Burman < miburman(a)redhat.com > wrote:
Hi,
This sparked my interest after the discussions in PR #523 (adding cache to avoid
metrics_idx writes). Stefan commented that he still wants to write to this table to keep
metrics available instantly, jsanda wants to write them asynchronously. Maybe we should
instead just stop writing there?
Why? We do the same thing in tenants also at this time, we don't write there if
someone writes a metric to a new tenant. We fetch the partition keys from metrics_idx
table. Now, the same ideology could be applied to the metrics_idx writing, read the
partition keys from data. There's a small performance penalty, but the main thing is
that we don't really need that information often - in most use cases never.
If we want to search something with for example tags, we search it with tags - that
metricId has been manually added to the metrics_idx table. No need to know if there's
metrics which were not initialized. This should be the preferred way of doing things in
any case - use tags instead of pushing metadata to the metricId.
If we need to find out if id exists, fetching that from the PK (PartitionKey) index is
fast. The only place where we could slow down is if there's lots of tenants with lots
of metricIds each and we want to fetch all the metricIds of a single tenant. In that case
the fetching of definitions could slow down. How often do users fetch all the tenant
metricIds without any filtering? And how performance critical is this sort of behavior?
And what use case does list of ids serve (without any information associated to them) ?
If you need to fetch datapoints from a known metricId, there's no need for
metrics_idx table writing or reading. So this index writing only applies to listing
metrics.
- Micke
_______________________________________________
hawkular-dev mailing list
hawkular-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hawkular-dev
_______________________________________________
hawkular-dev mailing list
hawkular-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hawkular-dev
_______________________________________________
hawkular-dev mailing list
hawkular-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hawkular-dev
--
Thomas Segismont
JBoss ON Engineering Team