Hi Jay
Thanks - yes I think the complete set of span durations will only be required while
performance analysis is relevant.
Tagging specific points in the business process may be more about business KPIs, that may
be required for much long term analysis - possibly related to duration but also
cardinality. May need to understand specific use cases better to determine the best way to
deal with this.
Anyone got some good use cases, i.e. examples of useful business KPIs?
Regards
Gary
----- Original Message -----
For those of us not overly familiar with APM let me see if I can summarize
what I've read so far and see if I'm close...
Trace: Made up of a set of spans, a trace represents a high level unit of
work, probably a complete piece of business logic, or service endpoint. And
whose duration is likely more reflective of user-experience, like a
response-time. As such it would seem like if a trace duration is within
expectation then there isn't too much to worry about and the spans making up
that trace are likely not overly relevant, unless you want to look at
response-time tuning and improvement. But, a "kpi" ( Key Performance
Indicator , right?) tag on a trace could in general be useful, as you may
want to be able to establish a baseline and perform alerting on durations
that reflect user experience.
If the above is fairly on target it would seem like storing trace duration
metrics would be a good idea but storing span duration metrics would be a
waste, until such time that the trace is not well behaved. It said in one of
the replies that all spans were stored in ES, I assume this is just for a
brief time to support some ad-hoc analysis in the short term.
As for long term storage I'd agree that it should be user-directed, tags on
the APM side to decide what data to store seems like a good idea, especially
at the trace level. And if the trace behaves badly (perhaps based on an
alert) then spans for that trace perhaps should get tagged (at least until
the issue is resolved).
I don't really know why the solution requires ES but likely it was just
available and easy to use. If we can start to incorporate Hmetrics that
seems good, a complete migration to Hmetrics seems better ;-) As Gary
mentioned, certainly there is a benefit of having Halerting built in for use
with metrics being stored. From an alerting perspective we're also working
with Gary/APM to perform alerting on APM events, which could also
potentially trigger some automated tagging.
On 1/9/2017 5:10 AM, Gary Brown wrote:
Apologies I didn't set the context better :)
There are a couple of reasons to investigate integration with H-Metrics.
1) Long term storage of metrics, as mentioned - primarily because we are
currently collecting all information for use in short term analysis. If some
of that information is of long term interest to an end user, then having the
ability to identify the relevant information is necessary. Two options are
(a) mark the data at the point it is generated (i.e. tag it), or (b) define
server side configuration to extract the relevant data. Option (a) just
seems the most straightforward.
2) Integration with H-Alerts for free. So although we can (and have)
integrated directly, it may be better from an integrated architecture
perspective to feed the metrics through H-Metrics.
Regards
Gary
----- Original Message -----
On Mon, Jan 9, 2017 at 10:05 AM, Gary Brown < gbrown(a)redhat.com > wrote:
----- Original Message -----
Hello,
let me start with terminology explanation:
Span is one unit of work. It can be a method/REST endpoint invocation or
just
any unit of work.
List of spans form a trace which is currently tree like structure(every
span
has its parent, except root span).
Span captures its start and end, therefore it is possible to calculate a
duration.
By default all spans reported to the server are persisted in elasticsearch.
I think there are two different things in this email:
1. OpenTracing standard "kpi" tag. Tags are key-value pairs, so I guess
value
would be a metric name (e.g. "user-registration": duration of user
registration which consist of multiple service calls). I think that any
duration is considered as KPI. Typically when users want to show durations
for a particular service they can use filtering by tags(custom/standard).
Is
there really need for KPI tag?
This is about long term metric storage and analysis, rather than short term
user based filtering. So if we want to be able to identify the key metrics
that are of long term interest, it would need to either be done through some
configuration on the server, or as suggested in this proposal by allowing
the application to tag the relevant spans.
Long term storage wasn't mentioned in the original email. I ask for a better
explanation "KPI" tag. If there is a really need for it.
2. integration with h-metrics. What are the benefits for users? Integration
with grafana? Cannot be grafana directly connected to elastic APM is using?
All durations (span, trace) are stored in our backend and published to JMS
so an integration with other systems should be possible.
Long term storage of specific metrics. APM is collecting a large amount of
information, so it is likely that it will only be retained for a limited
time frame.
I think that tracing tool is mostly used for a real time analysis and
debugging or tool for solving performance problems and testing
"canary"/"blue-green" deployments, but long term storage can useful
or have
an access to aggregated results from older periods of time. Statistics
(average/median) of service response times. We should probably look at
zipkin ML and look if somebody was looking for this feature.
It would actually make sense to provide some kind of reports as APM does
higher level aggregations. Question is if there is an interest of this
feature. I'm not very familiar with grafana but from what I read it can
connect to elasticsearch so maybe it can be used with APM to show dasboards
and reports.
Regards
Gary
Regards,
On Sat, Jan 7, 2017 at 5:09 PM, John Sanda < jsanda(a)redhat.com > wrote:
I want to make sure I understand the difference between trace and span. A
trace captures the call between two services whether those are REST
endpoints, EJBs, etc. Let’s say we want a trace of A calling B. Within that
A calls C which in turn calls D, etc. and then finally B is called. Would a
span measure the duration of the call between C and D?
Are all spans captured/recorded by default?
Would this be optional functionality? I am wondering because it obviously
requires having Hawkular Metrics running which also means Cassandra.
Under what tenant would you store these metrics?
On Jan 7, 2017, at 7:36 AM, Gary Brown < gbrown(a)redhat.com > wrote:
Hi
Wanted to discuss a proposal for recording some metric data captured from
Hawkular APM in Hawkular Metrics.
For those not familiar with Hawkular APM, it captures the end to end
trace
instance (think of it as a distributed call stack), for each invocation
of
an application. This trace can include information about the
communications between services, but can also include details about
internal components used within the services (e.g. EJBs, database calls,
etc).
First point is that if we were to record duration metrics for each 'span'
captured (i.e. scope within which a task is performed), for each
invocation of an application, then it would result in a large amount of
data that may be of no interest. So we need to find a way for end
users/developers to express which key points within an application they
do
want recorded as metrics.
The proposal is to allow the application/services to define a
tag/property
on the spans of interest, e.g. 'kpi', that would indicate to the server
that the duration value for the span should be stored in H-Metrics. The
value for the tag should define the name/description of the KPI.
If considered a suitable solution, then we can also propose it as a
standard tag in the OpenTracing standard.
There are a couple of metrics that we could record by default, first
being
the trace instance completion time, and the second possibly being the
individual service response times (although this could potentially also
be
governed by the 'kpi' tag).
Thoughts?
Regards
Gary
_______________________________________________
hawkular-dev mailing list hawkular-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hawkular-dev
_______________________________________________
hawkular-dev mailing list hawkular-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hawkular-dev --
Pavol Loffay
Cell: +421948286055
Red Hat
Purkyňova 111 TPB-B
612 45 Brno
_______________________________________________
hawkular-dev mailing list hawkular-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hawkular-dev
_______________________________________________
hawkular-dev mailing list hawkular-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hawkular-dev --
Pavol Loffay
Cell: +421948286055
Red Hat
Purkyňova 111 TPB-B
612 45 Brno
_______________________________________________
hawkular-dev mailing list hawkular-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hawkular-dev
_______________________________________________
hawkular-dev mailing list hawkular-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hawkular-dev
_______________________________________________
hawkular-dev mailing list
hawkular-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hawkular-dev