auto testing client integrations with hawkular apm
by Neil Okamoto
I'm in the process of developing a clojure library that wraps the
opentracing-java client (the abstract interface), and provides concrete
implementations for Hawkular APM and Zipkin.
I'm considering ways to test the concrete Hawkular implementation in
Travis. One thought is to configure Travis to launch a hawkular dev server
(from the docker image) and then execute a series of test cases to collect
traces, and finally use the REST API to read back what hawkular received.
The problem being, I haven't found the documentation for the query
parameters you pass to "GET /traces/fragments/search".
https://hawkular.gitbooks.io/hawkular-apm-user-guide/
content/restapi/#GET__traces_fragments_search
Any pointers how to do this? How does the hawkular development team run
integration tests?
9 years
Little research help with collection times..
by Michael Burman
Hi,
I could use some help if someone has access to this sort of data. I'm
trying to find our collection interval fluctuations when using our
collectors. The actual collection interval is not interesting, but the
amount it's delayed per run and how that changes.
Currently we use the default's from the Gorilla paper (with the typo
that the paper had), which gives us ranges: [-63,64] (7 bits),
[-255,256] (9 bits), [-2047,2048] (12 bits) and rest. First one needs 2
bits as metadata, second needs 3 and third needs already 4 bits.
Those ranges were done by the Facebook guys from their data, but that
was using the second precision. We use milliseconds like for example
Prometheus does, and they use ranges for 14,17,20 meaningful bits
(IIRC). I don't think their values are however useful to us as from what
I've gathered from my small set of data, our best ranges are somewhere
around:
[-256,256] millisecond difference in delta of delta, D = (tn - t(n-1) -
(t(n-1) - t(n-2))
[-512,512] milliseconds
[-2048,2048] milliseconds
First two are probably wiser to merge to a single range and ignore the
first one. While we store one extra bit of info, we save one on the
metadata. Last one could technically be something like "missed one
datapoint", but then that would be collection interval and I'm not sure
if we have any target value for that in the long range.
But, I'm guessing too much. From the hawkular-openshift-agent I can see
that values are fetched in sequential mode, so delay in fetching one
metric will cause the next one to be delayed and so on. I can see from
an empty instance how much this time is, but when running in a congested
environment, what can we expect?
Saving a bit or two might not sound like much, but lets say with 30 000
pods in Openshift, 32 metrics per pod in 15 seconds interval we're
talking about 5 529 600 000 datapoints per day and it adds up (659MB per
day saved space with a single bit). So getting the ranges correct nets
us a lot of savings.
- Micke
9 years
Requirement of Aerogear Unified Push Server Instance
by Anuj Garg
Hello team,
We need one instance of Aerogear Unified Push Server for setting up push
messaging feature in android client of hawkular.
I can think of 2 ways this can be achieved.
1. Creating instance on openshift.
2. Asking aerogear team if they can provide account for one application.
I request suggestions about what in these or other things we can do for
this.
Regards
Anuj Garg
9 years
Rolling upgrades (on Kubernetes)
by Heiko W.Rupp
Hey,
on Kubernetes a common use case (that has been demonstrated by Thomas
and Juca recently) are rolling upgrades (blue/green, canary, ...) where a
new version of a service comes into play, gets scaled up and when this
is scaled up, the old version of the service gets scaled down and eventually
disappears.
It can also happen though that the new version is buggy and the previous
version gets scaled up again to take the full load until a fix is available.
We need to make sure to support those rolling upgrades and also the
rollback to the previous version of a service.
E.g. if a (No)Sql schema update happens in version N+1, a still
running instance of version N must not break by it, when N+1
is rolling out while N is still active. Similarly when N+1 has upgraded
the schema and the application gets rolled back to version N, it must
still be able to cope with the upgraded schema version.
Same applies to APIs, but this is something we have discussed and
mastered in the past.
We should perhaps investigate how we can automatically test
this scenario in our Travis testing.
9 years
[metrics] Internal stats?
by Heiko W.Rupp
Hey,
what internal stats of the Hawkular metrics do we currently collect?
I think Joel did some work for the C* part.
What I think we need is
- number of data points stored on a per tenant basis.
Resolution could be something like "last minute" or
"last 5 minutes" I.e. not realtime updates in the table.
- Total number of data points (i.e. sum over all tenants)
- Query stats. This is probably more complicated, as
querying on metrics that are still in some buffer is
cheaper than over 3 years of raw data.
To get started I'd go with # of queries per tenant and global
Those could perhaps be differentiated on
- raw endpoint
- stats endpoint
- What about alerting? More alert definitions certainly
need more cpu, so number of alert definitions per tenant
and total would be another pair.
- does number of fired alerts also make sense?
The idea behind those is to get some usage figures of the
shared resource "Hawkular metrics" and then to be able to
charge them back onto individual tenants e.g. inside of
OpenShift.
9 years
Cross-Tenant endpoints in Alerting on OS
by Jay Shaughnessy
On 2/23/2017 6:05 PM, Matt Wringe wrote:
> Is there any reason why this being sent in private emails and not to a mailing list?
Matt, Not really, sending to dev-list for anyone interested in the
discussion...
> ----- Original Message -----
>> There was an IRC discussion today about $SUBJECT. Here is a summary of
>> a conversation Matt and I had to drill down into whether there was a
>> cross-tenant security concern with the Alerting API in OS. In short,
>> the answer seems to be no. Alerting (1.4+) offers two endpoints for
>> fetching cross-tenant: /alerts/admin/alerts and /alerts/admin/events.
>> Note that the 'admin' is just in the path, and was chosen just to group
>> what we deemed were admin-level endpoints, the first two of which are
>> these cross-tenant fetches. The 'admin' does not mean anything else in
>> this context, it does not reflect a special user or tenant. The way
>> these endpoints work is that that they accept a Hawkular-Tenant HTTP
>> header that can be a comma-separated-list of tenantIds. As with any of
>> the alerting endpoints. Alerting does not perform any security in the
>> request handling. But, in OS the HAM deployments both have the OS
>> security filtering in place. That filtering does two things, for a
>> cluster-admin user it's basically a pass-thru, the CSL Hawkular-Tenant
>> header is passed on and the endpoints work. For all other users the
>> Hawkular-Tenant header is validated. Because each project name is a
>> tenant name, the value must match a project name. As such, the
>> validation fails if a CSL is supplied. This is decent behavior for now
>> as it prevents any undesired access. Note that as a corner-case, these
>> endpoints will work fine if the header just supplies a single tenant, in
>> which case they are basically the same as the typical single-tenant
>> fetch endpoints.
> What has happened is now Alerts is not considering the Hawkular-tenant header to contain just a string, but a comma separated lists of strings.
>
> eg "Hawkular-tenant: projectA,projectB"
Note, not in general, comma-separated-lists handled only for the two
cross-tenant endpoints mentioned above.
> The OpenShift filter still considers this to be a string, so it will check with OpenShift if the user has permission to access the project named with a string value of "projectA,projectB". Since a project cannot have a ',' within its name, this check will always fail and return an access denied error.
>
> If the user is a cluster level user they are given access to everything, even impossibly named projects. So a cluster level user will happen to be able to use the current setup just due to how this works.
>
> So there doesn't appear to be any security issue that we need to deal with immediately, but we do probably want to handle this properly in the future. It might not be too difficult to add support to the tenant to consider a csl.
>
>> I'm not totally familiar with the Metrics approach to cross-tenant
>> handling but going forward we (Metrics and Alerting) should probably
>> look for some consistency, if possible. Moreover, any solution should
>> reflect what best serves OS. The idea of a CSL for the header is fairly
>> simple and flexible. It may be something to consider, for the OS filter
>> it would mean validating that the bearer has access to each of the
>> individual tenants before forwarding the request.
> I don't recall any meetings about adding multitenancy to Metrics. From what I recall, there is no plans at all to introduce multitenancy at all for metrics.
>
> If I was aware of this discussion when this was brought up for alerts, I would have probably objected to the endpoint being called 'admin' since I don't think that reflects what the true purpose of this is suppose to be. Its not really an admin endpoint, but an endpoint for cross-tenancy. I could have access to projectA and projectB, but not be an 'admin'
>
> If we are making changes like this which affect security, I would really like to be notified so that I can make sure our security filters will function properly. Even if I am in the meeting when its being discussed it would be good to ping me on the PR with the actual implementation.
Of course. This stuff went in in mid November and at that time we (in
alerting) were really just getting settled with the initial integration
into metrics for OS. Going forward I think we have a better idea of
what is relevant to OS and can more easily flag items of import.
9 years