March 2017 - hawkular-dev - Jboss List Archives

agent running in EAP6.4 cannot talk to Hawkular Server over HTTPS

by John Mazzitelli

Josejulio, cc hawkular-dev: <TL;DR> Due to incomplete API support in a EAP 6.4 library, we cannot support the agent installed as a subsystem extension inside EAP6 if the agent is to talk to the Hawkular Server over HTTPS. </TL;DR> I don't know how to workaround this one - maybe someone has a bright idea. But right now, it looks like we can't support an EAP6-based agent talking to Hawkular-Metrics over HTTPS *unless* the agent is running as a javaagent (a new feature not even in master yet, but I tried it and it works). This is a EAP 6.4 method that OKHttp is calling when making an HTTP request requiring SSL - I'll give you the summary - its a one-line auto-generated stub method that "return null;" :) https://github.com/wildfly/wildfly-core/blame/de6b17d4d342e98871c0e95f7e6... I stepped into this code via a debugger and the line number and behavior (returning null always) matches up with that code. Needless to say, this causes a NullPointerException later on in the OKHttp library and thus cannot talk to the Hawkular Server over HTTPS. Here's the stack trace that got me there: Daemon Thread [Hawkular WildFly Agent Startup Thread] (Suspended) org.jboss.as.domain.management.security.WrapperSSLContext$WrapperSpi$WrapperSSLSocketFactory.createSocket(java.net.Socket, java.lang.String, int, boolean) line: 126 okhttp3.internal.connection.RealConnection.connectTls(int, int, okhttp3.internal.connection.ConnectionSpecSelector) line: 230 okhttp3.internal.connection.RealConnection.establishProtocol(int, int, okhttp3.internal.connection.ConnectionSpecSelector) line: 198 okhttp3.internal.connection.RealConnection.buildConnection(int, int, int, okhttp3.internal.connection.ConnectionSpecSelector) line: 174 okhttp3.internal.connection.RealConnection.connect(int, int, int, java.util.List<okhttp3.ConnectionSpec>, boolean) line: 114 okhttp3.internal.connection.StreamAllocation.findConnection(int, int, int, boolean) line: 193 okhttp3.internal.connection.StreamAllocation.findHealthyConnection(int, int, int, boolean, boolean) line: 129 okhttp3.internal.connection.StreamAllocation.newStream(okhttp3.OkHttpClient, boolean) line: 98 okhttp3.internal.connection.ConnectInterceptor.intercept(okhttp3.Interceptor$Chain) line: 42 okhttp3.internal.http.RealInterceptorChain.proceed(okhttp3.Request, okhttp3.internal.connection.StreamAllocation, okhttp3.internal.http.HttpStream, okhttp3.Connection) line: 92 okhttp3.internal.http.RealInterceptorChain.proceed(okhttp3.Request) line: 67 okhttp3.internal.cache.CacheInterceptor.intercept(okhttp3.Interceptor$Chain) line: 109 okhttp3.internal.http.RealInterceptorChain.proceed(okhttp3.Request, okhttp3.internal.connection.StreamAllocation, okhttp3.internal.http.HttpStream, okhttp3.Connection) line: 92 okhttp3.internal.http.RealInterceptorChain.proceed(okhttp3.Request) line: 67 okhttp3.internal.http.BridgeInterceptor.intercept(okhttp3.Interceptor$Chain) line: 93 okhttp3.internal.http.RealInterceptorChain.proceed(okhttp3.Request, okhttp3.internal.connection.StreamAllocation, okhttp3.internal.http.HttpStream, okhttp3.Connection) line: 92 okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(okhttp3.Interceptor$Chain) line: 124 okhttp3.internal.http.RealInterceptorChain.proceed(okhttp3.Request, okhttp3.internal.connection.StreamAllocation, okhttp3.internal.http.HttpStream, okhttp3.Connection) line: 92 okhttp3.internal.http.RealInterceptorChain.proceed(okhttp3.Request) line: 67 okhttp3.RealCall.getResponseWithInterceptorChain() line: 170 okhttp3.RealCall.execute() line: 60 org.hawkular.agent.monitor.service.MonitorService(org.hawkular.agent.monitor.service.AgentCoreEngine).waitForHawkularServer() line: 648 org.hawkular.agent.monitor.service.MonitorService(org.hawkular.agent.monitor.service.AgentCoreEngine).startHawkularAgent(org.hawkular.agent.monitor.config.AgentCoreEngineConfiguration) line: 279 org.hawkular.agent.monitor.service.MonitorService(org.hawkular.agent.monitor.service.AgentCoreEngine).startHawkularAgent() line: 164 org.hawkular.agent.monitor.service.MonitorService$1CustomPropertyChangeListener$1.run() line: 395 java.lang.Thread.run() line: 745

9 years, 4 months

2
3
0 / 0

auto testing client integrations with hawkular apm

by Neil Okamoto

I'm in the process of developing a clojure library that wraps the opentracing-java client (the abstract interface), and provides concrete implementations for Hawkular APM and Zipkin. I'm considering ways to test the concrete Hawkular implementation in Travis. One thought is to configure Travis to launch a hawkular dev server (from the docker image) and then execute a series of test cases to collect traces, and finally use the REST API to read back what hawkular received. The problem being, I haven't found the documentation for the query parameters you pass to "GET /traces/fragments/search". https://hawkular.gitbooks.io/hawkular-apm-user-guide/ content/restapi/#GET__traces_fragments_search Any pointers how to do this? How does the hawkular development team run integration tests?

9 years, 4 months

2
2
0 / 0

Little research help with collection times..

by Michael Burman

Hi, I could use some help if someone has access to this sort of data. I'm trying to find our collection interval fluctuations when using our collectors. The actual collection interval is not interesting, but the amount it's delayed per run and how that changes. Currently we use the default's from the Gorilla paper (with the typo that the paper had), which gives us ranges: [-63,64] (7 bits), [-255,256] (9 bits), [-2047,2048] (12 bits) and rest. First one needs 2 bits as metadata, second needs 3 and third needs already 4 bits. Those ranges were done by the Facebook guys from their data, but that was using the second precision. We use milliseconds like for example Prometheus does, and they use ranges for 14,17,20 meaningful bits (IIRC). I don't think their values are however useful to us as from what I've gathered from my small set of data, our best ranges are somewhere around: [-256,256] millisecond difference in delta of delta, D = (tn - t(n-1) - (t(n-1) - t(n-2)) [-512,512] milliseconds [-2048,2048] milliseconds First two are probably wiser to merge to a single range and ignore the first one. While we store one extra bit of info, we save one on the metadata. Last one could technically be something like "missed one datapoint", but then that would be collection interval and I'm not sure if we have any target value for that in the long range. But, I'm guessing too much. From the hawkular-openshift-agent I can see that values are fetched in sequential mode, so delay in fetching one metric will cause the next one to be delayed and so on. I can see from an empty instance how much this time is, but when running in a congested environment, what can we expect? Saving a bit or two might not sound like much, but lets say with 30 000 pods in Openshift, 32 metrics per pod in 15 seconds interval we're talking about 5 529 600 000 datapoints per day and it adds up (659MB per day saved space with a single bit). So getting the ranges correct nets us a lot of savings. - Micke

9 years, 4 months

1
0
0 / 0

Requirement of Aerogear Unified Push Server Instance

by Anuj Garg

Hello team, We need one instance of Aerogear Unified Push Server for setting up push messaging feature in android client of hawkular. I can think of 2 ways this can be achieved. 1. Creating instance on openshift. 2. Asking aerogear team if they can provide account for one application. I request suggestions about what in these or other things we can do for this. Regards Anuj Garg

9 years, 4 months

2
1
0 / 0

Rolling upgrades (on Kubernetes)

by Heiko W.Rupp

Hey, on Kubernetes a common use case (that has been demonstrated by Thomas and Juca recently) are rolling upgrades (blue/green, canary, ...) where a new version of a service comes into play, gets scaled up and when this is scaled up, the old version of the service gets scaled down and eventually disappears. It can also happen though that the new version is buggy and the previous version gets scaled up again to take the full load until a fix is available. We need to make sure to support those rolling upgrades and also the rollback to the previous version of a service. E.g. if a (No)Sql schema update happens in version N+1, a still running instance of version N must not break by it, when N+1 is rolling out while N is still active. Similarly when N+1 has upgraded the schema and the application gets rolled back to version N, it must still be able to cope with the upgraded schema version. Same applies to APIs, but this is something we have discussed and mastered in the past. We should perhaps investigate how we can automatically test this scenario in our Travis testing.

9 years, 4 months

3
4
0 / 0

[metrics] Internal stats?

by Heiko W.Rupp

Hey, what internal stats of the Hawkular metrics do we currently collect? I think Joel did some work for the C* part. What I think we need is - number of data points stored on a per tenant basis. Resolution could be something like "last minute" or "last 5 minutes" I.e. not realtime updates in the table. - Total number of data points (i.e. sum over all tenants) - Query stats. This is probably more complicated, as querying on metrics that are still in some buffer is cheaper than over 3 years of raw data. To get started I'd go with # of queries per tenant and global Those could perhaps be differentiated on - raw endpoint - stats endpoint - What about alerting? More alert definitions certainly need more cpu, so number of alert definitions per tenant and total would be another pair. - does number of fired alerts also make sense? The idea behind those is to get some usage figures of the shared resource "Hawkular metrics" and then to be able to charge them back onto individual tenants e.g. inside of OpenShift.

9 years, 4 months

6
8
0 / 0

Cross-Tenant endpoints in Alerting on OS

by Jay Shaughnessy

On 2/23/2017 6:05 PM, Matt Wringe wrote: > Is there any reason why this being sent in private emails and not to a mailing list? Matt, Not really, sending to dev-list for anyone interested in the discussion... > ----- Original Message ----- >> There was an IRC discussion today about $SUBJECT. Here is a summary of >> a conversation Matt and I had to drill down into whether there was a >> cross-tenant security concern with the Alerting API in OS. In short, >> the answer seems to be no. Alerting (1.4+) offers two endpoints for >> fetching cross-tenant: /alerts/admin/alerts and /alerts/admin/events. >> Note that the 'admin' is just in the path, and was chosen just to group >> what we deemed were admin-level endpoints, the first two of which are >> these cross-tenant fetches. The 'admin' does not mean anything else in >> this context, it does not reflect a special user or tenant. The way >> these endpoints work is that that they accept a Hawkular-Tenant HTTP >> header that can be a comma-separated-list of tenantIds. As with any of >> the alerting endpoints. Alerting does not perform any security in the >> request handling. But, in OS the HAM deployments both have the OS >> security filtering in place. That filtering does two things, for a >> cluster-admin user it's basically a pass-thru, the CSL Hawkular-Tenant >> header is passed on and the endpoints work. For all other users the >> Hawkular-Tenant header is validated. Because each project name is a >> tenant name, the value must match a project name. As such, the >> validation fails if a CSL is supplied. This is decent behavior for now >> as it prevents any undesired access. Note that as a corner-case, these >> endpoints will work fine if the header just supplies a single tenant, in >> which case they are basically the same as the typical single-tenant >> fetch endpoints. > What has happened is now Alerts is not considering the Hawkular-tenant header to contain just a string, but a comma separated lists of strings. > > eg "Hawkular-tenant: projectA,projectB" Note, not in general, comma-separated-lists handled only for the two cross-tenant endpoints mentioned above. > The OpenShift filter still considers this to be a string, so it will check with OpenShift if the user has permission to access the project named with a string value of "projectA,projectB". Since a project cannot have a ',' within its name, this check will always fail and return an access denied error. > > If the user is a cluster level user they are given access to everything, even impossibly named projects. So a cluster level user will happen to be able to use the current setup just due to how this works. > > So there doesn't appear to be any security issue that we need to deal with immediately, but we do probably want to handle this properly in the future. It might not be too difficult to add support to the tenant to consider a csl. > >> I'm not totally familiar with the Metrics approach to cross-tenant >> handling but going forward we (Metrics and Alerting) should probably >> look for some consistency, if possible. Moreover, any solution should >> reflect what best serves OS. The idea of a CSL for the header is fairly >> simple and flexible. It may be something to consider, for the OS filter >> it would mean validating that the bearer has access to each of the >> individual tenants before forwarding the request. > I don't recall any meetings about adding multitenancy to Metrics. From what I recall, there is no plans at all to introduce multitenancy at all for metrics. > > If I was aware of this discussion when this was brought up for alerts, I would have probably objected to the endpoint being called 'admin' since I don't think that reflects what the true purpose of this is suppose to be. Its not really an admin endpoint, but an endpoint for cross-tenancy. I could have access to projectA and projectB, but not be an 'admin' > > If we are making changes like this which affect security, I would really like to be notified so that I can make sure our security filters will function properly. Even if I am in the meeting when its being discussed it would be good to ping me on the PR with the actual implementation. Of course. This stuff went in in mid November and at that time we (in alerting) were really just getting settled with the initial integration into metrics for OS. Going forward I think we have a better idea of what is relevant to OS and can more easily flag items of import.

9 years, 4 months

3
4
0 / 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

hawkular-dev March 2017