Avail-percentage query in metrics?
by Heiko W.Rupp
Hey,
is there a way to query Hawkular-metrics like this:
"give me the distribution of availability (up, down, ..) over a given
time range"
The result could be aggregated time per value or relative percentages -
this
does not matter.
Heiko
7 years, 9 months
Want to try hawkular-services via Docker?
by Heiko W.Rupp
If you want to give it a try you can save the following
into a file docker-compose.yml
and then run docker-compose up hawkular
This image has the -Pdev user jdoe/password.
hawkular:
image: "pilhuhn/hawkular-services:latest"
ports:
- "8080:8080"
- "8443:8443"
- "9990:9990"
volumes:
- /tmp/opt/hawkular/data:/opt/data
links:
- myCassandra
environment:
- HAWKULAR_BACKEND=remote
- CASSANDRA_NODES=myCassandra
myCassandra:
image: cassandra:3.7
environment:
- CASSANDRA_START_RPC=true
volumes:
- /tmp/opt/hawkular/cassandra:/var/lib/cassandra
--
Reg. Adresse: Red Hat GmbH, Technopark II, Haus C,
Werner-von-Siemens-Ring 14, D-85630 Grasbrunn
Handelsregister: Amtsgericht München HRB 153243
Geschäftsführer: Charles Cachera, Michael Cunningham, Michael O'Neill,
Eric Shander
7 years, 9 months
Can’t run inventory hawkular-service
by Austin Kuo
Hi all,
I started a cassandra cluster with:
> ccm create test -v 2.2.6 -n 1 -s && ccm start test
and then I tried to start the service with:
> ./dist/target/hawkular-services-dist-0.0.3.Final-SNAPSHOT/bin/standalone.sh
And it showed the warning and I checked the sever status at localhost:8080.
It said that inventory is unavailable.
WARN [org.hawkular.inventory.cdi] (ServerService Thread Pool -- 69)
HAWKINV003501: Inventory backend failed to initialize in an attempt 14 of
15 with message: Could not instantiate implementation:
com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftStoreManager.
Anyone who has encountered this?
Thanks!
7 years, 9 months
Request for PR review - build-tools - https://github.com/hawkular/hawkular-build-tools/pull/25
by Lukas Krejci
Hi all,
I've made some adjustments to our swagger -> asciidoc script so that it is
able to emit information about inheritance of the data definitions.
I'd like to use this for the next Inventory release, because the REST docs are
border-line useless without it.
I made biiig updates to the REST docs for the new inventory REST API and would
like to have it in shape before the imminent release of it.
Therefore I humbly ask for review of the above mentioned PR so that we can
release new build-tools and parent that I could then consume in inventory for
the new release.
Thanks,
--
Lukas Krejci
7 years, 9 months
Has to provide tenant header to look up tenant
by Austin Kuo
I found something weird when I was playing with the rest api.
When I GET ‘inventory/tenant’, this request has to provide tenant header.
It does not make sense to me because I look up my tenants because I don’t
know them.
Austin.
7 years, 9 months
Metrics performance testing PR#520
by Thomas Segismont
Hi,
Today I've been looking at Metrics insertion performance.
My setup is the following:
- on my laptop I run the Gatling load scenario (which is very similar to
the perf test job scenario)
- I also run a Metrics standalone instance
- on another machine connected to my LAN, I run a single node C* cluster
In order to avoid problems due to memory constraints, I set min and max
heap size to 2048m. Below this value the server spends significant time
in garbage collection (with high number of concurrent clients).
I ran Gatling 3 times with different number of clients (think agents):
100,1000,2000
----
mvn gatling:execute -Dclients=X -Dramp=0 -Dloops=50
----
From 100 to 1000 virtual clients, the throughput raised accordingly
(x10). From 1000 to 2000, I hit the same kind of plateau Filip was
observing: I got barely 15% throughput increase. None of the machines
had reached cpu/io/memory limits.
So I ran the 2000 virtual clients test again and jstack (result
attached). As you can see, most of the task handler threads are in state
WAITING inside the
com.datastax.driver.core.HostConnectionPool.awaitAvailableConnection method.
Then, following the "Monitoring and tuning the pool" section of the
driver doc, I added some code to print the number of open connections,
active requests, and maximum capacity. It confirmed that the maximum
capacity was reached.
Note that by default with the V3 protocol, the driver creates just one
connection per host and allows a maximum of 1024 concurrent requests.
After that I added new options in the code to let the user define the
max number of connections per host as well as requests per connection.
In my tests, I set the maximum connection limit to 10 and maximum
concurrency to 5000. These are arbitrary values, they are simply bigger
than the default. Best values would depend on the environment and should
be determined with testing.
Then I ran Gatling 3 times again (100, 1000, 2000 virtual clients). This
time throughput increased linearly with the number of clients, from 100
to 1000 and then 2000. I also tried with 3000, but this time throughput
did not increase as much. This was certainly due to both machines now
using more than 80% cpu.
I sent a pull request [2] with the changes needed for the new
configuration parameters.
Hopefully it can help in Filip's tests as well.
Regards,
--
Thomas Segismont
JBoss ON Engineering Team
[1]
http://datastax.github.io/java-driver/manual/pooling/#monitoring-and-tuni...
[2] https://github.com/hawkular/hawkular-metrics/pull/520
7 years, 9 months
JMS/MDB related PR to be reviewed
by Gary Brown
https://github.com/hawkular/hawkular-apm/pull/456
Hawkular APM uses JMS as its event processing backbone, with events being passed around in batches. When some or all of a batch of events fails to be processed by a particular component, it has a retry mechanism that resubmits the failed events to be processed again (subject to a max retry count).
The problem with the current approach is that it is resubmitting the events to the source topic upon which the events were received - however this topic may also be used by other processing components that had processed the events successfully. For example, processing components A and B subscribe to a particular topic T - A processes all of the events in batch B1 successfully, and B only processes half, retrying the other half. When the failed half are published back to topic T again, then A receives them a second time, even though they were processed successfully the first time.
Therefore the retry mechanism needs to be more targetted, so that the resubmitted events are only processed by the component that failed to process the events before. There are two possible ways to do this:
1) Define a retry queue - this means doubling up on all of the subscribers, so that each subscriber (i.e. processing capability) would have a topic subscriber to receive the initial events, and a retry queue to handle failed attempts. Its possible a single retry queue could be used with a message selector to distinguish the target processing unit.
2) Use a message selector on the original topic - so in this case, no more destinations need to be added, the initial message has no property defined so will be received by all topic subscribers, whereas when the message is resubmitted, it will have a target subscriber named, which will be routed through to the correct subscriber based on a message selector.
This PR implements this second approach. On initial tests the performance hit of using a message selector appears negligible, although this was my initial concern. If this does prove to be an issue in the future, then the single queue with message selector approach in (1) could be used.
Anyone able to review/merge the PR for me?
Thanks in advance.
Regards
Gary
7 years, 9 months