Metrics performance testing PR#520
by Thomas Segismont
Hi,
Today I've been looking at Metrics insertion performance.
My setup is the following:
- on my laptop I run the Gatling load scenario (which is very similar to
the perf test job scenario)
- I also run a Metrics standalone instance
- on another machine connected to my LAN, I run a single node C* cluster
In order to avoid problems due to memory constraints, I set min and max
heap size to 2048m. Below this value the server spends significant time
in garbage collection (with high number of concurrent clients).
I ran Gatling 3 times with different number of clients (think agents):
100,1000,2000
----
mvn gatling:execute -Dclients=X -Dramp=0 -Dloops=50
----
From 100 to 1000 virtual clients, the throughput raised accordingly
(x10). From 1000 to 2000, I hit the same kind of plateau Filip was
observing: I got barely 15% throughput increase. None of the machines
had reached cpu/io/memory limits.
So I ran the 2000 virtual clients test again and jstack (result
attached). As you can see, most of the task handler threads are in state
WAITING inside the
com.datastax.driver.core.HostConnectionPool.awaitAvailableConnection method.
Then, following the "Monitoring and tuning the pool" section of the
driver doc, I added some code to print the number of open connections,
active requests, and maximum capacity. It confirmed that the maximum
capacity was reached.
Note that by default with the V3 protocol, the driver creates just one
connection per host and allows a maximum of 1024 concurrent requests.
After that I added new options in the code to let the user define the
max number of connections per host as well as requests per connection.
In my tests, I set the maximum connection limit to 10 and maximum
concurrency to 5000. These are arbitrary values, they are simply bigger
than the default. Best values would depend on the environment and should
be determined with testing.
Then I ran Gatling 3 times again (100, 1000, 2000 virtual clients). This
time throughput increased linearly with the number of clients, from 100
to 1000 and then 2000. I also tried with 3000, but this time throughput
did not increase as much. This was certainly due to both machines now
using more than 80% cpu.
I sent a pull request [2] with the changes needed for the new
configuration parameters.
Hopefully it can help in Filip's tests as well.
Regards,
--
Thomas Segismont
JBoss ON Engineering Team
[1]
http://datastax.github.io/java-driver/manual/pooling/#monitoring-and-tuni...
[2] https://github.com/hawkular/hawkular-metrics/pull/520
9 years, 8 months
JMS/MDB related PR to be reviewed
by Gary Brown
https://github.com/hawkular/hawkular-apm/pull/456
Hawkular APM uses JMS as its event processing backbone, with events being passed around in batches. When some or all of a batch of events fails to be processed by a particular component, it has a retry mechanism that resubmits the failed events to be processed again (subject to a max retry count).
The problem with the current approach is that it is resubmitting the events to the source topic upon which the events were received - however this topic may also be used by other processing components that had processed the events successfully. For example, processing components A and B subscribe to a particular topic T - A processes all of the events in batch B1 successfully, and B only processes half, retrying the other half. When the failed half are published back to topic T again, then A receives them a second time, even though they were processed successfully the first time.
Therefore the retry mechanism needs to be more targetted, so that the resubmitted events are only processed by the component that failed to process the events before. There are two possible ways to do this:
1) Define a retry queue - this means doubling up on all of the subscribers, so that each subscriber (i.e. processing capability) would have a topic subscriber to receive the initial events, and a retry queue to handle failed attempts. Its possible a single retry queue could be used with a message selector to distinguish the target processing unit.
2) Use a message selector on the original topic - so in this case, no more destinations need to be added, the initial message has no property defined so will be received by all topic subscribers, whereas when the message is resubmitted, it will have a target subscriber named, which will be routed through to the correct subscriber based on a message selector.
This PR implements this second approach. On initial tests the performance hit of using a message selector appears negligible, although this was my initial concern. If this does prove to be an issue in the future, then the single queue with message selector approach in (1) could be used.
Anyone able to review/merge the PR for me?
Thanks in advance.
Regards
Gary
9 years, 8 months
hawkular services with embedded c profile - need PR reviewed/merged
by John Mazzitelli
I just submitted a new PR for hawkular services:
https://github.com/hawkular/hawkular-services/pull/22
This re-introduces the ability to build Hawkular Services with embedded Cassandra if you want.
This will ONLY embed Cassandra IF you build with the special Maven profile -Pembeddedc.
If you never build with that profile, Hawkular Services will build as it does now (that is, requires you to install and start your own Cassandra separately).
So this will allow developers to continue doing what they've been doing (that is, no need to run a standalone C*) if they want.
Again, the default build is NOT to embed Cassandra. You must explicitly tell Maven you want to embed C* via -Pembeddedc to get this feature.
9 years, 8 months
Hawkular Community - Distribution
by Stefan Negrea
Hello Everybody,
I want to give an update on the Hawkular Community distribution since
Hawkular Services is now on a good track.
The Hawkular Community distribution will be built on top of the Hawkular
Services with two additions: Embedded Cassandra and Hawkular UI. The
Embedded Cassandra was removed from the Hawkular Services but will make its
way into the community distribution. The Hawkular UI will be the original
web interface from previous Hawkular bundle but updated to work with
Hawkular Services.
In the short term, the Hawkular Community distributions will be just a
rebundling of Hawkular Services with the Embedded Cassandra. The repository
for this effort is: https://github.com/hawkular/hawkular-community
The Hawkular UI updates are now in the planning stages and I will provide
an update as soon as the timeline becomes more clear. The code has been
relocated this new repository: https://github.com/hawkular/hawkular-ui
Lastly, I renamed old hawkular/hawkular repository
hawkular/hawkular-dist-old, marked it as archived, and removed access to
it. The new url: https://github.com/hawkular/hawkular-dist-old . My initial
reaction was to update hawkular/hawkular for the new community
distribution. But after looking at the commit and pull request history it
did not make sense to continue using that repository. Also, it would have
been really confusing to keep the repository with the old name knowing it
will probably be never be updated again, if somebody takes a look at the
org and spots that repository it may wonder if the community is still
active.
A note about the hawkular/hawkular rename, Github has automatic redirects
for renamed repositories. If somebody navigates to the old url, they will
be redirected to the new name. So as long we do not have another
hawkular/hawkular, the old url will keep working (which is is a good idea
given existing links and forks). But we should never ever name anything
again hawkular/hawkular because it is a very misleading name.
Thank you,
Stefan Negrea
9 years, 8 months
Hawkular Services 0.0.3.Final
by Juraci Paixão Kröhling
Team,
Hawkular Services 0.0.3.Final has just been released.
This version includes Agent 0.19.0.Final and features the removal of the
embedded Cassandra. You'll need a local installation of Cassandra
running before you start the server.
The distribution can be downloaded here:
https://repository.jboss.org/nexus/service/local/repositories/releases/co...
As the previous distributions, the Agent has to be configured with an
user. This can be accomplished by:
- Adding an user via bin/add-user.sh like:
./bin/add-user.sh \
-a \
-u <theusername> \
-p <thepassword> \
-g read-write,read-only
- Changing the Agent's credential on standalone.xml to the credentials
from the previous step or by passing hawkular.rest.user /
hawkular.rest.password as system properties (-Dhawkular.rest.user=jdoe)
Known issue:
https://issues.jboss.org/browse/HWKINVENT-186
- Juca.
9 years, 8 months
Performance testing (and PR #519)
by Michael Burman
Hi,
So the discussion on #hawkular pointed out that our current metrics performance testing is seeing a regression in the performance, after the merging of PR #519. I found this odd as I had done a lot of testing for this feature, including multiple runs of JMH (OpenJDK's Java benchmarking tool) and using a profiler to witness reduced CPU usage as well as reduced GC activity. The issue surrounds over RxJava's concatWith vs. mergeWith. Now, concatWith creates internal Queues and other stuff to keep track of the stuff, whereas mergeWith does not.
So, Filip's tools say mergeWith is slower, my intuition/profiler and benchmarking says otherwise. Odd situation, but there's a small difference in testing methologies here. I've done testing against hawkular-core-service (that is, only the internal handling), while Filip uses the whole Wildfly in his testing. Although the performance degredation is quite severe with JAX-RS involved, it shouldn't cause different end results.
The test results for my runs with JMH are here (with first the end results and then individual runs and statistics).
https://gist.github.com/burmanm/e3b1ab0e67bcae189072718d39b66b6c
The test that is run here is:
https://github.com/hawkular/hawkular-metrics/blob/master/integration-test...
Which mimics the behavior in which multiple concurrent senders are sending 1 metric in each call. With our backend, there's not a huge difference if we send everything in one request or single requests (this is more JAX-RS issue). The tests are using both 1 datapoint per 1 metric and 10 datapoints per 1 metric. The performance is returned as inserted metrics per second (so multiply by 10 the second result -> 9012 metrics / second -> 90120 datapoints / second).
Let the flaming begin.. or something ;)
Other notes:
There are other important optimization strategies involved with the 1 metric 1 datapoint scenario. First one is changing the behavior of indexUpdates. Removing unnecessary index updates doubles the performance from 13k metrics to 26k metrics per second. Second optimization strategy with single Cassandra node deployments is to force datapoint writing to use batch statements (with different partitions - which isn't normally recommended), that increases the performance from the 26k to >60k. I'm trying to find better solution for the latter case, but haven't so far found one.
- Micke
9 years, 8 months
Hawkular Commons 0.7.3.Final released
by Juraci Paixão Kröhling
Team,
Hawkular Commons 0.7.3.Final was just released. This release includes
the JAX-RS filter for requiring Hawkular-Tenant, as well as the
annotation `@TenantRequired`, where you can mark a specific class or
method as required or not required.
PRs to Alerts and Inventory will follow shortly. I also expect to be
able to send a PR to Metrics soon.
- Juca.
9 years, 8 months
Prepare for Hawkular Services 0.0.3.Final
by Juraci Paixão Kröhling
Team,
We'll have a release of Hawkular Services tomorrow, for version
0.0.3.Final.
This release will include the removal of embedded Cassandra. There
should not be anything required for individual components, but it would
be good if you'd test the current Hawkular Services master to see if
your component is working as expected.
If there are version changes required, send a PR by EOD today.
- Juca.
9 years, 8 months