[Hawkular-dev] Metrics performance testing PR#520

Tue Jun 21 17:59:49 EDT 2016

Hi,

Today I've been looking at Metrics insertion performance.

My setup is the following:
- on my laptop I run the Gatling load scenario (which is very similar to 
the perf test job scenario)
- I also run a Metrics standalone instance
- on another machine connected to my LAN, I run a single node C* cluster

In order to avoid problems due to memory constraints, I set min and max 
heap size to 2048m. Below this value the server spends significant time 
in garbage collection (with high number of concurrent clients).

I ran Gatling 3 times with different number of clients (think agents): 
100,1000,2000
----
mvn gatling:execute -Dclients=X -Dramp=0 -Dloops=50
----

 From 100 to 1000 virtual clients, the throughput raised accordingly 
(x10). From 1000 to 2000, I hit the same kind of plateau Filip was 
observing: I got barely 15% throughput increase. None of the machines 
had reached cpu/io/memory limits.

So I ran the 2000 virtual clients test again and jstack (result 
attached). As you can see, most of the task handler threads are in state 
WAITING inside the 
com.datastax.driver.core.HostConnectionPool.awaitAvailableConnection method.

Then, following the "Monitoring and tuning the pool" section of the 
driver doc, I added some code to print the number of open connections, 
active requests, and maximum capacity. It confirmed that the maximum 
capacity was reached.

Note that by default with the V3 protocol, the driver creates just one 
connection per host and allows a maximum of 1024 concurrent requests.

After that I added new options in the code to let the user define the 
max number of connections per host as well as requests per connection.

In my tests, I set the maximum connection limit to 10 and maximum 
concurrency to 5000. These are arbitrary values, they are simply bigger 
than the default. Best values would depend on the environment and should 
be determined with testing.

Then I ran Gatling 3 times again (100, 1000, 2000 virtual clients). This 
time throughput increased linearly with the number of clients, from 100 
to 1000 and then 2000. I also tried with 3000, but this time throughput 
did not increase as much. This was certainly due to both machines now 
using more than 80% cpu.

I sent a pull request [2] with the changes needed for the new 
configuration parameters.

Hopefully it can help in Filip's tests as well.

Regards,

-- 
Thomas Segismont
JBoss ON Engineering Team

[1] 
http://datastax.github.io/java-driver/manual/pooling/#monitoring-and-tuning-the-pool
[2] https://github.com/hawkular/hawkular-metrics/pull/520
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stack.tar.gz
Type: application/gzip
Size: 16416 bytes
Desc: not available
Url : http://lists.jboss.org/pipermail/hawkular-dev/attachments/20160621/8f63b78c/attachment-0001.bin