Hello,
as discussed on last haw perf meeting I tried to generate load from more generators.
Motivation for this was to find out If it's possible to max out hw resources on tested
machine (PowerEdge R630, 16 cores) when generation heavy load from more load generators.
Quick answer is NO. Even though the load was generated from 5 machines from 1500 threads
in total, CPU usage was never higher than 50% and maximal disk write speed was 15MB/sec.
Maximum throughput was ~19000 requests/sec when using ~ 900 threads. When adding more
threads the throughput and cpu usage went actually down.
When bypassing cassandra calls in hawkular metrics and returning 200 response directly
(see
https://github.com/hawkular/hawkular-metrics/pull/411), single generator from 300
thread was able to generate load 45000 requests/sec.
All this means that the problem is really not on load generator side not being able to
generate sufficient load. Hawkular is not able to max out resources on that machine in
current configuration no matter how many threads are used for load generation.
Important note about perfcake (load generator we are using) is that it's not trying to
break the server. For each thread it's sending new request only if previous request
was done (response returned). So this means it's reporting maximum throughput which is
hawkular metrics able to serve for given number of threads.
Results:
1 datapoint per request:
600 threads -> throughput ~18100 requests/sec
900 threads -> throughput ~19000 requests/sec
1200 threads -> throughput ~10300 requests/sec
1500 threads -> throughput ~ 10000 requests/sec
10 datapoint per request:
300 threads -> throughput ~5000 requests/sec
600 threads -> throughput ~3500 requests/sec
900 threads -> throughput ~3100 requests/sec
1200 threads -> throughput ~3100 requests/sec
1500 threads -> throughput ~ 3100 requests/sec
For all cases garbage collection was happening in regular periods ~3s. ( WF is running
with -Xms64m -Xmx512m -XX:MaxPermSize=256m)
I tried a few runs with increased memory -Xms2048m -Xmx2048m -XX:MaxPermSize=256m:
1 datapoint per request:
900 threads -> throughput ~ 19800 requests/sec
1500 threads -> throughput ~ 19800 requests/sec
So for bigger number of threads the increased memory helped a lot. For 900 threads it
helped a little. Throughput was stabilized before fist garbage collection occurred so it
seems another increase of memory doesn't make sense.
See attached txt for more detailed results.
Filip