[Hawkular-dev] Performance testing (and PR #519)

Mon Jun 20 09:46:21 EDT 2016

Hi,

So the discussion on #hawkular pointed out that our current metrics performance testing is seeing a regression in the performance, after the merging of PR #519. I found this odd as I had done a lot of testing for this feature, including multiple runs of JMH (OpenJDK's Java benchmarking tool) and using a profiler to witness reduced CPU usage as well as reduced GC activity. The issue surrounds over RxJava's concatWith vs. mergeWith. Now, concatWith creates internal Queues and other stuff to keep track of the stuff, whereas mergeWith does not.

So, Filip's tools say mergeWith is slower, my intuition/profiler and benchmarking says otherwise. Odd situation, but there's a small difference in testing methologies here. I've done testing against hawkular-core-service (that is, only the internal handling), while Filip uses the whole Wildfly in his testing. Although the performance degredation is quite severe with JAX-RS involved, it shouldn't cause different end results.

The test results for my runs with JMH are here (with first the end results and then individual runs and statistics).

https://gist.github.com/burmanm/e3b1ab0e67bcae189072718d39b66b6c

The test that is run here is:

https://github.com/hawkular/hawkular-metrics/blob/master/integration-tests/jmh-benchmark/src/main/java/org/hawkular/metrics/benchmark/jmh/InsertBenchmark.java#L141

Which mimics the behavior in which multiple concurrent senders are sending 1 metric in each call. With our backend, there's not a huge difference if we send everything in one request or single requests (this is more JAX-RS issue). The tests are using both 1 datapoint per 1 metric and 10 datapoints per 1 metric. The performance is returned as inserted metrics per second (so multiply by 10 the second result -> 9012 metrics / second -> 90120 datapoints / second).

Let the flaming begin.. or something ;)

Other notes:

There are other important optimization strategies involved with the 1 metric 1 datapoint scenario. First one is changing the behavior of indexUpdates. Removing unnecessary index updates doubles the performance from 13k metrics to 26k metrics per second. Second optimization strategy with single Cassandra node deployments is to force datapoint writing to use batch statements (with different partitions - which isn't normally recommended), that increases the performance from the 26k to >60k. I'm trying to find better solution for the latter case, but haven't so far found one.

  - Micke