Hi,
So the discussion on #hawkular pointed out that our current metrics performance testing is
seeing a regression in the performance, after the merging of PR #519. I found this odd as
I had done a lot of testing for this feature, including multiple runs of JMH
(OpenJDK's Java benchmarking tool) and using a profiler to witness reduced CPU usage
as well as reduced GC activity. The issue surrounds over RxJava's concatWith vs.
mergeWith. Now, concatWith creates internal Queues and other stuff to keep track of the
stuff, whereas mergeWith does not.
So, Filip's tools say mergeWith is slower, my intuition/profiler and benchmarking says
otherwise. Odd situation, but there's a small difference in testing methologies here.
I've done testing against hawkular-core-service (that is, only the internal handling),
while Filip uses the whole Wildfly in his testing. Although the performance degredation is
quite severe with JAX-RS involved, it shouldn't cause different end results.
The test results for my runs with JMH are here (with first the end results and then
individual runs and statistics).
https://gist.github.com/burmanm/e3b1ab0e67bcae189072718d39b66b6c
The test that is run here is:
https://github.com/hawkular/hawkular-metrics/blob/master/integration-test...
Which mimics the behavior in which multiple concurrent senders are sending 1 metric in
each call. With our backend, there's not a huge difference if we send everything in
one request or single requests (this is more JAX-RS issue). The tests are using both 1
datapoint per 1 metric and 10 datapoints per 1 metric. The performance is returned as
inserted metrics per second (so multiply by 10 the second result -> 9012 metrics /
second -> 90120 datapoints / second).
Let the flaming begin.. or something ;)
Other notes:
There are other important optimization strategies involved with the 1 metric 1 datapoint
scenario. First one is changing the behavior of indexUpdates. Removing unnecessary index
updates doubles the performance from 13k metrics to 26k metrics per second. Second
optimization strategy with single Cassandra node deployments is to force datapoint writing
to use batch statements (with different partitions - which isn't normally
recommended), that increases the performance from the 26k to >60k. I'm trying to
find better solution for the latter case, but haven't so far found one.
- Micke