Replied inline.
----- Original Message -----
Thank you for the detailed summary. It is very insightful. I have a
few
comments. First, I am not all that interested in the tests in which client
are submitting one data point per request. We have two primary feeds right
now that we with which we are concerned - Heapster and the Hawkular agent.
Neither will be submitting a single data point per request. It will more
likely be in the hundreds or even thousands.
It's very easy to increase the number of data points per request up to ~700. Bigger
messages are currently a problem because of the perfcake limitation -
https://github.com/PerfCake/PerfCake/issues/260
I will update test configurations to use higher values.
Secondly, we perform an additional write per metric on each request
and wait
for all those writes finish being replying. Whether a request contains 1 or
100 data points for a metrics in a request, we perform an additional write.
The overhead of that additional write is probably offset more when there are
more data points per metric. We could make those additional writes
completely async, potentially sending a response before they complete and
see if it yields and gains.
If you want to experiment with this, please create a PR. We can easily compare results
then.
Third, it doesn’t sound like we are even coming close maxing out
resources on
Cassandra nodes in most of the test configurations. It sounds more like
H-Metrics is the limiting factor. When the numbers plateau or drop and
adding more C* nodes doesn’t help, I would be interested to see how things
look if we add an additional H-Metrics nodes. Is is possible with the load
generator tool submit requests to multiple hosts in a round robin manner?
This is possible. Added to my todo list.
Filip
> On Jan 14, 2016, at 8:48 AM, Filip Brychta
<fbrychta(a)redhat.com> wrote:
>
> Hello,
> I did first quick perf testing of haw metrics STANDALONE with more
> cassandra nodes and it showed some interesting results.
>
> Important note is that hawkular and cassandra cluster were running on VMs
> with shared storage. Which is very poor design for cassandra cluster but
> it still showed some patterns which will be true for every setup.
> Witch proper cassandra cluster (dedicated disks, CommitLog and SSTables on
> different disks, ...) the results should be definitely better.
>
> Summary of what was found (something is obvious even without testing):
> - small messages (1 datapoint per request) utilize heavily cpu on hawkular
> host and cassandra hosts are utilized gently
> - bigger messages (100 datapoints per request) are less demanding on
> hawkular host's cpu, cassandra hosts are utilized little bit more
> - with week cpu on hawkular host, adding more cassandra nodes makes
> performance even worst
> - for small messages (1 datapoint per request) even with sufficient cpu on
> hawkular host the performance improvement was only ~ 25% when number of
> nodes in the cluster was increased from 1 to 2
> - for bigger messages (100 datapoints per request) with sufficient cpu on
> hawkular host the performance improvement was ~ 75% when number of nodes
> in the cluster was increased from 1 to 2
> - for small messages (1 datapoint per request) even with sufficient cpu on
> hawkular host the performance does NOT scale up with more cassandra nodes
> (see results - performance dropped when 4th node was added)
> - for bigger messages (100 datapoints per request) with sufficient cpu on
> hawkular host the performance scales up but not linearly (this could be
> caused by shared storage and with proper cassandra cluster the results
> will be better)
>
> Questions:
> - why is the performance getting worst when adding 3th and 4th storage
> nodes when sending small messages and having sufficient cpu on hawkular
> host?
>
>
> About the test:
> - load generator was hitting following endpoint
> http://${server.host}:${server.port}/hawkular/metrics/gauges/data
> - one test run takes 4 minutes
> - message with one datapoint looks like this
> [{"id":"gaugeID","data":[{"timestamp":
"@{CurrentTimestamp}", "value":
> 10.12}]}]
> - load generator was using 300 threads (each thread is acting like single
> client) and was sending messages containing 1 or 100 datapoints
> - hawkular metrics is deployed on wildfly-9.0.2.Final
> - metrics version:
>
"Implementation-Version":"0.12.0-SNAPSHOT","Built-From-Git-SHA1":"c35deda5d6d03429e97f1ed4a6e4ef12cf7f3a00"
>
>
> Results:
>
> ===================================================
> VMs with 2 cores and shared storage, 4GB of memory.
> ===================================================
> 300 threads, 1 datapoint, hawkular_metrics |
> org.apache.cassandra.locator.SimpleStrategy |
{"replication_factor":"1"
> ++++++++++++++++++++++++++++++++
> 1 cassandra node ~ 3945 req/sec
> 2 cassandra nodes ~ 3751 req/sec
> 3 cassandra nodes ~ 3318 req/sec
> 4 cassandra nodes ~ 2726 req/sec
>
> In this case the cpu on hawkular VM was fully used and adding more
> cassandra nodes actually made performance worst.
> Cpu on cassandra nodes was never fully used
>
>
> 300 threads, 100 datapoint, hawkular_metrics |
> org.apache.cassandra.locator.SimpleStrategy |
{"replication_factor":"1"
> ++++++++++++++++++++++++++++++++
> 1 cassandra nodes ~ 102 req/sec
> 2 cassandra nodes ~ 138 req/sec
> 3 cassandra nodes ~ 188 req/sec
> 4 cassandra nodes ~ 175 req/sec
>
>
> With week cpu on hawkular VM and big messages (100 datapoints in each)
> there is still some improvement when adding more cassandra nodes.
> Cpu on cassandra nodes was never fully used
>
> ===================================================
> Hawkular VM with 4 cores, cassandra VMs 2 cores and shared storage,4GB of
> memory.
> ===================================================
> 300 threads, 1 datapoint, hawkular_metrics |
> org.apache.cassandra.locator.SimpleStrategy |
{"replication_factor":"1"
> ++++++++++++++++++++++++++++++++
> 1 cassandra node ~ 5150 req/sec
> 2 cassandra nodes ~ 5667 req/sec
> 3 cassandra nodes ~ 5799 req/sec
> 4 cassandra nodes ~ 5476 req/sec
>
> With stronger cpu on hawkular VM adding more cassandra nodes improves
> performance but there is a drop when 4th node is added.
> Cpu on cassandra nodes was never fully used
>
> 300 threads, 100 datapoint, hawkular_metrics |
> org.apache.cassandra.locator.SimpleStrategy |
{"replication_factor":"1"
> ++++++++++++++++++++++++++++++++
> 1 cassandra nodes ~ 111 req/sec
> 2 cassandra nodes ~ 173 req/sec
> 3 cassandra nodes ~ 206 req/sec
> 4 cassandra nodes ~ 211 req/sec
>
> With stronger cpu on hawkular VM adding more cassandra nodes improves
> performance.
> Cpu on cassandra nodes was never fully used
>
> ===================================================
> Hawkular VM with 8 cores, cassandra VMs 2 cores and shared storage,4GB of
> memory. Cpu on hawkular machine is used 30-40%
> ===================================================
> 300 threads, 1 datapoint, hawkular_metrics |
> org.apache.cassandra.locator.SimpleStrategy |
{"replication_factor":"1"
> ++++++++++++++++++++++++++++++++
> 1 cassandra node ~ 5424 req/sec
> 2 cassandra nodes ~ 6810 req/sec
> 3 cassandra nodes ~ 6576 req/sec
> 4 cassandra nodes ~ 6094 req/sec
>
> Why there is a drop for 3th and 4th node?
>
> 300 threads, 100 datapoint, hawkular_metrics |
> org.apache.cassandra.locator.SimpleStrategy |
{"replication_factor":"1"
> ++++++++++++++++++++++++++++++++
> 1 cassandra nodes ~ 97 req/sec
> 2 cassandra nodes ~ 168 req/sec
> 3 cassandra nodes ~ 222 req/sec
> 4 cassandra nodes ~ 241 req/sec
>
>
> Please let me know what you would like to see in next rounds of testing.
>
> Filip
> _______________________________________________
> hawkular-dev mailing list
> hawkular-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/hawkular-dev
_______________________________________________
hawkular-dev mailing list
hawkular-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hawkular-dev