Hi,
I assume this might be one of the reasons for the scalability wall I've seen in my
performance tests. No matter what I did (while testing the JAX-RS 2.0 / Servlet 3.1 async
I/O performances) I would always hit a limit of around 3200 inserts per second. Although,
this happened with a parallel requests each sending just one metric. Neither Cassandra or
EAP ate all the CPU, nor was the I/O a bottleneck or out of memory. But no matter the
implementation, that was the top it would do (the servlet did less CPU - but still the
performance was stuck there). Or was it hitting a limitation of some
Datastax-driver-netty-concurrency limit? Either way, it's clearly limiting our
performance.
And even if it didn't matter there, the catch is that if that was the reason, it would
become very apparent with this change of using async writes as we would do more in
parallel.
- Micke
----- Original Message -----
From: "John Sanda" <jsanda(a)redhat.com>
To: "Discussions around Hawkular development"
<hawkular-dev(a)lists.jboss.org>
Sent: Wednesday, September 30, 2015 5:49:41 AM
Subject: [Hawkular-dev] batch vs async inserts
Many of you have probably seen warnings in the Hawkular server log like,
--------------
WARN 15:55:59 Batch of prepared statements for [hawkular_metrics.data,
hawkular_metrics.metrics_idx] is of size 5665, exceeding specified threshold of 5120 by
545.
--------------
This warning is generated due to batch statements being larger that a threshold defined in
cassandra.yaml. It defaults to 5 KB. When the batch statement is larger than that
threshold, Cassandra logs the warning. Note that the threshold is based on the actual size
of the payload, not the number of statements in the batch. We should stop seeing this
warning in 0.7.0 release of Metrics. See HWKMETRICS-252[1] for details.
The general advice in the Cassandra community is to favor async writes in parallel over
batch inserts when you are trying to improve or optimize write performance. Unlogged
batches across multiple partitions is almost always a bad idea. The one exception is with
unlogged batches in which all of the mutations are for the same partition. In that case,
Cassandra performs the writes atomically. This is how we use batch inserts in metrics.
Interestingly I have seen threads on the Cassandra mailing list that still discourage the
use of batch inserts even in this case. This thread[1] provides some really interesting
insights and analysis on unlogged batch inserts vs async inserts. The thread references a
document with some performance analysis that is worth a look.
[1]
https://issues.jboss.org/browse/HWKMETRICS-252
[2]
http://www.mail-archive.com/user%40cassandra.apache.org/msg43976.html
_______________________________________________
hawkular-dev mailing list
hawkular-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hawkular-dev