[Hawkular-dev] batch vs async inserts

Michael Burman miburman at redhat.com
Wed Sep 30 07:24:42 EDT 2015


Hi,

I assume this might be one of the reasons for the scalability wall I've seen in my performance tests. No matter what I did (while testing the JAX-RS 2.0 / Servlet 3.1 async I/O performances) I would always hit a limit of around 3200 inserts per second. Although, this happened with a parallel requests each sending just one metric. Neither Cassandra or EAP ate all the CPU, nor was the I/O a bottleneck or out of memory. But no matter the implementation, that was the top it would do (the servlet did less CPU - but still the performance was stuck there). Or was it hitting a limitation of some Datastax-driver-netty-concurrency limit? Either way, it's clearly limiting our performance.

And even if it didn't matter there, the catch is that if that was the reason, it would become very apparent with this change of using async writes as we would do more in parallel.

   - Micke

----- Original Message -----
From: "John Sanda" <jsanda at redhat.com>
To: "Discussions around Hawkular development" <hawkular-dev at lists.jboss.org>
Sent: Wednesday, September 30, 2015 5:49:41 AM
Subject: [Hawkular-dev] batch vs async inserts

Many of you have probably seen warnings in the Hawkular server log like, 

-------------- 
WARN 15:55:59 Batch of prepared statements for [hawkular_metrics.data, hawkular_metrics.metrics_idx] is of size 5665, exceeding specified threshold of 5120 by 545. 
-------------- 

This warning is generated due to batch statements being larger that a threshold defined in cassandra.yaml. It defaults to 5 KB. When the batch statement is larger than that threshold, Cassandra logs the warning. Note that the threshold is based on the actual size of the payload, not the number of statements in the batch. We should stop seeing this warning in 0.7.0 release of Metrics. See HWKMETRICS-252[1] for details. 

The general advice in the Cassandra community is to favor async writes in parallel over batch inserts when you are trying to improve or optimize write performance. Unlogged batches across multiple partitions is almost always a bad idea. The one exception is with unlogged batches in which all of the mutations are for the same partition. In that case, Cassandra performs the writes atomically. This is how we use batch inserts in metrics. Interestingly I have seen threads on the Cassandra mailing list that still discourage the use of batch inserts even in this case. This thread[1] provides some really interesting insights and analysis on unlogged batch inserts vs async inserts. The thread references a document with some performance analysis that is worth a look. 

[1] https://issues.jboss.org/browse/HWKMETRICS-252 
[2] http://www.mail-archive.com/user%40cassandra.apache.org/msg43976.html 

_______________________________________________
hawkular-dev mailing list
hawkular-dev at lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hawkular-dev


More information about the hawkular-dev mailing list