[Hawkular-dev] Hawkular-metrics resource requirements questions

Wed Dec 14 08:50:09 EST 2016

I have some numbers to share for compression. I ran a test simulation that stored 7 days of data points for 5,000 metrics with a sampling rate of 10 secs. I use a 10 sec sampling rate a lot since that is what has been used in OpenShift. I used only gauge metrics and used a uniform distribution for the values. The size of the raw data on disk *without* gorilla compression was 192 MB. We are still using Cassandra’s default compression, LZ4.

With gorilla compression the size of the live data on disk is 4.2 MB, which is in stark contrast with the size of the raw data. There are several things to note. First, we compress data in 2 hour blocks by default, which makes it a bit difficult to say how big a compressed sample is. The data set will affect the overall compression as well. It would be nice though if we did publish something about the compressed sample size even if it is with some caveats.

I mentioned that the the 4.2 MB is the size of the *live* data. As hawkular-metrics ingests data it does not cache and compress it in memory. Instead the data points are written to the data table just as they were prior to introducing gorilla compression. There is a background job that does the compression. It runs every 2 hours. When a 2 hour block of a data points for a time series is compressed and persisted, the corresponding raw data is deleted. If you are familiar with Cassandra, then you might be aware that deletes do not happen immediately. If hawkular-metrics is running with a single Cassandra node and/or if the replication_factor is 1, then gc_grace_seconds will be set to zero, so deleted data should get purged pretty fast. For a multi-node C* cluster with replication, gc_grace_seconds is set to 7 days.

It is a lot of little details, but they can impact the actual numbers you see, which is why I stressed that I was measuring the size of the live data. I hope this is helpful.

- John
> On Dec 12, 2016, at 11:41 PM, John Sanda <jsanda at redhat.com> wrote:
> 
> While this is anecdotal, it offers a bit of perspective. I have done testing on my laptop with sustained ingestion rates of 5,000 and 6,000 samples / 10 seconds without a single dropped mutation. This is with a single Cassandra 3.0.9 node create with ccm on a laptop running Fedora 24 and having 16 GB RAM, 8 cores, and SSD. We also use test environment with virtual machines and shared storage where we might have a tough time sustaining those ingestion rates on a single node depending on the time of day.
> 
>> On Dec 12, 2016, at 11:22 PM, John Sanda <jsanda at redhat.com <mailto:jsanda at redhat.com>> wrote:
>> 
>> Hey Daniel,
>> 
>> Sorry for the late reply. The person who did all the compression work (gayak on freenode) probably will not be around much for the rest of the year. He would be the best person to answer questions on compression; however, I should have some numbers to report back to you tomorrow.
>> 
>> With respect to performance handling 100 samples/second is not a problem, but just like with any other Cassandra TSDB, your hardware configuration is going to be a big factor. If you do not have good I/O performance for the commit log, ingestion is going to suffer. I will let Stefan chime with some thoughts on EC2 instance types.
>> 
>> Lastly, we welcome and appreciate community involvement. Your use case sounds really interesting, and we’ll do our best to get your questions answered and get you up and running.
>> 
>> - John
>> 
>>> On Dec 8, 2016, at 12:05 PM, Daniel Miranda <danielkza2 at gmail.com <mailto:danielkza2 at gmail.com>> wrote:
>>> 
>>> Forgot the links. The uncompressed storage estimates are actually for NewTS, but they should not be much different for any other Cassandra-backed TSDB without compression.
>>> 
>>> [1] https://www.adventuresinoss.com/2016/01/22/opennms-at-scale/ <https://www.adventuresinoss.com/2016/01/22/opennms-at-scale/>
>>> [2] https://prometheus.io/docs/operating/storage/ <https://prometheus.io/docs/operating/storage/>
>>> 
>>> Em qui, 8 de dez de 2016 às 15:00, Daniel Miranda <danielkza2 at gmail.com <mailto:danielkza2 at gmail.com>> escreveu:
>>> Greetings,
>>> 
>>> I'm looking for a distributed time-series database, preferably backed by Cassandra, to help monitor about 30 instances in AWS (with a perspective of quick growth in the future). Hawkular Metrics seems interesting due to it's native clustering support and use of compression, since naively using Cassandra is quite inefficient - KairosDB seems to need about 12B/sample [1], which is *way* higher than other systems with custom storage backends (Prometheus can do ~1B/sample [2]).
>>> 
>>> I would like to know if there are any existing benchmarks for how Hawkular's ingestion and compression perform, and what kind of resources I would need to handle something like 100 samples/producer/second, hopefully with retention for 7 and 30 days (the latter with reduced precision).
>>> 
>>> My planned setup is Collectd -> Riemann -> Hawkular (?) with Grafana for visualization.
>>> 
>>> Thanks in advance,
>>> Daniel
>>> _______________________________________________
>>> hawkular-dev mailing list
>>> hawkular-dev at lists.jboss.org <mailto:hawkular-dev at lists.jboss.org>
>>> https://lists.jboss.org/mailman/listinfo/hawkular-dev <https://lists.jboss.org/mailman/listinfo/hawkular-dev>
>> 
>> _______________________________________________
>> hawkular-dev mailing list
>> hawkular-dev at lists.jboss.org <mailto:hawkular-dev at lists.jboss.org>
>> https://lists.jboss.org/mailman/listinfo/hawkular-dev
> 
> _______________________________________________
> hawkular-dev mailing list
> hawkular-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hawkular-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20161214/bfdf9004/attachment.html