I have some numbers to share for compression. I ran a test simulation that stored 7 days
of data points for 5,000 metrics with a sampling rate of 10 secs. I use a 10 sec sampling
rate a lot since that is what has been used in OpenShift. I used only gauge metrics and
used a uniform distribution for the values. The size of the raw data on disk *without*
gorilla compression was 192 MB. We are still using Cassandra’s default compression, LZ4.
With gorilla compression the size of the live data on disk is 4.2 MB, which is in stark
contrast with the size of the raw data. There are several things to note. First, we
compress data in 2 hour blocks by default, which makes it a bit difficult to say how big a
compressed sample is. The data set will affect the overall compression as well. It would
be nice though if we did publish something about the compressed sample size even if it is
with some caveats.
I mentioned that the the 4.2 MB is the size of the *live* data. As hawkular-metrics
ingests data it does not cache and compress it in memory. Instead the data points are
written to the data table just as they were prior to introducing gorilla compression.
There is a background job that does the compression. It runs every 2 hours. When a 2 hour
block of a data points for a time series is compressed and persisted, the corresponding
raw data is deleted. If you are familiar with Cassandra, then you might be aware that
deletes do not happen immediately. If hawkular-metrics is running with a single Cassandra
node and/or if the replication_factor is 1, then gc_grace_seconds will be set to zero, so
deleted data should get purged pretty fast. For a multi-node C* cluster with replication,
gc_grace_seconds is set to 7 days.
It is a lot of little details, but they can impact the actual numbers you see, which is
why I stressed that I was measuring the size of the live data. I hope this is helpful.
- John
On Dec 12, 2016, at 11:41 PM, John Sanda <jsanda(a)redhat.com>
wrote:
While this is anecdotal, it offers a bit of perspective. I have done testing on my laptop
with sustained ingestion rates of 5,000 and 6,000 samples / 10 seconds without a single
dropped mutation. This is with a single Cassandra 3.0.9 node create with ccm on a laptop
running Fedora 24 and having 16 GB RAM, 8 cores, and SSD. We also use test environment
with virtual machines and shared storage where we might have a tough time sustaining those
ingestion rates on a single node depending on the time of day.
> On Dec 12, 2016, at 11:22 PM, John Sanda <jsanda(a)redhat.com
<mailto:jsanda@redhat.com>> wrote:
>
> Hey Daniel,
>
> Sorry for the late reply. The person who did all the compression work (gayak on
freenode) probably will not be around much for the rest of the year. He would be the best
person to answer questions on compression; however, I should have some numbers to report
back to you tomorrow.
>
> With respect to performance handling 100 samples/second is not a problem, but just
like with any other Cassandra TSDB, your hardware configuration is going to be a big
factor. If you do not have good I/O performance for the commit log, ingestion is going to
suffer. I will let Stefan chime with some thoughts on EC2 instance types.
>
> Lastly, we welcome and appreciate community involvement. Your use case sounds really
interesting, and we’ll do our best to get your questions answered and get you up and
running.
>
> - John
>
>> On Dec 8, 2016, at 12:05 PM, Daniel Miranda <danielkza2(a)gmail.com
<mailto:danielkza2@gmail.com>> wrote:
>>
>> Forgot the links. The uncompressed storage estimates are actually for NewTS, but
they should not be much different for any other Cassandra-backed TSDB without
compression.
>>
>> [1]
https://www.adventuresinoss.com/2016/01/22/opennms-at-scale/
<
https://www.adventuresinoss.com/2016/01/22/opennms-at-scale/>
>> [2]
https://prometheus.io/docs/operating/storage/
<
https://prometheus.io/docs/operating/storage/>
>>
>> Em qui, 8 de dez de 2016 às 15:00, Daniel Miranda <danielkza2(a)gmail.com
<mailto:danielkza2@gmail.com>> escreveu:
>> Greetings,
>>
>> I'm looking for a distributed time-series database, preferably backed by
Cassandra, to help monitor about 30 instances in AWS (with a perspective of quick growth
in the future). Hawkular Metrics seems interesting due to it's native clustering
support and use of compression, since naively using Cassandra is quite inefficient -
KairosDB seems to need about 12B/sample [1], which is *way* higher than other systems with
custom storage backends (Prometheus can do ~1B/sample [2]).
>>
>> I would like to know if there are any existing benchmarks for how Hawkular's
ingestion and compression perform, and what kind of resources I would need to handle
something like 100 samples/producer/second, hopefully with retention for 7 and 30 days
(the latter with reduced precision).
>>
>> My planned setup is Collectd -> Riemann -> Hawkular (?) with Grafana for
visualization.
>>
>> Thanks in advance,
>> Daniel
>> _______________________________________________
>> hawkular-dev mailing list
>> hawkular-dev(a)lists.jboss.org <mailto:hawkular-dev@lists.jboss.org>
>>
https://lists.jboss.org/mailman/listinfo/hawkular-dev
<
https://lists.jboss.org/mailman/listinfo/hawkular-dev>
>
> _______________________________________________
> hawkular-dev mailing list
> hawkular-dev(a)lists.jboss.org <mailto:hawkular-dev@lists.jboss.org>
>
https://lists.jboss.org/mailman/listinfo/hawkular-dev
_______________________________________________
hawkular-dev mailing list
hawkular-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hawkular-dev