C* schema design

Thursday, 18 June 2015

I had been looking at the alerts schema briefly and wanted to share some thoughts on
schema design in general. My understanding is that there can be many alerts for a single
trigger. And based on my understanding and experience with RHQ, it is possible for there
to be a large number of alerts.  Here is the schema for the alerts table,

CREATE TABLE alerts (
    tenantid text,
    alertid text,
    payload text,
    PRIMARY KEY (tenantid, alertid)
)

Simple enough. Here is a short test driver[1] I wrote to insert 100 million alerts into a
3 node cluster. Note that only a single tenant is used. As the test ran, I periodically
ran nodetool status to check the load on the cluster. You can see the output of that
here[2]. You will that only of the nodes owns virtually all of load. 127.0.0.1 has over 2
GB of data while the other two have less than 500 KB each.

All of the alerts for a single tenant are stored in a single partition. In other words,
all alerts are stored in one physical row on disk. This does not scale. Regardless of the
number of nodes, only one node owns all of the data for a given tenant. This will also
probably increasingly negative impact on read performance. 

The first change I might consider is to partition by trigger, but that alone is probably
not sufficient. What if a single trigger generates thousands of alerts per day? That is
still going to lead to hot spots (i.e., partition becoming too large). I would also
consider partitioning by some time slice. Something like this,

CREATE TABLE alerts (
    tenantid text,
    triggerid text,
    date timestamp,
    alertid text,
    payload text,
    PRIMARY KEY ((tenantid, triggerid, date), alertid)
)

Now data is partitioned by tenant, trigger, and date. The date column might be rounded
down to week, day, hour, etc. It really depends on the frequency of writes as well as the
read patterns. Now the data will be spread across nodes in a more scalable way. And when
you have to fetch data across multiple time slices, you can execute the queries in
parallel.

[1] https://gist.github.com/jsanda/ce07905fa9f16f13c661
<https://gist.github.com/jsanda/ce07905fa9f16f13c661>
[2] https://gist.github.com/jsanda/6d2ee6cd5bcab07331d2
<https://gist.github.com/jsanda/6d2ee6cd5bcab07331d2>

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015