<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">I had been looking at the alerts schema briefly and wanted to share some thoughts on schema design in general. My understanding is that there can be many alerts for a single trigger. And based on my understanding and experience with RHQ, it is possible for there to be a large number of alerts. Here is the schema for the alerts table,<div class=""><br class=""></div><div class=""><div class="">CREATE TABLE alerts (</div><div class=""> tenantid text,</div><div class=""> alertid text,</div><div class=""> payload text,</div><div class=""> PRIMARY KEY (tenantid, alertid)</div><div class="">)</div></div><div class=""><br class=""></div><div class="">Simple enough. Here is a short test driver[1] I wrote to insert 100 million alerts into a 3 node cluster. Note that only a single tenant is used. As the test ran, I periodically ran nodetool status to check the load on the cluster. You can see the output of that here[2]. You will that only of the nodes owns virtually all of load. 127.0.0.1 has over 2 GB of data while the other two have less than 500 KB each.</div><div class=""><br class=""></div><div class="">All of the alerts for a single tenant are stored in a single partition. In other words, all alerts are stored in one physical row on disk. This does not scale. Regardless of the number of nodes, only one node owns all of the data for a given tenant. This will also probably increasingly negative impact on read performance. </div><div class=""><br class=""></div><div class="">The first change I might consider is to partition by trigger, but that alone is probably not sufficient. What if a single trigger generates thousands of alerts per day? That is still going to lead to hot spots (i.e., partition becoming too large). I would also consider partitioning by some time slice. Something like this,</div><div class=""><br class=""></div><div class=""><div class="">CREATE TABLE alerts (</div><div class=""> tenantid text,</div><div class=""> triggerid text,</div><div class=""> date timestamp,</div><div class=""> alertid text,</div><div class=""> payload text,</div><div class=""> PRIMARY KEY ((tenantid, triggerid, date), alertid)</div><div class="">)</div></div><div class=""><br class=""></div><div class="">Now data is partitioned by tenant, trigger, and date. The date column might be rounded down to week, day, hour, etc. It really depends on the frequency of writes as well as the read patterns. Now the data will be spread across nodes in a more scalable way. And when you have to fetch data across multiple time slices, you can execute the queries in parallel.</div><div class=""><br class=""></div><div class="">[1] <a href="https://gist.github.com/jsanda/ce07905fa9f16f13c661" class="">https://gist.github.com/jsanda/ce07905fa9f16f13c661</a></div><div class="">[2] <a href="https://gist.github.com/jsanda/6d2ee6cd5bcab07331d2" class="">https://gist.github.com/jsanda/6d2ee6cd5bcab07331d2</a></div></body></html>