[Hawkular-dev] C* schema design
Jay Shaughnessy
jshaughn at redhat.com
Tue Jun 30 16:42:30 EDT 2015
Before getting to Alerts, maybe we should touch a little on the
persisted data in the model. At the moment everything is pretty much
partitioned on TenantId. For example:
CREATE TABLE ${keyspace}.triggers (
tenantId text,
// other fields...
id text,
PRIMARY KEY (tenantId, id)
)
In general the number of Triggers will be relatively small, likely
maxing out in the thousands. It could feasibly be a very small number.
Queries are usually going to be on a specific Trigger. But queries on
all triggers or some cross-section (likely based on the results of a
prior Tag search) are possible. But all searches will likely be within
a tenant. Leaving it as is would give us single-partition searches.
Data would only be spread cross-partition if multiple tenants are in
play. I'd say this is likely OK as is, or we could try and spread
things across partitions by manufacturing an additional field, like
"idbucket" that was a modulo of the id hash. That could ensure a few
partitions even if there is only one tenant.
Whether we leave it as is or tweak it with a bucket, I would think the
trigger-actions, conditions, dampening and tags that hang off a trigger
should probably have the same partitionId as the trigger. They would get
the same distribution and should only hit one partition on a query,
which typically goes through the owning trigger.
Actions are queried by tenantId and actionPlugin. Again, the low
cardinality would probably make the current tenantId partitionId OK, but
we could also use (tenantId,pluginName) and get distribution +
single-partition querying.
Tags are a little funny because the category field is optional but is
also something that could be queried. When searching for Triggers the
queries are likely (tenantId,name) or (tenantId,name,category). When
displaying a Trigger it would be (tenantId,triggerId). In this case
it's probably best to store full tags in two ways, with a clustering key
of name, and a clustering key of triggerId. This is basically what we
have already. We have 'name' and 'triggerId,name' We may be able to
drop the 'name' segment from that last one. We have a secondary index
on category for both, I think we may be able to drop the one on tags
table and keep it only on tags_triggers table.
Alerts are of course the big thing. Although millions of alerts is
probably unlikely (certainly unwanted), the number is hard to predict.
But it will be the highest cardinality by far, as single triggers can
generate many alerts. The thing about alert querying, I think, is that
it is the most recent, unresolved alerts, that are very likely the most
interesting. Once an alert is resolved, and certainly once it is
resolved and old, its usefulness typically goes down considerably. At
that point queries would be more for aggregate reporting, etc.
I don't think in a well-behaved system that a trigger would generate
thousands of alerts in a day. It certainly could generate thousands, or
more, in a lifetime.
Queries for alerts are often across triggers. reports-style queries
will likely be in a time range.
For pure distribution John's suggestion of PRIMARY KEY ((tenantid,
triggerid, date), alertid) may be good but would suffer multi-partition
hits for a time-based query. We may also want to consider PRIMARY KEY
((tenantid, date), triggerId, alertid) with 'date' being some length in
which most alerts become less interesting. Like maybe a week. i don't
know if all alerts that get generated in an entire week may be too much
for a partition, maybe a day, or 3 days, or something would be safer.
We have several secondary tables for search. These should likely be
partitioned similarly.
let's keep discussing, I'm new at this and may have missed the mark...
On 6/18/2015 12:41 PM, Lucas Ponce wrote:
>
>> CREATE TABLE alerts (
>> tenantid text,
>> triggerid text,
>> date timestamp,
>> alertid text,
>> payload text,
>> PRIMARY KEY ((tenantid, triggerid, date), alertid)
>> )
>>
> Hey John,
>
> Thanks for the tip, I had exactly this approach in my todo-list for adding triggerid in the partition key.
> Also the date is interesting.
>
> The main idea is to locate all info related on an specific trigger close, to avoid stress the full partition.
>
>
>
> _______________________________________________
> hawkular-dev mailing list
> hawkular-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hawkular-dev
>
More information about the hawkular-dev
mailing list