For whatever I just saw this for this for the first time earlier today. If additional
capacity is needed, the primary solution should be to deploy additional Cassandra nodes in
order to decrease the load per node. Running compactions manually, i.e., major
compactions, should be considered a last resort. Cassandra runs compactions automatically.
These are referred to as minor compactions. With a major compaction, Cassandra will merge
together all SSTables. With a minor compaction and size-tiered compaction which we use for
the data table, Cassandra will merge 4 tables of similar size. The settings are
configurable but we can stick with the defaults for the discussion. The reason running
major compactions is discouraged is because it often results in minor compactions no
longer being executed due to the way size-tiered compaction works. This means that users
would have to set up a job to manually run compactions. Not the end of the world, but
something that definitely needs to be taken into consideration.
Another thing to consider is how deletes are handled by Cassandra. Deleted data is not
immediately purged from disk. There is a grace period which is configured per table. It
defaults to 10 days. Deleted data is only purged after that grace period. So if we have a
bunch of data files on disk that consist of only non-deleted data and/or deleted data for
which the grace period has not passed, running a major compaction probably won’t reclaim
much space. With HWKMETRICS-367, we have reduced the grace period from the default of 10
days to one day.
We need documentation to explain all of this as well as docs to provide some sizing
guidelines. I have also floated the idea of a trigger that could either reject writes or
send an email notification when disk usage exceeds a threshold, which should be well
before the disk is full. If the disk is close to being full and if the data set is large
enough, running a compaction could fail as it requires additional disk space.
On Mar 11, 2016, at 11:55 AM, Heiko W.Rupp <hrupp(a)redhat.com>
wrote:
Hey,
<captain_obvious>
so for Hawkular-metrics (and Hawkular) we store the data in a Cassandra
database that puts files on a disk,
which can get full earlier than expected (and usually on week-ends). And
when the disk is full, Metrics does not like it.
</captain_obvious>
What can we do in this case?
I could imagine that on the C* nodes we run a script that uses df to
figure out the available space and tries
to run some compaction if space gets tight.
Of course that does not solve the issue per se, but should give some air
to breathe.
Right now I fear we are not able to reduce the ttl of a metric/tenant on
the fly and have metrics do the right thing - at least if I understood
yak correctly.
That script should possibly also send an email to an admin.
In case that we run Hawkular-full, we can determine the disk space and
feed that into Hawkular for Alerts to pick it up and then have the
machinery trigger the compaction and send the email.
_______________________________________________
hawkular-dev mailing list
hawkular-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hawkular-dev