Re: [Hawkular-dev] managing cassandra cluster

Monday, 12 September 2016

I’d like to expose something in the REST API.

...
 On Sep 6, 2016, at 3:43 AM, Thomas Heute <theute(a)redhat.com&gt;
wrote:

 Agreed.

 What user interface do you have in mind here ? CLI ? JMX ? WebUI ?

 On Fri, Sep 2, 2016 at 5:34 PM, John Sanda <jsanda(a)redhat.com
<mailto:jsanda@redhat.com>> wrote:
 To date we haven’t really done anything by way of managing/monitoring the Cassandra
cluster. We need to monitor Cassandra in order to know things like:

 * When additional nodes are needed
 * When disk space is low
 * When I/O is too slow
 * When more heap space is needed

 Cassandra exposes a lot of metrics. I created HWKMETRICS-448. It briefly talks about
collecting metrics from Cassandra. In terms of managing the cluster, I will provide a few
concrete examples that have come up recently in OpenShift.

 Scenario 1: User deploys additional node(s) to reduce the load on cluster
 After the new node has bootstrapped and is running, we need to run nodetool cleanup on
each node (or run it via JMX) in order to remove keys/data that each each node no longer
owns; otherwise, disk space won’t be freed up. The cleanup operation can potentially be
resource intensive as it triggers compactions. Given this, we probably want to run it one
node at a time. Right now the user is left to do this manually.

 Scenario 2: User deploys additional node(s) to get replication and fault tolerance
 I connect to Cassandra directly via cqlsh and update replication_factor. I then need to
run repair on each node can be tricky because 1) it is resource intensive, 2) can take a
long time, 3) prone to failure, and 4) Cassandra does not give progress indicators.

 Scenario 3: User sets up regularly, scheduled repair to ensure data is consistent across
cluster
 Once replication_factor > 1, repair needs to be run on a regular basis. More
specifically it should be run within gc_grace_seconds which is configured per table and
defaults to 10 days. The data table in metrics has reduced gc_grace_seconds to 1 day and
probably reduce it to zero since it is append-only. The value for gc_grace_seconds might
vary per table based on access patterns, which means the frequency of repair should vary
as well.

 There has already been some discussion of these things for Hawkular Metrics in the
context of OpenShift. It applies to all of Hawkular Services as well. Initially I was
thinking about building some management components directly in metrics, but it probably
makes more sense as a separate, shared component (or components) that can be reused in
both stand alone metrics in OpenShift and a full Hawkular Services deployment in MiQ for
example.

 We are already running into these scenarios in OpenShift and probably need to start
putting something in place sooner rather than later.
 _______________________________________________
 hawkular-dev mailing list
 hawkular-dev(a)lists.jboss.org <mailto:hawkular-dev@lists.jboss.org>
 https://lists.jboss.org/mailman/listinfo/hawkular-dev
<https://lists.jboss.org/mailman/listinfo/hawkular-dev>

 _______________________________________________
 hawkular-dev mailing list
 hawkular-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/hawkular-dev 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Hawkular-dev] managing cassandra cluster