Re: [Hawkular-dev] managing cassandra cluster

Tuesday, 6 September 2016

----- Original Message -----
...
 From: "Michael Burman" <miburman(a)redhat.com&gt;
 To: "Discussions around Hawkular development"
<hawkular-dev(a)lists.jboss.org&gt;
 Sent: Tuesday, 6 September, 2016 11:09:45 AM
 Subject: Re: [Hawkular-dev] managing cassandra cluster

 Hi,

 Well, actually I would say for Openshift we should try to hit a single
 container strategy, which has both HWKMETRICS & Cassandra deployed. This is
 to ensure that some of the operations we're going to do in the future can
 take advantage of data locality. 
Hawkular Metrics and Cassandra are not cheap components to run in a cluster, they take up
a lot of resources. If we can run multiple Cassandras for every 1 Hawkular (or vice versa)
then we can use a lot less resources than if we always need to scale both up because one
has become a bottleneck.

I would really not want to bundle these together if we don't have to.

What operations would require bundling them together exactly?

...
 So unless we run a separate service inside the Cassandra containers,
there's
 no easy single metric to get from Cassandra that provides "needs scaling".

This is what I was expecting, I was not expecting Cassandra itself to provide a nice
default metric to scale or not based on a specific value. We would have some service
running along with Cassandra monitoring itself and the cluster to determine if scaling is
required or not.

...

   - Micke

 ----- Original Message -----
 From: "Matt Wringe" <mwringe(a)redhat.com&gt;
 To: "Discussions around Hawkular development"
<hawkular-dev(a)lists.jboss.org&gt;
 Sent: Tuesday, September 6, 2016 5:26:07 PM
 Subject: Re: [Hawkular-dev] managing cassandra cluster

 ----- Original Message -----
 > From: "John Sanda" <jsanda(a)redhat.com&gt;
 > To: "Discussions around Hawkular development"
 > <hawkular-dev(a)lists.jboss.org&gt;
 > Sent: Friday, 2 September, 2016 11:34:07 AM
 > Subject: [Hawkular-dev] managing cassandra cluster
 > 
 > To date we haven’t really done anything by way of managing/monitoring the
 > Cassandra cluster. We need to monitor Cassandra in order to know things
 > like:
 > 
 > * When additional nodes are needed
 > * When disk space is low
 > * When I/O is too slow
 > * When more heap space is needed
 > 
 > Cassandra exposes a lot of metrics. I created HWKMETRICS-448. It briefly
 > talks about collecting metrics from Cassandra. In terms of managing the
 > cluster, I will provide a few concrete examples that have come up recently
 > in OpenShift.
 > 
 > Scenario 1: User deploys additional node(s) to reduce the load on cluster
 > After the new node has bootstrapped and is running, we need to run nodetool
 > cleanup on each node (or run it via JMX) in order to remove keys/data that
 > each each node no longer owns; otherwise, disk space won’t be freed up. The
 > cleanup operation can potentially be resource intensive as it triggers
 > compactions. Given this, we probably want to run it one node at a time.
 > Right now the user is left to do this manually.
 > 
 > Scenario 2: User deploys additional node(s) to get replication and fault
 > tolerance
 > I connect to Cassandra directly via cqlsh and update replication_factor. I
 > then need to run repair on each node can be tricky because 1) it is
 > resource
 > intensive, 2) can take a long time, 3) prone to failure, and 4) Cassandra
 > does not give progress indicators.
 > 
 > Scenario 3: User sets up regularly, scheduled repair to ensure data is
 > consistent across cluster
 > Once replication_factor > 1, repair needs to be run on a regular basis.
 > More
 > specifically it should be run within gc_grace_seconds which is configured
 > per table and defaults to 10 days. The data table in metrics has reduced
 > gc_grace_seconds to 1 day and probably reduce it to zero since it is
 > append-only. The value for gc_grace_seconds might vary per table based on
 > access patterns, which means the frequency of repair should vary as well.
 > 
 > 
 > There has already been some discussion of these things for Hawkular Metrics
 > in the context of OpenShift. It applies to all of Hawkular Services as
 > well.
 > Initially I was thinking about building some management components directly
 > in metrics, but it probably makes more sense as a separate, shared
 > component
 > (or components) that can be reused in both stand alone metrics in OpenShift
 > and a full Hawkular Services deployment in MiQ for example.

 On OpenShift, the ideal situation here would be to have the Cassandra
 instances themselves expose a metric that we can use to determine when the
 Cassandra cluster is under too much load and needs to scale up. The HPA
 would then read this metric and automatically scale the cluster up if
 needed.

 If we determine that cannot be done for whatever reason and that Hawkular
 Metrics needs to determine when to scale or not, there are ways we can do
 this. But it gets a little bit more tricky. If given the right permissions
 we can go out to the cluster and do things like scale up components, perform
 operations of the Cassandra containers directly, etc. Ideally the HPA should
 be handling this, but we could get around it if absolutely needed.

 > 
 > We are already running into these scenarios in OpenShift and probably need
 > to
 > start putting something in place sooner rather than later.
 > _______________________________________________
 > hawkular-dev mailing list
 > hawkular-dev(a)lists.jboss.org
 > https://lists.jboss.org/mailman/listinfo/hawkular-dev
 > 

 _______________________________________________
 hawkular-dev mailing list
 hawkular-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/hawkular-dev

 _______________________________________________
 hawkular-dev mailing list
 hawkular-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/hawkular-dev

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Hawkular-dev] managing cassandra cluster