Re: [Hawkular-dev] managing cassandra cluster

Monday, 12 September 2016

...
 On Sep 6, 2016, at 11:45 AM, Matt Wringe <mwringe(a)redhat.com&gt;
wrote:

 ----- Original Message -----
> From: "Michael Burman" <miburman(a)redhat.com
<mailto:miburman@redhat.com>>
> To: "Discussions around Hawkular development"
<hawkular-dev(a)lists.jboss.org <mailto:hawkular-dev@lists.jboss.org>>
> Sent: Tuesday, 6 September, 2016 11:09:45 AM
> Subject: Re: [Hawkular-dev] managing cassandra cluster
> 
> Hi,
> 
> Well, actually I would say for Openshift we should try to hit a single
> container strategy, which has both HWKMETRICS & Cassandra deployed. This is
> to ensure that some of the operations we're going to do in the future can
> take advantage of data locality.

 Hawkular Metrics and Cassandra are not cheap components to run in a cluster, they take up
a lot of resources. If we can run multiple Cassandras for every 1 Hawkular (or vice versa)
then we can use a lot less resources than if we always need to scale both up because one
has become a bottleneck.

 I would really not want to bundle these together if we don't have to.

 What operations would require bundling them together exactly?

> So unless we run a separate service inside the Cassandra containers, there's
> no easy single metric to get from Cassandra that provides "needs scaling".

 This is what I was expecting, I was not expecting Cassandra itself to provide a nice
default metric to scale or not based on a specific value. We would have some service
running along with Cassandra monitoring itself and the cluster to determine if scaling is
required or not. 
Here’s what I had in mind at least for an initial effort. We package our own metrics
reporter[1] with Cassandra. It pushes data to hawkular metrics. We provide a management
endpoint that will indicate whether or not scaling is necessary based on the metrics we
collect.

[1] http://www.datastax.com/dev/blog/pluggable-metrics-reporting-in-cassandra...
<http://www.datastax.com/dev/blog/pluggable-metrics-reporting-in-cassandra...
...

> 
>  - Micke
> 
> ----- Original Message -----
> From: "Matt Wringe" <mwringe(a)redhat.com&gt;
> To: "Discussions around Hawkular development"
<hawkular-dev(a)lists.jboss.org&gt;
> Sent: Tuesday, September 6, 2016 5:26:07 PM
> Subject: Re: [Hawkular-dev] managing cassandra cluster
> 
> 
> 
> ----- Original Message -----
>> From: "John Sanda" <jsanda(a)redhat.com&gt;
>> To: "Discussions around Hawkular development"
>> <hawkular-dev(a)lists.jboss.org&gt;
>> Sent: Friday, 2 September, 2016 11:34:07 AM
>> Subject: [Hawkular-dev] managing cassandra cluster
>> 
>> To date we haven’t really done anything by way of managing/monitoring the
>> Cassandra cluster. We need to monitor Cassandra in order to know things
>> like:
>> 
>> * When additional nodes are needed
>> * When disk space is low
>> * When I/O is too slow
>> * When more heap space is needed
>> 
>> Cassandra exposes a lot of metrics. I created HWKMETRICS-448. It briefly
>> talks about collecting metrics from Cassandra. In terms of managing the
>> cluster, I will provide a few concrete examples that have come up recently
>> in OpenShift.
>> 
>> Scenario 1: User deploys additional node(s) to reduce the load on cluster
>> After the new node has bootstrapped and is running, we need to run nodetool
>> cleanup on each node (or run it via JMX) in order to remove keys/data that
>> each each node no longer owns; otherwise, disk space won’t be freed up. The
>> cleanup operation can potentially be resource intensive as it triggers
>> compactions. Given this, we probably want to run it one node at a time.
>> Right now the user is left to do this manually.
>> 
>> Scenario 2: User deploys additional node(s) to get replication and fault
>> tolerance
>> I connect to Cassandra directly via cqlsh and update replication_factor. I
>> then need to run repair on each node can be tricky because 1) it is
>> resource
>> intensive, 2) can take a long time, 3) prone to failure, and 4) Cassandra
>> does not give progress indicators.
>> 
>> Scenario 3: User sets up regularly, scheduled repair to ensure data is
>> consistent across cluster
>> Once replication_factor > 1, repair needs to be run on a regular basis.
>> More
>> specifically it should be run within gc_grace_seconds which is configured
>> per table and defaults to 10 days. The data table in metrics has reduced
>> gc_grace_seconds to 1 day and probably reduce it to zero since it is
>> append-only. The value for gc_grace_seconds might vary per table based on
>> access patterns, which means the frequency of repair should vary as well.
>> 
>> 
>> There has already been some discussion of these things for Hawkular Metrics
>> in the context of OpenShift. It applies to all of Hawkular Services as
>> well.
>> Initially I was thinking about building some management components directly
>> in metrics, but it probably makes more sense as a separate, shared
>> component
>> (or components) that can be reused in both stand alone metrics in OpenShift
>> and a full Hawkular Services deployment in MiQ for example.
> 
> On OpenShift, the ideal situation here would be to have the Cassandra
> instances themselves expose a metric that we can use to determine when the
> Cassandra cluster is under too much load and needs to scale up. The HPA
> would then read this metric and automatically scale the cluster up if
> needed.
> 
> If we determine that cannot be done for whatever reason and that Hawkular
> Metrics needs to determine when to scale or not, there are ways we can do
> this. But it gets a little bit more tricky. If given the right permissions
> we can go out to the cluster and do things like scale up components, perform
> operations of the Cassandra containers directly, etc. Ideally the HPA should
> be handling this, but we could get around it if absolutely needed.
> 
>> 
>> We are already running into these scenarios in OpenShift and probably need
>> to
>> start putting something in place sooner rather than later.
>> _______________________________________________
>> hawkular-dev mailing list
>> hawkular-dev(a)lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/hawkular-dev
>> 
> 
> _______________________________________________
> hawkular-dev mailing list
> hawkular-dev(a)lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hawkular-dev
> 
> _______________________________________________
> hawkular-dev mailing list
> hawkular-dev(a)lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hawkular-dev
> 

 _______________________________________________
 hawkular-dev mailing list
 hawkular-dev(a)lists.jboss.org <mailto:hawkular-dev@lists.jboss.org>
 https://lists.jboss.org/mailman/listinfo/hawkular-dev
<https://lists.jboss.org/mailman/listinfo/hawkular-dev> 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Hawkular-dev] managing cassandra cluster