Re: [Hawkular-dev] metrics on the bus

Thursday, 18 August 2016

It's possible the solution proposed by John should be the ultimate 
goal.  It has the advantage of creating a distribution of 
Metrics+Alerting while also solving an H Services performance issue.  
But I don't really buy this argument which distills down to, "what if 
there are bugs?".  And I don't really buy into John's argument that the 
JMS use in H Services has an inherent performance issue.  Without the 
full publishing of all metrics, and the subsequent filtering by 
alerting, I'm sure it can handle what we throw at it and plenty more.  
Finally, I don't really agree with Lucas that John's approach is a major 
architectural change, it presents a packaging issue and adds some 
integration code.  It doesn't propose removal of JMS/bus in H Services.

So personally, I don't have an issue with PR-568 as an immediate 
solution to the existing performance issue. I also don't mind if we 
close it and immediately implement John's approach as it potentially 
gives us two wins for the price of one.  And I also don't mind if PR-568 
is an interim solution if we decide to defer the co-packaging approach.  
I think we need a decision from Heiko and Thomas as to how to proceed, 
and minimally, I appreciate that Lucas has been proactive in providing 
at least one solution.

On 8/17/2016 6:57 PM, Stefan Negrea wrote:
...
 One of the contentions that I have with PR-568 is that introduces
more 
 failures points for data prior to reaching Alerts. Publishing all data 
 directly is a simple proposition: data comes in, is persisted to 
 Cassandra, and at the same time sent via JMS. The PR introduces 
 multiple additional failure points and a failure in the Metrics will 
 go unnoticed. For example, what if the filtering mechanism all of a 
 sudden crashed, what then? What if the data being filtered does not 
 match the expectations from Alerts; as in Alerts requested data for a 
 metric id to be sent but Metrics lost track of that and does not 
 report data for that metric id.

 Going back to the replies from Randall, in order for PR-568 to be an 
 alternative to what is done today, we will need to design a lot of 
 additional features to get the same level of delivery confidence and 
 guarantee that we have today (without the PR).

 https://github.com/hawkular/hawkular-metrics/pull/568

 Thank you,
 Stefan Negrea

 On Wed, Aug 17, 2016 at 4:45 PM, Jay Shaughnessy <jshaughn(a)redhat.com 
 <mailto:jshaughn@redhat.com>> wrote:

     +1.  Although Randall is right, there is definitely a chance of
     inconsistency between what is persisted and what is processed by
     alerting, I think it's acceptable for our purposes.  In general users
     have historically accepted that server downtime can result in missed
     alerts.  Moreover, almost all of the alerting scenarios involve
     behavior
     over time.

     On 8/17/2016 5:44 AM, Michael Burman wrote:
     > Hi,
     >
     > Storing to Cassandra and JMS is not atomic as Cassandra does not
     provide transactions and especially not 2PC. So they're two
     different writes and can always result in inconsistency, no matter
     the secondary transport protocol. Also, is alerts even capable of
     handling all the possible crash scenarios? And do we even care
     about such a small window of potential data loss to the alerting
     engine in the case of a crash (which will take down both metrics &
     alerts on that node) ? We don't provide strict consistency with
     default metrics setting either, defaulting to one node
     acknowledges in Cassandra. There are multiple theoretical
     scenarios where we could in multi node scenario lose data or get
     inconsistencies.
     >
     > I think these are acceptable however for our use case. Even
     assuming we would lose one "node down" datapoint, that same
     situation probably persist for the next datapoint -> alert
     triggers, if you lose one metric datapoint from a bucket the
     calculated averages or percentiles etc only suffer a minor
     precision imperfection. Not to mention that almost everything in
     monitoring is already a discrete information sampled at certain
     point of time and not a continuous real value, so precision is
     lost before it even arrives to us.
     >
     > For those reasons I'd say these "problems" are more academical,
     without any real world implications in this domain.
     >
     >    - Micke
     >
     > ----- Original Message -----
     > From: "Randall Hauch" <rhauch(a)redhat.com
<mailto:rhauch@redhat.com>>
     > To: "Discussions around Hawkular development"
     <hawkular-dev(a)lists.jboss.org <mailto:hawkular-dev@lists.jboss.org>>
     > Sent: Tuesday, August 16, 2016 7:42:56 PM
     > Subject: Re: [Hawkular-dev] metrics on the bus
     >
     > I agree that the distributed system is probably more fault
     tolerant when using JMS than putting everything into a single app
     and forgoing JMS.
     >
     > BTW, does metrics write data to Cassandra and publish to JMS
     atomically? If not, that’s also a window for failure that might
     result in data loss. Something to consider if Hawkular requires
     complete consistency and can’t afford data loss.
     >
     >> On Aug 16, 2016, at 11:08 AM, John Sanda <jsanda(a)redhat.com
     <mailto:jsanda@redhat.com>> wrote:
     >>
     >> With the JMS solution we have in place right now, data points
     are published after they have persisted in Cassandra. We can
     certainly keep that same behavior.
     >>
     >>> On Aug 16, 2016, at 11:49 AM, Randall Hauch <rhauch(a)redhat.com
     <mailto:rhauch@redhat.com>> wrote:
     >>>
     >>> Sorry, I’ve been lurking. One thing to consider is how each
     approach handles failures. For example, what happens if the system
     crashes after processed by metrics but before alerts picks it up?
     Will the system become inconsistent or will some events be lost
     before alerts sees them?
     >>>
     >>> Really, in order for the system to be completely fault
     tolerant, each component has to be completely atomic. Components
     that use “dual writes” (e.g., write to one system, then write to
     another outside of a larger transaction) will always be subject to
     losing data/events during a very inopportune failure. Not only
     that, a system comprised of multiple components that individually
     are safe might still be subject to losing data/events.
     >>>
     >>> I hope this is helpful.
     >>>
     >>> Randall
     >>>
     >>>> On Aug 16, 2016, at 10:25 AM, John Sanda <jsanda(a)redhat.com
     <mailto:jsanda@redhat.com>> wrote:
     >>>>
     >>>> I considered clustering before making the suggestion.
     MetricDataListener listens to a JMS topic for data points. When it
     receives data points, it passes those data points to AlertsEngine
     which in turn writes the data points into an ISPN, distributed
     cache. And then it looks like those data point get processed via a
     cache entry listener in AlertsEngineImpl. If I understand this
     data flow correctly, then I think it will work just as well if not
     better in a single WAR. Rather than getting notifications from a
     JMS topic, MetricDataListener can receive notifications from an
     Observable that pushes data point as they received in client
     requests. Metrics will also subscribe to that same Observable so
     that it can persist the data points. The fact that alerts is using
     a distributed cache works to our advantage here because it
     provides a mechanism for distributing data across nodes.
     >>>>
     >>>>> On Aug 16, 2016, at 3:29 AM, Lucas Ponce <lponce(a)redhat.com
     <mailto:lponce@redhat.com>> wrote:
     >>>>>
     >>>>> This is a big point.
     >>>>>
     >>>>> I can see pros and cons on it.
     >>>>>
     >>>>> First thing it comes to me is that metrics has a stateless
     nature meanwhile alerts is stateful.
     >>>>>
     >>>>> So a first coupling would work for a single node but when we
     want to scale our troubles can start as the design in clustered
     scenarios is completely different and a single .war won't help IMO.
     >>>>>
     >>>>> I don't think our current design is bad, in the context of
     the HAWKULAR-1102 and working in a demand publishing draft we are
     addressing the business issues that triggered this discussion.
     >>>>>
     >>>>> But I would like to hold this topic for a future
     architecture face to face meeting, to discuss it from all angles
     as we did on Madrid.
     >>>>>
     >>>>> (Counting with a face to face meeting in a reasonable
     timeframe, of course).
     >>>>>
     >>>>> Lucas
     >>>>>
     >>>>> ----- Mensaje original -----
     >>>>>> De: "John Sanda" <jsanda(a)redhat.com
<mailto:jsanda@redhat.com>>
     >>>>>> Para: "Discussions around Hawkular development"
     <hawkular-dev(a)lists.jboss.org <mailto:hawkular-dev@lists.jboss.org>>
     >>>>>> Enviados: Lunes, 15 de Agosto 2016 16:45:28
     >>>>>> Asunto: Re: [Hawkular-dev] metrics on the bus
     >>>>>>
     >>>>>> We use JMS in large part because metrics and alerts are in
     separate WARs (I
     >>>>>> realize JMS is used for other purposes, but I am speaking
     strictly about
     >>>>>> this scenario). Why not deploy metrics and alerts in the
     same WAR and
     >>>>>> altogether bypass JMS? As data points are ingested, we
     broadcast them using
     >>>>>> an Rx subject to which both metrics and alerts subscribe.
     We could do this
     >>>>>> is in away that still keeps metrics and alerts decoupled as
     they are today.
     >>>>>> We would also have the added benefit of having a stand
     alone deployment for
     >>>>>> metrics and alerts.
     >>>>>>
     >>>>>>
     >>>>>>
     >>>>>>
     >>>>>> On Aug 10, 2016, at 9:37 AM, Jay Shaughnessy <
     jshaughn(a)redhat.com <mailto:jshaughn@redhat.com> > wrote:
     >>>>>>
     >>>>>>
     >>>>>> Yes, in fact I should have made it more clear that this
     whole discussion is
     >>>>>> bounded by H Metrics and H Alerting in the H Services
     context, so limiting
     >>>>>> this to HS/Bus integration code is what we'd want to
do.
     >>>>>>
     >>>>>> On 8/10/2016 4:06 AM, Heiko W.Rupp wrote:
     >>>>>>
     >>>>>>
     >>>>>>
     >>>>>> Someone remind me please.
     >>>>>>
     >>>>>> That bus-sender in/or hawkular-metrics is not an
     >>>>>> internal detail of metrics, but rather sort of
     >>>>>> 'external add-on'?
     >>>>>>
     >>>>>> If so, the logic to filter (or create many subscriptions)
     >>>>>> could go into it and would not touch the core metrics.
     >>>>>> Metrics would (as it does today) forward all new data-
     >>>>>> points into this sender and the sender can then decide
     >>>>>> how to proceed.
     >>>>>>
     >>>>>> _______________________________________________
     >>>>>> hawkular-dev mailing list hawkular-dev(a)lists.jboss.org
     <mailto:hawkular-dev@lists.jboss.org>
     >>>>>> https://lists.jboss.org/mailman/listinfo/hawkular-dev
     <https://lists.jboss.org/mailman/listinfo/hawkular-dev>
     >>>>>>
     >>>>>> _______________________________________________
     >>>>>> hawkular-dev mailing list
     >>>>>> hawkular-dev(a)lists.jboss.org
     <mailto:hawkular-dev@lists.jboss.org>
     >>>>>> https://lists.jboss.org/mailman/listinfo/hawkular-dev
     <https://lists.jboss.org/mailman/listinfo/hawkular-dev>
     >>>>>>
     >>>>>>
     >>>>>> _______________________________________________
     >>>>>> hawkular-dev mailing list
     >>>>>> hawkular-dev(a)lists.jboss.org
     <mailto:hawkular-dev@lists.jboss.org>
     >>>>>> https://lists.jboss.org/mailman/listinfo/hawkular-dev
     <https://lists.jboss.org/mailman/listinfo/hawkular-dev>
     >>>>>>
     >>>>> _______________________________________________
     >>>>> hawkular-dev mailing list
     >>>>> hawkular-dev(a)lists.jboss.org
     <mailto:hawkular-dev@lists.jboss.org>
     >>>>> https://lists.jboss.org/mailman/listinfo/hawkular-dev
     <https://lists.jboss.org/mailman/listinfo/hawkular-dev>
     >>>>
     >>>> _______________________________________________
     >>>> hawkular-dev mailing list
     >>>> hawkular-dev(a)lists.jboss.org
     <mailto:hawkular-dev@lists.jboss.org>
     >>>> https://lists.jboss.org/mailman/listinfo/hawkular-dev
     <https://lists.jboss.org/mailman/listinfo/hawkular-dev>
     >>>
     >>> _______________________________________________
     >>> hawkular-dev mailing list
     >>> hawkular-dev(a)lists.jboss.org
<mailto:hawkular-dev@lists.jboss.org>
     >>> https://lists.jboss.org/mailman/listinfo/hawkular-dev
     <https://lists.jboss.org/mailman/listinfo/hawkular-dev>
     >>
     >> _______________________________________________
     >> hawkular-dev mailing list
     >> hawkular-dev(a)lists.jboss.org <mailto:hawkular-dev@lists.jboss.org>
     >> https://lists.jboss.org/mailman/listinfo/hawkular-dev
     <https://lists.jboss.org/mailman/listinfo/hawkular-dev>
     >
     > _______________________________________________
     > hawkular-dev mailing list
     > hawkular-dev(a)lists.jboss.org <mailto:hawkular-dev@lists.jboss.org>
     > https://lists.jboss.org/mailman/listinfo/hawkular-dev
     <https://lists.jboss.org/mailman/listinfo/hawkular-dev>
     >
     > _______________________________________________
     > hawkular-dev mailing list
     > hawkular-dev(a)lists.jboss.org <mailto:hawkular-dev@lists.jboss.org>
     > https://lists.jboss.org/mailman/listinfo/hawkular-dev
     <https://lists.jboss.org/mailman/listinfo/hawkular-dev>

     _______________________________________________
     hawkular-dev mailing list
     hawkular-dev(a)lists.jboss.org <mailto:hawkular-dev@lists.jboss.org>
     https://lists.jboss.org/mailman/listinfo/hawkular-dev
     <https://lists.jboss.org/mailman/listinfo/hawkular-dev>

 _______________________________________________
 hawkular-dev mailing list
 hawkular-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/hawkular-dev 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Hawkular-dev] metrics on the bus