[Hawkular-dev] metrics on the bus

Fri Aug 19 09:25:24 EDT 2016

I won't force the technical decision, I don't think we necessarily need to
separate Alerts and Metrics as 2 separate servers. I barely found examples
where Metrics is needed without Alerting and vice-versa.

So whatever offer the best performance for the 2 together (and doesn't
require a rewrite) would be great.

(Today we have Metrics only in OpenShift, but there are already requests to
add alerting).

On Thu, Aug 18, 2016 at 3:46 PM, Jay Shaughnessy <jshaughn at redhat.com>
wrote:

>
> It's possible the solution proposed by John should be the ultimate goal.
> It has the advantage of creating a distribution of Metrics+Alerting while
> also solving an H Services performance issue.  But I don't really buy this
> argument which distills down to, "what if there are bugs?".  And I don't
> really buy into John's argument that the JMS use in H Services has an
> inherent performance issue.  Without the full publishing of all metrics,
> and the subsequent filtering by alerting, I'm sure it can handle what we
> throw at it and plenty more.  Finally, I don't really agree with Lucas that
> John's approach is a major architectural change, it presents a packaging
> issue and adds some integration code.  It doesn't propose removal of
> JMS/bus in H Services.
>
> So personally, I don't have an issue with PR-568 as an immediate solution
> to the existing performance issue.  I also don't mind if we close it and
> immediately implement John's approach as it potentially gives us two wins
> for the price of one.  And I also don't mind if PR-568 is an interim
> solution if we decide to defer the co-packaging approach.  I think we need
> a decision from Heiko and Thomas as to how to proceed, and minimally, I
> appreciate that Lucas has been proactive in providing at least one solution.
>
>
> On 8/17/2016 6:57 PM, Stefan Negrea wrote:
>
> One of the contentions that I have with PR-568 is that introduces more
> failures points for data prior to reaching Alerts. Publishing all data
> directly is a simple proposition: data comes in, is persisted to Cassandra,
> and at the same time sent via JMS. The PR introduces multiple additional
> failure points and a failure in the Metrics will go unnoticed. For example,
> what if the filtering mechanism all of a sudden crashed, what then? What if
> the data being filtered does not match the expectations from Alerts; as in
> Alerts requested data for a metric id to be sent but Metrics lost track of
> that and does not report data for that metric id.
>
> Going back to the replies from Randall, in order for PR-568 to be an
> alternative to what is done today, we will need to design a lot of
> additional features to get the same level of delivery confidence and
> guarantee that we have today (without the PR).
>
>
> https://github.com/hawkular/hawkular-metrics/pull/568
>
>
> Thank you,
> Stefan Negrea
>
>
> On Wed, Aug 17, 2016 at 4:45 PM, Jay Shaughnessy <jshaughn at redhat.com>
> wrote:
>
>>
>> +1.  Although Randall is right, there is definitely a chance of
>> inconsistency between what is persisted and what is processed by
>> alerting, I think it's acceptable for our purposes.  In general users
>> have historically accepted that server downtime can result in missed
>> alerts.  Moreover, almost all of the alerting scenarios involve behavior
>> over time.
>>
>>
>> On 8/17/2016 5:44 AM, Michael Burman wrote:
>> > Hi,
>> >
>> > Storing to Cassandra and JMS is not atomic as Cassandra does not
>> provide transactions and especially not 2PC. So they're two different
>> writes and can always result in inconsistency, no matter the secondary
>> transport protocol. Also, is alerts even capable of handling all the
>> possible crash scenarios? And do we even care about such a small window of
>> potential data loss to the alerting engine in the case of a crash (which
>> will take down both metrics & alerts on that node) ? We don't provide
>> strict consistency with default metrics setting either, defaulting to one
>> node acknowledges in Cassandra. There are multiple theoretical scenarios
>> where we could in multi node scenario lose data or get inconsistencies.
>> >
>> > I think these are acceptable however for our use case. Even assuming we
>> would lose one "node down" datapoint, that same situation probably persist
>> for the next datapoint -> alert triggers, if you lose one metric datapoint
>> from a bucket the calculated averages or percentiles etc only suffer a
>> minor precision imperfection. Not to mention that almost everything in
>> monitoring is already a discrete information sampled at certain point of
>> time and not a continuous real value, so precision is lost before it even
>> arrives to us.
>> >
>> > For those reasons I'd say these "problems" are more academical, without
>> any real world implications in this domain.
>> >
>> >    - Micke
>> >
>> > ----- Original Message -----
>> > From: "Randall Hauch" <rhauch at redhat.com>
>> > To: "Discussions around Hawkular development" <
>> hawkular-dev at lists.jboss.org>
>> > Sent: Tuesday, August 16, 2016 7:42:56 PM
>> > Subject: Re: [Hawkular-dev] metrics on the bus
>> >
>> > I agree that the distributed system is probably more fault tolerant
>> when using JMS than putting everything into a single app and forgoing JMS.
>> >
>> > BTW, does metrics write data to Cassandra and publish to JMS
>> atomically? If not, that’s also a window for failure that might result in
>> data loss. Something to consider if Hawkular requires complete consistency
>> and can’t afford data loss.
>> >
>> >> On Aug 16, 2016, at 11:08 AM, John Sanda <jsanda at redhat.com> wrote:
>> >>
>> >> With the JMS solution we have in place right now, data points are
>> published after they have persisted in Cassandra. We can certainly keep
>> that same behavior.
>> >>
>> >>> On Aug 16, 2016, at 11:49 AM, Randall Hauch <rhauch at redhat.com>
>> wrote:
>> >>>
>> >>> Sorry, I’ve been lurking. One thing to consider is how each approach
>> handles failures. For example, what happens if the system crashes after
>> processed by metrics but before alerts picks it up? Will the system become
>> inconsistent or will some events be lost before alerts sees them?
>> >>>
>> >>> Really, in order for the system to be completely fault tolerant, each
>> component has to be completely atomic. Components that use “dual writes”
>> (e.g., write to one system, then write to another outside of a larger
>> transaction) will always be subject to losing data/events during a very
>> inopportune failure. Not only that, a system comprised of multiple
>> components that individually are safe might still be subject to losing
>> data/events.
>> >>>
>> >>> I hope this is helpful.
>> >>>
>> >>> Randall
>> >>>
>> >>>> On Aug 16, 2016, at 10:25 AM, John Sanda <jsanda at redhat.com> wrote:
>> >>>>
>> >>>> I considered clustering before making the suggestion.
>> MetricDataListener listens to a JMS topic for data points. When it receives
>> data points, it passes those data points to AlertsEngine which in turn
>> writes the data points into an ISPN, distributed cache. And then it looks
>> like those data point get processed via a cache entry listener in
>> AlertsEngineImpl. If I understand this data flow correctly, then I think it
>> will work just as well if not better in a single WAR. Rather than getting
>> notifications from a JMS topic, MetricDataListener can receive
>> notifications from an Observable that pushes data point as they received in
>> client requests. Metrics will also subscribe to that same Observable so
>> that it can persist the data points. The fact that alerts is using a
>> distributed cache works to our advantage here because it provides a
>> mechanism for distributing data across nodes.
>> >>>>
>> >>>>> On Aug 16, 2016, at 3:29 AM, Lucas Ponce <lponce at redhat.com> wrote:
>> >>>>>
>> >>>>> This is a big point.
>> >>>>>
>> >>>>> I can see pros and cons on it.
>> >>>>>
>> >>>>> First thing it comes to me is that metrics has a stateless nature
>> meanwhile alerts is stateful.
>> >>>>>
>> >>>>> So a first coupling would work for a single node but when we want
>> to scale our troubles can start as the design in clustered scenarios is
>> completely different and a single .war won't help IMO.
>> >>>>>
>> >>>>> I don't think our current design is bad, in the context of the
>> HAWKULAR-1102 and working in a demand publishing draft we are addressing
>> the business issues that triggered this discussion.
>> >>>>>
>> >>>>> But I would like to hold this topic for a future architecture face
>> to face meeting, to discuss it from all angles as we did on Madrid.
>> >>>>>
>> >>>>> (Counting with a face to face meeting in a reasonable timeframe, of
>> course).
>> >>>>>
>> >>>>> Lucas
>> >>>>>
>> >>>>> ----- Mensaje original -----
>> >>>>>> De: "John Sanda" <jsanda at redhat.com>
>> >>>>>> Para: "Discussions around Hawkular development" <
>> hawkular-dev at lists.jboss.org>
>> >>>>>> Enviados: Lunes, 15 de Agosto 2016 16:45:28
>> >>>>>> Asunto: Re: [Hawkular-dev] metrics on the bus
>> >>>>>>
>> >>>>>> We use JMS in large part because metrics and alerts are in
>> separate WARs (I
>> >>>>>> realize JMS is used for other purposes, but I am speaking strictly
>> about
>> >>>>>> this scenario). Why not deploy metrics and alerts in the same WAR
>> and
>> >>>>>> altogether bypass JMS? As data points are ingested, we broadcast
>> them using
>> >>>>>> an Rx subject to which both metrics and alerts subscribe. We could
>> do this
>> >>>>>> is in away that still keeps metrics and alerts decoupled as they
>> are today.
>> >>>>>> We would also have the added benefit of having a stand alone
>> deployment for
>> >>>>>> metrics and alerts.
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> On Aug 10, 2016, at 9:37 AM, Jay Shaughnessy < jshaughn at redhat.com
>> > wrote:
>> >>>>>>
>> >>>>>>
>> >>>>>> Yes, in fact I should have made it more clear that this whole
>> discussion is
>> >>>>>> bounded by H Metrics and H Alerting in the H Services context, so
>> limiting
>> >>>>>> this to HS/Bus integration code is what we'd want to do.
>> >>>>>>
>> >>>>>> On 8/10/2016 4:06 AM, Heiko W.Rupp wrote:
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> Someone remind me please.
>> >>>>>>
>> >>>>>> That bus-sender in/or hawkular-metrics is not an
>> >>>>>> internal detail of metrics, but rather sort of
>> >>>>>> 'external add-on'?
>> >>>>>>
>> >>>>>> If so, the logic to filter (or create many subscriptions)
>> >>>>>> could go into it and would not touch the core metrics.
>> >>>>>> Metrics would (as it does today) forward all new data-
>> >>>>>> points into this sender and the sender can then decide
>> >>>>>> how to proceed.
>> >>>>>>
>> >>>>>> _______________________________________________
>> >>>>>> hawkular-dev mailing list hawkular-dev at lists.jboss.org
>> >>>>>> https://lists.jboss.org/mailman/listinfo/hawkular-dev
>> >>>>>>
>> >>>>>> _______________________________________________
>> >>>>>> hawkular-dev mailing list
>> >>>>>> hawkular-dev at lists.jboss.org
>> >>>>>> https://lists.jboss.org/mailman/listinfo/hawkular-dev
>> >>>>>>
>> >>>>>>
>> >>>>>> _______________________________________________
>> >>>>>> hawkular-dev mailing list
>> >>>>>> hawkular-dev at lists.jboss.org
>> >>>>>> https://lists.jboss.org/mailman/listinfo/hawkular-dev
>> >>>>>>
>> >>>>> _______________________________________________
>> >>>>> hawkular-dev mailing list
>> >>>>> hawkular-dev at lists.jboss.org
>> >>>>> https://lists.jboss.org/mailman/listinfo/hawkular-dev
>> >>>>
>> >>>> _______________________________________________
>> >>>> hawkular-dev mailing list
>> >>>> hawkular-dev at lists.jboss.org
>> >>>> https://lists.jboss.org/mailman/listinfo/hawkular-dev
>> >>>
>> >>> _______________________________________________
>> >>> hawkular-dev mailing list
>> >>> hawkular-dev at lists.jboss.org
>> >>> https://lists.jboss.org/mailman/listinfo/hawkular-dev
>> >>
>> >> _______________________________________________
>> >> hawkular-dev mailing list
>> >> hawkular-dev at lists.jboss.org
>> >> https://lists.jboss.org/mailman/listinfo/hawkular-dev
>> >
>> > _______________________________________________
>> > hawkular-dev mailing list
>> > hawkular-dev at lists.jboss.org
>> > https://lists.jboss.org/mailman/listinfo/hawkular-dev
>> >
>> > _______________________________________________
>> > hawkular-dev mailing list
>> > hawkular-dev at lists.jboss.org
>> > https://lists.jboss.org/mailman/listinfo/hawkular-dev
>>
>> _______________________________________________
>> hawkular-dev mailing list
>> hawkular-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/hawkular-dev
>>
>
>
>
> _______________________________________________
> hawkular-dev mailing listhawkular-dev at lists.jboss.orghttps://lists.jboss.org/mailman/listinfo/hawkular-dev
>
>
>
> _______________________________________________
> hawkular-dev mailing list
> hawkular-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hawkular-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20160819/1af28fdf/attachment-0001.html