Excellent. Accepting loses or (scoped) inconsistencies are very reasonable. Might I
suggest a “What happens when ...” section somewhere in the documentation or website? Not
only is it really nice to be able to see how a system is expected to behave during various
kinds of faults, it’s also really helpful to think about all the things that can go wrong
and work out what the behavior actually is. And IMO the transparency helps show users that
you’ve actually thought about these situations. See
for one such
example — it’s not perfect, but it’s a start.
Keep up the great work, folks!
On Aug 17, 2016, at 4:45 PM, Jay Shaughnessy
<jshaughn(a)redhat.com> wrote:
+1. Although Randall is right, there is definitely a chance of
inconsistency between what is persisted and what is processed by
alerting, I think it's acceptable for our purposes. In general users
have historically accepted that server downtime can result in missed
alerts. Moreover, almost all of the alerting scenarios involve behavior
over time.
On 8/17/2016 5:44 AM, Michael Burman wrote:
> Hi,
>
> Storing to Cassandra and JMS is not atomic as Cassandra does not provide transactions
and especially not 2PC. So they're two different writes and can always result in
inconsistency, no matter the secondary transport protocol. Also, is alerts even capable of
handling all the possible crash scenarios? And do we even care about such a small window
of potential data loss to the alerting engine in the case of a crash (which will take down
both metrics & alerts on that node) ? We don't provide strict consistency with
default metrics setting either, defaulting to one node acknowledges in Cassandra. There
are multiple theoretical scenarios where we could in multi node scenario lose data or get
inconsistencies.
>
> I think these are acceptable however for our use case. Even assuming we would lose
one "node down" datapoint, that same situation probably persist for the next
datapoint -> alert triggers, if you lose one metric datapoint from a bucket the
calculated averages or percentiles etc only suffer a minor precision imperfection. Not to
mention that almost everything in monitoring is already a discrete information sampled at
certain point of time and not a continuous real value, so precision is lost before it even
arrives to us.
>
> For those reasons I'd say these "problems" are more academical, without
any real world implications in this domain.
>
> - Micke
>
> ----- Original Message -----
> From: "Randall Hauch" <rhauch(a)redhat.com>
> To: "Discussions around Hawkular development"
<hawkular-dev(a)lists.jboss.org>
> Sent: Tuesday, August 16, 2016 7:42:56 PM
> Subject: Re: [Hawkular-dev] metrics on the bus
>
> I agree that the distributed system is probably more fault tolerant when using JMS
than putting everything into a single app and forgoing JMS.
>
> BTW, does metrics write data to Cassandra and publish to JMS atomically? If not,
that’s also a window for failure that might result in data loss. Something to consider if
Hawkular requires complete consistency and can’t afford data loss.
>
>> On Aug 16, 2016, at 11:08 AM, John Sanda <jsanda(a)redhat.com> wrote:
>>
>> With the JMS solution we have in place right now, data points are published after
they have persisted in Cassandra. We can certainly keep that same behavior.
>>
>>> On Aug 16, 2016, at 11:49 AM, Randall Hauch <rhauch(a)redhat.com> wrote:
>>>
>>> Sorry, I’ve been lurking. One thing to consider is how each approach handles
failures. For example, what happens if the system crashes after processed by metrics but
before alerts picks it up? Will the system become inconsistent or will some events be lost
before alerts sees them?
>>>
>>> Really, in order for the system to be completely fault tolerant, each
component has to be completely atomic. Components that use “dual writes” (e.g., write to
one system, then write to another outside of a larger transaction) will always be subject
to losing data/events during a very inopportune failure. Not only that, a system comprised
of multiple components that individually are safe might still be subject to losing
data/events.
>>>
>>> I hope this is helpful.
>>>
>>> Randall
>>>
>>>> On Aug 16, 2016, at 10:25 AM, John Sanda <jsanda(a)redhat.com>
wrote:
>>>>
>>>> I considered clustering before making the suggestion. MetricDataListener
listens to a JMS topic for data points. When it receives data points, it passes those data
points to AlertsEngine which in turn writes the data points into an ISPN, distributed
cache. And then it looks like those data point get processed via a cache entry listener in
AlertsEngineImpl. If I understand this data flow correctly, then I think it will work just
as well if not better in a single WAR. Rather than getting notifications from a JMS topic,
MetricDataListener can receive notifications from an Observable that pushes data point as
they received in client requests. Metrics will also subscribe to that same Observable so
that it can persist the data points. The fact that alerts is using a distributed cache
works to our advantage here because it provides a mechanism for distributing data across
nodes.
>>>>
>>>>> On Aug 16, 2016, at 3:29 AM, Lucas Ponce <lponce(a)redhat.com>
wrote:
>>>>>
>>>>> This is a big point.
>>>>>
>>>>> I can see pros and cons on it.
>>>>>
>>>>> First thing it comes to me is that metrics has a stateless nature
meanwhile alerts is stateful.
>>>>>
>>>>> So a first coupling would work for a single node but when we want to
scale our troubles can start as the design in clustered scenarios is completely different
and a single .war won't help IMO.
>>>>>
>>>>> I don't think our current design is bad, in the context of the
HAWKULAR-1102 and working in a demand publishing draft we are addressing the business
issues that triggered this discussion.
>>>>>
>>>>> But I would like to hold this topic for a future architecture face to
face meeting, to discuss it from all angles as we did on Madrid.
>>>>>
>>>>> (Counting with a face to face meeting in a reasonable timeframe, of
course).
>>>>>
>>>>> Lucas
>>>>>
>>>>> ----- Mensaje original -----
>>>>>> De: "John Sanda" <jsanda(a)redhat.com>
>>>>>> Para: "Discussions around Hawkular development"
<hawkular-dev(a)lists.jboss.org>
>>>>>> Enviados: Lunes, 15 de Agosto 2016 16:45:28
>>>>>> Asunto: Re: [Hawkular-dev] metrics on the bus
>>>>>>
>>>>>> We use JMS in large part because metrics and alerts are in
separate WARs (I
>>>>>> realize JMS is used for other purposes, but I am speaking
strictly about
>>>>>> this scenario). Why not deploy metrics and alerts in the same WAR
and
>>>>>> altogether bypass JMS? As data points are ingested, we broadcast
them using
>>>>>> an Rx subject to which both metrics and alerts subscribe. We
could do this
>>>>>> is in away that still keeps metrics and alerts decoupled as they
are today.
>>>>>> We would also have the added benefit of having a stand alone
deployment for
>>>>>> metrics and alerts.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Aug 10, 2016, at 9:37 AM, Jay Shaughnessy <
jshaughn(a)redhat.com > wrote:
>>>>>>
>>>>>>
>>>>>> Yes, in fact I should have made it more clear that this whole
discussion is
>>>>>> bounded by H Metrics and H Alerting in the H Services context, so
limiting
>>>>>> this to HS/Bus integration code is what we'd want to do.
>>>>>>
>>>>>> On 8/10/2016 4:06 AM, Heiko W.Rupp wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Someone remind me please.
>>>>>>
>>>>>> That bus-sender in/or hawkular-metrics is not an
>>>>>> internal detail of metrics, but rather sort of
>>>>>> 'external add-on'?
>>>>>>
>>>>>> If so, the logic to filter (or create many subscriptions)
>>>>>> could go into it and would not touch the core metrics.
>>>>>> Metrics would (as it does today) forward all new data-
>>>>>> points into this sender and the sender can then decide
>>>>>> how to proceed.
>>>>>>
>>>>>> _______________________________________________
>>>>>> hawkular-dev mailing list hawkular-dev(a)lists.jboss.org
>>>>>>
https://lists.jboss.org/mailman/listinfo/hawkular-dev
>>>>>>
>>>>>> _______________________________________________
>>>>>> hawkular-dev mailing list
>>>>>> hawkular-dev(a)lists.jboss.org
>>>>>>
https://lists.jboss.org/mailman/listinfo/hawkular-dev
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> hawkular-dev mailing list
>>>>>> hawkular-dev(a)lists.jboss.org
>>>>>>
https://lists.jboss.org/mailman/listinfo/hawkular-dev
>>>>>>
>>>>> _______________________________________________
>>>>> hawkular-dev mailing list
>>>>> hawkular-dev(a)lists.jboss.org
>>>>>
https://lists.jboss.org/mailman/listinfo/hawkular-dev
>>>>
>>>> _______________________________________________
>>>> hawkular-dev mailing list
>>>> hawkular-dev(a)lists.jboss.org
>>>>
https://lists.jboss.org/mailman/listinfo/hawkular-dev
>>>
>>> _______________________________________________
>>> hawkular-dev mailing list
>>> hawkular-dev(a)lists.jboss.org
>>>
https://lists.jboss.org/mailman/listinfo/hawkular-dev
>>
>> _______________________________________________
>> hawkular-dev mailing list
>> hawkular-dev(a)lists.jboss.org
>>
https://lists.jboss.org/mailman/listinfo/hawkular-dev
>
> _______________________________________________
> hawkular-dev mailing list
> hawkular-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/hawkular-dev
>
> _______________________________________________
> hawkular-dev mailing list
> hawkular-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/hawkular-dev
_______________________________________________
hawkular-dev mailing list
hawkular-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hawkular-dev