New Hawkular Blog Post: Hawkular Alerts with Prometheus, ElasticSearch, Kafka
by Thomas Heute
New Hawkular blog post from noreply(a)hawkular.org (John Mazzitelli): http://ift.tt/2vYe1WG
Federated Alerts
Hawkular Alerts aims to be a federated alerting system. That is to say, it can fire alerts and send notifications that are triggered by data coming from a number of third-party external systems.
Thus, Hawkular Alerts is more than just an alerting system for use with Hawkular Metrics. In fact, Hawkular Alerts can be used independently of Hawkular Metrics. This means you do not even have to be using Hawkular Metrics to take advantage of the functionality provided by Hawkular Alerts.
This is a key differentiator between Hawkular Alerts and other alerting systems. Most alerting systems only alert on data coming from their respective storage systems (e.g. the Prometheus Alert Engine alerts only on Prometheus data). Hawkular Alerts, on the other hand, can trigger alerts based on data from various systems.
Alerts vs. Events
Before we begin, a quick clarification is in order. When it is said that Hawkular Alerts fires an "alert" it means some data came into Hawkular Alerts that matched some conditions which triggered the creation of an alert in Hawkular Alerts backend storage (which can then trigger additional actions such as sending emails or calling a webhook). An "alert" typically refers to a problem that has been detected, and someone should take action to fix it. An alert has a lifecycle attached to it - alerts are opened, then acknowledged by some user who will hopefully fix the problem, then resolved when the problem can be considered closed.
However, there can be conditions that occur that do not represent problems but nevertheless are events you want recorded. There is no lifecycle associated with events and no additional actions are triggered by events, but "events" are fired by Hawkular Alerts in the same general manner as "alerts" are.
In this document, when it is said that Hawkular Alerts can fire "alerts" based on data coming from external third-party systems such as Prometheus, ElasticSearch, and Kakfa, this also means events can be fired as well as alerts. What this means is you can record any event (not just a "problem", aka "alert") that can be gleaned from this data coming from external third-party systems.
See alerting philosophy for more.
Demo
There is a recorded demo found here that will illustrate what this document is describing. After you read this document, you should watch the demo to gain further clarity on what is being explained. The demo is the multiple-sources example which you can run yourself found here (note: at the time of writing, this example is only found in the next branch, to be merged in master soon).
Prometheus
Hawkular Alerts can take the results of Prometheus metric queries and use the queried data for triggers that can fire alerts.
This Hawkular Alerts trigger will fire an alert (and send an email) when a Prometheus metric indicates our store’s inventory of widgets is consistently low (as defined by the Prometheus query you see in the "expression" field of the condition):
"trigger":{
"id": "low-stock-prometheus-trigger",
"name": "Low Stock",
"description": "The number of widgets in stock is consistently low.",
"severity": "MEDIUM",
"enabled": true,
"tags": {
"prometheus": "Prometheus"
},
"actions":[
{
"actionPlugin": "email",
"actionId": "email-notify-owner"
}
]
},
"conditions":[
{
"type": "EXTERNAL",
"alerterId": "prometheus",
"dataId": "prometheus-dataid",
"expression": "rate(products_in_inventory{product=\"widget\"}[30s])<2"
}
]
Integration with Prometheus Alert Engine
As a side note, though not demostrated in the example, Hawkular Alerts also has an integration with Prometheus' own Alert Engine. This means the alerts generated by Prometheus itself can be forward to Hawkular Alerts which can, in turn, be used for additional processing, perhaps for use with data that is unavailable to Prometheus that can tell Hawkular Alerts to fire other alerts. For example, Hawkular Alerts can take Prometheus alerts as input and feed it back into other conditions that trigger on the Prometheus alert along with ElasticSearch logs.
ElasticSearch
Hawkular Alerts can examine logs stored in ElasticSearch and trigger alerts based on patterns that match within the ElasticSearch log messages.
This Hawkular Alerts trigger will fire an alert (and send an email) when ElasticSearch logs indicate sales are being lost due to inventory being out of stock of items (as defined by the condition which looks for a log category of "FATAL" which happens to mean a lost sale in the case of the store’s logs). Notice dampening is enabled on this trigger - this alert will only fire when the logs indicate lost sales every 3 times.
"trigger":{
"id": "lost-sale-elasticsearch-trigger",
"name": "Lost Sale",
"description": "A sale was lost due to inventory out of stock.",
"severity": "CRITICAL",
"enabled": true,
"tags": {
"Elasticsearch": "Localhost instance"
},
"context": {
"timestamp": "@timestamp",
"filter": "{\"match\":{\"category\":\"inventory\"}}",
"interval": "10s",
"index": "store",
"mapping": "level:category,@timestamp:ctime,message:text,category:dataId,index:tags"
},
"actions":[
{
"actionPlugin": "email",
"actionId": "email-notify-owner"
}
]
},
"dampenings": [
{
"triggerMode": "FIRING",
"type":"STRICT",
"evalTrueSetting": 3
}
],
"conditions":[
{
"type": "EVENT",
"dataId": "inventory",
"expression": "category == 'FATAL'"
}
]
Kafka
Hawkular Alerts can examine data retrieved from Kafka message streams and trigger alerts based that Kafka data.
This Hawkular Alerts trigger will fire an alert when data over a Kakfa topic indicates a large purchase was made to fill the store’s inventory (as defined by the condition which evaluates to true when any number over 17 is received on the Kafka topic):
"trigger":{
"id": "large-inventory-purchase-kafka-trigger",
"name": "Large Inventory Purchase",
"description": "A large purchase was made to restock inventory.",
"severity": "LOW",
"enabled": true,
"tags": {
"Kafka": "Localhost instance"
},
"context": {
"topic": "store",
"kafka.bootstrap.servers": "localhost:9092",
"kafka.group.id": "hawkular-alerting"
},
"actions":[ ]
},
"conditions":[
{
"type": "THRESHOLD",
"dataId": "store",
"operator": "GT",
"threshold": 17
}
]
But, Wait! There’s More!
The above only mentions the different ways Hawkular Metrics retrieves data for use in determining what alerts to fire. What is not covered here is the fact that Hawkular Alerts can stream data in the other direction as well - Hawkular Alerts can send alert and event data to things like an ElasticSearch server or a Kafka broker. There are additional examples (mentioned below) that can demonstrate this capability.
The point is Hawkular Alerts should be seen as a shared, common alerting engine that can be shared for use by multiple third-party systems and can be used as both a consumer and producer - as a consumer of the data from external third-party systems (which is used to fire alerts and events) and as a producer to send notifications of alerts and events to external third-party systems.
More Examples
Take a look at the Hawkular Alerts examples for more examples on using external systems as data to be used for triggering alerts. (note: at the time of writing, some examples are currently in the next branch such as the Kafka ones).
from Hawkular Blog
7 years
A convention for metrics (short) name
by Joel Takvorian
Hi,
What would you say about having a convention of a special tag (let's say
"_name") that would point to a (short) intelligible name for a metric. That
convention wouldn't be mandatory in any case of course, but the UI could
check if that tag exists and use that name, instead of the full metric id,
for better display.
WDYT?
7 years, 1 month
Fwd: [jboss-community #455658] Problem with Nexus for a Hawkular artifact
by Lucas Ponce
FYI
---------- Forwarded message ----------
From: dhladky(a)redhat.com via RT <jboss-community(a)redhat.com>
Date: Wed, Aug 2, 2017 at 5:00 PM
Subject: [jboss-community #455658] Problem with Nexus for a Hawkular
artifact
To: lponce(a)redhat.com
Ticket #455658
It can be accessed online at: https://engineering.redhat.
com/rt/Ticket/Display.html?id=455658
On Tue Aug 01 10:07:12 2017, dhladky(a)redhat.com wrote:
> Hi,
>
> I can not tell anything about the Apache repository, however regarding
Maven
> Central I created this ticket:
> https://issues.sonatype.org/browse/MVNCENTRAL-2567
Response from Sonatype:
One of our sync jobs was hung and never recovered. We terminated the job and
restarted, and I'm already seeing various 0.9.7 artifacts on Central. We're
updating our jobs to ensure that they're not hung indefinitely.
7 years, 1 month
Re: [Hawkular-dev] Dynamic UI PoC Presentation
by Caina Costa
On Mon, Jul 17, 2017 at 9:39 AM, Alissa Bonas <abonas(a)redhat.com> wrote:
> 1. a diagram that shows an example of architecture components and data
> format of entities (json/code/configuration) will help to understand the
> proposal.
>
This is an example of an Entity hierarchy, it shows what kind of entities
we can create, as well as other patterns that we can use. The farther the
key is from Entity, the more specialized it is, which means that the
.applicable? method on those are a lot more picky on what kind of data it
matches to. Views follow the same hierarchy and the engine matches the same
way, and the first defined view is used, so let's say:
If we have a WildflyDomainControllerServer to render, first it will try to
find WildFlyDomainControllerServerView, then WildFlyDomainServerView, then
WildFlyServerView, then WildFlyServerBaseView, then ServerView. That means
that adding new entities are not going to break the views being used.
Also, WildFlyServerBase is an abstract entity, in the sense that it only
provides implementation, and is not to be matched. This can be achieved by
setting .applicable? to return false. Entities are just normal ruby
objects, there is nothing special about them, they just need to answer to
the .applicable? method and receive 1 argument in its initializer.
> What I mean is - which component should define/work with which format for
> the example entities? what should be defined on hawkular side, what is
> fetched and defined in ruby gem, what parts are defined and how they are
> processed in miq ui, what parts in miq server + how does that work (or not
> :)) with persisting objects in miq, etc. Can/should 2 completely different
> entities (such as middleware server and a deployment) use your proposal,
> given that they might have some similar common fields? (for example,
> "name", "status")
>
Entities define the "canonical truth" of the responses from the server, and
views define how to represent them as JSON. They don't tackle how we're
going to present data, and not how to fetch them.
To persist data on MiQ: this PoC does not take any action on persisting
stuff, it only cares about representation. To do that, we just need a new
JSON field on every middleware table, to save the response from the server,
and then we can use it. Something like this:
entity = Entity.constantize(MiddlewareServer.first.data)
render json: View.for(entity)
>
> 2. I noticed that the discussion moved to jbmc list although it originated
> in hawkular-dev. the tech discussion is definitely more suitable in
> hawkular-dev as a continue to the original thread on the topic.
>
>
>
> On Mon, Jul 17, 2017 at 3:27 PM, Caina Costa <cacosta(a)redhat.com> wrote:
>
>> Yes, that's exactly what it does, with some caveats: we have a hierarchy
>> of server types, as in, we first have to implement a generic server type
>> that implements the default attributes to all the servers. From there, we
>> can subclass for more specific server types, and the view/entity runtime
>> takes care of match the hawkular data with the defined entities. So let's
>> say we have a hierarchy like that:
>>
>> MiddlewareServer > WildFlyServer > WildFly11Server
>>
>> For this example, let's say that MiddlewareServer includes only the
>> summary, WildFlyServer includes metrics, and WildFly11Server includes power
>> operations.
>>
>> When matching the data with Entity.constantize(data), we match first the
>> more specialized server, so WildFly11Server, and then WildFlyServer, then
>> the more generic MiddlewareServer. This is automatic on the runtime, and if
>> we add new server types, it will try to match in the reverse of the order
>> provided, first the most specific, then going forward for less specific
>> entities.
>>
>> So, in summary:
>>
>> It does enable us to add new server types with no code change on the
>> ManageIQ side, by providing more generic interfaces that we can match upon,
>> which means that while we might not have all information by default, we
>> will have a representation that makes sense. It also enables us to expand
>> new server types easily with small changes.
>>
>>
>>
>> On Mon, Jul 17, 2017 at 8:31 AM, Thomas Heute <theute(a)redhat.com> wrote:
>>
>>> I just watched the recording, it was not clear to me the benefits it
>>> brings (or if it's just internal).
>>>
>>> I was hoping to see how one can add a new server type with no code
>>> change on the ManageIQ side, maybe I misinterpreted the current work.
>>>
>>> Can you explain ?
>>>
>>> Thomas
>>>
>>> On Thu, Jul 13, 2017 at 6:13 PM, Caina Costa <cacosta(a)redhat.com> wrote:
>>>
>>>> Hello guys,
>>>>
>>>> Thanks you all for joining the presentation, lots of great questions!
>>>> For those of you that could not join, here's the recording:
>>>>
>>>> https://bluejeans.com/s/hnR7@/
>>>>
>>>> And the slides are attached.
>>>>
>>>> As always, if you have any questions, please do not hesitate to get in
>>>> touch. I'm available on IRC and e-mail.
>>>>
>>>
>>>
>>
>
7 years, 1 month