[Hawkular-dev] Alerts - Response Time and Downtime requirements

Thomas Heute theute at redhat.com
Fri Feb 13 08:07:41 EST 2015


On 02/13/2015 01:32 PM, Catherine Robson wrote:
>
>
>> Thomas Heute <mailto:theute at redhat.com>
>> February 13, 2015 at 4:22 AM
>>
>> On 02/12/2015 11:25 PM, Catherine Robson wrote:
>>> Hi -
>>>
>>> We see that alerting on response time and downtime are part of what 
>>> we hope to provide in the first iteration of Hawkular.  We'd like to 
>>> get started on the designs related to alert 
>>> definition/configuration.  I'm hoping that you can all share some of 
>>> the requirements around alert definitions that you think we need to 
>>> have for Hawkular at this point. I don't want to overdo it by 
>>> looking at JON - I'd like to start simple. 
>> +1
>>> Here's the requirements for the web console as I currently am 
>>> thinking of them, but would like the team to comment on them and 
>>> add/remove requirements as you see necessary.
>>>
>>> Overall Alerts
>>> As an administrator of a website, I would like to have all alerts 
>>> sent to me through e-mail.
>>> Ad an administrator of a website, I would like to have all alerts 
>>> sent to me via text message.
>>
>> That may be implicit but:
>>     - As an administrator of a website, I would like to have all 
>> alerts listed only  to the console.
> Ha! Right :)
>>
>> We may not implement SMS right from the beginning but having 2 ways 
>> may be good to initiate the design. (Later we should embed Aerogear 
>> UPS and have a small app for push on phones, note to potential 
>> readers, a good student subject or contribution).
>>
>> 1 important thing:
>>     - we need some alert "profiles", if I look at my 1000s resources 
>> I may want them all to follow a same profile and when I want to 
>> change who receive an email I should do that in a single place (and 
>> not go through the 1000s resources). There would be several profiles.
> Great - this makes sense.  Do we need to hand enter user information 
> (e-mails/phones) or can some of this information be gathered through 
> KeyCloak potentially? 
I think a "contact list" would make sense (and use Keycloak) for email 
and SMS (and later for mobile push). Not necessarily in the first 
iteration though.
> To clarify exactly what we think a "profile" contains - please verify 
> below.
>
> An alert profile is a place where users can set up alerting contact 
> information and rules for many resources.  An alert profile contains:
>
>   * A name & description
>
Yes a user may choose to have various profiles depending on the gravity 
of the problem, or set of machines (different people in charge) so he 
needs to identify the profile easily.
>
>   * Contact information of everyone associated with this profile (auto
>     or manual?)
>
This would be explicitely listed, we may need to add "shortcuts" at some 
point for instance if a resource has an "owner" we may want to be able 
to send an email to the "owner" of the affected resource rather than a 
fix person, but that's already more advanced.
>
>   * A group of resources this profile applies to
>

> Another alternative is that the resources are not mentioned in the 
> profile, and you just assign the profile when you're working in the 
> resources.  This feels much more like an "Alert contact group" than a 
> profile to me in that case, so it is just a terminology change I think 
> to make it clearer for what to expect from this capability.

A user could be interested to check which resources are affected before 
making a change, but that doesn't need to be prominent.

I took the example of changing the email "Alert contact group", but it 
could be changing the acceptable response time for all servers. So an 
additional point would be

  * Alert conditions


So one example of an "alert profile"
     - Name: "Neuchatel Datacenter Critical issue"
     - Description: "blah blah"
     - Condition: "Down for 10min"
     - Alerts: "Email bob immediately", "SMS mary after 20min if still down"

That profile would normally be applied to all EAP servers in Neuchatel 
datacenter. If bob gets fired, someone comes in and change the email to 
someone else in that alert profile. If the neuchatel datacenter becomes 
more critical, someone comes in and change "Down for 10min" to "Down for 
2 min"



>>
>> 3 potential improvements that we may want to think about in the 
>> design right now (or not):
>>     - Different addressee: Support for sending email/SMS to someone 
>> else but the owner
> Let me see if I can expand on this use case to make sure we're all on 
> the same page.
>
> Precondition:  An alert fired.  It was sent to person A.
> Step 1: User sees the alert, and wants to "share" this alert with 
> person B.
> Step 2: User manually enters Person B's e-mail or SMS information.
> Step 2 alternate:  User selects from a dropdown list of existing known 
> users to find Person B, and Person B's preferred contact method is 
> used for sending the alert.
> End Goal:  Alert is sent to Person B based on the method chosen above.

I really just meant that we can send the email to someone else than the 
logged-in user

>>     - Escalamation: if resource is down for 5 min, send me (or 
>> someone else) an email, if still down after 30min send me a SMS
> Could this use those alert profiles too?

Yes definitely.
(Sorry for that "Escalamation" typo :) I really meant escalation)

>>     - Multiple alerts for 1 particular event: if resource is down for 
>> 5 min, send me an email, send my boss an email and send the IT guy a SMS
>
>>
>>
>>> Downtime
>>> As an administrator of a website, I would like to configure Hawkular 
>>> so an alert is sent to me every time the system goes down.
>>> As an administrator of a website, I would like to configure Hawkular 
>>> so an alert is only sent to me after the system is down for a 
>>> certain length of time, so I'm not alerted if there is a very minor 
>>> downtime event.
>> +1
>>>
>>> Response time
>>> As an administrator of a website, I would like to configure Hawkular 
>>> to alert me when my website's response time is slower than a 
>>> threshold I have set so I know there may be performance problems.
>> It would have to be for some configurable period of time
> Ok - so you would never want to alert if we go over *at all* for this 
> metric, you would only ever want to alert based on a time interval it 
> was above the threshold for.

Right, unless we look at a percentile (or average) a single response 
time value outside the norm doesn't mean anything, this would only 
frustrate the person notified. So this needs to include some period of time.

Thomas

>>>
>>> Are there any other "settings" to the alerts that we should be 
>>> considering at this point?
>>
>> At some point in the future we may want to have a warning state, but 
>> I don't want to surcharge this thread :)
>>
>> Thomas
>>> Thanks,
>>> Catherine
>>>
>>>
>>> _______________________________________________
>>> hawkular-dev mailing list
>>> hawkular-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/hawkular-dev
>>
>> Catherine Robson <mailto:crobson at redhat.com>
>> February 12, 2015 at 5:25 PM
>> Hi -
>>
>> We see that alerting on response time and downtime are part of what 
>> we hope to provide in the first iteration of Hawkular. We'd like to 
>> get started on the designs related to alert 
>> definition/configuration.  I'm hoping that you can all share some of 
>> the requirements around alert definitions that you think we need to 
>> have for Hawkular at this point.  I don't want to overdo it by 
>> looking at JON - I'd like to start simple.  Here's the requirements 
>> for the web console as I currently am thinking of them, but would 
>> like the team to comment on them and add/remove requirements as you 
>> see necessary.
>>
>> Overall Alerts
>> As an administrator of a website, I would like to have all alerts 
>> sent to me through e-mail.
>> Ad an administrator of a website, I would like to have all alerts 
>> sent to me via text message.
>>
>> Downtime
>> As an administrator of a website, I would like to configure Hawkular 
>> so an alert is sent to me every time the system goes down.
>> As an administrator of a website, I would like to configure Hawkular 
>> so an alert is only sent to me after the system is down for a certain 
>> length of time, so I'm not alerted if there is a very minor downtime 
>> event.
>>
>> Response time
>> As an administrator of a website, I would like to configure Hawkular 
>> to alert me when my website's response time is slower than a 
>> threshold I have set so I know there may be performance problems.
>>
>> Are there any other "settings" to the alerts that we should be 
>> considering at this point?
>>
>> Thanks,
>> Catherine
>
> -- 
> Catherine Robson
> User Experience Design
> Red Hat JBoss Middleware
> c: 978-944-3825
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20150213/9f7235dd/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: postbox-contact.jpg
Type: image/jpeg
Size: 1230 bytes
Desc: not available
Url : http://lists.jboss.org/pipermail/hawkular-dev/attachments/20150213/9f7235dd/attachment-0002.jpg 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: compose-unknown-contact.jpg
Type: image/jpeg
Size: 770 bytes
Desc: not available
Url : http://lists.jboss.org/pipermail/hawkular-dev/attachments/20150213/9f7235dd/attachment-0003.jpg 


More information about the hawkular-dev mailing list