Re: [Hawkular-dev] Availability revisited

Wednesday, 15 July 2015

...
 On Jul 13, 2015, at 8:36 AM, Heiko W.Rupp <hrupp(a)redhat.com&gt;
wrote:

 Hey,

 we did talk about Availability and computed state in the past

 Now triggered by https://issues.jboss.org/browse/HAWKULAR-401
 and also https://issues.jboss.org/browse/HAWKULAR-407
 we need to revisit this and finally start including it in the code base.

 In -407 we have the issue that the server can currently not detect that
 a feed is down. For the WF-agent, this is likely to be solved with the 
 new
 feed-comm system, that can see disconnect messages [1] and act 
 accordingly 
Are there any docs, notes, etc. on the feed-comm system? I am not familiar with this.

...
 (i.E. server side add a synthetic "down" event into the
availability 
 data stream.
 Of course other feeds can also use that mechanism.

 A generic feed though, that is sending availability records from time to 
 time
 is most probably not sending a "down" event in the case that it is going
 down or crashing. So we need to have a periodic job looking for feeds
 that did not talk to us for a longer period of time.
 This also implies that at least the in-memory state for feed 
 availability
 needs to be updated with a last-seen record, as Micke described some 
 time
 ago ( that last seen record should probably be flushed to C* from time 
 to
 time). 
Why do we need to store the last seen availability in memory?

...
 Also we would need to require "generic" feeds to do some
heartbeats by
 sending their availability once per minute at least.

 Now for -401, which is trickier. If e.g. a WildFly is in state 
 'reload-needed',
 it is technically up, but its configuration has pending changes.

 So we would need "up" availability, and then another (sub) state 
 indicating
 the pending change.
 And then we may have state like "maintenance mode", where a resource
 may be up or down without impacting e.g. alerting or any SLA 
 computation.

 From those raw input variables we would then compute the resource
 state
 http://lists.jboss.org/pipermail/hawkular-dev/2015-March/000413.html

 While this could be up/down/unknown/(mixed for groups), it will also 
 mean
 that we need to convey the other information to the user. If e.g. a 
 resource
 is in maintenance mode, the user should be informed why alerts on the
 resource do not fire.
 Likewise for reload-needed: the user needs to know why the recent 
 changes
 he or she made did not change the way the appserver works.
 Treating reload-needed as just "down" is wrong, as the server continues 
 to
 work and serve requests. 
If you talking about correlation, then I am +1. When I think about RHQ, the user could
easily see availability state change, but he would have to go hunting around to see what
precipitated it.

...

 The above of course has an impact on storage. Right now we only store
 up/down/unknown (as text) for availability, but we certainly would need
 to also store sub-state.
 For the maintenance-mode, this is orthogonal to all the above and should
 probably a "flag" on a graph of resources.

   Heiko

 [1] @OnClose is called with a code of 1006 on client crash/abnormal 
 termination.
 See http://tools.ietf.org/html/rfc6455#section-7.4
 _______________________________________________
 hawkular-dev mailing list
 hawkular-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/hawkular-dev 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Hawkular-dev] Availability revisited