Re: [wildfly-dev] Domain Overview design

Friday, 25 July 2014

On 7/25/14, 7:49 AM, Liz Clayton wrote:
...
 Hi,

 ----- Original Message -----
> From: "Brian Stansberry" <brian.stansberry(a)redhat.com&gt;
> To: "Liz Clayton" <lclayton(a)redhat.com&gt;
> Cc: wildfly-dev(a)lists.jboss.org
> Sent: Thursday, July 24, 2014 3:58:15 PM
> Subject: Re: [wildfly-dev] Domain Overview design
>
> +100 on Jason's comment thanking you for posting this.

 Thanks for looking it over, and for the great feedback! I have some follow-up questions
inline.

> On 7/24/14, 12:05 PM, Jason Greene wrote:
>>
>> On Jul 24, 2014, at 11:45 AM, Jason Greene <jason.greene(a)redhat.com&gt;
wrote:
>>
>>>
>>> On Jul 24, 2014, at 11:22 AM, Liz Clayton <lclayton(a)redhat.com&gt; wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm sketching out some ideas for the Domain Overview screen. I'd
like to
>>>> find a visualization that make it easier to scan the page to determine
>>>> server availability, and possibly alerts.
>>>>
>>>> Given that the domain could be large, the visualization needs to scale.
I
>>>> started by looking at heatmap visualizations, which worked pretty well.
>>>> Although I didn't feel like they helped in describing the overall
>>>> relationships of servers, server groups and hosts... So I decided to
>>>> break the heat maps into individual (stacked) heatmaps, ordered by
>>>> server group. My hope is that this helps to define groupings and such.
>>>>
>>>> I posted the current design proposal at:
>>>> https://community.jboss.org/wiki/DomainOverview070114pdf
>>>>
>>>> It would be great to get feedback on the designs. Some questions I have
>>>> are:
>>>> - Is it difficult/easy to understand that the boxes, in the server
>>>> groupings, are intended to represent servers?
>
> That seemed intuitive to me. I don't get though why some boxes are
> different sizes from others on "Second Iteration: Stacked heatmap". On
> "First draft heatmap – Server Group view for 'availability'" I
could
> somewhat get that, as different server groups can vary in size based on
> # of servers.

 So it sounds like the uniformly sized boxes (pg 5) are working better for you? Followed
by the standard heatmap (pg15), and then not so much for the irregular ones on pg 16?

Yes. Aesthetically I like the pg16 approach, but once I asked myself 
what the size meant, I had no answer. :)

BTW, I just realized that pg 5 is the more critical page vs pg 16. :)

...
 The boxes are irregular on 16 because they were intended to display
mini heatmaps, stacked. And unlike the domain-view version (pg15) where the size of the
box would be driven by # of servers - the mini ones could be scaled by some other metric
(throughput or etc.). But I didn't really have that information, so I made the boxes
uniformly sized.

I see; so the thought was perhaps the user could choose a scaling factor 
or something?

...
>>>> - Should the servers be laid out in the visualization
by level of
>>>> availability/status (as illustrated), or by some other ordering (A-Z,
>>>> Z-A...)?
>
> My instinct is something like alphabetical is better. The color already
> lets me easily find and identify things where availability/status is
> relevant to what I want to do. But when it's not relevant to what I want
> (say I want to work with server-X for reasons completely unrelated to
> availability) then I need to rely on location to find what I want quickly.

 That makes sense. For the everything is OK scenario - maybe there could be some kind of
search to help locate/highlight a specific server in the display.

> Note that multiple servers in a domain can have the same name; it's the
> host (as in Host Controller) + server name combination that must be unique.

 Good to know.

>>>> - Is it difficult/easy to understand that when a box is a different
>>>> color, that it is indicating its availability status?
>
> If the colors are green and grey, maybe not. But in a number of examples
> you use green<->red, with grey too, and I think that's pretty intuitive.
> Red = bad, green = good, yellowish = in between, grey = the good/bad
> aspect is out-of-scope.

 Is there info to help figure out those bands - when something changes from yellow to
red...?

Not really, no. Having that gets into the "alerts" discussion. Some of 
the other notions I mentioned -- suspended, reload-required, 
restart-required -- could possibly be 'yellow', but as we discuss below 
those things are not quite the same as server "health".

The "suspended" state could logically be yellow, but if later we want to 
treat some sort of alert-based error criteria as triggering yellow, then 
we'd be mixing two different things into the "yellow" concept.

...
>>>> - What do you expect to be the relationship between
(Availability) Status
>>>> and Alerts? Would “x” alerts equate to a change in availability status,
>>>> or can they function independently? For example: Could you have an error
>>>> on a server and it still be “available?”
>
> I think we need a better understanding of what alerts are. In general, I
> like the idea of the occurrence of some kind of negative event tends to
> shift the color away from green and toward red. But what constitutes a
> negative event? How much control does the user have over that
> definition? How much control does the user have over how much different
> events shift the color?

 Yes exactly. I was hoping that there could be some type of relationship
(events=shifting), and that the user could drill-down to learn more about the event(s).
But lots of hand-waving in the design, because I'm not sure how do-able that is?

It's a very big task.

...
> Please be cautious about the Alerts notion in your design.
WildFly
> doesn't actually have the kind of altering system that many might be
> thinking of when they imagine this kind of thing. So it would have to be
> developed. That's something we want to do, and Jeff Mesnil is doing some
> of the foundational work, but it's not there yet and it's a big job
> competing with lots of other big tasks. So, we want it, but the more a
> UI design depends on a complex alerting system, the riskier it is that a
> needed feature won't be there.

 Great to know, thanks! I'm principally trying to find a way for users to drill-down
to get more information about an availability issue. Perhaps there are other ways that
steer clear of alerts.

> Re: some of the questions, on the "Questions" page:
>
> "What are the states for server availability?"
>
> On "Not available" and "Failed" we have no notion of why a server
was
> taken down; i.e. the admin takes it down, but whether they did so due to
> issues or for some other reason, we have no idea. We can distinguish a
> crash from an administrative shutdown.

 So it sounds like there is shutdown ("N/A") and "failed."

> Also, there are other aspect of a server's running state that complicate
> things.
>
> We are adding the ability to put a server in "suspending" and
> "suspended" states where it moves to not accepting normal end user
> requests but is still running. This isn't an "error" state; the admin
> has chosen to put the server in that state.

 So it sounds like there is: shutdown ("N/A"), "suspended", and
"failed."

> There's also a similar notion regarding how consistent the server's
> running state is with its persistent configuration. Admins can make
> configuration changes that will not take effect until the server is
> reloaded or restarted.

 Would that directly affect its availability status?

No, it wouldn't. The server goes into this state when the admin makes a 
config change that the server can't apply to its running services 
without affecting their ability to handle end-user requests. The server 
doesn't "just do it", it goes into this state which lets the admin know 
they need to take an action that *will* temporarily affect availability.

...
> I'm not sure if those things are well represented on a
green-yellow-red
> color continuum because they are somewhat different from server health.
> But they are important pieces of data to visualize.

 yes, great to know about.

> "How does the Alerts tab fit in with the *current* Notification message
> queue?"
>
> Heiko Braun knows better, but I don't see a close fit. The current queue
> isn't really based on any sort of server-push of events. The console
> makes administrative requests and gets responses; if relevant that
> request/response results in stuff in the queue. But anything that
> happens outside of those requests/responses is unknown to the console.

 So there are events happening on the system, that could affect availability, which will
not show up in the message queue?

Absolutely. The console only knows what it specifically asks or the 
effect of changes it makes, plus a small bit of status information that 
gets piggy-backed in the response to requests (i.e. that the server is 
in reload/restart-required state.) But the user's app could be throwing 
errors all over the place, resources like memory or thread pools could 
be overtaxed, etc, and the console would have no clue unless it check 
those specific things.

...
 Thanks!
 Liz

> Cheers,
>
>
> --
> Brian Stansberry
> Senior Principal Software Engineer
> JBoss by Red Hat
> 

-- 
Brian Stansberry
Senior Principal Software Engineer
JBoss by Red Hat

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: [wildfly-dev] Domain Overview design