Hi,
----- Original Message -----
From: "Brian Stansberry"
<brian.stansberry(a)redhat.com>
To: "Liz Clayton" <lclayton(a)redhat.com>
Cc: wildfly-dev(a)lists.jboss.org
Sent: Thursday, July 24, 2014 3:58:15 PM
Subject: Re: [wildfly-dev] Domain Overview design
+100 on Jason's comment thanking you for posting this.
Thanks for looking it over, and for the great feedback! I have some follow-up questions
inline.
On 7/24/14, 12:05 PM, Jason Greene wrote:
>
> On Jul 24, 2014, at 11:45 AM, Jason Greene <jason.greene(a)redhat.com> wrote:
>
>>
>> On Jul 24, 2014, at 11:22 AM, Liz Clayton <lclayton(a)redhat.com> wrote:
>>
>>> Hi,
>>>
>>> I'm sketching out some ideas for the Domain Overview screen. I'd
like to
>>> find a visualization that make it easier to scan the page to determine
>>> server availability, and possibly alerts.
>>>
>>> Given that the domain could be large, the visualization needs to scale. I
>>> started by looking at heatmap visualizations, which worked pretty well.
>>> Although I didn't feel like they helped in describing the overall
>>> relationships of servers, server groups and hosts... So I decided to
>>> break the heat maps into individual (stacked) heatmaps, ordered by
>>> server group. My hope is that this helps to define groupings and such.
>>>
>>> I posted the current design proposal at:
>>>
https://community.jboss.org/wiki/DomainOverview070114pdf
>>>
>>> It would be great to get feedback on the designs. Some questions I have
>>> are:
>>> - Is it difficult/easy to understand that the boxes, in the server
>>> groupings, are intended to represent servers?
That seemed intuitive to me. I don't get though why some boxes are
different sizes from others on "Second Iteration: Stacked heatmap". On
"First draft heatmap – Server Group view for 'availability'" I could
somewhat get that, as different server groups can vary in size based on
# of servers.
So it sounds like the uniformly sized boxes (pg 5) are working better for you? Followed by
the standard heatmap (pg15), and then not so much for the irregular ones on pg 16?
The boxes are irregular on 16 because they were intended to display mini heatmaps,
stacked. And unlike the domain-view version (pg15) where the size of the box would be
driven by # of servers - the mini ones could be scaled by some other metric (throughput or
etc.). But I didn't really have that information, so I made the boxes uniformly sized.
>>> - Should the servers be laid out in the visualization by
level of
>>> availability/status (as illustrated), or by some other ordering (A-Z,
>>> Z-A...)?
My instinct is something like alphabetical is better. The color already
lets me easily find and identify things where availability/status is
relevant to what I want to do. But when it's not relevant to what I want
(say I want to work with server-X for reasons completely unrelated to
availability) then I need to rely on location to find what I want quickly.
That makes sense. For the everything is OK scenario - maybe there could be some kind of
search to help locate/highlight a specific server in the display.
Note that multiple servers in a domain can have the same name;
it's the
host (as in Host Controller) + server name combination that must be unique.
Good to know.
>>> - Is it difficult/easy to understand that when a box is
a different
>>> color, that it is indicating its availability status?
If the colors are green and grey, maybe not. But in a number of examples
you use green<->red, with grey too, and I think that's pretty intuitive.
Red = bad, green = good, yellowish = in between, grey = the good/bad
aspect is out-of-scope.
Is there info to help figure out those bands - when something changes from yellow to
red...?
>>> - What do you expect to be the relationship between
(Availability) Status
>>> and Alerts? Would “x” alerts equate to a change in availability status,
>>> or can they function independently? For example: Could you have an error
>>> on a server and it still be “available?”
I think we need a better understanding of what alerts are. In general, I
like the idea of the occurrence of some kind of negative event tends to
shift the color away from green and toward red. But what constitutes a
negative event? How much control does the user have over that
definition? How much control does the user have over how much different
events shift the color?
Yes exactly. I was hoping that there could be some type of relationship (events=shifting),
and that the user could drill-down to learn more about the event(s). But lots of
hand-waving in the design, because I'm not sure how do-able that is?
Please be cautious about the Alerts notion in your design. WildFly
doesn't actually have the kind of altering system that many might be
thinking of when they imagine this kind of thing. So it would have to be
developed. That's something we want to do, and Jeff Mesnil is doing some
of the foundational work, but it's not there yet and it's a big job
competing with lots of other big tasks. So, we want it, but the more a
UI design depends on a complex alerting system, the riskier it is that a
needed feature won't be there.
Great to know, thanks! I'm principally trying to find a way for users to drill-down to
get more information about an availability issue. Perhaps there are other ways that steer
clear of alerts.
Re: some of the questions, on the "Questions" page:
"What are the states for server availability?"
On "Not available" and "Failed" we have no notion of why a server
was
taken down; i.e. the admin takes it down, but whether they did so due to
issues or for some other reason, we have no idea. We can distinguish a
crash from an administrative shutdown.
So it sounds like there is shutdown ("N/A") and "failed."
Also, there are other aspect of a server's running state that
complicate
things.
We are adding the ability to put a server in "suspending" and
"suspended" states where it moves to not accepting normal end user
requests but is still running. This isn't an "error" state; the admin
has chosen to put the server in that state.
So it sounds like there is: shutdown ("N/A"), "suspended", and
"failed."
There's also a similar notion regarding how consistent the
server's
running state is with its persistent configuration. Admins can make
configuration changes that will not take effect until the server is
reloaded or restarted.
Would that directly affect its availability status?
I'm not sure if those things are well represented on a
green-yellow-red
color continuum because they are somewhat different from server health.
But they are important pieces of data to visualize.
yes, great to know about.
"How does the Alerts tab fit in with the *current* Notification
message
queue?"
Heiko Braun knows better, but I don't see a close fit. The current queue
isn't really based on any sort of server-push of events. The console
makes administrative requests and gets responses; if relevant that
request/response results in stuff in the queue. But anything that
happens outside of those requests/responses is unknown to the console.
So there are events happening on the system, that could affect availability, which will
not show up in the message queue?
Thanks!
Liz
Cheers,
--
Brian Stansberry
Senior Principal Software Engineer
JBoss by Red Hat