<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    Yes, we had some bz/jiras opened about this before.  We get cases
    where customers application is failing and has pages upon pages of
    dependency errors and the customer cannot easily determine the
    issue.  And even support has difficultly, we usually try searching
    for common things like datasources or other JNDI references that
    might be missing, but I have seen several where it was not a
    datasource and took a while of tearing the apps apart to resolve. 
    It looks like there was some improvement in EAP 7.1 [2], but it
    sounds like Stuart's PR may be even better.<br>
    <br>
    I found one example deployment on [1] that we could try and see what
    the logging looks like with the new PR.<br>
    I figure the service dump would show all of the failed dependencies
    in case there was a need to look at the others?<br>
    <br>
    [1] <a class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=1283294">https://bugzilla.redhat.com/show_bug.cgi?id=1283294</a><br>
    [2] <a class="moz-txt-link-freetext" href="https://issues.jboss.org/browse/JBEAP-5311">https://issues.jboss.org/browse/JBEAP-5311</a><br>
    <br>
    <div class="moz-cite-prefix">On 2/15/18 5:15 PM, Stuart Douglas
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAAoo=c6zt2wX22tKHCLsEE+S3StMXWvWXLSp8o5PhpGY9_TvOQ@mail.gmail.com">
      <div dir="ltr">I have opened <a
          href="https://github.com/wildfly/wildfly-core/pull/3114"
          moz-do-not-send="true">https://github.com/wildfly/wildfly-core/pull/3114</a>
        to allow for testing/further review.
        <div><br>
        </div>
        <div>Stuart</div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Thu, Feb 15, 2018 at 11:32 PM,
          Stuart Douglas <span dir="ltr">&lt;<a
              href="mailto:stuart.w.douglas@gmail.com" target="_blank"
              moz-do-not-send="true">stuart.w.douglas@gmail.com</a>&gt;</span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div dir="ltr"><br>
              <div class="gmail_extra"><br>
                <div class="gmail_quote">
                  <div>
                    <div class="h5">On Thu, Feb 15, 2018 at 6:51 PM,
                      Brian Stansberry <span dir="ltr">&lt;<a
                          href="mailto:brian.stansberry@redhat.com"
                          target="_blank" moz-do-not-send="true">brian.stansberry@redhat.com</a>&gt;</span>
                      wrote:<br>
                      <blockquote class="gmail_quote" style="margin:0px
                        0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
                        <div dir="ltr">
                          <div class="gmail_extra">
                            <div class="gmail_quote"><span
                                class="m_-8471000774253265312gmail-">On
                                Wed, Feb 14, 2018 at 9:37 PM, Stuart
                                Douglas <span dir="ltr">&lt;<a
                                    href="mailto:stuart.w.douglas@gmail.com"
                                    target="_blank"
                                    moz-do-not-send="true">stuart.w.douglas@gmail.com</a>&gt;</span>
                                wrote:<br>
                                <blockquote class="gmail_quote"
                                  style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
                                  <div dir="ltr"><br>
                                    <div class="gmail_extra"><br>
                                      <div class="gmail_quote"><span>On
                                          Wed, Feb 14, 2018 at 4:43 PM,
                                          Brian Stansberry <span
                                            dir="ltr">&lt;<a
                                              href="mailto:brian.stansberry@redhat.com"
                                              target="_blank"
                                              moz-do-not-send="true">brian.stansberry@redhat.com</a>&gt;</span>
                                          wrote:<br>
                                          <blockquote
                                            class="gmail_quote"
                                            style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
                                            <div dir="ltr">
                                              <div class="gmail_extra">
                                                <div class="gmail_quote"><span
class="m_-8471000774253265312gmail-m_1451192004882818368m_273302915848364358gmail-">On
                                                    Tue, Feb 13, 2018 at
                                                    8:24 PM, Stuart
                                                    Douglas <span
                                                      dir="ltr">&lt;<a
                                                        href="mailto:stuart.w.douglas@gmail.com"
                                                        target="_blank"
moz-do-not-send="true">stuart.w.douglas@gmail.com</a>&gt;</span> wrote:<br>
                                                    <blockquote
                                                      class="gmail_quote"
                                                      style="margin:0px
                                                      0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
                                                      <div dir="ltr">Hi
                                                        Everyone,
                                                        <div><br>
                                                        </div>
                                                        <div>I have been
                                                          thinking a bit
                                                          about the way
                                                          we report
                                                          errors in
                                                          WildFly, and I
                                                          think this is
                                                          something that
                                                          we can improve
                                                          on. At the
                                                          moment I think
                                                          we are way to
                                                          liberal with
                                                          what we
                                                          report, which
                                                          results in a
                                                          ton of
                                                          services being
                                                          listed in the
                                                          error report
                                                          that have
                                                          nothing to do
                                                          with the
                                                          actual
                                                          failure.</div>
                                                        <div><br>
                                                        </div>
                                                        <div>As an
                                                          example to
                                                          work from I
                                                          have created
                                                          [1], which is
                                                          a simple EJB
                                                          application.
                                                          This consists
                                                          of 10 EJB's,
                                                          one of which
                                                          has a
                                                          reference to a
                                                          non-existant
                                                          data source,
                                                          the rest are
                                                          simply empty
                                                          no-op EJB's
                                                          (just
                                                          @Stateless on
                                                          an empty
                                                          class).</div>
                                                        <div><br>
                                                        </div>
                                                        <div>This app
                                                          fails to
                                                          deploy because
                                                          the
                                                          java:global/NonExistant
                                                          data source is
                                                          missing, which
                                                          gives the
                                                          failure
                                                          description in
                                                          [2]. This is
                                                          ~120 lines
                                                          long and lists
                                                          multiple
                                                          services for
                                                          every single
                                                          component in
                                                          the
                                                          application
                                                          (part of the
                                                          reason this is
                                                          so long is
                                                          because the
                                                          failures are
                                                          reported
                                                          twice, once
                                                          when the
                                                          deployment
                                                          fails and once
                                                          when the
                                                          server
                                                          starts).</div>
                                                        <div><br>
                                                        </div>
                                                        <div>I think we
                                                          can improve on
                                                          this. I think
                                                          in every
                                                          failure case
                                                          there will be
                                                          some root
                                                          causes that
                                                          are all the
                                                          end user cares
                                                          about, and we
                                                          should limit
                                                          our reporting
                                                          to just these
                                                          cases, rather
                                                          than listing
                                                          every internal
                                                          service that
                                                          can no longer
                                                          start due to
                                                          missing
                                                          transitive
                                                          deps.</div>
                                                        <div><br>
                                                        </div>
                                                        <div>In
                                                          particular
                                                          these root
                                                          causes are:</div>
                                                        <div>1) A
                                                          service threw
                                                          and exception
                                                          in its start()
                                                          method and
                                                          failed to
                                                          start</div>
                                                        <div>2) A
                                                          dependency is
                                                          actually
                                                          missing (i.e.
                                                          not installed,
                                                          not just not
                                                          started)</div>
                                                        <div><br>
                                                        </div>
                                                        <div>I think
                                                          that one or
                                                          both of these
                                                          two cases will
                                                          be the root
                                                          cause of any
                                                          failure, and
                                                          as such that
                                                          is all we
                                                          should be
                                                          reporting on.</div>
                                                        <div><br>
                                                        </div>
                                                        <div>We already
                                                          do an OK job
                                                          of handing
                                                          case 1),
                                                          services that
                                                          have failed,
                                                          as they get
                                                          their own line
                                                          item in the
                                                          error report,
                                                          however case
                                                          2) results in
                                                          a huge report
                                                          that lists
                                                          every service
                                                          that has not
                                                          come up, no
                                                          matter how far
                                                          removed they
                                                          are from the
                                                          actual
                                                          problem.</div>
                                                      </div>
                                                    </blockquote>
                                                    <div><br>
                                                    </div>
                                                  </span>
                                                  <div>If the 2) case
                                                    can be correctly
                                                    determined, then +1
                                                    to reporting some
                                                    new section and not
                                                    reporting the
                                                    current "WFLYCTL0180:
                                                    Services with
                                                    missing/unavailable
                                                    dependencies"
                                                    section. The
                                                    WFLYCTL0180 section
                                                    could only be
                                                    reported as a
                                                    fallback if for some
                                                    reason the 1) and 2)
                                                    stuff is empty.</div>
                                                </div>
                                              </div>
                                            </div>
                                          </blockquote>
                                          <div><br>
                                          </div>
                                        </span>
                                        <div>I have adjusted this a bit
                                          so a service with mode NEVER
                                          is treated the same as if it
                                          is missing. I am pretty sure
                                          that with this change 1) and
                                          2) will cover 100% of cases.</div>
                                        <span>
                                          <div><br>
                                          </div>
                                          <div> </div>
                                          <blockquote
                                            class="gmail_quote"
                                            style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
                                            <div dir="ltr">
                                              <div class="gmail_extra">
                                                <div class="gmail_quote"><span
class="m_-8471000774253265312gmail-m_1451192004882818368m_273302915848364358gmail-">
                                                    <div> </div>
                                                    <blockquote
                                                      class="gmail_quote"
                                                      style="margin:0px
                                                      0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
                                                      <div dir="ltr">
                                                        <div><br>
                                                        </div>
                                                        <div>I think we
                                                          could make a
                                                          change to the
                                                          way this is
                                                          reported so
                                                          that only
                                                          direct
                                                          problems are
                                                          reported [3],
                                                          so the error
                                                          report would
                                                          look something
                                                          like [4] (note
                                                          that this
                                                          commit only
                                                          changes the
                                                          operation
                                                          report, the
                                                          container
                                                          state
                                                          reporting
                                                          after boot is
                                                          still quite
                                                          verbose).</div>
                                                      </div>
                                                    </blockquote>
                                                    <div><br>
                                                    </div>
                                                  </span>
                                                  <div>I think the
                                                    container state
                                                    reporting is ok.
                                                    IMHO the proper fix
                                                    to the container
                                                    state reporting is
                                                    to rollback and fail
                                                    boot if
                                                    Stage.RUNTIME
                                                    failures occur.
                                                    Configurable, but
                                                    rollback by default.
                                                    If we did that there
                                                    would be no
                                                    container state
                                                    reporting. If you
                                                    deploy your broken
                                                    app post-boot you
                                                    shouldn't see the
                                                    container state
                                                    reporting because by
                                                    the time the report
                                                    is written the op
                                                    should have rolled
                                                    back and the
                                                    services are no
                                                    longer "missing".
                                                    It's only because we
                                                    don't rollback on
                                                    boot that this is
                                                    reported.</div>
                                                </div>
                                              </div>
                                            </div>
                                          </blockquote>
                                          <div><br>
                                          </div>
                                        </span>
                                        <div>I don't think it is
                                          nessesary to report on
                                          services that are only down
                                          because their dependents are
                                          down. It basically just adds
                                          noise, as they are not really
                                          related to the underlying
                                          issue. I have expanded my
                                          branch to also do this:</div>
                                        <div><br>
                                        </div>
                                        <div><a
href="https://github.com/wildfly/wildfly-core/compare/master...stuartwdouglas:error-reporting?expand=1"
                                            target="_blank"
                                            moz-do-not-send="true">https://github.com/wildfly/wil<wbr>dfly-core/compare/master...stu<wbr>artwdouglas:error-reporting?ex<wbr>pand=1</a><br>
                                        </div>
                                        <div> </div>
                                        <div>This ends up with very
                                          concise reports that just
                                          detail the services that are
                                          the root cause of the
                                          problem: <a
href="https://gist.github.com/stuartwdouglas/42a68aaaa130ceee38ca5f66d0040de3"
                                            target="_blank"
                                            moz-do-not-send="true">https://gist.github.c<wbr>om/stuartwdouglas/42a68aaaa130<wbr>ceee38ca5f66d0040de3</a></div>
                                        <div><br>
                                        </div>
                                        <div>Does this approach seem
                                          reasonable? lf a user really
                                          does want a complete dump of
                                          all services that are down
                                          that information is still
                                          available directly from MSC
                                          anyway.</div>
                                      </div>
                                    </div>
                                  </div>
                                </blockquote>
                                <div><br>
                                </div>
                              </span>
                              <div>It seems reasonable.</div>
                              <div><br>
                              </div>
                              <div>I'm going to get all lawyerly now.
                                This is because while we don't treat our
                                failure messages as "API" requiring
                                compatibility, for these particular ones
                                I think we should be as careful as
                                possible.</div>
                              <div><br>
                              </div>
                              <div>1)  "WFLYCTL0180: Services with
                                missing/unavailable dependencies" =&gt;
                                ["<a
                                  href="http://jboss.naming.context.java.co"
                                  moz-do-not-send="true">jboss.naming.context.java.co</a><wbr>mp.\"error-reporting-1.0-SNAPS<wbr>HOT\".\"error-reporting-1.0-<wbr>SNAPSHOT\".ErrorEjb.env.\"com.<wbr>stuartdouglas.ErrorEjb\".nonEx<wbr>istant
                                is missing
                                [jboss.naming.context.java.glo<wbr>bal.NonExistant]"]</div>
                              <div><br>
                              </div>
                              <div>Here you've somewhat repurposed an
                                existing message. That can be quite ok
                                IMHO so long as what's gone is just
                                noise and the English meaning of the
                                message is still correct. Basically,
                                what did "missing/unavailable
                                dependencies" mean before, what does it
                                mean now, and is there a clear story
                                behind the shift from A to B.  The
                                "missing" part is pretty clear -- not
                                installed or NEVER is "missing". For
                                "unavailable" now we've dropped the
                                installed but unstarted ones. If we're
                                including the ones that directly depend
                                on *failed* services then that's a
                                coherent definition of "unavailable". If
                                we're not then "unavailable" is
                                misleading. Sorry, I'm juggling stuff so
                                I haven't checked the code. :(</div>
                            </div>
                          </div>
                        </div>
                      </blockquote>
                      <div><br>
                      </div>
                    </div>
                  </div>
                  <div>Previously this section would display every
                    service that was down due to its dependencies being
                    down. This would include services that were many
                    levels away from the actual problem (e.g. if A
                    depends on B which depends on C which depends on D
                    which is down, A, B and C would all be listed in
                    this section). This change displays the same
                    information, but only for direct dependents, so in
                    the example about only C would be listed in this
                    section.</div>
                  <div><br>
                  </div>
                  <div>The 'New missing/unsatisfied dependencies:'
                    section in the container state report is similar.
                    Previously it would list every service that had
                    failed to come up, now it will only list services
                    that are directly affected by a problem.</div>
                  <span class="">
                    <div> </div>
                    <blockquote class="gmail_quote" style="margin:0px
                      0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
                      <div dir="ltr">
                        <div class="gmail_extra">
                          <div class="gmail_quote">
                            <div><br>
                            </div>
                            <div>2) I think "38 additional services are
                              down due to their dependencies being
                              missing or failed" should have a message
                              code, not NONE. It's a separate message
                              that may or may not appear. Plus it's new.
                              And I think we're better off in these
                              complex message structures to be precise
                              vs trying to avoid codes for cosmetic
                              reasons.</div>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                    <div><br>
                    </div>
                  </span>
                  <div>Ok.</div>
                  <span class="HOEnZb"><font color="#888888">
                      <div><br>
                      </div>
                      <div>Stuart</div>
                    </font></span><span class="">
                    <div> </div>
                    <blockquote class="gmail_quote" style="margin:0px
                      0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
                      <div dir="ltr">
                        <div class="gmail_extra">
                          <div class="gmail_quote"><span
                              class="m_-8471000774253265312gmail-">
                              <div><br>
                              </div>
                              <div><br>
                              </div>
                              <blockquote class="gmail_quote"
                                style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
                                <div dir="ltr">
                                  <div class="gmail_extra">
                                    <div class="gmail_quote"><span
                                        class="m_-8471000774253265312gmail-m_1451192004882818368HOEnZb"><font
                                          color="#888888">
                                          <div><br>
                                          </div>
                                          <div>Stuart</div>
                                        </font></span><span>
                                        <div><br>
                                        </div>
                                        <blockquote class="gmail_quote"
                                          style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
                                          <div dir="ltr">
                                            <div class="gmail_extra">
                                              <div class="gmail_quote"><span
class="m_-8471000774253265312gmail-m_1451192004882818368m_273302915848364358gmail-">
                                                  <div> </div>
                                                  <blockquote
                                                    class="gmail_quote"
                                                    style="margin:0px
                                                    0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
                                                    <div dir="ltr">
                                                      <div><br>
                                                      </div>
                                                      <div>I am guessing
                                                        that this is not
                                                        as simple as it
                                                        sounds,
                                                        otherwise it
                                                        would have
                                                        already been
                                                        addressed, but I
                                                        think we can do
                                                        better that the
                                                        current state of
                                                        affairs so I
                                                        thought I would
                                                        get a discussion
                                                        started.</div>
                                                    </div>
                                                  </blockquote>
                                                  <div><br>
                                                  </div>
                                                </span>
                                                <div>It sounds pretty
                                                  simple. Any "problem"
                                                  ServiceController
                                                  exposes its
                                                  ServiceContainer, and
                                                  if relying on that
                                                  registry to check if a
                                                  missing dependency is
                                                  installed is not
                                                  correct for some
                                                  reason, the
                                                  ModelControllerImpl
                                                  exposes its
                                                  ServiceRegistry via a
                                                  package protected
                                                  getter. So
                                                  AbstractOperationContext
                                                  can provide that to
                                                  the SVH.</div>
                                                <div><br>
                                                </div>
                                                <blockquote
                                                  class="gmail_quote"
                                                  style="margin:0px 0px
                                                  0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><span
class="m_-8471000774253265312gmail-m_1451192004882818368m_273302915848364358gmail-">
                                                    <div dir="ltr">
                                                      <div><br>
                                                      </div>
                                                      <div>Stuart</div>
                                                      <div><br>
                                                      </div>
                                                      <div>[1] <a
                                                          href="https://github.com/stuartwdouglas/errorreporting"
target="_blank" moz-do-not-send="true">https://github.com/stuartwdoug<wbr>las/errorreporting</a></div>
                                                      <div>[2] <a
href="https://gist.github.com/stuartwdouglas/b52a85813913f3304301eeb1f389fae8"
target="_blank" moz-do-not-send="true">https://gist.github.com/stuart<wbr>wdouglas/b52a85813913f3304301e<wbr>eb1f389fae8</a> </div>
                                                      <div>[3] <a
href="https://github.com/stuartwdouglas/wildfly-core/commit/a1fbc831edf290971d54c13dd1c5d15719454f85"
target="_blank" moz-do-not-send="true">https://github.com/stuartw<wbr>douglas/wildfly-core/commit/a1<wbr>fbc831edf290971d54c13dd1c5d157<wbr>19454f85</a></div>
                                                      <div>[4] <a
href="https://gist.github.com/stuartwdouglas/14040534da8d07f937d02f2f08099e8d"
target="_blank" moz-do-not-send="true">https://gist.github.com/st<wbr>uartwdouglas/14040534da8d07f93<wbr>7d02f2f08099e8d</a></div>
                                                    </div>
                                                    <br>
                                                  </span>______________________________<wbr>_________________<br>
                                                  wildfly-dev mailing
                                                  list<br>
                                                  <a
                                                    href="mailto:wildfly-dev@lists.jboss.org"
                                                    target="_blank"
                                                    moz-do-not-send="true">wildfly-dev@lists.jboss.org</a><br>
                                                  <a
                                                    href="https://lists.jboss.org/mailman/listinfo/wildfly-dev"
                                                    rel="noreferrer"
                                                    target="_blank"
                                                    moz-do-not-send="true">https://lists.jboss.org/mailma<wbr>n/listinfo/wildfly-dev</a><span
class="m_-8471000774253265312gmail-m_1451192004882818368m_273302915848364358gmail-HOEnZb"><font
                                                      color="#888888"><br>
                                                    </font></span></blockquote>
                                              </div>
                                              <span
class="m_-8471000774253265312gmail-m_1451192004882818368m_273302915848364358gmail-HOEnZb"><font
                                                  color="#888888"><br>
                                                  <br clear="all">
                                                  <div><br>
                                                  </div>
                                                  -- <br>
                                                  <div
class="m_-8471000774253265312gmail-m_1451192004882818368m_273302915848364358gmail-m_7631090312607809384gmail_signature">
                                                    <div dir="ltr">Brian
                                                      Stansberry
                                                      <div>Manager,
                                                        Senior Principal
                                                        Software
                                                        Engineer</div>
                                                      <div>Red Hat</div>
                                                    </div>
                                                  </div>
                                                </font></span></div>
                                          </div>
                                        </blockquote>
                                      </span></div>
                                    <br>
                                  </div>
                                </div>
                              </blockquote>
                            </span></div>
                          <span class="m_-8471000774253265312gmail-"><br>
                            <br clear="all">
                            <div><br>
                            </div>
                            -- <br>
                            <div
                              class="m_-8471000774253265312gmail-m_1451192004882818368gmail_signature">
                              <div dir="ltr">Brian Stansberry
                                <div>Manager, Senior Principal Software
                                  Engineer</div>
                                <div>Red Hat</div>
                              </div>
                            </div>
                          </span></div>
                      </div>
                    </blockquote>
                  </span></div>
                <br>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
wildfly-dev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:wildfly-dev@lists.jboss.org">wildfly-dev@lists.jboss.org</a>
<a class="moz-txt-link-freetext" href="https://lists.jboss.org/mailman/listinfo/wildfly-dev">https://lists.jboss.org/mailman/listinfo/wildfly-dev</a></pre>
    </blockquote>
    <br>
  </body>
</html>