[wildfly-dev] Error reporting on deployment failure
Bob McWhirter
bmcwhirt at redhat.com
Thu Feb 15 11:20:05 EST 2018
Agreed. I’ve had to track giant errors from A to B to C etc only to figure
out Z was missing.
On Wed, Feb 14, 2018 at 10:38 PM Stuart Douglas <stuart.w.douglas at gmail.com>
wrote:
> On Wed, Feb 14, 2018 at 4:43 PM, Brian Stansberry <
> brian.stansberry at redhat.com> wrote:
>
>> On Tue, Feb 13, 2018 at 8:24 PM, Stuart Douglas <
>> stuart.w.douglas at gmail.com> wrote:
>>
>>> Hi Everyone,
>>>
>>> I have been thinking a bit about the way we report errors in WildFly,
>>> and I think this is something that we can improve on. At the moment I think
>>> we are way to liberal with what we report, which results in a ton of
>>> services being listed in the error report that have nothing to do with the
>>> actual failure.
>>>
>>> As an example to work from I have created [1], which is a simple EJB
>>> application. This consists of 10 EJB's, one of which has a reference to a
>>> non-existant data source, the rest are simply empty no-op EJB's (just
>>> @Stateless on an empty class).
>>>
>>> This app fails to deploy because the java:global/NonExistant data source
>>> is missing, which gives the failure description in [2]. This is ~120 lines
>>> long and lists multiple services for every single component in the
>>> application (part of the reason this is so long is because the failures are
>>> reported twice, once when the deployment fails and once when the server
>>> starts).
>>>
>>> I think we can improve on this. I think in every failure case there will
>>> be some root causes that are all the end user cares about, and we should
>>> limit our reporting to just these cases, rather than listing every internal
>>> service that can no longer start due to missing transitive deps.
>>>
>>> In particular these root causes are:
>>> 1) A service threw and exception in its start() method and failed to
>>> start
>>> 2) A dependency is actually missing (i.e. not installed, not just not
>>> started)
>>>
>>> I think that one or both of these two cases will be the root cause of
>>> any failure, and as such that is all we should be reporting on.
>>>
>>> We already do an OK job of handing case 1), services that have failed,
>>> as they get their own line item in the error report, however case 2)
>>> results in a huge report that lists every service that has not come up, no
>>> matter how far removed they are from the actual problem.
>>>
>>
>> If the 2) case can be correctly determined, then +1 to reporting some new
>> section and not reporting the current "WFLYCTL0180: Services with
>> missing/unavailable dependencies" section. The WFLYCTL0180 section could
>> only be reported as a fallback if for some reason the 1) and 2) stuff is
>> empty.
>>
>
> I have adjusted this a bit so a service with mode NEVER is treated the
> same as if it is missing. I am pretty sure that with this change 1) and 2)
> will cover 100% of cases.
>
>
>
>>
>>
>>>
>>> I think we could make a change to the way this is reported so that only
>>> direct problems are reported [3], so the error report would look something
>>> like [4] (note that this commit only changes the operation report, the
>>> container state reporting after boot is still quite verbose).
>>>
>>
>> I think the container state reporting is ok. IMHO the proper fix to the
>> container state reporting is to rollback and fail boot if Stage.RUNTIME
>> failures occur. Configurable, but rollback by default. If we did that there
>> would be no container state reporting. If you deploy your broken app
>> post-boot you shouldn't see the container state reporting because by the
>> time the report is written the op should have rolled back and the services
>> are no longer "missing". It's only because we don't rollback on boot that
>> this is reported.
>>
>
> I don't think it is nessesary to report on services that are only down
> because their dependents are down. It basically just adds noise, as they
> are not really related to the underlying issue. I have expanded my branch
> to also do this:
>
>
> https://github.com/wildfly/wildfly-core/compare/master...stuartwdouglas:error-reporting?expand=1
>
> This ends up with very concise reports that just detail the services that
> are the root cause of the problem:
> https://gist.github.com/stuartwdouglas/42a68aaaa130ceee38ca5f66d0040de3
>
> Does this approach seem reasonable? lf a user really does want a complete
> dump of all services that are down that information is still available
> directly from MSC anyway.
>
> Stuart
>
>
>>
>>>
>>> I am guessing that this is not as simple as it sounds, otherwise it
>>> would have already been addressed, but I think we can do better that the
>>> current state of affairs so I thought I would get a discussion started.
>>>
>>
>> It sounds pretty simple. Any "problem" ServiceController exposes its
>> ServiceContainer, and if relying on that registry to check if a missing
>> dependency is installed is not correct for some reason, the
>> ModelControllerImpl exposes its ServiceRegistry via a package protected
>> getter. So AbstractOperationContext can provide that to the SVH.
>>
>>
>>> Stuart
>>>
>>> [1] https://github.com/stuartwdouglas/errorreporting
>>> [2]
>>> https://gist.github.com/stuartwdouglas/b52a85813913f3304301eeb1f389fae8
>>> [3]
>>> https://github.com/stuartwdouglas/wildfly-core/commit/a1fbc831edf290971d54c13dd1c5d15719454f85
>>> [4]
>>> https://gist.github.com/stuartwdouglas/14040534da8d07f937d02f2f08099e8d
>>>
>>> _______________________________________________
>>> wildfly-dev mailing list
>>> wildfly-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>
>>
>>
>>
>> --
>> Brian Stansberry
>> Manager, Senior Principal Software Engineer
>> Red Hat
>>
> _______________________________________________
> wildfly-dev mailing list
> wildfly-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/wildfly-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/wildfly-dev/attachments/20180215/9a031587/attachment-0001.html
More information about the wildfly-dev
mailing list