[wildfly-dev] Error reporting on deployment failure

Tomaž Cerar tomaz.cerar at gmail.com
Thu Feb 15 11:28:09 EST 2018


Hey,

One of things we did talk about at f2f, but never got into details,
that would help with this is adding capabilities to deployments.

This way one failure you would get error message with telling you what
capability is not available.
for example, that datasource is missing and where you can define it.
Or datasource defined at address xyz is in error so you know where to look
to fix it.

For start we should need to expose capability registry and few other things
to DUPs
and continue from there.

--
tomaz

On Thu, Feb 15, 2018 at 4:37 AM, Stuart Douglas <stuart.w.douglas at gmail.com>
wrote:

>
>
> On Wed, Feb 14, 2018 at 4:43 PM, Brian Stansberry <
> brian.stansberry at redhat.com> wrote:
>
>> On Tue, Feb 13, 2018 at 8:24 PM, Stuart Douglas <
>> stuart.w.douglas at gmail.com> wrote:
>>
>>> Hi Everyone,
>>>
>>> I have been thinking a bit about the way we report errors in WildFly,
>>> and I think this is something that we can improve on. At the moment I think
>>> we are way to liberal with what we report, which results in a ton of
>>> services being listed in the error report that have nothing to do with the
>>> actual failure.
>>>
>>> As an example to work from I have created [1], which is a simple EJB
>>> application. This consists of 10 EJB's, one of which has a reference to a
>>> non-existant data source, the rest are simply empty no-op EJB's (just
>>> @Stateless on an empty class).
>>>
>>> This app fails to deploy because the java:global/NonExistant data source
>>> is missing, which gives the failure description in [2]. This is ~120 lines
>>> long and lists multiple services for every single component in the
>>> application (part of the reason this is so long is because the failures are
>>> reported twice, once when the deployment fails and once when the server
>>> starts).
>>>
>>> I think we can improve on this. I think in every failure case there will
>>> be some root causes that are all the end user cares about, and we should
>>> limit our reporting to just these cases, rather than listing every internal
>>> service that can no longer start due to missing transitive deps.
>>>
>>> In particular these root causes are:
>>> 1) A service threw and exception in its start() method and failed to
>>> start
>>> 2) A dependency is actually missing (i.e. not installed, not just not
>>> started)
>>>
>>> I think that one or both of these two cases will be the root cause of
>>> any failure, and as such that is all we should be reporting on.
>>>
>>> We already do an OK job of handing case 1), services that have failed,
>>> as they get their own line item in the error report, however case 2)
>>> results in a huge report that lists every service that has not come up, no
>>> matter how far removed they are from the actual problem.
>>>
>>
>> If the 2) case can be correctly determined, then +1 to reporting some new
>> section and not reporting the current "WFLYCTL0180: Services with
>> missing/unavailable dependencies" section. The WFLYCTL0180 section could
>> only be reported as a fallback if for some reason the 1) and 2) stuff is
>> empty.
>>
>
> I have adjusted this a bit so a service with mode NEVER is treated the
> same as if it is missing. I am pretty sure that with this change 1) and 2)
> will cover 100% of cases.
>
>
>
>>
>>
>>>
>>> I think we could make a change to the way this is reported so that only
>>> direct problems are reported [3], so the error report would look something
>>> like [4] (note that this commit only changes the operation report, the
>>> container state reporting after boot is still quite verbose).
>>>
>>
>> I think the container state reporting is ok. IMHO the proper fix to the
>> container state reporting is to rollback and fail boot if Stage.RUNTIME
>> failures occur. Configurable, but rollback by default. If we did that there
>> would be no container state reporting. If you deploy your broken app
>> post-boot you shouldn't see the container state reporting because by the
>> time the report is written the op should have rolled back and the services
>> are no longer "missing". It's only because we don't rollback on boot that
>> this is reported.
>>
>
> I don't think it is nessesary to report on services that are only down
> because their dependents are down. It basically just adds noise, as they
> are not really related to the underlying issue. I have expanded my branch
> to also do this:
>
> https://github.com/wildfly/wildfly-core/compare/master...
> stuartwdouglas:error-reporting?expand=1
>
> This ends up with very concise reports that just detail the services that
> are the root cause of the problem: https://gist.github.com/stuartwdouglas/
> 42a68aaaa130ceee38ca5f66d0040de3
>
> Does this approach seem reasonable? lf a user really does want a complete
> dump of all services that are down that information is still available
> directly from MSC anyway.
>
> Stuart
>
>
>>
>>>
>>> I am guessing that this is not as simple as it sounds, otherwise it
>>> would have already been addressed, but I think we can do better that the
>>> current state of affairs so I thought I would get a discussion started.
>>>
>>
>> It sounds pretty simple. Any "problem" ServiceController exposes its
>> ServiceContainer, and if relying on that registry to check if a missing
>> dependency is installed is not correct for some reason, the
>> ModelControllerImpl exposes its ServiceRegistry via a package protected
>> getter. So AbstractOperationContext can provide that to the SVH.
>>
>>
>>> Stuart
>>>
>>> [1] https://github.com/stuartwdouglas/errorreporting
>>> [2] https://gist.github.com/stuartwdouglas/b52a85813913f3304301e
>>> eb1f389fae8
>>> [3] https://github.com/stuartwdouglas/wildfly-core/commit/a1
>>> fbc831edf290971d54c13dd1c5d15719454f85
>>> [4] https://gist.github.com/stuartwdouglas/14040534da8d07f93
>>> 7d02f2f08099e8d
>>>
>>> _______________________________________________
>>> wildfly-dev mailing list
>>> wildfly-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>
>>
>>
>>
>> --
>> Brian Stansberry
>> Manager, Senior Principal Software Engineer
>> Red Hat
>>
>
>
> _______________________________________________
> wildfly-dev mailing list
> wildfly-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/wildfly-dev/attachments/20180215/c6558e6e/attachment.html 


More information about the wildfly-dev mailing list