[jboss-jira] [JBoss JIRA] (WFCORE-2089) Infinite wildfly boot on StartException in service start

Brian Stansberry (JIRA) issues at jboss.org
Wed Dec 7 14:08:00 EST 2016


    [ https://issues.jboss.org/browse/WFCORE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13335568#comment-13335568 ] 

Brian Stansberry edited comment on WFCORE-2089 at 12/7/16 2:07 PM:
-------------------------------------------------------------------

Two thread dumps attached. First with ControllerBootThread blocking in awaitContainerStability, which eventually times out as expected and logs. The boot proceeds and gets to the second thread dump, where ControllerBootThread is trying to write the audit log.

Both logs show a bunch of MSC threads blocking, because now the security stuff will block forever. A blocking operation has been introduced into service start() calls in a way that is unknowable by the service author. This is why MSC doesn't reach stability.

[~dlofthouse] your JMX change may help with some of this by getting rid of the JMX authz stuff that seems to be the main thing blocking various MSC threads. But the blocking call path in OperationContext.getCaller() needs to be addressed. Perhaps during boot the check for the security identity can be dropped somehow, but I don't think that's enough. We can't have a call path in management op execution that can block forever like this. There's a whole set of work (see BlockingTimeout) that's all about avoiding that.


was (Author: brian.stansberry):
Two thread dumps attached. First with ControllerBootThread blocking in awaitContainerStability, which eventually times out as expected and logs. The boot proceeds and gets to the second thread dump, where ControllerBootThread is trying to log.

Both logs show a bunch of MSC threads blocking, because now the security stuff will block forever. A blocking operation has been introduced into service start() calls in a way that is unknowable by the service author. This is why MSC doesn't reach stability.

[~dlofthouse] your JMX change may help with some of this by getting rid of the JMX authz stuff that seems to be the main thing blocking various MSC threads. But the blocking call path in OperationContext.getCaller() needs to be addressed. Perhaps during boot the check for the security identity can be dropped somehow, but I don't think that's enough. We can't have a call path in management op execution that can block forever like this. There's a whole set of work (see BlockingTimeout) that's all about avoiding that.

> Infinite wildfly boot on StartException in service start
> --------------------------------------------------------
>
>                 Key: WFCORE-2089
>                 URL: https://issues.jboss.org/browse/WFCORE-2089
>             Project: WildFly Core
>          Issue Type: Bug
>          Components: Domain Management, Security
>            Reporter: Jan Kalina
>            Assignee: Darran Lofthouse
>            Priority: Blocker
>         Attachments: threads-2.txt, threads.txt
>
>
> Following exception (and probably similar too) will cause wildfly frozing during start. Visible especially during integration tests (which will never ends), but reproducible directly too (see steps).
> {code:java}
> 15:59:37,252 ERROR [org.jboss.msc.service.fail] (MSC service thread 1-4) MSC000001: Failed to start service org.wildfly.security.security-realm.ManagementRealm: org.jboss.msc.service.StartException in service org.wildfly.security.security-realm.ManagementRealm: WFLYELY00025: Referenced property file is invalid: ELY01006: No realm name found in password property file
> 	at org.wildfly.extension.elytron.PropertiesRealmDefinition$1$1.get(PropertiesRealmDefinition.java:185)
> 	at org.wildfly.extension.elytron.PropertiesRealmDefinition$1$1.get(PropertiesRealmDefinition.java:164)
> 	at org.wildfly.extension.elytron.TrivialService.start(TrivialService.java:53)
> 	at org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1963)
> 	at org.jboss.msc.service.ServiceControllerImpl$StartTask.run(ServiceControllerImpl.java:1896)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> {code}
> *Update:* This problem with infinite boot will occure everytime the start() method of some service throws StartException(). Not an Elytron problem.



--
This message was sent by Atlassian JIRA
(v7.2.3#72005)


More information about the jboss-jira mailing list