[JBoss JIRA] (WFLY-12815) Wildfly 13 - Thread local state corrupted by deployed application explosion during session timeout leading to WELD-001304 - More than one context active for scope type javax.enterprise.context.SessionScoped
by Matěj Novotný (Jira)
[ https://issues.redhat.com/browse/WFLY-12815?page=com.atlassian.jira.plugi... ]
Matěj Novotný commented on WFLY-12815:
--------------------------------------
Looks like this issue got snowed under, sorry for that.
Re-reading it, I think it also depends on what exactly is supposed to happen if a session lifecycle listener throws an exception - do you process other listener(s) or not? Looking at [servlet specification|https://jakarta.ee/specifications/servlet/5.0/servlet-spec-5.0-SNAPSHOT.html#listener-exceptions], I get the feeling you stop processing listeners with the first one that throws an exception (which is what you see happening). Weld has the last listener which tears down context and if you don't get there, it will stay active leading to the error you are seeing. Notably the specification also says:
{quote}bq. Developers wishing normal processing to occur after a listener generates an exception must handle their own exceptions within the notification methods.
{quote}
So the safest workflow here is to simply catch your exceptions inside listeners.
What we can do on Weld side is that we check if destruction context is active while trying to activate a standard one (which happens [here|https://github.com/weld/core/blob/master/modules/web/src/ma...) and if that's the case, we deactivate it and perform some logging. That should return the thread to non-corrupted state although I can't say it if won't break anything else. I will try to compose a PR for that a test it with CDI and then WFLY tests. If that passes, I will let you know so that you can try that with your setup which corrupts all threads, would that work for you [~nuno.godinhomatos]?
> Wildfly 13 - Thread local state corrupted by deployed application explosion during session timeout leading to WELD-001304 - More than one context active for scope type javax.enterprise.context.SessionScoped
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: WFLY-12815
> URL: https://issues.redhat.com/browse/WFLY-12815
> Project: WildFly
> Issue Type: Bug
> Components: CDI / Weld
> Affects Versions: 13.0.0.Final
> Environment: Environment independent the issue, it is purely a logical problem
> Reporter: NUNO GODINHO DE MATOS
> Assignee: Matěj Novotný
> Priority: Major
> Attachments: sourceCodeToSendToWildfly.7z
>
>
> The full description of the problem can be seen in stack overflow.
> Please consulder the issue:
> https://stackoverflow.com/questions/58930939/wildflt-13-weld-001304-more-...
> SUMMARY:
> (1) Setup you wildfly to have a session timeout of 1 minute - so that you can esaily make your http sessions timeout
> (2) Program in your WAR application a sessionDestroyed listener that will be broken.
> In our case whenever the session is timing out, we have some code that explodes in wildfly and not in weblogic because it expected for the RequestScope context to be active, but apparently in wildfly when Undertow to start killing of a session the request scope context is not made active so that caused our session destroyed handling to break
> (3) Do this sufficient amount of times to corrupted as may threads in the thread pool as possible
> (4) Now try to interact with your application making use of some session scoped beans .
> If you travel to ay sort of view that makes use of a session scoped bean that thread will be broken with the exception that multiple session scope context implementation are active.
> But this exception will only come out and aply if the thread handling the HTTP request is one of the threads that in the past were used by undertow to handle the session timeout.
> The only threads that have been corrupted forever are those that had a broken sessin timeout
> Explanation for the issue:
> - When the session timeout is being orchestrated by underdow, wildfly is activating a special HttpSessionDescrutionContext and making it active.
> This ACTIVE TRUE/FALSE flag is a ThreadLocal variable.
> So the activation of the scope context is marked on the thread itself.
> - When the thread blows up the thread context will remain for as long at the thread lives
> - in a future request the flag had that thread local variable active already.
> So when the BeanManagaerImpl is hunting to the one and only active http session context it finds the traditional happy path http session context active plust the DestructionSession context that was activated in a previous call.
> All of the illustrative stack traces that facilitate the comprehention of the issue are shown in the stack overflow thread.
> I am of the oppinion that errors like this can happen in the deployed applications.
> It would not hurt if wildfly would somehow be able to ensure that the thread that hand an explosion in a previous request is not corrupted when it is used to handle new requests.
> Many thanks for having a look.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 2 months
[JBoss JIRA] (SWSQE-1056) Try OCP 4.4 on IPv6 in PSI
by Filip Brychta (Jira)
[ https://issues.redhat.com/browse/SWSQE-1056?page=com.atlassian.jira.plugi... ]
Filip Brychta commented on SWSQE-1056:
--------------------------------------
{color:#333333}Currently the only tested/working way is a bare metal IPI on single stack IPv6, docs here [1]. I know about two teams which have a CI for that. Openshift-kni dev team is using [2] and ocp-edge-qe team is using [3]. Both approaches are mocking bare metal machines as VMs on one big hypervisor and using IPI way.{color}
{color:#333333}I tried [2] on our bare metal machine (b24.jonqe.lab.eng.bos.redhat.com) in BOS lab which has 64GB of memory and 32 CPUs which should match minimal resource requirements. The cluster was provisioned fine (after several tries) but any change which requires cluster reconciliation breaks it. Some cluster operators (authentication, dns, kube-storage-version-migrator, network, openshift-apiserver) are stuck in progressing state or degraded and the cluster never come back up. This is blocking SM installation as on IPv6 it's necessary to mirror SM operators from redhat-operators (the same way like in disconnected OCP) which is starting the cluster reconciliation.{color}
{color:#333333}I've not tried [3] yet as it requires machine with 144 GB of memory and I don't have such a big machine.{color}
{color:#333333}Possible next steps:{color}
{color:#333333}- try [2] on bigger machine as the minimal required resources might not be enough (cluster with minimal resources seems pretty fragile even on IPv4), bigger machine might be available in http://bagl.scalelab.redhat.com/ or https://beaker.engineering.redhat.com/{color}
{color:#333333}- wait for working UPI installer and update our existing provisioning scripts for IPv6 (our scripts are mocking bare metal machines as VMs in PSI OpenStack and using UPI way){color}
{color:#333333}- wait for working templates for Flexy, e.g. [4] and use the One OCP deploy tool{color}
{color:#333333}Filip{color}
{color:#333333}[1] - https://openshift-kni.github.io/baremetal-deploy/{color}
{color:#333333}[2] - https://github.com/openshift-metal3/dev-scripts{color}
{color:#333333}[3] - https://gitlab.cee.redhat.com/ocp-edge-qe/ocp-edge{color}
{color:#333333}[4] - http://git.app.eng.bos.redhat.com/git/openshift-misc.git/plain/v3-launch-...
> Try OCP 4.4 on IPv6 in PSI
> --------------------------
>
> Key: SWSQE-1056
> URL: https://issues.redhat.com/browse/SWSQE-1056
> Project: Kiali QE
> Issue Type: QE Task
> Reporter: Filip Brychta
> Assignee: Filip Brychta
> Priority: Major
> Labels: infrastructure
>
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 2 months
[JBoss JIRA] (SWSQE-1056) Try OCP 4.4 on IPv6 in PSI
by Filip Brychta (Jira)
[ https://issues.redhat.com/browse/SWSQE-1056?page=com.atlassian.jira.plugi... ]
Filip Brychta commented on SWSQE-1056:
--------------------------------------
{quote}There was a discussion in the PM meeting this week on the IPv6 topic, and not being able to create such clusters. As you may know, the Jaeger team has shelved this testing effort until there is the ability to provision or such an environment is available.
Thus, seems best to hold off on this provisioning effort for the time being.
{quote}
+1 to hold this effort. To add a bit more context to what Matt said, it has been confirmed that for the OCp 4.6, the IPV6 will be supported only for the Verizon needs. And the plan, at this moment, for IPV6 GA on OCP is on 4.7.
{quote}Thanks, Matt{quote}
> Try OCP 4.4 on IPv6 in PSI
> --------------------------
>
> Key: SWSQE-1056
> URL: https://issues.redhat.com/browse/SWSQE-1056
> Project: Kiali QE
> Issue Type: QE Task
> Reporter: Filip Brychta
> Assignee: Filip Brychta
> Priority: Major
> Labels: infrastructure
>
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 2 months
[JBoss JIRA] (WFLY-12815) Wildfly 13 - Thread local state corrupted by deployed application explosion during session timeout leading to WELD-001304 - More than one context active for scope type javax.enterprise.context.SessionScoped
by NUNO GODINHO DE MATOS (Jira)
[ https://issues.redhat.com/browse/WFLY-12815?page=com.atlassian.jira.plugi... ]
NUNO GODINHO DE MATOS edited comment on WFLY-12815 at 7/22/20 4:16 AM:
-----------------------------------------------------------------------
Hi,
Just to let you know we have gotten more reports on this issue.
So we are providing to projects experiencing this issue our "hack" patch to fix the thread local corruption of the thread:
[^sourceCodeToSendToWildfly.7z]
The source code you get there simply does not contain the application server identification util. I do not think it matters, it is not everybody that needs to deploy code to more than on application server type.
This listener allows a project to work around the thread local corruption until they find out how the got the thread corrupted during a session destroyed event.
The goal is always to find out the original "zero ground" exception that corrupted the wildfly thread local state. Typically it ends up being some new logic that requires RequestScope to be active during a session destroyed event that needs to be refactored ...
But for this wildfly issue here, the underlying reason for the corruption should be irrelevant.
Wildfly base code should make sure a new thread never starts working with any scope context active.
IT would be good to not need to implement such work arounds, since these hacky listeners will ultimately stop working on newer versions of wildfly. Guaranteed. So a new thread should never be lauched with an http session scope already active, should simply not be allowed.
was (Author: nuno.godinhomatos):
Hi,
Just to let you know we have gotten more reports on this issue.
So we are providing to projects experiencing this issue our "hack" patch to fix the thread local corruption of the thread:
[^sourceCodeToSendToWildfly.7z]
The source code you get there simply does not contain the application server identification util. I do not think it matters, it is not everybody that needs to deploy code to more than on application server type.
This listener allows a project to work around the thread local corruption until they find out how the got the thread corrupted during a session destroyed event.
The goal is always to find out the original "zero ground" exception that corrupted the wildfly thread local state. Typically it ends up being some new logic that requires RequestScope to be active during a session destroyed event that needs to be refactored ...
But for this wildfly issue here, the underlying reason for the corruption should be irrelevant.
Wildfly base code should make sure a new thread never starts working with any scope context active.
IT would be good to not need to implement such work arounds, since these hacky listeners which can stop working from version to verison of wildfly.
> Wildfly 13 - Thread local state corrupted by deployed application explosion during session timeout leading to WELD-001304 - More than one context active for scope type javax.enterprise.context.SessionScoped
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: WFLY-12815
> URL: https://issues.redhat.com/browse/WFLY-12815
> Project: WildFly
> Issue Type: Bug
> Components: CDI / Weld
> Affects Versions: 13.0.0.Final
> Environment: Environment independent the issue, it is purely a logical problem
> Reporter: NUNO GODINHO DE MATOS
> Assignee: Matěj Novotný
> Priority: Major
> Attachments: sourceCodeToSendToWildfly.7z
>
>
> The full description of the problem can be seen in stack overflow.
> Please consulder the issue:
> https://stackoverflow.com/questions/58930939/wildflt-13-weld-001304-more-...
> SUMMARY:
> (1) Setup you wildfly to have a session timeout of 1 minute - so that you can esaily make your http sessions timeout
> (2) Program in your WAR application a sessionDestroyed listener that will be broken.
> In our case whenever the session is timing out, we have some code that explodes in wildfly and not in weblogic because it expected for the RequestScope context to be active, but apparently in wildfly when Undertow to start killing of a session the request scope context is not made active so that caused our session destroyed handling to break
> (3) Do this sufficient amount of times to corrupted as may threads in the thread pool as possible
> (4) Now try to interact with your application making use of some session scoped beans .
> If you travel to ay sort of view that makes use of a session scoped bean that thread will be broken with the exception that multiple session scope context implementation are active.
> But this exception will only come out and aply if the thread handling the HTTP request is one of the threads that in the past were used by undertow to handle the session timeout.
> The only threads that have been corrupted forever are those that had a broken sessin timeout
> Explanation for the issue:
> - When the session timeout is being orchestrated by underdow, wildfly is activating a special HttpSessionDescrutionContext and making it active.
> This ACTIVE TRUE/FALSE flag is a ThreadLocal variable.
> So the activation of the scope context is marked on the thread itself.
> - When the thread blows up the thread context will remain for as long at the thread lives
> - in a future request the flag had that thread local variable active already.
> So when the BeanManagaerImpl is hunting to the one and only active http session context it finds the traditional happy path http session context active plust the DestructionSession context that was activated in a previous call.
> All of the illustrative stack traces that facilitate the comprehention of the issue are shown in the stack overflow thread.
> I am of the oppinion that errors like this can happen in the deployed applications.
> It would not hurt if wildfly would somehow be able to ensure that the thread that hand an explosion in a previous request is not corrupted when it is used to handle new requests.
> Many thanks for having a look.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 2 months
[JBoss JIRA] (WFLY-12815) Wildfly 13 - Thread local state corrupted by deployed application explosion during session timeout leading to WELD-001304 - More than one context active for scope type javax.enterprise.context.SessionScoped
by NUNO GODINHO DE MATOS (Jira)
[ https://issues.redhat.com/browse/WFLY-12815?page=com.atlassian.jira.plugi... ]
NUNO GODINHO DE MATOS edited comment on WFLY-12815 at 7/22/20 4:15 AM:
-----------------------------------------------------------------------
Hi,
Just to let you know we have gotten more reports on this issue.
So we are providing to projects experiencing this issue our "hack" patch to fix the thread local corruption of the thread:
[^sourceCodeToSendToWildfly.7z]
The source code you get there simply does not contain the application server identification util. I do not think it matters, it is not everybody that needs to deploy code to more than on application server type.
This listener allows a project to work around the thread local corruption until they find out how the got the thread corrupted during a session destroyed event.
The goal is always to find out the original "zero ground" exception that corrupted the wildfly thread local state. Typically it ends up being some new logic that requires RequestScope to be active during a session destroyed event that needs to be refactored ...
But for this wildfly issue here, the underlying reason for the corruption should be irrelevant.
Wildfly base code should make sure a new thread never starts working with any scope context active.
IT would be good to not need to implement such work arounds, since these hacky listeners which can stop working from version to verison of wildfly.
was (Author: nuno.godinhomatos):
Hi,
Just to let you know we have gotten more reports on this issue.
So we are providing to projects experiencing this issue our "hack" patch to fix the thread local corruption of the thread:
[^sourceCodeToSendToWildfly.7z]
I am adding a sketch of the listener class that we use to work-around this problem, underlying the underlying "reasons" of our blow ups during session desctroyed are discovered.
IT would be good to not need to implement such work around with these hacky listeners which can stop working from version to verison of wildfly.
> Wildfly 13 - Thread local state corrupted by deployed application explosion during session timeout leading to WELD-001304 - More than one context active for scope type javax.enterprise.context.SessionScoped
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: WFLY-12815
> URL: https://issues.redhat.com/browse/WFLY-12815
> Project: WildFly
> Issue Type: Bug
> Components: CDI / Weld
> Affects Versions: 13.0.0.Final
> Environment: Environment independent the issue, it is purely a logical problem
> Reporter: NUNO GODINHO DE MATOS
> Assignee: Matěj Novotný
> Priority: Major
> Attachments: sourceCodeToSendToWildfly.7z
>
>
> The full description of the problem can be seen in stack overflow.
> Please consulder the issue:
> https://stackoverflow.com/questions/58930939/wildflt-13-weld-001304-more-...
> SUMMARY:
> (1) Setup you wildfly to have a session timeout of 1 minute - so that you can esaily make your http sessions timeout
> (2) Program in your WAR application a sessionDestroyed listener that will be broken.
> In our case whenever the session is timing out, we have some code that explodes in wildfly and not in weblogic because it expected for the RequestScope context to be active, but apparently in wildfly when Undertow to start killing of a session the request scope context is not made active so that caused our session destroyed handling to break
> (3) Do this sufficient amount of times to corrupted as may threads in the thread pool as possible
> (4) Now try to interact with your application making use of some session scoped beans .
> If you travel to ay sort of view that makes use of a session scoped bean that thread will be broken with the exception that multiple session scope context implementation are active.
> But this exception will only come out and aply if the thread handling the HTTP request is one of the threads that in the past were used by undertow to handle the session timeout.
> The only threads that have been corrupted forever are those that had a broken sessin timeout
> Explanation for the issue:
> - When the session timeout is being orchestrated by underdow, wildfly is activating a special HttpSessionDescrutionContext and making it active.
> This ACTIVE TRUE/FALSE flag is a ThreadLocal variable.
> So the activation of the scope context is marked on the thread itself.
> - When the thread blows up the thread context will remain for as long at the thread lives
> - in a future request the flag had that thread local variable active already.
> So when the BeanManagaerImpl is hunting to the one and only active http session context it finds the traditional happy path http session context active plust the DestructionSession context that was activated in a previous call.
> All of the illustrative stack traces that facilitate the comprehention of the issue are shown in the stack overflow thread.
> I am of the oppinion that errors like this can happen in the deployed applications.
> It would not hurt if wildfly would somehow be able to ensure that the thread that hand an explosion in a previous request is not corrupted when it is used to handle new requests.
> Many thanks for having a look.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 2 months
[JBoss JIRA] (WFLY-12815) Wildfly 13 - Thread local state corrupted by deployed application explosion during session timeout leading to WELD-001304 - More than one context active for scope type javax.enterprise.context.SessionScoped
by NUNO GODINHO DE MATOS (Jira)
[ https://issues.redhat.com/browse/WFLY-12815?page=com.atlassian.jira.plugi... ]
NUNO GODINHO DE MATOS commented on WFLY-12815:
----------------------------------------------
Hi,
Just to let you know we have gotten more reports on this issue.
So we are providing to projects experiencing this issue our "hack" patch to fix the thread local corruption of the thread:
[^sourceCodeToSendToWildfly.7z]
I am adding a sketch of the listener class that we use to work-around this problem, underlying the underlying "reasons" of our blow ups during session desctroyed are discovered.
IT would be good to not need to implement such work around with these hacky listeners which can stop working from version to verison of wildfly.
> Wildfly 13 - Thread local state corrupted by deployed application explosion during session timeout leading to WELD-001304 - More than one context active for scope type javax.enterprise.context.SessionScoped
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: WFLY-12815
> URL: https://issues.redhat.com/browse/WFLY-12815
> Project: WildFly
> Issue Type: Bug
> Components: CDI / Weld
> Affects Versions: 13.0.0.Final
> Environment: Environment independent the issue, it is purely a logical problem
> Reporter: NUNO GODINHO DE MATOS
> Assignee: Matěj Novotný
> Priority: Major
> Attachments: sourceCodeToSendToWildfly.7z
>
>
> The full description of the problem can be seen in stack overflow.
> Please consulder the issue:
> https://stackoverflow.com/questions/58930939/wildflt-13-weld-001304-more-...
> SUMMARY:
> (1) Setup you wildfly to have a session timeout of 1 minute - so that you can esaily make your http sessions timeout
> (2) Program in your WAR application a sessionDestroyed listener that will be broken.
> In our case whenever the session is timing out, we have some code that explodes in wildfly and not in weblogic because it expected for the RequestScope context to be active, but apparently in wildfly when Undertow to start killing of a session the request scope context is not made active so that caused our session destroyed handling to break
> (3) Do this sufficient amount of times to corrupted as may threads in the thread pool as possible
> (4) Now try to interact with your application making use of some session scoped beans .
> If you travel to ay sort of view that makes use of a session scoped bean that thread will be broken with the exception that multiple session scope context implementation are active.
> But this exception will only come out and aply if the thread handling the HTTP request is one of the threads that in the past were used by undertow to handle the session timeout.
> The only threads that have been corrupted forever are those that had a broken sessin timeout
> Explanation for the issue:
> - When the session timeout is being orchestrated by underdow, wildfly is activating a special HttpSessionDescrutionContext and making it active.
> This ACTIVE TRUE/FALSE flag is a ThreadLocal variable.
> So the activation of the scope context is marked on the thread itself.
> - When the thread blows up the thread context will remain for as long at the thread lives
> - in a future request the flag had that thread local variable active already.
> So when the BeanManagaerImpl is hunting to the one and only active http session context it finds the traditional happy path http session context active plust the DestructionSession context that was activated in a previous call.
> All of the illustrative stack traces that facilitate the comprehention of the issue are shown in the stack overflow thread.
> I am of the oppinion that errors like this can happen in the deployed applications.
> It would not hurt if wildfly would somehow be able to ensure that the thread that hand an explosion in a previous request is not corrupted when it is used to handle new requests.
> Many thanks for having a look.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 2 months
[JBoss JIRA] (WFLY-12815) Wildfly 13 - Thread local state corrupted by deployed application explosion during session timeout leading to WELD-001304 - More than one context active for scope type javax.enterprise.context.SessionScoped
by NUNO GODINHO DE MATOS (Jira)
[ https://issues.redhat.com/browse/WFLY-12815?page=com.atlassian.jira.plugi... ]
NUNO GODINHO DE MATOS updated WFLY-12815:
-----------------------------------------
Attachment: sourceCodeToSendToWildfly.7z
> Wildfly 13 - Thread local state corrupted by deployed application explosion during session timeout leading to WELD-001304 - More than one context active for scope type javax.enterprise.context.SessionScoped
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: WFLY-12815
> URL: https://issues.redhat.com/browse/WFLY-12815
> Project: WildFly
> Issue Type: Bug
> Components: CDI / Weld
> Affects Versions: 13.0.0.Final
> Environment: Environment independent the issue, it is purely a logical problem
> Reporter: NUNO GODINHO DE MATOS
> Assignee: Matěj Novotný
> Priority: Major
> Attachments: sourceCodeToSendToWildfly.7z
>
>
> The full description of the problem can be seen in stack overflow.
> Please consulder the issue:
> https://stackoverflow.com/questions/58930939/wildflt-13-weld-001304-more-...
> SUMMARY:
> (1) Setup you wildfly to have a session timeout of 1 minute - so that you can esaily make your http sessions timeout
> (2) Program in your WAR application a sessionDestroyed listener that will be broken.
> In our case whenever the session is timing out, we have some code that explodes in wildfly and not in weblogic because it expected for the RequestScope context to be active, but apparently in wildfly when Undertow to start killing of a session the request scope context is not made active so that caused our session destroyed handling to break
> (3) Do this sufficient amount of times to corrupted as may threads in the thread pool as possible
> (4) Now try to interact with your application making use of some session scoped beans .
> If you travel to ay sort of view that makes use of a session scoped bean that thread will be broken with the exception that multiple session scope context implementation are active.
> But this exception will only come out and aply if the thread handling the HTTP request is one of the threads that in the past were used by undertow to handle the session timeout.
> The only threads that have been corrupted forever are those that had a broken sessin timeout
> Explanation for the issue:
> - When the session timeout is being orchestrated by underdow, wildfly is activating a special HttpSessionDescrutionContext and making it active.
> This ACTIVE TRUE/FALSE flag is a ThreadLocal variable.
> So the activation of the scope context is marked on the thread itself.
> - When the thread blows up the thread context will remain for as long at the thread lives
> - in a future request the flag had that thread local variable active already.
> So when the BeanManagaerImpl is hunting to the one and only active http session context it finds the traditional happy path http session context active plust the DestructionSession context that was activated in a previous call.
> All of the illustrative stack traces that facilitate the comprehention of the issue are shown in the stack overflow thread.
> I am of the oppinion that errors like this can happen in the deployed applications.
> It would not hurt if wildfly would somehow be able to ensure that the thread that hand an explosion in a previous request is not corrupted when it is used to handle new requests.
> Many thanks for having a look.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 2 months
[JBoss JIRA] (WFLY-13686) Deadlock on Wildfly startup
by K G (Jira)
K G created WFLY-13686:
--------------------------
Summary: Deadlock on Wildfly startup
Key: WFLY-13686
URL: https://issues.redhat.com/browse/WFLY-13686
Project: WildFly
Issue Type: Bug
Components: EJB
Affects Versions: 20.0.1.Final
Reporter: K G
Assignee: Cheng Fang
On Wildfly startup there can be a deadlock related to ejb/singleton access and more specifically: StartupAwaitInterceptor and ContainerManagedConcurrencyInterceptor. This can happen when there is a too early client request (occurring during app startup) or a request caused by thread running in managed executor (that's what happened to me). A thread that is blocked by StartupAwaitInterceptor also holds a lock from ContainerManagedConcurrencyInterceptor and blocks other threads. This is related to the following pull request, link to the comment: [https://github.com/wildfly/wildfly/pull/9009#issuecomment-656147415] .
I guess possible solution is to change interceptors ordering. Other possibility is to add "privileged" flag (see pull request for explanation) to threads from managed thread factory but in this case a too early client request could also cause a dealock.
Scenario of deadlock (description copied from pull request's comment):
* startup singleton A's initialization starts and completes successfully
* startup singleton B is initializing and during that it starts a task X via managedThreadExecutor
* X wants to access A and is blocked by StartupCountdown.await
* meanwhile B continues initializing and wants to access A but X already holds a lock on A (I can see ContainerManagedConcurrencyInterceptor.processInvocation in the tread dump) hence after 5000ms B's initialization fails as well as whole deployment
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 2 months