December 2015 - jboss-jira - Jboss List Archives

[JBoss JIRA] (WFLY-3884) Integrate mod_cluster subsystem with security subsystem for SSL configuration

by Radoslav Husar (JIRA)

[ https://issues.jboss.org/browse/WFLY-3884?page=com.atlassian.jira.plugin.... ] Radoslav Husar updated WFLY-3884: --------------------------------- Fix Version/s: 11.0.0.CR1 > Integrate mod_cluster subsystem with security subsystem for SSL configuration > ----------------------------------------------------------------------------- > > Key: WFLY-3884 > URL: https://issues.jboss.org/browse/WFLY-3884 > Project: WildFly > Issue Type: Feature Request > Components: Clustering > Affects Versions: 9.0.0.Alpha1 > Reporter: Paul Ferraro > Assignee: Radoslav Husar > Fix For: 11.0.0.CR1 > > > Currently, the SSL certificate configuration is embedded within the mod_cluster subsystem configuration. Ideally, mod_cluster would reference this configuration from the security subsystem. -- This message was sent by Atlassian JIRA (v6.4.11#64026)

10 years, 7 months

1
0
0 / 0

[JBoss JIRA] (WFCORE-1225) Domain rollout operation timeouts

by Brian Stansberry (JIRA)

Brian Stansberry created WFCORE-1225: ---------------------------------------- Summary: Domain rollout operation timeouts Key: WFCORE-1225 URL: https://issues.jboss.org/browse/WFCORE-1225 Project: WildFly Core Issue Type: Feature Request Components: Domain Management Reporter: Brian Stansberry Assignee: Brian Stansberry Fix For: 3.0.0.Alpha1 Portion of the parent task related to rolling out changes to a domain. The expectation is that single process timeouts (WFLY-2741) will handle most failure conditions related to domain rollouts (e.g. if a single server hangs, preventing completion of the rollout, eventually that server will time out, allowing the domain wide rollout to continue.) Timeouts in the domain rollout code serve as a second line of defense: 1) In case of protocol or other problems that prevent the calling process learning about the timeout on the remote process 2) In case of bugs in the single process timeout handling on the remote process 3) In mixed domain cases where remote hosts are running previous versions and do not have the timeout function Potential places to add timeouts: DomainSlaveHandler->HostControllerUpdateTask.ProxyOperationListener.retrievePreparedOperation() -- where the master HC waits for responses from slaves RollingServerGroupUpdateTask.run() -> ServerTaskExecutor.ServerOperationListener.retrievePreparedOperation() -- timeout here means 1 server didn't respond, but need to move on to next ConcurrentServerGroupUpdateTask.run() -> ServerTaskExecutor.ServerOperationListener.retrievePreparedOperation() -- timeout here means none of the remaining servers have responded w/in the timeout DomainRolloutStepHandler.finalizeOp() -> future.get() ---- the ServerGroupUpdateTask should fail in the normal phase, so any timeout here would indicate a problem committing the tx or a comms problem getting back the response -- This message was sent by Atlassian JIRA (v6.4.11#64026)

10 years, 7 months

1
0
0 / 0

[JBoss JIRA] (WFCORE-378) Domain rollout operation timeouts

by Brian Stansberry (JIRA)

[ https://issues.jboss.org/browse/WFCORE-378?page=com.atlassian.jira.plugin... ] Brian Stansberry updated WFCORE-378: ------------------------------------ Issue Type: Bug (was: Feature Request) > Domain rollout operation timeouts > --------------------------------- > > Key: WFCORE-378 > URL: https://issues.jboss.org/browse/WFCORE-378 > Project: WildFly Core > Issue Type: Bug > Components: Domain Management > Reporter: Brian Stansberry > Assignee: Brian Stansberry > Fix For: 3.0.0.Alpha1 > > > Portion of the parent task related to rolling out changes to a domain. > The expectation is that single process timeouts (WFLY-2741) will handle most failure conditions related to domain rollouts (e.g. if a single server hangs, preventing completion of the rollout, eventually that server will time out, allowing the domain wide rollout to continue.) Timeouts in the domain rollout code serve as a second line of defense: > 1) In case of protocol or other problems that prevent the calling process learning about the timeout on the remote process > 2) In case of bugs in the single process timeout handling on the remote process > 3) In mixed domain cases where remote hosts are running previous versions and do not have the timeout function > Potential places to add timeouts: > DomainSlaveHandler->HostControllerUpdateTask.ProxyOperationListener.retrievePreparedOperation() > -- where the master HC waits for responses from slaves > RollingServerGroupUpdateTask.run() -> ServerTaskExecutor.ServerOperationListener.retrievePreparedOperation() > -- timeout here means 1 server didn't respond, but need to move on to next > ConcurrentServerGroupUpdateTask.run() -> ServerTaskExecutor.ServerOperationListener.retrievePreparedOperation() > -- timeout here means none of the remaining servers have responded w/in the timeout > DomainRolloutStepHandler.finalizeOp() -> future.get() > ---- the ServerGroupUpdateTask should fail in the normal phase, so any timeout here would indicate a problem committing the tx or a comms problem getting back the response -- This message was sent by Atlassian JIRA (v6.4.11#64026)

10 years, 7 months

1
0
0 / 0

[JBoss JIRA] (WFCORE-1225) Domain rollout operation timeouts

by Brian Stansberry (JIRA)

[ https://issues.jboss.org/browse/WFCORE-1225?page=com.atlassian.jira.plugi... ] Brian Stansberry updated WFCORE-1225: ------------------------------------- Issue Type: Bug (was: Feature Request) > Domain rollout operation timeouts > --------------------------------- > > Key: WFCORE-1225 > URL: https://issues.jboss.org/browse/WFCORE-1225 > Project: WildFly Core > Issue Type: Bug > Components: Domain Management > Reporter: Brian Stansberry > Assignee: Brian Stansberry > Fix For: 3.0.0.Alpha1 > > > Portion of the parent task related to rolling out changes to a domain. > The expectation is that single process timeouts (WFLY-2741) will handle most failure conditions related to domain rollouts (e.g. if a single server hangs, preventing completion of the rollout, eventually that server will time out, allowing the domain wide rollout to continue.) Timeouts in the domain rollout code serve as a second line of defense: > 1) In case of protocol or other problems that prevent the calling process learning about the timeout on the remote process > 2) In case of bugs in the single process timeout handling on the remote process > 3) In mixed domain cases where remote hosts are running previous versions and do not have the timeout function > Potential places to add timeouts: > DomainSlaveHandler->HostControllerUpdateTask.ProxyOperationListener.retrievePreparedOperation() > -- where the master HC waits for responses from slaves > RollingServerGroupUpdateTask.run() -> ServerTaskExecutor.ServerOperationListener.retrievePreparedOperation() > -- timeout here means 1 server didn't respond, but need to move on to next > ConcurrentServerGroupUpdateTask.run() -> ServerTaskExecutor.ServerOperationListener.retrievePreparedOperation() > -- timeout here means none of the remaining servers have responded w/in the timeout > DomainRolloutStepHandler.finalizeOp() -> future.get() > ---- the ServerGroupUpdateTask should fail in the normal phase, so any timeout here would indicate a problem committing the tx or a comms problem getting back the response -- This message was sent by Atlassian JIRA (v6.4.11#64026)

10 years, 7 months

1
0
0 / 0

[JBoss JIRA] (WFCORE-1224) Try closing the channel if java.lang.Error prevents sending error responses to the client

by Brian Stansberry (JIRA)

[ https://issues.jboss.org/browse/WFCORE-1224?page=com.atlassian.jira.plugi... ] Brian Stansberry deleted WFCORE-1224: ------------------------------------- > Try closing the channel if java.lang.Error prevents sending error responses to the client > ----------------------------------------------------------------------------------------- > > Key: WFCORE-1224 > URL: https://issues.jboss.org/browse/WFCORE-1224 > Project: WildFly Core > Issue Type: Sub-task > Reporter: Brian Stansberry > Assignee: Brian Stansberry > > Beyond the basic work on WFCORE-1134, look into further reaction when Errors occur and the server or HC cannot even send an error response to the caller. If we get to this point, the task has already failed to handle a problem and now we can't notify the remote side either. Most likely this is an OOME situation, although any Error here means a major issue. > Options: > 1) Try and close the channel to disconnect this process from the remote end so it doesn't disrupt the remote end. If this is an intra-HC or HC-server connection a successful close will remove this process from normal domain control. If this is a server the HC still has some control over the server via the ProcessController, e.g. to handle a 'kill' or 'destroy' management op. If this is a slave HC, the slave is disconnected from the domain. Either a server or a slave HC may try to reconnect, although it's likely better if that fails and the user just restarts the process. > If the remote side is an end user client (e.g. CLI) then a successful close will be noticed by the client. The user can reconnect or take action to terminate this process. > 2) Commit suicide via SystemExiter.exit. But I'm not certain complete termination of the process is how we want to deal with problems with management requests. Probably some sort of configurable policy would be better. -- This message was sent by Atlassian JIRA (v6.4.11#64026)

10 years, 7 months

1
0
0 / 0

[JBoss JIRA] (WFCORE-1223) A java.lang.Error in Stage.DONE can lead to hangs

by Brian Stansberry (JIRA)

Brian Stansberry created WFCORE-1223: ---------------------------------------- Summary: A java.lang.Error in Stage.DONE can lead to hangs Key: WFCORE-1223 URL: https://issues.jboss.org/browse/WFCORE-1223 Project: WildFly Core Issue Type: Bug Components: Domain Management Affects Versions: 2.0.1.Final Reporter: Brian Stansberry Assignee: Brian Stansberry Fix For: 3.0.0.Alpha1 As part of investigating https://bugzilla.redhat.com/show_bug.cgi?id=1259767 I've looked into general handling of situations when java.lang.Error is thrown during management operation execution. Handling prior to Stage.DONE looks ok, with the error caught, logged and the failure-description on the response set. But if the error happens after commit or during rollback, it isn't always properly triggering a response. In the intra-domain case the remote node that has the problem tries to send a failure response to the calling HC, but uses the wrong request id, resulting in that response being dropped and the calling HC continuing to wait. A post-commit Error seems pretty unlikely. A rollback error seems more possible, as a rollback may involve installing significant numbers of services. -- This message was sent by Atlassian JIRA (v6.4.11#64026)

10 years, 7 months

1
0
0 / 0

[JBoss JIRA] (WFCORE-1224) Try closing the channel if java.lang.Error prevents sending error responses to the client

by Brian Stansberry (JIRA)

Brian Stansberry created WFCORE-1224: ---------------------------------------- Summary: Try closing the channel if java.lang.Error prevents sending error responses to the client Key: WFCORE-1224 URL: https://issues.jboss.org/browse/WFCORE-1224 Project: WildFly Core Issue Type: Sub-task Components: Domain Management Reporter: Brian Stansberry Assignee: Brian Stansberry Fix For: 3.0.0.Alpha1 Beyond the basic work on WFCORE-1134, look into further reaction when Errors occur and the server or HC cannot even send an error response to the caller. If we get to this point, the task has already failed to handle a problem and now we can't notify the remote side either. Most likely this is an OOME situation, although any Error here means a major issue. Options: 1) Try and close the channel to disconnect this process from the remote end so it doesn't disrupt the remote end. If this is an intra-HC or HC-server connection a successful close will remove this process from normal domain control. If this is a server the HC still has some control over the server via the ProcessController, e.g. to handle a 'kill' or 'destroy' management op. If this is a slave HC, the slave is disconnected from the domain. Either a server or a slave HC may try to reconnect, although it's likely better if that fails and the user just restarts the process. If the remote side is an end user client (e.g. CLI) then a successful close will be noticed by the client. The user can reconnect or take action to terminate this process. 2) Commit suicide via SystemExiter.exit. But I'm not certain complete termination of the process is how we want to deal with problems with management requests. Probably some sort of configurable policy would be better. -- This message was sent by Atlassian JIRA (v6.4.11#64026)

10 years, 7 months

1
0
0 / 0

[JBoss JIRA] (WFCORE-1222) Graceful shutdown is interfering with the 'kill' and 'destroy' operations

by Brian Stansberry (JIRA)

Brian Stansberry created WFCORE-1222: ---------------------------------------- Summary: Graceful shutdown is interfering with the 'kill' and 'destroy' operations Key: WFCORE-1222 URL: https://issues.jboss.org/browse/WFCORE-1222 Project: WildFly Core Issue Type: Bug Components: Domain Management Affects Versions: 2.0.4.Final Reporter: Brian Stansberry Assignee: Brian Stansberry Fix For: 2.0.5.Final The 'kill' and 'destroy' operations on the HC are meant to force shutdown of misbehaving servers. But the graceful shutdown work ([1]) has introduced a management op into the mix. I believe that should be removed, as its not the intent of these operations to try and be graceful; the regular 'stop' ops are for that. When experimenting with how domains react to OOME servers as part of my WFCORE-378 work I'm seeing 'kill' and 'destroy' no longer function because the OOME on the server means the graceful shutdown management op hangs. [1] https://github.com/wildfly/wildfly-core/commit/6e95b5#diff-ecdfa997cd57af... -- This message was sent by Atlassian JIRA (v6.4.11#64026)

10 years, 7 months

1
0
0 / 0

[JBoss JIRA] (WFCORE-1221) Operation headers not propagated to domain servers when 'composite' op is used

by Brian Stansberry (JIRA)

Brian Stansberry created WFCORE-1221: ---------------------------------------- Summary: Operation headers not propagated to domain servers when 'composite' op is used Key: WFCORE-1221 URL: https://issues.jboss.org/browse/WFCORE-1221 Project: WildFly Core Issue Type: Bug Components: Domain Management Affects Versions: 2.0.4.Final Reporter: Brian Stansberry Assignee: Brian Stansberry Priority: Critical Fix For: 2.0.5.Final When the user adds request headers to an op, they are not propagated to the servers during domain rollout if the 'composite' op is involved. For example, if I add some stdout printing of what the headers are on the various processes and invoke this: {code} [domain@localhost:9990 /] deploy ~/tmp/helloworld.war --headers={blocking-timeout=5;rollback-on-runtime-failure=false} --all-server-groups {code} Then on a HC with two servers, this is logged: [Host Controller] 10:53:40,697 INFO [stdout] (management-handler-thread - 3) "composite" headers: { [Host Controller] 10:53:40,697 INFO [stdout] (management-handler-thread - 3) "blocking-timeout" => "5", [Host Controller] 10:53:40,698 INFO [stdout] (management-handler-thread - 3) "rollback-on-runtime-failure" => "false", [Host Controller] 10:53:40,698 INFO [stdout] (management-handler-thread - 3) "caller-type" => "user", [Host Controller] 10:53:40,698 INFO [stdout] (management-handler-thread - 3) "access-mechanism" => "NATIVE" [Host Controller] 10:53:40,698 INFO [stdout] (management-handler-thread - 3) } [Host Controller] 10:53:40,727 INFO [org.jboss.as.repository] (management-handler-thread - 3) WFLYDR0001: Content added at location /Users/bstansberry/dev/wildfly/wildfly-core/dist/target/wildfly-core-2.0.5.Final-SNAPSHOT/domain/data/content/6f/cd9eae343ed6d5aa9fffa83012d155b1ef911c/content [Server:server-one] 10:53:40,772 INFO [stdout] (ServerService Thread Pool -- 11) "composite" headers: null [Server:server-two] 10:53:40,772 INFO [stdout] (ServerService Thread Pool -- 11) "composite" headers: null The HC logs, then the servers report. The user-specified headers are not included. Invoke the same op without the batch and this is logged: {code} [Host Controller] 10:43:50,400 INFO [stdout] (management-handler-thread - 4) "composite" headers: { [Host Controller] 10:43:50,401 INFO [stdout] (management-handler-thread - 4) "blocking-timeout" => "5", [Host Controller] 10:43:50,401 INFO [stdout] (management-handler-thread - 4) "rollback-on-runtime-failure" => "false", [Host Controller] 10:43:50,401 INFO [stdout] (management-handler-thread - 4) "caller-type" => "user", [Host Controller] 10:43:50,401 INFO [stdout] (management-handler-thread - 4) "access-mechanism" => "NATIVE" [Host Controller] 10:43:50,401 INFO [stdout] (management-handler-thread - 4) } [Host Controller] 10:43:50,425 INFO [org.jboss.as.repository] (management-handler-thread - 4) WFLYDR0001: Content added at location /Users/bstansberry/dev/wildfly/wildfly-core/dist/target/wildfly-core-2.0.5.Final-SNAPSHOT/domain/data/content/6f/cd9eae343ed6d5aa9fffa83012d155b1ef911c/content [Server:server-two] 10:43:50,464 INFO [stdout] (ServerService Thread Pool -- 11) "composite" headers: { [Server:server-two] 10:43:50,464 INFO [stdout] (ServerService Thread Pool -- 11) "blocking-timeout" => "5", [Server:server-two] 10:43:50,464 INFO [stdout] (ServerService Thread Pool -- 11) "rollback-on-runtime-failure" => "false", [Server:server-one] 10:43:50,464 INFO [stdout] (ServerService Thread Pool -- 11) "composite" headers: { [Server:server-two] 10:43:50,464 INFO [stdout] (ServerService Thread Pool -- 11) "access-mechanism" => "NATIVE", [Server:server-one] 10:43:50,465 INFO [stdout] (ServerService Thread Pool -- 11) "blocking-timeout" => "5", [Server:server-two] 10:43:50,465 INFO [stdout] (ServerService Thread Pool -- 11) "domain-uuid" => "216d2e99-dba5-4c89-8020-b0c16bd553c5" [Server:server-one] 10:43:50,465 INFO [stdout] (ServerService Thread Pool -- 11) "rollback-on-runtime-failure" => "false", [Server:server-two] 10:43:50,465 INFO [stdout] (ServerService Thread Pool -- 11) } [Server:server-one] 10:43:50,465 INFO [stdout] (ServerService Thread Pool -- 11) "access-mechanism" => "NATIVE", [Server:server-one] 10:43:50,465 INFO [stdout] (ServerService Thread Pool -- 11) "domain-uuid" => "216d2e99-dba5-4c89-8020-b0c16bd553c5" [Server:server-one] 10:43:50,465 INFO [stdout] (ServerService Thread Pool -- 11) } {code} Expected headers are present. Note the CLI 'deploy' is far from the only time the 'composite' op is used. Among other places, the high level CLI 'batch' command in a domain involves use of 'composite'. -- This message was sent by Atlassian JIRA (v6.4.11#64026)

10 years, 7 months

1
0
0 / 0

[JBoss JIRA] (WFCORE-1213) Graceful shutdown is interfering with the 'kill' and 'destroy' operations

by Brian Stansberry (JIRA)

[ https://issues.jboss.org/browse/WFCORE-1213?page=com.atlassian.jira.plugi... ] Brian Stansberry updated WFCORE-1213: ------------------------------------- Git Pull Request: https://github.com/wildfly/wildfly-core/pull/1308 > Graceful shutdown is interfering with the 'kill' and 'destroy' operations > ------------------------------------------------------------------------- > > Key: WFCORE-1213 > URL: https://issues.jboss.org/browse/WFCORE-1213 > Project: WildFly Core > Issue Type: Bug > Components: Domain Management > Affects Versions: 2.0.4.Final > Reporter: Brian Stansberry > Assignee: Brian Stansberry > Fix For: 2.0.5.Final > > > The 'kill' and 'destroy' operations on the HC are meant to force shutdown of misbehaving servers. But the graceful shutdown work ([1]) has introduced a management op into the mix. I believe that should be removed, as its not the intent of these operations to try and be graceful; the regular 'stop' ops are for that. > When experimenting with how domains react to OOME servers as part of my WFCORE-378 work I'm seeing 'kill' and 'destroy' no longer function because the OOME on the server means the graceful shutdown management op hangs. > [1] https://github.com/wildfly/wildfly-core/commit/6e95b5#diff-ecdfa997cd57af... -- This message was sent by Atlassian JIRA (v6.4.11#64026)

10 years, 7 months

1
0
0 / 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

jboss-jira December 2015