[JBoss JIRA] (WFCORE-378) Domain rollout operation timeouts
by Kabir Khan (JIRA)
[ https://issues.jboss.org/browse/WFCORE-378?page=com.atlassian.jira.plugin... ]
Kabir Khan updated WFCORE-378:
------------------------------
Fix Version/s: 1.0.0.CR1
(was: 1.0.0.Beta6)
> Domain rollout operation timeouts
> ---------------------------------
>
> Key: WFCORE-378
> URL: https://issues.jboss.org/browse/WFCORE-378
> Project: WildFly Core
> Issue Type: Feature Request
> Components: Domain Management
> Reporter: Brian Stansberry
> Assignee: Brian Stansberry
> Fix For: 1.0.0.CR1
>
>
> Portion of the parent task related to rolling out changes to a domain.
> The expectation is that single process timeouts (WFLY-2741) will handle most failure conditions related to domain rollouts (e.g. if a single server hangs, preventing completion of the rollout, eventually that server will time out, allowing the domain wide rollout to continue.) Timeouts in the domain rollout code serve as a second line of defense:
> 1) In case of protocol or other problems that prevent the calling process learning about the timeout on the remote process
> 2) In case of bugs in the single process timeout handling on the remote process
> 3) In mixed domain cases where remote hosts are running previous versions and do not have the timeout function
> Potential places to add timeouts:
> DomainSlaveHandler->HostControllerUpdateTask.ProxyOperationListener.retrievePreparedOperation()
> -- where the master HC waits for responses from slaves
> RollingServerGroupUpdateTask.run() -> ServerTaskExecutor.ServerOperationListener.retrievePreparedOperation()
> -- timeout here means 1 server didn't respond, but need to move on to next
> ConcurrentServerGroupUpdateTask.run() -> ServerTaskExecutor.ServerOperationListener.retrievePreparedOperation()
> -- timeout here means none of the remaining servers have responded w/in the timeout
> DomainRolloutStepHandler.finalizeOp() -> future.get()
> ---- the ServerGroupUpdateTask should fail in the normal phase, so any timeout here would indicate a problem committing the tx or a comms problem getting back the response
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
10 years, 8 months
[JBoss JIRA] (WFCORE-380) Operations to read content from the content repository
by Kabir Khan (JIRA)
[ https://issues.jboss.org/browse/WFCORE-380?page=com.atlassian.jira.plugin... ]
Kabir Khan updated WFCORE-380:
------------------------------
Fix Version/s: 1.0.0.CR1
(was: 1.0.0.Beta6)
> Operations to read content from the content repository
> ------------------------------------------------------
>
> Key: WFCORE-380
> URL: https://issues.jboss.org/browse/WFCORE-380
> Project: WildFly Core
> Issue Type: Feature Request
> Components: Domain Management
> Reporter: Brian Stansberry
> Assignee: luck3y
> Priority: Minor
> Fix For: 1.0.0.CR1
>
>
> Ability to pull data out of the content repository. We can do that with rollout plans, but not for deployments or if we add deployment content overlays.
> Basically, if the user loses the original of the deployment, let's let them get it back via the management API and remove the temptation for them to hack into the repository.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
10 years, 8 months
[JBoss JIRA] (WFCORE-366) Support timeout of management operations
by Kabir Khan (JIRA)
[ https://issues.jboss.org/browse/WFCORE-366?page=com.atlassian.jira.plugin... ]
Kabir Khan updated WFCORE-366:
------------------------------
Fix Version/s: 1.0.0.CR1
(was: 1.0.0.Beta6)
> Support timeout of management operations
> ----------------------------------------
>
> Key: WFCORE-366
> URL: https://issues.jboss.org/browse/WFCORE-366
> Project: WildFly Core
> Issue Type: Enhancement
> Components: Domain Management
> Reporter: Brian Stansberry
> Assignee: Brian Stansberry
> Fix For: 1.0.0.CR1
>
>
> There have been various situations where bugs in WF or in user code have prevented management operations from completing, leaving the ModelController locked permanently, unless the process is stopped. There should be mechanism via which timeouts can be set.
> My proposal is to allow a timeout to be set via a management operation header. In addition, there will be a standard timeout that will apply if no header is present.
> The standard timeout will be quite lengthy, probably several minutes. The goal is to allow a management process to eventually auto-recover, not to do prompt detection of failures. Users can use the header if they wish prompt detection a failures. A short standard timeout runs the risk of false positives, particularly with large deployments.
> The meaning of this timeout is not to be an overall maximum time for operation execution. There would be no valid default for such a timeout in a managed domain, where the time it would take to roll out a change would depend on the size of the domain and the rollout plan.
> Rather, this timeout is meant to be the maximum period operation execution threads can block in various points. Blocking for longer than the timeout would result in operation failure.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
10 years, 8 months
[JBoss JIRA] (WFCORE-364) Hangs in mixed domain testsuite
by Kabir Khan (JIRA)
[ https://issues.jboss.org/browse/WFCORE-364?page=com.atlassian.jira.plugin... ]
Kabir Khan updated WFCORE-364:
------------------------------
Fix Version/s: 1.0.0.CR1
(was: 1.0.0.Beta6)
> Hangs in mixed domain testsuite
> -------------------------------
>
> Key: WFCORE-364
> URL: https://issues.jboss.org/browse/WFCORE-364
> Project: WildFly Core
> Issue Type: Bug
> Components: Domain Management
> Reporter: Brian Stansberry
> Assignee: Brian Stansberry
> Fix For: 1.0.0.CR1
>
> Attachments: testsuite-hang1-server.txt, testsuite-hang2-client.txt, testsuite-hang2-server.txt
>
>
> http://lightning.mw.lab.eng.bos.redhat.com/jenkins/job/wildfly-param-pull...
> https://dl.dropboxusercontent.com/u/712508/mixed-hang-debug.zip contains details.
> 30805.dump is the client and shows the problem is occurring in the @After cleanup of MixedDomainDeploymentTest as it removes deployments from the domain.
> "main" prio=10 tid=0xf7605c00 nid=0x7856 in Object.wait() [0xf776e000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xe50275f0> (a org.jboss.as.protocol.mgmt.ActiveOperationSupport$ActiveOperationImpl)
> at java.lang.Object.wait(Object.java:503)
> at org.jboss.threads.AsyncFutureTask.await(AsyncFutureTask.java:192)
> - locked <0xe50275f0> (a org.jboss.as.protocol.mgmt.ActiveOperationSupport$ActiveOperationImpl)
> at org.jboss.threads.AsyncFutureTask.get(AsyncFutureTask.java:266)
> - locked <0xe50275f0> (a org.jboss.as.protocol.mgmt.ActiveOperationSupport$ActiveOperationImpl)
> at org.jboss.as.controller.client.impl.AbstractDelegatingAsyncFuture.get(AbstractDelegatingAsyncFuture.java:100)
> at org.jboss.as.controller.client.impl.AbstractModelControllerClient.executeForResult(AbstractModelControllerClient.java:127)
> at org.jboss.as.controller.client.impl.AbstractModelControllerClient.execute(AbstractModelControllerClient.java:71)
> at org.jboss.as.controller.client.helpers.domain.impl.DomainClientImpl.execute(DomainClientImpl.java:81)
> at org.jboss.as.test.integration.domain.mixed.MixedDomainDeploymentTest.removeDeployment(MixedDomainDeploymentTest.java:441)
> at org.jboss.as.test.integration.domain.mixed.MixedDomainDeploymentTest.cleanDeployments(MixedDomainDeploymentTest.java:343)
> at org.jboss.as.test.integration.domain.mixed.MixedDomainDeploymentTest.confirmNoDeployments(MixedDomainDeploymentTest.java:163)
> 31894.dump shows the master HC. management-handler-thread is blocking on the way out (after sending the commit or rollback) waiting for a final result response from the slave.
> "management-handler-thread - 1" prio=10 tid=0xca0efc00 nid=0x7cdd in Object.wait() [0xc7469000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xeb6d6558> (a org.jboss.as.protocol.mgmt.ActiveOperationSupport$ActiveOperationImpl)
> at java.lang.Object.wait(Object.java:503)
> at org.jboss.threads.AsyncFutureTask.await(AsyncFutureTask.java:192)
> - locked <0xeb6d6558> (a org.jboss.as.protocol.mgmt.ActiveOperationSupport$ActiveOperationImpl)
> at org.jboss.threads.AsyncFutureTask.get(AsyncFutureTask.java:266)
> - locked <0xeb6d6558> (a org.jboss.as.protocol.mgmt.ActiveOperationSupport$ActiveOperationImpl)
> at org.jboss.as.controller.client.impl.AbstractDelegatingAsyncFuture.get(AbstractDelegatingAsyncFuture.java:100)
> at org.jboss.as.domain.controller.operations.coordination.DomainSlaveHandler.finalizeOp(DomainSlaveHandler.java:215)
> at org.jboss.as.domain.controller.operations.coordination.DomainSlaveHandler.access$000(DomainSlaveHandler.java:58)
> at org.jboss.as.domain.controller.operations.coordination.DomainSlaveHandler$1.handleResult(DomainSlaveHandler.java:179)
> at org.jboss.as.controller.AbstractOperationContext$Step.handleRollback(AbstractOperationContext.java:805)
> at org.jboss.as.controller.AbstractOperationContext$Step.finalizeInternal(AbstractOperationContext.java:763)
> at org.jboss.as.controller.AbstractOperationContext$Step.finalizeStep(AbstractOperationContext.java:738)
> (Note that the "handleRollback" method name is a red-herring -- the method should be renamed to a "handleResult" as it now is called no matter what the result.)
> 31986.dump is the slave HC. It shows it is blocking in stage DONE after having sent operationPrepared to the master, waiting for commit or rollback.
> "domain-connection-threads - 1" prio=10 tid=0xc91a8400 nid=0x7d1a waiting on condition [0xc87fe000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0xea1fd410> (a java.util.concurrent.CountDownLatch$Sync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:236)
> at org.jboss.as.controller.remote.TransactionalProtocolOperationHandler$ProxyOperationControlProxy.operationPrepared(TransactionalProtocolOperationHandler.java:173)
> at org.jboss.as.controller.AbstractOperationContext.doCompleteStep(AbstractOperationContext.java:337)
> at org.jboss.as.controller.AbstractOperationContext.completeStep(AbstractOperationContext.java:211)
> at org.jboss.as.domain.controller.operations.coordination.ServerOperationsResolverHandler.execute(ServerOperationsResolverHandler.java:128)
> Indication is the commit or rollback message is getting lost. There are no threads shown sending or receiving it.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
10 years, 8 months
[JBoss JIRA] (WFCORE-384) Server does wrongly reference a list of socket binding groups
by Kabir Khan (JIRA)
[ https://issues.jboss.org/browse/WFCORE-384?page=com.atlassian.jira.plugin... ]
Kabir Khan updated WFCORE-384:
------------------------------
Fix Version/s: 1.0.0.CR1
(was: 1.0.0.Beta6)
> Server does wrongly reference a list of socket binding groups
> -------------------------------------------------------------
>
> Key: WFCORE-384
> URL: https://issues.jboss.org/browse/WFCORE-384
> Project: WildFly Core
> Issue Type: Feature Request
> Components: Domain Management
> Reporter: Heiko Braun
> Assignee: luck3y
> Fix For: 1.0.0.CR1
>
>
> hbraun: i see that a server (host=foo/server=bar) can have a list of socket-binding-groups
> [10:59am] hbraun: shouldn't that be a a single socket binding instead?
> [11:00am] emuckenhuber: oh... yeah
> [11:00am] hbraun: maybe it's just the description that's wrong
> [11:00am] emuckenhuber: that sounds similar to the jvm thing we had
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
10 years, 8 months
[JBoss JIRA] (WFCORE-392) Stack trace improvements, summarize the things that broke to make them not so intimidating
by Kabir Khan (JIRA)
[ https://issues.jboss.org/browse/WFCORE-392?page=com.atlassian.jira.plugin... ]
Kabir Khan updated WFCORE-392:
------------------------------
Fix Version/s: 1.0.0.CR1
(was: 1.0.0.Beta6)
> Stack trace improvements, summarize the things that broke to make them not so intimidating
> ------------------------------------------------------------------------------------------
>
> Key: WFCORE-392
> URL: https://issues.jboss.org/browse/WFCORE-392
> Project: WildFly Core
> Issue Type: Feature Request
> Components: Domain Management
> Reporter: Jim Tyrrell
> Assignee: Brian Stansberry
> Priority: Critical
> Labels: eap6-ux
> Fix For: 1.0.0.CR1
>
> Attachments: log.txt
>
>
> See the typical attached stack dump:
> It would be nice if the stack dump could include all of the information above as a summary as shown below, basically just the first lines of the errors w/o the whole stack dump:
> 11:51:45,391 INFO [org.jboss.as.server.controller] (HttpManagementService-threads - 5) Deployment of "SpringWAR.war" was rolled back with failure message {"Failed services" => {"jboss.web.deployment.default-host./SpringWAR" => "org.jboss.msc.service.StartException in service jboss.web.deployment.default-host./SpringWAR: failed to start context"}}
> 11:51:45,392 INFO [org.jboss.as.controller] (HttpManagementService-threads - 5) Service status report
> Services which failed to start:
> service jboss.web.deployment.default-host./SpringWAR: org.jboss.msc.service.StartException in service jboss.web.deployment.default-host./SpringWAR: failed to start context
> 11:51:45,417 INFO [org.jboss.as.server.deployment] (MSC service thread 1-2) Stopped deployment SpringWAR.war in 26ms
> 11:51:45,417 Error Summary of Exceptions in deployment:
> - [org.springframework.web.context.ContextLoader] (MSC service thread 1-5) Context initialization failed: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'jbossQueue' defined in class path resource [applicationContextServices.xml]: Invocation of init method failed; nested exception is javax.naming.NameNotFoundException: queue/B -- service jboss.naming.context.java.queue.B
> - [org.apache.catalina.core.ContainerBase.[jboss.web].[default-host].[/SpringWAR]] (MSC service thread 1-5) Exception sending context initialized event to listener instance of class org.springframework.web.context.ContextLoaderListener: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'jbossQueue' defined in class path resource [applicationContextServices.xml]: Invocation of init method failed; nested exception is javax.naming.NameNotFoundException: queue/B -- service jboss.naming.context.java.queue.B
> - [org.apache.catalina.core.StandardContext] (MSC service thread 1-5) Error listenerStart
> - ERROR [org.apache.catalina.core.StandardContext] (MSC service thread 1-5) Context [/SpringWAR] startup failed due to previous errors
> - [org.apache.catalina.core.ContainerBase.[jboss.web].[default-host].[/SpringWAR]] (MSC service thread 1-5) Closing Spring root WebApplicationContext
> - [org.jboss.msc.service.fail] (MSC service thread 1-5) MSC00001: Failed to start service jboss.web.deployment.default-host./SpringWAR:
> ERROR Scroll up to see more info...
> Or
> 11:51:45,417 Error Summary of Exceptions in deployment:
> [1] [org.springframework.web.context.ContextLoader] (MSC service thread 1-5) Context initialization failed: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'jbossQueue' defined in class path resource [applicationContextServices.xml]: Invocation of init method failed; nested exception is javax.naming.NameNotFoundException: queue/B -- service jboss.naming.context.java.queue.B
> [2] [org.apache.catalina.core.ContainerBase.[jboss.web].[default-host].[/SpringWAR]] (MSC service thread 1-5) Exception sending context initialized event to listener instance of class org.springframework.web.context.ContextLoaderListener: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'jbossQueue' defined in class path resource [applicationContextServices.xml]: Invocation of init method failed; nested exception is javax.naming.NameNotFoundException: queue/B -- service jboss.naming.context.java.queue.B
> [3] [org.apache.catalina.core.StandardContext] (MSC service thread 1-5) Error listenerStart
> [4] ERROR [org.apache.catalina.core.StandardContext] (MSC service thread 1-5) Context [/SpringWAR] startup failed due to previous errors
> [5] [org.apache.catalina.core.ContainerBase.[jboss.web].[default-host].[/SpringWAR]] (MSC service thread 1-5) Closing Spring root WebApplicationContext
> [6] [org.jboss.msc.service.fail] (MSC service thread 1-5) MSC00001: Failed to start service jboss.web.deployment.default-host./SpringWAR:
> ERROR Scroll up to see more info...
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
10 years, 8 months
[JBoss JIRA] (WFCORE-390) Add reference description information to resource metadata
by Kabir Khan (JIRA)
[ https://issues.jboss.org/browse/WFCORE-390?page=com.atlassian.jira.plugin... ]
Kabir Khan updated WFCORE-390:
------------------------------
Fix Version/s: 1.0.0.CR1
(was: 1.0.0.Beta6)
> Add reference description information to resource metadata
> ----------------------------------------------------------
>
> Key: WFCORE-390
> URL: https://issues.jboss.org/browse/WFCORE-390
> Project: WildFly Core
> Issue Type: Feature Request
> Components: Domain Management
> Reporter: Brian Stansberry
> Fix For: 1.0.0.CR1
>
>
> The AS's configuration model frequently includes attributes that are references to the name of some other resource in the model. The metadata describing such attributes must include information to help users and tooling to understand that reference.
> A simple approach would be to include a metadata attribute whose value is an absolute or relative path to the target resource.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
10 years, 8 months