[jboss-jira] [JBoss JIRA] (WFCORE-378) Domain rollout operation timeouts
Brian Stansberry (JIRA)
issues at jboss.org
Mon Dec 1 16:15:48 EST 2014
[ https://issues.jboss.org/browse/WFCORE-378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brian Stansberry moved WFLY-2742 to WFCORE-378:
-----------------------------------------------
Project: WildFly Core (was: WildFly)
Key: WFCORE-378 (was: WFLY-2742)
Component/s: Domain Management
(was: Domain Management)
Fix Version/s: 1.0.0.Beta1
(was: 9.0.0.Beta1)
> Domain rollout operation timeouts
> ---------------------------------
>
> Key: WFCORE-378
> URL: https://issues.jboss.org/browse/WFCORE-378
> Project: WildFly Core
> Issue Type: Feature Request
> Components: Domain Management
> Reporter: Brian Stansberry
> Assignee: Brian Stansberry
> Fix For: 1.0.0.Beta1
>
>
> Portion of the parent task related to rolling out changes to a domain.
> The expectation is that single process timeouts (WFLY-2741) will handle most failure conditions related to domain rollouts (e.g. if a single server hangs, preventing completion of the rollout, eventually that server will time out, allowing the domain wide rollout to continue.) Timeouts in the domain rollout code serve as a second line of defense:
> 1) In case of protocol or other problems that prevent the calling process learning about the timeout on the remote process
> 2) In case of bugs in the single process timeout handling on the remote process
> 3) In mixed domain cases where remote hosts are running previous versions and do not have the timeout function
> Potential places to add timeouts:
> DomainSlaveHandler->HostControllerUpdateTask.ProxyOperationListener.retrievePreparedOperation()
> -- where the master HC waits for responses from slaves
> RollingServerGroupUpdateTask.run() -> ServerTaskExecutor.ServerOperationListener.retrievePreparedOperation()
> -- timeout here means 1 server didn't respond, but need to move on to next
> ConcurrentServerGroupUpdateTask.run() -> ServerTaskExecutor.ServerOperationListener.retrievePreparedOperation()
> -- timeout here means none of the remaining servers have responded w/in the timeout
> DomainRolloutStepHandler.finalizeOp() -> future.get()
> ---- the ServerGroupUpdateTask should fail in the normal phase, so any timeout here would indicate a problem committing the tx or a comms problem getting back the response
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
More information about the jboss-jira
mailing list