[jboss-jira] [JBoss JIRA] (WFLY-2742) Domain rollout operation timeouts
Brian Stansberry (JIRA)
issues at jboss.org
Mon Dec 1 16:04:39 EST 2014
[ https://issues.jboss.org/browse/WFLY-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brian Stansberry updated WFLY-2742:
-----------------------------------
Parent: (was: WFLY-2740)
Issue Type: Feature Request (was: Sub-task)
> Domain rollout operation timeouts
> ---------------------------------
>
> Key: WFLY-2742
> URL: https://issues.jboss.org/browse/WFLY-2742
> Project: WildFly
> Issue Type: Feature Request
> Components: Domain Management
> Reporter: Brian Stansberry
> Assignee: Brian Stansberry
> Fix For: 9.0.0.Beta1
>
>
> Portion of the parent task related to rolling out changes to a domain.
> The expectation is that single process timeouts (WFLY-2741) will handle most failure conditions related to domain rollouts (e.g. if a single server hangs, preventing completion of the rollout, eventually that server will time out, allowing the domain wide rollout to continue.) Timeouts in the domain rollout code serve as a second line of defense:
> 1) In case of protocol or other problems that prevent the calling process learning about the timeout on the remote process
> 2) In case of bugs in the single process timeout handling on the remote process
> 3) In mixed domain cases where remote hosts are running previous versions and do not have the timeout function
> Potential places to add timeouts:
> DomainSlaveHandler->HostControllerUpdateTask.ProxyOperationListener.retrievePreparedOperation()
> -- where the master HC waits for responses from slaves
> RollingServerGroupUpdateTask.run() -> ServerTaskExecutor.ServerOperationListener.retrievePreparedOperation()
> -- timeout here means 1 server didn't respond, but need to move on to next
> ConcurrentServerGroupUpdateTask.run() -> ServerTaskExecutor.ServerOperationListener.retrievePreparedOperation()
> -- timeout here means none of the remaining servers have responded w/in the timeout
> DomainRolloutStepHandler.finalizeOp() -> future.get()
> ---- the ServerGroupUpdateTask should fail in the normal phase, so any timeout here would indicate a problem committing the tx or a comms problem getting back the response
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
More information about the jboss-jira
mailing list