[jboss-jira] [JBoss JIRA] (WFCORE-378) Domain rollout operation timeouts

Brian Stansberry (JIRA) issues at jboss.org
Tue Apr 28 10:30:54 EDT 2015


     [ https://issues.jboss.org/browse/WFCORE-378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Stansberry updated WFCORE-378:
------------------------------------
    Fix Version/s: 2.0.0.Beta1
                       (was: 1.0.0.CR1)


> Domain rollout operation timeouts
> ---------------------------------
>
>                 Key: WFCORE-378
>                 URL: https://issues.jboss.org/browse/WFCORE-378
>             Project: WildFly Core
>          Issue Type: Feature Request
>          Components: Domain Management
>            Reporter: Brian Stansberry
>            Assignee: Brian Stansberry
>             Fix For: 2.0.0.Beta1
>
>
> Portion of the parent task related to rolling out changes to a domain.
> The expectation is that single process timeouts (WFLY-2741) will handle most failure conditions related to domain rollouts (e.g. if a single server hangs, preventing completion of the rollout, eventually that server will time out, allowing the domain wide rollout to continue.) Timeouts in the domain rollout code serve as a second line of defense:
> 1) In case of protocol or other problems that prevent the calling process learning about the timeout on the remote process
> 2) In case of bugs in the single process timeout handling on the remote process
> 3) In mixed domain cases where remote hosts are running previous versions and do not have the timeout function
> Potential places to add timeouts:
> DomainSlaveHandler->HostControllerUpdateTask.ProxyOperationListener.retrievePreparedOperation()
> -- where the master HC waits for responses from slaves
> RollingServerGroupUpdateTask.run() -> ServerTaskExecutor.ServerOperationListener.retrievePreparedOperation()
> -- timeout here means 1 server didn't respond, but need to move on to next
> ConcurrentServerGroupUpdateTask.run() -> ServerTaskExecutor.ServerOperationListener.retrievePreparedOperation()
> -- timeout here means none of the remaining servers have responded w/in the timeout
> DomainRolloutStepHandler.finalizeOp() -> future.get()
> ---- the ServerGroupUpdateTask should fail in the normal phase, so any timeout here would indicate a problem committing the tx or a comms problem getting back the response



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


More information about the jboss-jira mailing list