[jboss-jira] [JBoss JIRA] (WFLY-2742) Domain rollout operation timeouts

Brian Stansberry (JIRA) issues at jboss.org
Fri Jan 10 16:26:32 EST 2014


Brian Stansberry created WFLY-2742:
--------------------------------------

             Summary: Domain rollout operation timeouts
                 Key: WFLY-2742
                 URL: https://issues.jboss.org/browse/WFLY-2742
             Project: WildFly
          Issue Type: Sub-task
      Security Level: Public (Everyone can see)
          Components: Domain Management
            Reporter: Brian Stansberry
            Assignee: Brian Stansberry
             Fix For: 9.0.0.CR1


Portion of the parent task related to rolling out changes to a domain.

The expectation is that single process timeouts (WFLY-2741) will handle most failure conditions related to domain rollouts (e.g. if a single server hangs, preventing completion of the rollout, eventually that server will time out, allowing the domain wide rollout to continue.) Timeouts in the domain rollout code serve as a second line of defense:

1) In case of protocol or other problems that prevent the calling process learning about the timeout on the remote process
2) In case of bugs in the single process timeout handling on the remote process
3) In mixed domain cases where remote hosts are running previous versions and do not have the timeout function

Potential places to add timeouts:

DomainSlaveHandler->HostControllerUpdateTask.ProxyOperationListener.retrievePreparedOperation()
-- where the master HC waits for responses from slaves

RollingServerGroupUpdateTask.run() -> ServerTaskExecutor.ServerOperationListener.retrievePreparedOperation()
-- timeout here means 1 server didn't respond, but need to move on to next

ConcurrentServerGroupUpdateTask.run() -> ServerTaskExecutor.ServerOperationListener.retrievePreparedOperation()
-- timeout here means none of the remaining servers have responded w/in the timeout

DomainRolloutStepHandler.finalizeOp() -> future.get()
---- the ServerGroupUpdateTask should fail in the normal phase, so any timeout here would indicate a problem committing the tx or a comms problem getting back the response

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the jboss-jira mailing list