]
Stuart Douglas updated WFCORE-378:
----------------------------------
Fix Version/s: 1.0.0.CR1
(was: 1.0.0.Beta1)
Domain rollout operation timeouts
---------------------------------
Key: WFCORE-378
URL:
https://issues.jboss.org/browse/WFCORE-378
Project: WildFly Core
Issue Type: Feature Request
Components: Domain Management
Reporter: Brian Stansberry
Assignee: Brian Stansberry
Fix For: 1.0.0.CR1
Portion of the parent task related to rolling out changes to a domain.
The expectation is that single process timeouts (WFLY-2741) will handle most failure
conditions related to domain rollouts (e.g. if a single server hangs, preventing
completion of the rollout, eventually that server will time out, allowing the domain wide
rollout to continue.) Timeouts in the domain rollout code serve as a second line of
defense:
1) In case of protocol or other problems that prevent the calling process learning about
the timeout on the remote process
2) In case of bugs in the single process timeout handling on the remote process
3) In mixed domain cases where remote hosts are running previous versions and do not have
the timeout function
Potential places to add timeouts:
DomainSlaveHandler->HostControllerUpdateTask.ProxyOperationListener.retrievePreparedOperation()
-- where the master HC waits for responses from slaves
RollingServerGroupUpdateTask.run() ->
ServerTaskExecutor.ServerOperationListener.retrievePreparedOperation()
-- timeout here means 1 server didn't respond, but need to move on to next
ConcurrentServerGroupUpdateTask.run() ->
ServerTaskExecutor.ServerOperationListener.retrievePreparedOperation()
-- timeout here means none of the remaining servers have responded w/in the timeout
DomainRolloutStepHandler.finalizeOp() -> future.get()
---- the ServerGroupUpdateTask should fail in the normal phase, so any timeout here would
indicate a problem committing the tx or a comms problem getting back the response