[wildfly-dev] Graceful shutdown

Tue Aug 26 00:19:28 EDT 2014

Hi all,

The first graceful shutdown code has now gone into Wildfly upstream, and 
it should be possible to now start implementing this for all endpoints 
(at the moment it is only implemented for Undertow and remote EJB).

Basically a server can be suspended by executing the :suspend operation 
in the CLI, and resumed using the :resume operation (there are 
corresponding options in domain mode as well that can be executed at 
domain, server group and server level). Servers can also be suspended by 
doing a graceful shutdown, which basically involves passing a timeout 
parameter to the :shutdown command (so :shutdown(timeout=60) will 
suspend the server, wait up to 60 seconds for all current requests to 
finish, then shut down).

 From a code point of view there are two main constructs, 
org.jboss.as.server.suspend.SuspendController, which is notified of 
suspend events, and 
org.wildfly.extension.requestcontroller.RequestController, which deals 
with the common case of tracking active requests (and also allows a 
global request limit to be put in place as a form of overload protection).

Subsystems that wish to use the SuspendController directly do this by 
registering ServerActivity callbacks, these callbacks notify the 
subsystem when the server is being suspended and resumed, and allow the 
subsystem to notify the server when the subsystem has suspended. This 
happens in two stages:

- preSuspend() this is called first, and allows things like mod_cluster 
to notify the load balancer that the server is being suspended. During 
this phase the server processes requests normally.

- suspend() this is called once the preSuspend() phase has completed, 
once suspend has started the subsystem should stop accepting requests, 
and notify the server once it considers itself fully suspended.

Subsystems that wish to use RequestController do this by getting access 
to a ControlPoint, which is identified by (top level) deployment name + 
entry point name. When an external request starts the code calls 
beginRequest() and checks the return value to see if the request is 
allowed to proceed. If it is allowed then the code must call 
requestComplete() when it is finished.

Note that this can only be used for external requests, or it can break 
already running code (e.g. @PreDestroy calls that are running when the 
server is suspended).

Because the request controller tracks the deployment and entry point we 
may eventually use this information to also provide:

- deployment level suspend (so we can do 'graceful undeploy')
- entry point level suspend (e.g. suspend all web requests)
- statistics on active requests by deployment/entry point

Note that RequstController is a subsystem, and the request tracking does 
add a small amount of overhead. If the subsystem is removed then this 
overhead will disappear, however graceful shutdown will then not track 
active requests.

All questions/comments are welcome, now that the core is in place I am 
going to start creating JIRA's for all the subsystem level integration work.

Stuart