[wildfly-dev] Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Brian Stansberry brian.stansberry at redhat.com
Wed Jun 11 12:07:20 EDT 2014


On 6/9/14, 5:38 PM, Stuart Douglas wrote:
> Server suspend and resume is a feature that allows a running server to
> gracefully finish of all running requests. The most common use case for
> this is graceful shutdown, where you would like a server to complete all
> running requests, reject any new ones, and then shut down, however there
> are also plenty of other valid use cases (e.g. suspend the server,
> modify a data source or some other config, then resume).
>
> User View:
>
> For the users point of view two new operations will be added to the server:
>
> suspend(timeout)
> resume()
>
> A runtime only attribute suspend-state (is this a good name?) will also
> be added, that can take one of three possible values, RUNNING,
> SUSPENDING, SUSPENDED.
>
> A timeout attribute will also be added to the shutdown operation. If
> this is present then the server will first be suspended, and the server
> will not shut down until either the suspend is successful or the timeout
> occurs. If no timeout parameter is passed to the operation then a normal
> non-graceful shutdown will take place.
>
> In domain mode these operations will be added to both individual server
> and a complete server group.
>
> Implementation Details
>
> Suspend/resume operates on entry points to the server. Any request that
> is currently running must not be affected by the suspend state, however
> any new request should be rejected. In general subsystems will track the
> number of outstanding requests, and when this hits zero they are
> considered suspended.
>
> We will introduce the notion of a global SuspendController, that manages
> the servers suspend state. All subsystems that wish to do a graceful
> shutdown register callback handlers with this controller.
>
> When the suspend() operation is invoked the controller will invoke all
> these callbacks, letting the subsystem know that the server is suspend,
> and providing the subsystem with a SuspendContext object that the
> subsystem can then use to notify the controller that the suspend is
> complete.
>
> What the subsystem does when it receives a suspend command, and when it
> considers itself suspended will vary, but in the common case it will
> immediatly start rejecting external requests (e.g. Undertow will start
> responding with a 503 to all new requests).

I think there will need to be some mechanism for coordination between 
subsystems here. For example, I doubt mod_cluster will want Undertow 
deciding to start sending 503s before it gets a chance to get the LB sorted.

> The subsystem will also
> track the number of outstanding requests, and when this hits zero then
> the subsystem will notify the controller that is has successfully
> suspended.
> Some subsystems will obviously want to do other actions on suspend, e.g.
> clustering will likely want to fail over, mod_cluster will notify the
> load balancer that the node is no longer available etc. In some cases we
> may want to make this configurable to an extent (e.g. Undertow could be
> configured to allow requests with an existing session, and not consider
> itself timed out until all sessions have either timed out or been
> invalidated, although this will obviously take a while).
>
> If anyone has any feedback let me know. In terms of implementation my
> basic plan is to get the core functionality and the Undertow
> implementation into Wildfly, and then work with subsystem authors to
> implement subsystem specific functionality once the core is in place.
>
> Stuart
>
>
>
>
>
>
>
> The
>
> A timeout attribute will also be added to the shutdown command,
> _______________________________________________
> wildfly-dev mailing list
> wildfly-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>


-- 
Brian Stansberry
Senior Principal Software Engineer
JBoss by Red Hat


More information about the wildfly-dev mailing list