Re: [wildfly-dev] Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Wednesday, 11 June 2014

On 6/9/14, 5:38 PM, Stuart Douglas wrote:
...
 Server suspend and resume is a feature that allows a running server
to
 gracefully finish of all running requests. The most common use case for
 this is graceful shutdown, where you would like a server to complete all
 running requests, reject any new ones, and then shut down, however there
 are also plenty of other valid use cases (e.g. suspend the server,
 modify a data source or some other config, then resume).

 User View:

 For the users point of view two new operations will be added to the server:

 suspend(timeout)
 resume()

 A runtime only attribute suspend-state (is this a good name?) will also
 be added, that can take one of three possible values, RUNNING,
 SUSPENDING, SUSPENDED.

 A timeout attribute will also be added to the shutdown operation. If
 this is present then the server will first be suspended, and the server
 will not shut down until either the suspend is successful or the timeout
 occurs. If no timeout parameter is passed to the operation then a normal
 non-graceful shutdown will take place.

 In domain mode these operations will be added to both individual server
 and a complete server group.

 Implementation Details

 Suspend/resume operates on entry points to the server. Any request that
 is currently running must not be affected by the suspend state, however
 any new request should be rejected. In general subsystems will track the
 number of outstanding requests, and when this hits zero they are
 considered suspended.

 We will introduce the notion of a global SuspendController, that manages
 the servers suspend state. All subsystems that wish to do a graceful
 shutdown register callback handlers with this controller.

 When the suspend() operation is invoked the controller will invoke all
 these callbacks, letting the subsystem know that the server is suspend,
 and providing the subsystem with a SuspendContext object that the
 subsystem can then use to notify the controller that the suspend is
 complete.

 What the subsystem does when it receives a suspend command, and when it
 considers itself suspended will vary, but in the common case it will
 immediatly start rejecting external requests (e.g. Undertow will start
 responding with a 503 to all new requests). 
I think there will need to be some mechanism for coordination between 
subsystems here. For example, I doubt mod_cluster will want Undertow 
deciding to start sending 503s before it gets a chance to get the LB sorted.

...
 The subsystem will also
 track the number of outstanding requests, and when this hits zero then
 the subsystem will notify the controller that is has successfully
 suspended.
 Some subsystems will obviously want to do other actions on suspend, e.g.
 clustering will likely want to fail over, mod_cluster will notify the
 load balancer that the node is no longer available etc. In some cases we
 may want to make this configurable to an extent (e.g. Undertow could be
 configured to allow requests with an existing session, and not consider
 itself timed out until all sessions have either timed out or been
 invalidated, although this will obviously take a while).

 If anyone has any feedback let me know. In terms of implementation my
 basic plan is to get the core functionality and the Undertow
 implementation into Wildfly, and then work with subsystem authors to
 implement subsystem specific functionality once the core is in place.

 Stuart

 The

 A timeout attribute will also be added to the shutdown command,
 _______________________________________________
 wildfly-dev mailing list
 wildfly-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/wildfly-dev

-- 
Brian Stansberry
Senior Principal Software Engineer
JBoss by Red Hat

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: [wildfly-dev] Design Proposal: Server suspend/resume (AKA Graceful Shutdown)