[wildfly-dev] Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Scott Marlow smarlow at redhat.com
Tue Jun 10 08:29:36 EDT 2014


On 06/09/2014 10:40 PM, Stuart Douglas wrote:
>
>
> Scott Marlow wrote:
>> On 06/09/2014 06:38 PM, Stuart Douglas wrote:
>>> Server suspend and resume is a feature that allows a running server to
>>> gracefully finish of all running requests. The most common use case for
>>> this is graceful shutdown, where you would like a server to complete all
>>> running requests, reject any new ones, and then shut down, however there
>>> are also plenty of other valid use cases (e.g. suspend the server,
>>> modify a data source or some other config, then resume).
>>>
>>> User View:
>>>
>>> For the users point of view two new operations will be added to the
>>> server:
>>>
>>> suspend(timeout)
>>> resume()
>>>
>>> A runtime only attribute suspend-state (is this a good name?) will also
>>> be added, that can take one of three possible values, RUNNING,
>>> SUSPENDING, SUSPENDED.
>>
>> The SuspendController "state" might be a shorter attribute name and just
>> as meaningful.
>
> This will be in the global server namespace (i.e. from the CLI
> :read-attribute(name="suspend-state").
>
> I think the name 'state' is just two generic, which kind of state are we
> talking about?

I was thinking suspend-state was an attribute of SuspendController. 
Thanks for the explanation.

>
>>
>> When are we in the RUNNING state?  Is that simply the pre-state for
>> SUSPENDING?
>
> 99.99% of the time. Basically servers are always running unless they are
> have been explicitly suspended, and then they go from suspending to
> suspended. Note that if resume is called at any time the server goes to
> RUNNING again immediately, as when subsystems are notified they should
> be able to begin accepting requests again straight away.
>
> We also have admin only mode, which is a kinda similar concept, so we
> need to make sure we document the differences.
>
>>
>>> A timeout attribute will also be added to the shutdown operation. If
>>> this is present then the server will first be suspended, and the server
>>> will not shut down until either the suspend is successful or the timeout
>>> occurs. If no timeout parameter is passed to the operation then a normal
>>> non-graceful shutdown will take place.
>>
>> Will non-graceful shutdown wait for non-daemon threads or terminate
>> immediately (call System.exit()).
>
> It will execute the same way it does today (all services will shut down
> and then the server will exit).
>
> Stuart
>
>>
>>> In domain mode these operations will be added to both individual server
>>> and a complete server group.
>>>
>>> Implementation Details
>>>
>>> Suspend/resume operates on entry points to the server. Any request that
>>> is currently running must not be affected by the suspend state, however
>>> any new request should be rejected. In general subsystems will track the
>>> number of outstanding requests, and when this hits zero they are
>>> considered suspended.
>>>
>>> We will introduce the notion of a global SuspendController, that manages
>>> the servers suspend state. All subsystems that wish to do a graceful
>>> shutdown register callback handlers with this controller.
>>>
>>> When the suspend() operation is invoked the controller will invoke all
>>> these callbacks, letting the subsystem know that the server is suspend,
>>> and providing the subsystem with a SuspendContext object that the
>>> subsystem can then use to notify the controller that the suspend is
>>> complete.
>>>
>>> What the subsystem does when it receives a suspend command, and when it
>>> considers itself suspended will vary, but in the common case it will
>>> immediatly start rejecting external requests (e.g. Undertow will start
>>> responding with a 503 to all new requests). The subsystem will also
>>> track the number of outstanding requests, and when this hits zero then
>>> the subsystem will notify the controller that is has successfully
>>> suspended.
>>> Some subsystems will obviously want to do other actions on suspend, e.g.
>>> clustering will likely want to fail over, mod_cluster will notify the
>>> load balancer that the node is no longer available etc. In some cases we
>>> may want to make this configurable to an extent (e.g. Undertow could be
>>> configured to allow requests with an existing session, and not consider
>>> itself timed out until all sessions have either timed out or been
>>> invalidated, although this will obviously take a while).
>>>
>>> If anyone has any feedback let me know. In terms of implementation my
>>> basic plan is to get the core functionality and the Undertow
>>> implementation into Wildfly, and then work with subsystem authors to
>>> implement subsystem specific functionality once the core is in place.
>>>
>>> Stuart
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> The
>>>
>>> A timeout attribute will also be added to the shutdown command,
>>> _______________________________________________
>>> wildfly-dev mailing list
>>> wildfly-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>
>>
>> _______________________________________________
>> wildfly-dev mailing list
>> wildfly-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/wildfly-dev



More information about the wildfly-dev mailing list