Re: [wildfly-dev] Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Tuesday, 10 June 2014

They are actually orthogonal, a server can be in both RESTART_REQUIRED 
and any one of the suspend states.

RESTART_REQUIRED is very much tied to services and the management model, 
while suspend/resume is a runtime only thing that should not touch the 
state of services.

Stuart

Dimitris Andreadis wrote:
...
 Why not extend the states of the existing 'server-state'
attribute to:

 (STARTING, RUNNING, SUSPENDING, SUSPENDED, RESTART_REQUIRED RUNNING)

 http://wildscribe.github.io/Wildfly/8.0.0.Final/index.html

 On 10/06/2014 04:40, Stuart Douglas wrote:
>
> Scott Marlow wrote:
>> On 06/09/2014 06:38 PM, Stuart Douglas wrote:
>>> Server suspend and resume is a feature that allows a running server to
>>> gracefully finish of all running requests. The most common use case for
>>> this is graceful shutdown, where you would like a server to complete all
>>> running requests, reject any new ones, and then shut down, however there
>>> are also plenty of other valid use cases (e.g. suspend the server,
>>> modify a data source or some other config, then resume).
>>>
>>> User View:
>>>
>>> For the users point of view two new operations will be added to the server:
>>>
>>> suspend(timeout)
>>> resume()
>>>
>>> A runtime only attribute suspend-state (is this a good name?) will also
>>> be added, that can take one of three possible values, RUNNING,
>>> SUSPENDING, SUSPENDED.
>> The SuspendController "state" might be a shorter attribute name and
just
>> as meaningful.
> This will be in the global server namespace (i.e. from the CLI
> :read-attribute(name="suspend-state").
>
> I think the name 'state' is just two generic, which kind of state are we
> talking about?
>
>> When are we in the RUNNING state?  Is that simply the pre-state for
>> SUSPENDING?
> 99.99% of the time. Basically servers are always running unless they are
> have been explicitly suspended, and then they go from suspending to
> suspended. Note that if resume is called at any time the server goes to
> RUNNING again immediately, as when subsystems are notified they should
> be able to begin accepting requests again straight away.
>
> We also have admin only mode, which is a kinda similar concept, so we
> need to make sure we document the differences.
>
>>> A timeout attribute will also be added to the shutdown operation. If
>>> this is present then the server will first be suspended, and the server
>>> will not shut down until either the suspend is successful or the timeout
>>> occurs. If no timeout parameter is passed to the operation then a normal
>>> non-graceful shutdown will take place.
>> Will non-graceful shutdown wait for non-daemon threads or terminate
>> immediately (call System.exit()).
> It will execute the same way it does today (all services will shut down
> and then the server will exit).
>
> Stuart
>
>>> In domain mode these operations will be added to both individual server
>>> and a complete server group.
>>>
>>> Implementation Details
>>>
>>> Suspend/resume operates on entry points to the server. Any request that
>>> is currently running must not be affected by the suspend state, however
>>> any new request should be rejected. In general subsystems will track the
>>> number of outstanding requests, and when this hits zero they are
>>> considered suspended.
>>>
>>> We will introduce the notion of a global SuspendController, that manages
>>> the servers suspend state. All subsystems that wish to do a graceful
>>> shutdown register callback handlers with this controller.
>>>
>>> When the suspend() operation is invoked the controller will invoke all
>>> these callbacks, letting the subsystem know that the server is suspend,
>>> and providing the subsystem with a SuspendContext object that the
>>> subsystem can then use to notify the controller that the suspend is
>>> complete.
>>>
>>> What the subsystem does when it receives a suspend command, and when it
>>> considers itself suspended will vary, but in the common case it will
>>> immediatly start rejecting external requests (e.g. Undertow will start
>>> responding with a 503 to all new requests). The subsystem will also
>>> track the number of outstanding requests, and when this hits zero then
>>> the subsystem will notify the controller that is has successfully
>>> suspended.
>>> Some subsystems will obviously want to do other actions on suspend, e.g.
>>> clustering will likely want to fail over, mod_cluster will notify the
>>> load balancer that the node is no longer available etc. In some cases we
>>> may want to make this configurable to an extent (e.g. Undertow could be
>>> configured to allow requests with an existing session, and not consider
>>> itself timed out until all sessions have either timed out or been
>>> invalidated, although this will obviously take a while).
>>>
>>> If anyone has any feedback let me know. In terms of implementation my
>>> basic plan is to get the core functionality and the Undertow
>>> implementation into Wildfly, and then work with subsystem authors to
>>> implement subsystem specific functionality once the core is in place.
>>>
>>> Stuart
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> The
>>>
>>> A timeout attribute will also be added to the shutdown command,
>>> _______________________________________________
>>> wildfly-dev mailing list
>>> wildfly-dev(a)lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>
>> _______________________________________________
>> wildfly-dev mailing list
>> wildfly-dev(a)lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
> _______________________________________________
> wildfly-dev mailing list
> wildfly-dev(a)lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>
 _______________________________________________
 wildfly-dev mailing list
 wildfly-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/wildfly-dev 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: [wildfly-dev] Design Proposal: Server suspend/resume (AKA Graceful Shutdown)