[wildfly-dev] Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Stuart Douglas stuart.w.douglas at gmail.com
Tue Jun 10 11:40:18 EDT 2014


I don't think so, I think RESTART_REQUIRED means running, but I need to 
restart to apply management changes (I think that attribute can also be 
RELOAD_REQUIRED, I think the description may be a bit out of date).

To accurately reflect all the possible states you would need something like:

RUNNING
PAUSING,
PAUSED,
RESTART_REQUIRED
PAUSING_RESTART_REQUIRED
PAUSED_RESTART_REQUIRED
RELOAD_REQUIRED
PAUSING_RELOAD_REQUIRED
PAUSED_RELOAD_REQUIRED

Which does not seem great, and may introduce compatibility problems for 
clients that are not expecting these new values.

Stuart



Dimitris Andreadis wrote:
> Isn't RESTART_REQUIRED also orthogonal to RUNNING?
>
> On 10/06/2014 17:17, Stuart Douglas wrote:
>> They are actually orthogonal, a server can be in both RESTART_REQUIRED
>> and any one of the
>> suspend states.
>>
>> RESTART_REQUIRED is very much tied to services and the management
>> model, while
>> suspend/resume is a runtime only thing that should not touch the state
>> of services.
>>
>>
>> Stuart
>>
>> Dimitris Andreadis wrote:
>>> Why not extend the states of the existing 'server-state' attribute to:
>>>
>>> (STARTING, RUNNING, SUSPENDING, SUSPENDED, RESTART_REQUIRED RUNNING)
>>>
>>> http://wildscribe.github.io/Wildfly/8.0.0.Final/index.html
>>>
>>> On 10/06/2014 04:40, Stuart Douglas wrote:
>>>>
>>>> Scott Marlow wrote:
>>>>> On 06/09/2014 06:38 PM, Stuart Douglas wrote:
>>>>>> Server suspend and resume is a feature that allows a running
>>>>>> server to
>>>>>> gracefully finish of all running requests. The most common use
>>>>>> case for
>>>>>> this is graceful shutdown, where you would like a server to
>>>>>> complete all
>>>>>> running requests, reject any new ones, and then shut down, however
>>>>>> there
>>>>>> are also plenty of other valid use cases (e.g. suspend the server,
>>>>>> modify a data source or some other config, then resume).
>>>>>>
>>>>>> User View:
>>>>>>
>>>>>> For the users point of view two new operations will be added to
>>>>>> the server:
>>>>>>
>>>>>> suspend(timeout)
>>>>>> resume()
>>>>>>
>>>>>> A runtime only attribute suspend-state (is this a good name?) will
>>>>>> also
>>>>>> be added, that can take one of three possible values, RUNNING,
>>>>>> SUSPENDING, SUSPENDED.
>>>>> The SuspendController "state" might be a shorter attribute name and
>>>>> just
>>>>> as meaningful.
>>>> This will be in the global server namespace (i.e. from the CLI
>>>> :read-attribute(name="suspend-state").
>>>>
>>>> I think the name 'state' is just two generic, which kind of state
>>>> are we
>>>> talking about?
>>>>
>>>>> When are we in the RUNNING state? Is that simply the pre-state for
>>>>> SUSPENDING?
>>>> 99.99% of the time. Basically servers are always running unless they
>>>> are
>>>> have been explicitly suspended, and then they go from suspending to
>>>> suspended. Note that if resume is called at any time the server goes to
>>>> RUNNING again immediately, as when subsystems are notified they should
>>>> be able to begin accepting requests again straight away.
>>>>
>>>> We also have admin only mode, which is a kinda similar concept, so we
>>>> need to make sure we document the differences.
>>>>
>>>>>> A timeout attribute will also be added to the shutdown operation. If
>>>>>> this is present then the server will first be suspended, and the
>>>>>> server
>>>>>> will not shut down until either the suspend is successful or the
>>>>>> timeout
>>>>>> occurs. If no timeout parameter is passed to the operation then a
>>>>>> normal
>>>>>> non-graceful shutdown will take place.
>>>>> Will non-graceful shutdown wait for non-daemon threads or terminate
>>>>> immediately (call System.exit()).
>>>> It will execute the same way it does today (all services will shut down
>>>> and then the server will exit).
>>>>
>>>> Stuart
>>>>
>>>>>> In domain mode these operations will be added to both individual
>>>>>> server
>>>>>> and a complete server group.
>>>>>>
>>>>>> Implementation Details
>>>>>>
>>>>>> Suspend/resume operates on entry points to the server. Any request
>>>>>> that
>>>>>> is currently running must not be affected by the suspend state,
>>>>>> however
>>>>>> any new request should be rejected. In general subsystems will
>>>>>> track the
>>>>>> number of outstanding requests, and when this hits zero they are
>>>>>> considered suspended.
>>>>>>
>>>>>> We will introduce the notion of a global SuspendController, that
>>>>>> manages
>>>>>> the servers suspend state. All subsystems that wish to do a graceful
>>>>>> shutdown register callback handlers with this controller.
>>>>>>
>>>>>> When the suspend() operation is invoked the controller will invoke
>>>>>> all
>>>>>> these callbacks, letting the subsystem know that the server is
>>>>>> suspend,
>>>>>> and providing the subsystem with a SuspendContext object that the
>>>>>> subsystem can then use to notify the controller that the suspend is
>>>>>> complete.
>>>>>>
>>>>>> What the subsystem does when it receives a suspend command, and
>>>>>> when it
>>>>>> considers itself suspended will vary, but in the common case it will
>>>>>> immediatly start rejecting external requests (e.g. Undertow will
>>>>>> start
>>>>>> responding with a 503 to all new requests). The subsystem will also
>>>>>> track the number of outstanding requests, and when this hits zero
>>>>>> then
>>>>>> the subsystem will notify the controller that is has successfully
>>>>>> suspended.
>>>>>> Some subsystems will obviously want to do other actions on
>>>>>> suspend, e.g.
>>>>>> clustering will likely want to fail over, mod_cluster will notify the
>>>>>> load balancer that the node is no longer available etc. In some
>>>>>> cases we
>>>>>> may want to make this configurable to an extent (e.g. Undertow
>>>>>> could be
>>>>>> configured to allow requests with an existing session, and not
>>>>>> consider
>>>>>> itself timed out until all sessions have either timed out or been
>>>>>> invalidated, although this will obviously take a while).
>>>>>>
>>>>>> If anyone has any feedback let me know. In terms of implementation my
>>>>>> basic plan is to get the core functionality and the Undertow
>>>>>> implementation into Wildfly, and then work with subsystem authors to
>>>>>> implement subsystem specific functionality once the core is in place.
>>>>>>
>>>>>> Stuart
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> The
>>>>>>
>>>>>> A timeout attribute will also be added to the shutdown command,
>>>>>> _______________________________________________
>>>>>> wildfly-dev mailing list
>>>>>> wildfly-dev at lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>>>>
>>>>> _______________________________________________
>>>>> wildfly-dev mailing list
>>>>> wildfly-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>> _______________________________________________
>>>> wildfly-dev mailing list
>>>> wildfly-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>>
>>> _______________________________________________
>>> wildfly-dev mailing list
>>> wildfly-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev


More information about the wildfly-dev mailing list