[wildfly-dev] Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Tue Jun 10 12:57:44 EDT 2014

You could potentially eliminate the PAUSING_*_REQUIRED states, since you typically wouldn't want to recommend restart in them

> On Jun 10, 2014, at 10:41 AM, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
> 
> I don't think so, I think RESTART_REQUIRED means running, but I need to 
> restart to apply management changes (I think that attribute can also be 
> RELOAD_REQUIRED, I think the description may be a bit out of date).
> 
> To accurately reflect all the possible states you would need something like:
> 
> RUNNING
> PAUSING,
> PAUSED,
> RESTART_REQUIRED
> PAUSING_RESTART_REQUIRED
> PAUSED_RESTART_REQUIRED
> RELOAD_REQUIRED
> PAUSING_RELOAD_REQUIRED
> PAUSED_RELOAD_REQUIRED
> 
> Which does not seem great, and may introduce compatibility problems for 
> clients that are not expecting these new values.
> 
> Stuart
> 
> 
> 
> Dimitris Andreadis wrote:
>> Isn't RESTART_REQUIRED also orthogonal to RUNNING?
>> 
>>> On 10/06/2014 17:17, Stuart Douglas wrote:
>>> They are actually orthogonal, a server can be in both RESTART_REQUIRED
>>> and any one of the
>>> suspend states.
>>> 
>>> RESTART_REQUIRED is very much tied to services and the management
>>> model, while
>>> suspend/resume is a runtime only thing that should not touch the state
>>> of services.
>>> 
>>> 
>>> Stuart
>>> 
>>> Dimitris Andreadis wrote:
>>>> Why not extend the states of the existing 'server-state' attribute to:
>>>> 
>>>> (STARTING, RUNNING, SUSPENDING, SUSPENDED, RESTART_REQUIRED RUNNING)
>>>> 
>>>> http://wildscribe.github.io/Wildfly/8.0.0.Final/index.html
>>>> 
>>>>> On 10/06/2014 04:40, Stuart Douglas wrote:
>>>>> 
>>>>> Scott Marlow wrote:
>>>>>>> On 06/09/2014 06:38 PM, Stuart Douglas wrote:
>>>>>>> Server suspend and resume is a feature that allows a running
>>>>>>> server to
>>>>>>> gracefully finish of all running requests. The most common use
>>>>>>> case for
>>>>>>> this is graceful shutdown, where you would like a server to
>>>>>>> complete all
>>>>>>> running requests, reject any new ones, and then shut down, however
>>>>>>> there
>>>>>>> are also plenty of other valid use cases (e.g. suspend the server,
>>>>>>> modify a data source or some other config, then resume).
>>>>>>> 
>>>>>>> User View:
>>>>>>> 
>>>>>>> For the users point of view two new operations will be added to
>>>>>>> the server:
>>>>>>> 
>>>>>>> suspend(timeout)
>>>>>>> resume()
>>>>>>> 
>>>>>>> A runtime only attribute suspend-state (is this a good name?) will
>>>>>>> also
>>>>>>> be added, that can take one of three possible values, RUNNING,
>>>>>>> SUSPENDING, SUSPENDED.
>>>>>> The SuspendController "state" might be a shorter attribute name and
>>>>>> just
>>>>>> as meaningful.
>>>>> This will be in the global server namespace (i.e. from the CLI
>>>>> :read-attribute(name="suspend-state").
>>>>> 
>>>>> I think the name 'state' is just two generic, which kind of state
>>>>> are we
>>>>> talking about?
>>>>> 
>>>>>> When are we in the RUNNING state? Is that simply the pre-state for
>>>>>> SUSPENDING?
>>>>> 99.99% of the time. Basically servers are always running unless they
>>>>> are
>>>>> have been explicitly suspended, and then they go from suspending to
>>>>> suspended. Note that if resume is called at any time the server goes to
>>>>> RUNNING again immediately, as when subsystems are notified they should
>>>>> be able to begin accepting requests again straight away.
>>>>> 
>>>>> We also have admin only mode, which is a kinda similar concept, so we
>>>>> need to make sure we document the differences.
>>>>> 
>>>>>>> A timeout attribute will also be added to the shutdown operation. If
>>>>>>> this is present then the server will first be suspended, and the
>>>>>>> server
>>>>>>> will not shut down until either the suspend is successful or the
>>>>>>> timeout
>>>>>>> occurs. If no timeout parameter is passed to the operation then a
>>>>>>> normal
>>>>>>> non-graceful shutdown will take place.
>>>>>> Will non-graceful shutdown wait for non-daemon threads or terminate
>>>>>> immediately (call System.exit()).
>>>>> It will execute the same way it does today (all services will shut down
>>>>> and then the server will exit).
>>>>> 
>>>>> Stuart
>>>>> 
>>>>>>> In domain mode these operations will be added to both individual
>>>>>>> server
>>>>>>> and a complete server group.
>>>>>>> 
>>>>>>> Implementation Details
>>>>>>> 
>>>>>>> Suspend/resume operates on entry points to the server. Any request
>>>>>>> that
>>>>>>> is currently running must not be affected by the suspend state,
>>>>>>> however
>>>>>>> any new request should be rejected. In general subsystems will
>>>>>>> track the
>>>>>>> number of outstanding requests, and when this hits zero they are
>>>>>>> considered suspended.
>>>>>>> 
>>>>>>> We will introduce the notion of a global SuspendController, that
>>>>>>> manages
>>>>>>> the servers suspend state. All subsystems that wish to do a graceful
>>>>>>> shutdown register callback handlers with this controller.
>>>>>>> 
>>>>>>> When the suspend() operation is invoked the controller will invoke
>>>>>>> all
>>>>>>> these callbacks, letting the subsystem know that the server is
>>>>>>> suspend,
>>>>>>> and providing the subsystem with a SuspendContext object that the
>>>>>>> subsystem can then use to notify the controller that the suspend is
>>>>>>> complete.
>>>>>>> 
>>>>>>> What the subsystem does when it receives a suspend command, and
>>>>>>> when it
>>>>>>> considers itself suspended will vary, but in the common case it will
>>>>>>> immediatly start rejecting external requests (e.g. Undertow will
>>>>>>> start
>>>>>>> responding with a 503 to all new requests). The subsystem will also
>>>>>>> track the number of outstanding requests, and when this hits zero
>>>>>>> then
>>>>>>> the subsystem will notify the controller that is has successfully
>>>>>>> suspended.
>>>>>>> Some subsystems will obviously want to do other actions on
>>>>>>> suspend, e.g.
>>>>>>> clustering will likely want to fail over, mod_cluster will notify the
>>>>>>> load balancer that the node is no longer available etc. In some
>>>>>>> cases we
>>>>>>> may want to make this configurable to an extent (e.g. Undertow
>>>>>>> could be
>>>>>>> configured to allow requests with an existing session, and not
>>>>>>> consider
>>>>>>> itself timed out until all sessions have either timed out or been
>>>>>>> invalidated, although this will obviously take a while).
>>>>>>> 
>>>>>>> If anyone has any feedback let me know. In terms of implementation my
>>>>>>> basic plan is to get the core functionality and the Undertow
>>>>>>> implementation into Wildfly, and then work with subsystem authors to
>>>>>>> implement subsystem specific functionality once the core is in place.
>>>>>>> 
>>>>>>> Stuart
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> The
>>>>>>> 
>>>>>>> A timeout attribute will also be added to the shutdown command,
>>>>>>> _______________________________________________
>>>>>>> wildfly-dev mailing list
>>>>>>> wildfly-dev at lists.jboss.org
>>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>>>> _______________________________________________
>>>>>> wildfly-dev mailing list
>>>>>> wildfly-dev at lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>>> _______________________________________________
>>>>> wildfly-dev mailing list
>>>>> wildfly-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>> _______________________________________________
>>>> wildfly-dev mailing list
>>>> wildfly-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
> _______________________________________________
> wildfly-dev mailing list
> wildfly-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/wildfly-dev