[wildfly-dev] Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Dimitris Andreadis dandread at redhat.com
Tue Jun 10 11:47:43 EDT 2014


It seems to me RESTART_REQUIRED (or RELOAD_REQUIRED) should be a boolean on its own to 
simplify the state diagram.

On 10/06/2014 17:40, Stuart Douglas wrote:
> I don't think so, I think RESTART_REQUIRED means running, but I need to restart to apply
> management changes (I think that attribute can also be RELOAD_REQUIRED, I think the
> description may be a bit out of date).
>
> To accurately reflect all the possible states you would need something like:
>
> RUNNING
> PAUSING,
> PAUSED,
> RESTART_REQUIRED
> PAUSING_RESTART_REQUIRED
> PAUSED_RESTART_REQUIRED
> RELOAD_REQUIRED
> PAUSING_RELOAD_REQUIRED
> PAUSED_RELOAD_REQUIRED
>
> Which does not seem great, and may introduce compatibility problems for clients that are not
> expecting these new values.
>
> Stuart
>
>
>
> Dimitris Andreadis wrote:
>> Isn't RESTART_REQUIRED also orthogonal to RUNNING?
>>
>> On 10/06/2014 17:17, Stuart Douglas wrote:
>>> They are actually orthogonal, a server can be in both RESTART_REQUIRED
>>> and any one of the
>>> suspend states.
>>>
>>> RESTART_REQUIRED is very much tied to services and the management
>>> model, while
>>> suspend/resume is a runtime only thing that should not touch the state
>>> of services.
>>>
>>>
>>> Stuart
>>>
>>> Dimitris Andreadis wrote:
>>>> Why not extend the states of the existing 'server-state' attribute to:
>>>>
>>>> (STARTING, RUNNING, SUSPENDING, SUSPENDED, RESTART_REQUIRED RUNNING)
>>>>
>>>> http://wildscribe.github.io/Wildfly/8.0.0.Final/index.html
>>>>
>>>> On 10/06/2014 04:40, Stuart Douglas wrote:
>>>>>
>>>>> Scott Marlow wrote:
>>>>>> On 06/09/2014 06:38 PM, Stuart Douglas wrote:
>>>>>>> Server suspend and resume is a feature that allows a running
>>>>>>> server to
>>>>>>> gracefully finish of all running requests. The most common use
>>>>>>> case for
>>>>>>> this is graceful shutdown, where you would like a server to
>>>>>>> complete all
>>>>>>> running requests, reject any new ones, and then shut down, however
>>>>>>> there
>>>>>>> are also plenty of other valid use cases (e.g. suspend the server,
>>>>>>> modify a data source or some other config, then resume).
>>>>>>>
>>>>>>> User View:
>>>>>>>
>>>>>>> For the users point of view two new operations will be added to
>>>>>>> the server:
>>>>>>>
>>>>>>> suspend(timeout)
>>>>>>> resume()
>>>>>>>
>>>>>>> A runtime only attribute suspend-state (is this a good name?) will
>>>>>>> also
>>>>>>> be added, that can take one of three possible values, RUNNING,
>>>>>>> SUSPENDING, SUSPENDED.
>>>>>> The SuspendController "state" might be a shorter attribute name and
>>>>>> just
>>>>>> as meaningful.
>>>>> This will be in the global server namespace (i.e. from the CLI
>>>>> :read-attribute(name="suspend-state").
>>>>>
>>>>> I think the name 'state' is just two generic, which kind of state
>>>>> are we
>>>>> talking about?
>>>>>
>>>>>> When are we in the RUNNING state? Is that simply the pre-state for
>>>>>> SUSPENDING?
>>>>> 99.99% of the time. Basically servers are always running unless they
>>>>> are
>>>>> have been explicitly suspended, and then they go from suspending to
>>>>> suspended. Note that if resume is called at any time the server goes to
>>>>> RUNNING again immediately, as when subsystems are notified they should
>>>>> be able to begin accepting requests again straight away.
>>>>>
>>>>> We also have admin only mode, which is a kinda similar concept, so we
>>>>> need to make sure we document the differences.
>>>>>
>>>>>>> A timeout attribute will also be added to the shutdown operation. If
>>>>>>> this is present then the server will first be suspended, and the
>>>>>>> server
>>>>>>> will not shut down until either the suspend is successful or the
>>>>>>> timeout
>>>>>>> occurs. If no timeout parameter is passed to the operation then a
>>>>>>> normal
>>>>>>> non-graceful shutdown will take place.
>>>>>> Will non-graceful shutdown wait for non-daemon threads or terminate
>>>>>> immediately (call System.exit()).
>>>>> It will execute the same way it does today (all services will shut down
>>>>> and then the server will exit).
>>>>>
>>>>> Stuart
>>>>>
>>>>>>> In domain mode these operations will be added to both individual
>>>>>>> server
>>>>>>> and a complete server group.
>>>>>>>
>>>>>>> Implementation Details
>>>>>>>
>>>>>>> Suspend/resume operates on entry points to the server. Any request
>>>>>>> that
>>>>>>> is currently running must not be affected by the suspend state,
>>>>>>> however
>>>>>>> any new request should be rejected. In general subsystems will
>>>>>>> track the
>>>>>>> number of outstanding requests, and when this hits zero they are
>>>>>>> considered suspended.
>>>>>>>
>>>>>>> We will introduce the notion of a global SuspendController, that
>>>>>>> manages
>>>>>>> the servers suspend state. All subsystems that wish to do a graceful
>>>>>>> shutdown register callback handlers with this controller.
>>>>>>>
>>>>>>> When the suspend() operation is invoked the controller will invoke
>>>>>>> all
>>>>>>> these callbacks, letting the subsystem know that the server is
>>>>>>> suspend,
>>>>>>> and providing the subsystem with a SuspendContext object that the
>>>>>>> subsystem can then use to notify the controller that the suspend is
>>>>>>> complete.
>>>>>>>
>>>>>>> What the subsystem does when it receives a suspend command, and
>>>>>>> when it
>>>>>>> considers itself suspended will vary, but in the common case it will
>>>>>>> immediatly start rejecting external requests (e.g. Undertow will
>>>>>>> start
>>>>>>> responding with a 503 to all new requests). The subsystem will also
>>>>>>> track the number of outstanding requests, and when this hits zero
>>>>>>> then
>>>>>>> the subsystem will notify the controller that is has successfully
>>>>>>> suspended.
>>>>>>> Some subsystems will obviously want to do other actions on
>>>>>>> suspend, e.g.
>>>>>>> clustering will likely want to fail over, mod_cluster will notify the
>>>>>>> load balancer that the node is no longer available etc. In some
>>>>>>> cases we
>>>>>>> may want to make this configurable to an extent (e.g. Undertow
>>>>>>> could be
>>>>>>> configured to allow requests with an existing session, and not
>>>>>>> consider
>>>>>>> itself timed out until all sessions have either timed out or been
>>>>>>> invalidated, although this will obviously take a while).
>>>>>>>
>>>>>>> If anyone has any feedback let me know. In terms of implementation my
>>>>>>> basic plan is to get the core functionality and the Undertow
>>>>>>> implementation into Wildfly, and then work with subsystem authors to
>>>>>>> implement subsystem specific functionality once the core is in place.
>>>>>>>
>>>>>>> Stuart
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The
>>>>>>>
>>>>>>> A timeout attribute will also be added to the shutdown command,
>>>>>>> _______________________________________________
>>>>>>> wildfly-dev mailing list
>>>>>>> wildfly-dev at lists.jboss.org
>>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>>>>>
>>>>>> _______________________________________________
>>>>>> wildfly-dev mailing list
>>>>>> wildfly-dev at lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>>> _______________________________________________
>>>>> wildfly-dev mailing list
>>>>> wildfly-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>>>
>>>> _______________________________________________
>>>> wildfly-dev mailing list
>>>> wildfly-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev


More information about the wildfly-dev mailing list