[wildfly-dev] Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Dimitris Andreadis dandread at redhat.com
Tue Jun 10 12:21:01 EDT 2014


Sure. Which justifies trying to avoid those issues in the first place ;)

On 10/06/2014 17:50, Stuart Douglas wrote:
> We can't really change that now, as it is part of our existing API.
>
> Stuart
>
> Dimitris Andreadis wrote:
>> It seems to me RESTART_REQUIRED (or RELOAD_REQUIRED) should be a boolean
>> on its own to simplify the state diagram.
>>
>> On 10/06/2014 17:40, Stuart Douglas wrote:
>>> I don't think so, I think RESTART_REQUIRED means running, but I need
>>> to restart to apply
>>> management changes (I think that attribute can also be
>>> RELOAD_REQUIRED, I think the
>>> description may be a bit out of date).
>>>
>>> To accurately reflect all the possible states you would need something
>>> like:
>>>
>>> RUNNING
>>> PAUSING,
>>> PAUSED,
>>> RESTART_REQUIRED
>>> PAUSING_RESTART_REQUIRED
>>> PAUSED_RESTART_REQUIRED
>>> RELOAD_REQUIRED
>>> PAUSING_RELOAD_REQUIRED
>>> PAUSED_RELOAD_REQUIRED
>>>
>>> Which does not seem great, and may introduce compatibility problems
>>> for clients that are not
>>> expecting these new values.
>>>
>>> Stuart
>>>
>>>
>>>
>>> Dimitris Andreadis wrote:
>>>> Isn't RESTART_REQUIRED also orthogonal to RUNNING?
>>>>
>>>> On 10/06/2014 17:17, Stuart Douglas wrote:
>>>>> They are actually orthogonal, a server can be in both RESTART_REQUIRED
>>>>> and any one of the
>>>>> suspend states.
>>>>>
>>>>> RESTART_REQUIRED is very much tied to services and the management
>>>>> model, while
>>>>> suspend/resume is a runtime only thing that should not touch the state
>>>>> of services.
>>>>>
>>>>>
>>>>> Stuart
>>>>>
>>>>> Dimitris Andreadis wrote:
>>>>>> Why not extend the states of the existing 'server-state' attribute to:
>>>>>>
>>>>>> (STARTING, RUNNING, SUSPENDING, SUSPENDED, RESTART_REQUIRED RUNNING)
>>>>>>
>>>>>> http://wildscribe.github.io/Wildfly/8.0.0.Final/index.html
>>>>>>
>>>>>> On 10/06/2014 04:40, Stuart Douglas wrote:
>>>>>>>
>>>>>>> Scott Marlow wrote:
>>>>>>>> On 06/09/2014 06:38 PM, Stuart Douglas wrote:
>>>>>>>>> Server suspend and resume is a feature that allows a running
>>>>>>>>> server to
>>>>>>>>> gracefully finish of all running requests. The most common use
>>>>>>>>> case for
>>>>>>>>> this is graceful shutdown, where you would like a server to
>>>>>>>>> complete all
>>>>>>>>> running requests, reject any new ones, and then shut down, however
>>>>>>>>> there
>>>>>>>>> are also plenty of other valid use cases (e.g. suspend the server,
>>>>>>>>> modify a data source or some other config, then resume).
>>>>>>>>>
>>>>>>>>> User View:
>>>>>>>>>
>>>>>>>>> For the users point of view two new operations will be added to
>>>>>>>>> the server:
>>>>>>>>>
>>>>>>>>> suspend(timeout)
>>>>>>>>> resume()
>>>>>>>>>
>>>>>>>>> A runtime only attribute suspend-state (is this a good name?) will
>>>>>>>>> also
>>>>>>>>> be added, that can take one of three possible values, RUNNING,
>>>>>>>>> SUSPENDING, SUSPENDED.
>>>>>>>> The SuspendController "state" might be a shorter attribute name and
>>>>>>>> just
>>>>>>>> as meaningful.
>>>>>>> This will be in the global server namespace (i.e. from the CLI
>>>>>>> :read-attribute(name="suspend-state").
>>>>>>>
>>>>>>> I think the name 'state' is just two generic, which kind of state
>>>>>>> are we
>>>>>>> talking about?
>>>>>>>
>>>>>>>> When are we in the RUNNING state? Is that simply the pre-state for
>>>>>>>> SUSPENDING?
>>>>>>> 99.99% of the time. Basically servers are always running unless they
>>>>>>> are
>>>>>>> have been explicitly suspended, and then they go from suspending to
>>>>>>> suspended. Note that if resume is called at any time the server
>>>>>>> goes to
>>>>>>> RUNNING again immediately, as when subsystems are notified they
>>>>>>> should
>>>>>>> be able to begin accepting requests again straight away.
>>>>>>>
>>>>>>> We also have admin only mode, which is a kinda similar concept, so we
>>>>>>> need to make sure we document the differences.
>>>>>>>
>>>>>>>>> A timeout attribute will also be added to the shutdown
>>>>>>>>> operation. If
>>>>>>>>> this is present then the server will first be suspended, and the
>>>>>>>>> server
>>>>>>>>> will not shut down until either the suspend is successful or the
>>>>>>>>> timeout
>>>>>>>>> occurs. If no timeout parameter is passed to the operation then a
>>>>>>>>> normal
>>>>>>>>> non-graceful shutdown will take place.
>>>>>>>> Will non-graceful shutdown wait for non-daemon threads or terminate
>>>>>>>> immediately (call System.exit()).
>>>>>>> It will execute the same way it does today (all services will shut
>>>>>>> down
>>>>>>> and then the server will exit).
>>>>>>>
>>>>>>> Stuart
>>>>>>>
>>>>>>>>> In domain mode these operations will be added to both individual
>>>>>>>>> server
>>>>>>>>> and a complete server group.
>>>>>>>>>
>>>>>>>>> Implementation Details
>>>>>>>>>
>>>>>>>>> Suspend/resume operates on entry points to the server. Any request
>>>>>>>>> that
>>>>>>>>> is currently running must not be affected by the suspend state,
>>>>>>>>> however
>>>>>>>>> any new request should be rejected. In general subsystems will
>>>>>>>>> track the
>>>>>>>>> number of outstanding requests, and when this hits zero they are
>>>>>>>>> considered suspended.
>>>>>>>>>
>>>>>>>>> We will introduce the notion of a global SuspendController, that
>>>>>>>>> manages
>>>>>>>>> the servers suspend state. All subsystems that wish to do a
>>>>>>>>> graceful
>>>>>>>>> shutdown register callback handlers with this controller.
>>>>>>>>>
>>>>>>>>> When the suspend() operation is invoked the controller will invoke
>>>>>>>>> all
>>>>>>>>> these callbacks, letting the subsystem know that the server is
>>>>>>>>> suspend,
>>>>>>>>> and providing the subsystem with a SuspendContext object that the
>>>>>>>>> subsystem can then use to notify the controller that the suspend is
>>>>>>>>> complete.
>>>>>>>>>
>>>>>>>>> What the subsystem does when it receives a suspend command, and
>>>>>>>>> when it
>>>>>>>>> considers itself suspended will vary, but in the common case it
>>>>>>>>> will
>>>>>>>>> immediatly start rejecting external requests (e.g. Undertow will
>>>>>>>>> start
>>>>>>>>> responding with a 503 to all new requests). The subsystem will also
>>>>>>>>> track the number of outstanding requests, and when this hits zero
>>>>>>>>> then
>>>>>>>>> the subsystem will notify the controller that is has successfully
>>>>>>>>> suspended.
>>>>>>>>> Some subsystems will obviously want to do other actions on
>>>>>>>>> suspend, e.g.
>>>>>>>>> clustering will likely want to fail over, mod_cluster will
>>>>>>>>> notify the
>>>>>>>>> load balancer that the node is no longer available etc. In some
>>>>>>>>> cases we
>>>>>>>>> may want to make this configurable to an extent (e.g. Undertow
>>>>>>>>> could be
>>>>>>>>> configured to allow requests with an existing session, and not
>>>>>>>>> consider
>>>>>>>>> itself timed out until all sessions have either timed out or been
>>>>>>>>> invalidated, although this will obviously take a while).
>>>>>>>>>
>>>>>>>>> If anyone has any feedback let me know. In terms of
>>>>>>>>> implementation my
>>>>>>>>> basic plan is to get the core functionality and the Undertow
>>>>>>>>> implementation into Wildfly, and then work with subsystem
>>>>>>>>> authors to
>>>>>>>>> implement subsystem specific functionality once the core is in
>>>>>>>>> place.
>>>>>>>>>
>>>>>>>>> Stuart
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The
>>>>>>>>>
>>>>>>>>> A timeout attribute will also be added to the shutdown command,
>>>>>>>>> _______________________________________________
>>>>>>>>> wildfly-dev mailing list
>>>>>>>>> wildfly-dev at lists.jboss.org
>>>>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> wildfly-dev mailing list
>>>>>>>> wildfly-dev at lists.jboss.org
>>>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>>>>> _______________________________________________
>>>>>>> wildfly-dev mailing list
>>>>>>> wildfly-dev at lists.jboss.org
>>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>>>>>
>>>>>> _______________________________________________
>>>>>> wildfly-dev mailing list
>>>>>> wildfly-dev at lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev


More information about the wildfly-dev mailing list