What I am bringing up is more subsystem specific, but it might be valuable to think about.
In case of the time out of the graceful shutdown, what behavior would we consider correct
in terms of an inflight transaction?
Should it be a forced rollback, so that when the server is started back up, the
transaction manager will not find in the log a transaction to be recovered?
Or, should it be considered the same as a crashed state, where transactions should be
recoverable, and the recover manager wouuld try to recover the transaction?
I would lean towards the first, as this would be considered graceful by the administrator,
and having a transaction be in a state where it would be recovered on a restart,
doesn't seem graceful to me.
Andy
----- Original Message -----
From: "Stuart Douglas" <sdouglas(a)redhat.com>
To: "Wildfly Dev mailing list" <wildfly-dev(a)lists.jboss.org>
Sent: Monday, June 9, 2014 4:38:55 PM
Subject: [wildfly-dev] Design Proposal: Server suspend/resume (AKA Graceful Shutdown)
Server suspend and resume is a feature that allows a running server
to
gracefully finish of all running requests. The most common use case
for
this is graceful shutdown, where you would like a server to complete
all
running requests, reject any new ones, and then shut down, however
there
are also plenty of other valid use cases (e.g. suspend the server,
modify a data source or some other config, then resume).
User View:
For the users point of view two new operations will be added to the
server:
suspend(timeout)
resume()
A runtime only attribute suspend-state (is this a good name?) will
also
be added, that can take one of three possible values, RUNNING,
SUSPENDING, SUSPENDED.
A timeout attribute will also be added to the shutdown operation. If
this is present then the server will first be suspended, and the
server
will not shut down until either the suspend is successful or the
timeout
occurs. If no timeout parameter is passed to the operation then a
normal
non-graceful shutdown will take place.
In domain mode these operations will be added to both individual
server
and a complete server group.
Implementation Details
Suspend/resume operates on entry points to the server. Any request
that
is currently running must not be affected by the suspend state,
however
any new request should be rejected. In general subsystems will track
the
number of outstanding requests, and when this hits zero they are
considered suspended.
We will introduce the notion of a global SuspendController, that
manages
the servers suspend state. All subsystems that wish to do a graceful
shutdown register callback handlers with this controller.
When the suspend() operation is invoked the controller will invoke
all
these callbacks, letting the subsystem know that the server is
suspend,
and providing the subsystem with a SuspendContext object that the
subsystem can then use to notify the controller that the suspend is
complete.
What the subsystem does when it receives a suspend command, and when
it
considers itself suspended will vary, but in the common case it will
immediatly start rejecting external requests (e.g. Undertow will
start
responding with a 503 to all new requests). The subsystem will also
track the number of outstanding requests, and when this hits zero
then
the subsystem will notify the controller that is has successfully
suspended.
Some subsystems will obviously want to do other actions on suspend,
e.g.
clustering will likely want to fail over, mod_cluster will notify the
load balancer that the node is no longer available etc. In some cases
we
may want to make this configurable to an extent (e.g. Undertow could
be
configured to allow requests with an existing session, and not
consider
itself timed out until all sessions have either timed out or been
invalidated, although this will obviously take a while).
If anyone has any feedback let me know. In terms of implementation my
basic plan is to get the core functionality and the Undertow
implementation into Wildfly, and then work with subsystem authors to
implement subsystem specific functionality once the core is in place.
Stuart
The
A timeout attribute will also be added to the shutdown command,
_______________________________________________
wildfly-dev mailing list
wildfly-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/wildfly-dev