[wildfly-dev] Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Andrig Miller anmiller at redhat.com
Tue Jun 10 08:50:52 EDT 2014



----- Original Message -----
> From: "Michael Musgrove" <mmusgrov at redhat.com>
> To: "Jason T. Greene" <jgreene at redhat.com>, "Stuart Douglas" <stuart.w.douglas at gmail.com>
> Cc: "Wildfly Dev mailing list" <wildfly-dev at lists.jboss.org>, "Stuart Douglas" <sdouglas at redhat.com>
> Sent: Tuesday, June 10, 2014 3:51:02 AM
> Subject: Re: [wildfly-dev] Design Proposal: Server suspend/resume (AKA	Graceful Shutdown)
> 
> I agree with Stuart, it should wait for the transaction to finish
> before
> shutting down.
> 
> And yes (with caveats), Jason, when the timeout is reached our
> transaction reaper will abort the transaction. However, if the
> transaction was started with a timeout value of 0 it will never
> abort. Also, if the suspend happens when there are prepared
> transactions then it's too late to cancel the transaction and they
> will be recovered when the system is resumed.
> 

If the transaction will never abort, can we force a rollback?  This could lead to a never ending "graceful" shutdown.

Andy

> Note also that suspending before an in-flight transaction has
> prepared is probably safe since the resource will either:
> 
> - rollback the branch if all connections to the db are closed (when
> the system suspends); or
> - rollback the branch if the XAResource timeout (set via the
> XAResource.setTransactionTimeout()) value is reached
> 
> [And since it was never prepared we have no log record for it so we
> would not do anything on resume]
> 
> Mike
> 
> > IIRC the behavior for a tx timeout is a rollback, but we should
> > check that.
> >
> >> On Jun 9, 2014, at 6:50 PM, Stuart Douglas
> >> <stuart.w.douglas at gmail.com> wrote:
> >>
> >> Something I forgot to mention is that we will need a switch to
> >> turn this off, as there is a small but noticeable cost with
> >> tracking in flight requests.
> >>
> >>
> >>> On 9 Jun 2014, at 18:04, Andrig Miller <anmiller at redhat.com>
> >>> wrote:
> >>>
> >>> What I am bringing up is more subsystem specific, but it might be
> >>> valuable to think about.  In case of the time out of the
> >>> graceful shutdown, what behavior would we consider correct in
> >>> terms of an inflight transaction?
> >> It waits for the transaction to finish before shutting down.
> >>
> >> Stuart
> >>
> >>> Should it be a forced rollback, so that when the server is
> >>> started back up, the transaction manager will not find in the
> >>> log a transaction to be recovered?
> >>>
> >>> Or, should it be considered the same as a crashed state, where
> >>> transactions should be recoverable, and the recover manager
> >>> wouuld try to recover the transaction?
> >>>
> >>> I would lean towards the first, as this would be considered
> >>> graceful by the administrator, and having a transaction be in a
> >>> state where it would be recovered on a restart, doesn't seem
> >>> graceful to me.
> >>>
> >>> Andy
> >>>
> >>> ----- Original Message -----
> >>>> From: "Stuart Douglas" <sdouglas at redhat.com>
> >>>> To: "Wildfly Dev mailing list" <wildfly-dev at lists.jboss.org>
> >>>> Sent: Monday, June 9, 2014 4:38:55 PM
> >>>> Subject: [wildfly-dev] Design Proposal: Server suspend/resume
> >>>> (AKA Graceful    Shutdown)
> >>>>
> >>>> Server suspend and resume is a feature that allows a running
> >>>> server
> >>>> to
> >>>> gracefully finish of all running requests. The most common use
> >>>> case
> >>>> for
> >>>> this is graceful shutdown, where you would like a server to
> >>>> complete
> >>>> all
> >>>> running requests, reject any new ones, and then shut down,
> >>>> however
> >>>> there
> >>>> are also plenty of other valid use cases (e.g. suspend the
> >>>> server,
> >>>> modify a data source or some other config, then resume).
> >>>>
> >>>> User View:
> >>>>
> >>>> For the users point of view two new operations will be added to
> >>>> the
> >>>> server:
> >>>>
> >>>> suspend(timeout)
> >>>> resume()
> >>>>
> >>>> A runtime only attribute suspend-state (is this a good name?)
> >>>> will
> >>>> also
> >>>> be added, that can take one of three possible values, RUNNING,
> >>>> SUSPENDING, SUSPENDED.
> >>>>
> >>>> A timeout attribute will also be added to the shutdown
> >>>> operation. If
> >>>> this is present then the server will first be suspended, and the
> >>>> server
> >>>> will not shut down until either the suspend is successful or the
> >>>> timeout
> >>>> occurs. If no timeout parameter is passed to the operation then
> >>>> a
> >>>> normal
> >>>> non-graceful shutdown will take place.
> >>>>
> >>>> In domain mode these operations will be added to both individual
> >>>> server
> >>>> and a complete server group.
> >>>>
> >>>> Implementation Details
> >>>>
> >>>> Suspend/resume operates on entry points to the server. Any
> >>>> request
> >>>> that
> >>>> is currently running must not be affected by the suspend state,
> >>>> however
> >>>> any new request should be rejected. In general subsystems will
> >>>> track
> >>>> the
> >>>> number of outstanding requests, and when this hits zero they are
> >>>> considered suspended.
> >>>>
> >>>> We will introduce the notion of a global SuspendController, that
> >>>> manages
> >>>> the servers suspend state. All subsystems that wish to do a
> >>>> graceful
> >>>> shutdown register callback handlers with this controller.
> >>>>
> >>>> When the suspend() operation is invoked the controller will
> >>>> invoke
> >>>> all
> >>>> these callbacks, letting the subsystem know that the server is
> >>>> suspend,
> >>>> and providing the subsystem with a SuspendContext object that
> >>>> the
> >>>> subsystem can then use to notify the controller that the suspend
> >>>> is
> >>>> complete.
> >>>>
> >>>> What the subsystem does when it receives a suspend command, and
> >>>> when
> >>>> it
> >>>> considers itself suspended will vary, but in the common case it
> >>>> will
> >>>> immediatly start rejecting external requests (e.g. Undertow will
> >>>> start
> >>>> responding with a 503 to all new requests). The subsystem will
> >>>> also
> >>>> track the number of outstanding requests, and when this hits
> >>>> zero
> >>>> then
> >>>> the subsystem will notify the controller that is has
> >>>> successfully
> >>>> suspended.
> >>>> Some subsystems will obviously want to do other actions on
> >>>> suspend,
> >>>> e.g.
> >>>> clustering will likely want to fail over, mod_cluster will
> >>>> notify the
> >>>> load balancer that the node is no longer available etc. In some
> >>>> cases
> >>>> we
> >>>> may want to make this configurable to an extent (e.g. Undertow
> >>>> could
> >>>> be
> >>>> configured to allow requests with an existing session, and not
> >>>> consider
> >>>> itself timed out until all sessions have either timed out or
> >>>> been
> >>>> invalidated, although this will obviously take a while).
> >>>>
> >>>> If anyone has any feedback let me know. In terms of
> >>>> implementation my
> >>>> basic plan is to get the core functionality and the Undertow
> >>>> implementation into Wildfly, and then work with subsystem
> >>>> authors to
> >>>> implement subsystem specific functionality once the core is in
> >>>> place.
> >>>>
> >>>> Stuart
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> The
> >>>>
> >>>> A timeout attribute will also be added to the shutdown command,
> >>>> _______________________________________________
> >>>> wildfly-dev mailing list
> >>>> wildfly-dev at lists.jboss.org
> >>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
> >>> _______________________________________________
> >>> wildfly-dev mailing list
> >>> wildfly-dev at lists.jboss.org
> >>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
> >> _______________________________________________
> >> wildfly-dev mailing list
> >> wildfly-dev at lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/wildfly-dev
> > _______________________________________________
> > wildfly-dev mailing list
> > wildfly-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/wildfly-dev
> 
> 
> --
> Michael Musgrove
> Transactions Team
> e: mmusgrov at redhat.com
> t: +44 191 243 0870
> 
> Registered in England and Wales under Company Registration No.
> 03798903
> Directors: Michael Cunningham (US), Charles Peters (US), Matt Parson
> (US), Michael O'Neill(Ireland)
> 
> _______________________________________________
> wildfly-dev mailing list
> wildfly-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/wildfly-dev
> 


More information about the wildfly-dev mailing list