Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

WildFly Schedule & Goals Update

Proposal to read boot errors via...

Stuart Douglas

Monday, 9 June 2014 Mon, 9 Jun '14

5:38 p.m.

Server suspend and resume is a feature that allows a running server to gracefully finish of all running requests. The most common use case for this is graceful shutdown, where you would like a server to complete all running requests, reject any new ones, and then shut down, however there are also plenty of other valid use cases (e.g. suspend the server, modify a data source or some other config, then resume). User View: For the users point of view two new operations will be added to the server: suspend(timeout) resume() A runtime only attribute suspend-state (is this a good name?) will also be added, that can take one of three possible values, RUNNING, SUSPENDING, SUSPENDED. A timeout attribute will also be added to the shutdown operation. If this is present then the server will first be suspended, and the server will not shut down until either the suspend is successful or the timeout occurs. If no timeout parameter is passed to the operation then a normal non-graceful shutdown will take place. In domain mode these operations will be added to both individual server and a complete server group. Implementation Details Suspend/resume operates on entry points to the server. Any request that is currently running must not be affected by the suspend state, however any new request should be rejected. In general subsystems will track the number of outstanding requests, and when this hits zero they are considered suspended. We will introduce the notion of a global SuspendController, that manages the servers suspend state. All subsystems that wish to do a graceful shutdown register callback handlers with this controller. When the suspend() operation is invoked the controller will invoke all these callbacks, letting the subsystem know that the server is suspend, and providing the subsystem with a SuspendContext object that the subsystem can then use to notify the controller that the suspend is complete. What the subsystem does when it receives a suspend command, and when it considers itself suspended will vary, but in the common case it will immediatly start rejecting external requests (e.g. Undertow will start responding with a 503 to all new requests). The subsystem will also track the number of outstanding requests, and when this hits zero then the subsystem will notify the controller that is has successfully suspended. Some subsystems will obviously want to do other actions on suspend, e.g. clustering will likely want to fail over, mod_cluster will notify the load balancer that the node is no longer available etc. In some cases we may want to make this configurable to an extent (e.g. Undertow could be configured to allow requests with an existing session, and not consider itself timed out until all sessions have either timed out or been invalidated, although this will obviously take a while). If anyone has any feedback let me know. In terms of implementation my basic plan is to get the core functionality and the Undertow implementation into Wildfly, and then work with subsystem authors to implement subsystem specific functionality once the core is in place. Stuart The A timeout attribute will also be added to the shutdown command,

Show replies by date

Andrig Miller

Monday, 9 June Mon, 9 Jun

6:04 p.m.

New subject: Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

What I am bringing up is more subsystem specific, but it might be valuable to think about. In case of the time out of the graceful shutdown, what behavior would we consider correct in terms of an inflight transaction? Should it be a forced rollback, so that when the server is started back up, the transaction manager will not find in the log a transaction to be recovered? Or, should it be considered the same as a crashed state, where transactions should be recoverable, and the recover manager wouuld try to recover the transaction? I would lean towards the first, as this would be considered graceful by the administrator, and having a transaction be in a state where it would be recovered on a restart, doesn't seem graceful to me. Andy ----- Original Message -----

...

From: "Stuart Douglas" <sdouglas(a)redhat.com> To: "Wildfly Dev mailing list" <wildfly-dev(a)lists.jboss.org> Sent: Monday, June 9, 2014 4:38:55 PM Subject: [wildfly-dev] Design Proposal: Server suspend/resume (AKA Graceful Shutdown) Server suspend and resume is a feature that allows a running server to gracefully finish of all running requests. The most common use case for this is graceful shutdown, where you would like a server to complete all running requests, reject any new ones, and then shut down, however there are also plenty of other valid use cases (e.g. suspend the server, modify a data source or some other config, then resume). User View: For the users point of view two new operations will be added to the server: suspend(timeout) resume() A runtime only attribute suspend-state (is this a good name?) will also be added, that can take one of three possible values, RUNNING, SUSPENDING, SUSPENDED. A timeout attribute will also be added to the shutdown operation. If this is present then the server will first be suspended, and the server will not shut down until either the suspend is successful or the timeout occurs. If no timeout parameter is passed to the operation then a normal non-graceful shutdown will take place. In domain mode these operations will be added to both individual server and a complete server group. Implementation Details Suspend/resume operates on entry points to the server. Any request that is currently running must not be affected by the suspend state, however any new request should be rejected. In general subsystems will track the number of outstanding requests, and when this hits zero they are considered suspended. We will introduce the notion of a global SuspendController, that manages the servers suspend state. All subsystems that wish to do a graceful shutdown register callback handlers with this controller. When the suspend() operation is invoked the controller will invoke all these callbacks, letting the subsystem know that the server is suspend, and providing the subsystem with a SuspendContext object that the subsystem can then use to notify the controller that the suspend is complete. What the subsystem does when it receives a suspend command, and when it considers itself suspended will vary, but in the common case it will immediatly start rejecting external requests (e.g. Undertow will start responding with a 503 to all new requests). The subsystem will also track the number of outstanding requests, and when this hits zero then the subsystem will notify the controller that is has successfully suspended. Some subsystems will obviously want to do other actions on suspend, e.g. clustering will likely want to fail over, mod_cluster will notify the load balancer that the node is no longer available etc. In some cases we may want to make this configurable to an extent (e.g. Undertow could be configured to allow requests with an existing session, and not consider itself timed out until all sessions have either timed out or been invalidated, although this will obviously take a while). If anyone has any feedback let me know. In terms of implementation my basic plan is to get the core functionality and the Undertow implementation into Wildfly, and then work with subsystem authors to implement subsystem specific functionality once the core is in place. Stuart The A timeout attribute will also be added to the shutdown command, _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

Stuart Douglas

6:48 p.m.

New subject: Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Something I forgot to mention is that we will need a switch to turn this off, as there is a small but noticeable cost with tracking in flight requests.

...

On 9 Jun 2014, at 18:04, Andrig Miller <anmiller(a)redhat.com> wrote: What I am bringing up is more subsystem specific, but it might be valuable to think about. In case of the time out of the graceful shutdown, what behavior would we consider correct in terms of an inflight transaction?

It waits for the transaction to finish before shutting down. Stuart

...

Should it be a forced rollback, so that when the server is started back up, the transaction manager will not find in the log a transaction to be recovered? Or, should it be considered the same as a crashed state, where transactions should be recoverable, and the recover manager wouuld try to recover the transaction? I would lean towards the first, as this would be considered graceful by the administrator, and having a transaction be in a state where it would be recovered on a restart, doesn't seem graceful to me. Andy ----- Original Message ----- > From: "Stuart Douglas" <sdouglas(a)redhat.com> > To: "Wildfly Dev mailing list" <wildfly-dev(a)lists.jboss.org> > Sent: Monday, June 9, 2014 4:38:55 PM > Subject: [wildfly-dev] Design Proposal: Server suspend/resume (AKA Graceful Shutdown) > > Server suspend and resume is a feature that allows a running server > to > gracefully finish of all running requests. The most common use case > for > this is graceful shutdown, where you would like a server to complete > all > running requests, reject any new ones, and then shut down, however > there > are also plenty of other valid use cases (e.g. suspend the server, > modify a data source or some other config, then resume). > > User View: > > For the users point of view two new operations will be added to the > server: > > suspend(timeout) > resume() > > A runtime only attribute suspend-state (is this a good name?) will > also > be added, that can take one of three possible values, RUNNING, > SUSPENDING, SUSPENDED. > > A timeout attribute will also be added to the shutdown operation. If > this is present then the server will first be suspended, and the > server > will not shut down until either the suspend is successful or the > timeout > occurs. If no timeout parameter is passed to the operation then a > normal > non-graceful shutdown will take place. > > In domain mode these operations will be added to both individual > server > and a complete server group. > > Implementation Details > > Suspend/resume operates on entry points to the server. Any request > that > is currently running must not be affected by the suspend state, > however > any new request should be rejected. In general subsystems will track > the > number of outstanding requests, and when this hits zero they are > considered suspended. > > We will introduce the notion of a global SuspendController, that > manages > the servers suspend state. All subsystems that wish to do a graceful > shutdown register callback handlers with this controller. > > When the suspend() operation is invoked the controller will invoke > all > these callbacks, letting the subsystem know that the server is > suspend, > and providing the subsystem with a SuspendContext object that the > subsystem can then use to notify the controller that the suspend is > complete. > > What the subsystem does when it receives a suspend command, and when > it > considers itself suspended will vary, but in the common case it will > immediatly start rejecting external requests (e.g. Undertow will > start > responding with a 503 to all new requests). The subsystem will also > track the number of outstanding requests, and when this hits zero > then > the subsystem will notify the controller that is has successfully > suspended. > Some subsystems will obviously want to do other actions on suspend, > e.g. > clustering will likely want to fail over, mod_cluster will notify the > load balancer that the node is no longer available etc. In some cases > we > may want to make this configurable to an extent (e.g. Undertow could > be > configured to allow requests with an existing session, and not > consider > itself timed out until all sessions have either timed out or been > invalidated, although this will obviously take a while). > > If anyone has any feedback let me know. In terms of implementation my > basic plan is to get the core functionality and the Undertow > implementation into Wildfly, and then work with subsystem authors to > implement subsystem specific functionality once the core is in place. > > Stuart > > > > > > > > The > > A timeout attribute will also be added to the shutdown command, > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/wildfly-dev _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

Jason T. Greene

9:01 p.m.

New subject: Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

IIRC the behavior for a tx timeout is a rollback, but we should check that.

...

On Jun 9, 2014, at 6:50 PM, Stuart Douglas <stuart.w.douglas(a)gmail.com> wrote: Something I forgot to mention is that we will need a switch to turn this off, as there is a small but noticeable cost with tracking in flight requests. > On 9 Jun 2014, at 18:04, Andrig Miller <anmiller(a)redhat.com> wrote: > > What I am bringing up is more subsystem specific, but it might be valuable to think about. In case of the time out of the graceful shutdown, what behavior would we consider correct in terms of an inflight transaction? It waits for the transaction to finish before shutting down. Stuart > > Should it be a forced rollback, so that when the server is started back up, the transaction manager will not find in the log a transaction to be recovered? > > Or, should it be considered the same as a crashed state, where transactions should be recoverable, and the recover manager wouuld try to recover the transaction? > > I would lean towards the first, as this would be considered graceful by the administrator, and having a transaction be in a state where it would be recovered on a restart, doesn't seem graceful to me. > > Andy > > ----- Original Message ----- >> From: "Stuart Douglas" <sdouglas(a)redhat.com> >> To: "Wildfly Dev mailing list" <wildfly-dev(a)lists.jboss.org> >> Sent: Monday, June 9, 2014 4:38:55 PM >> Subject: [wildfly-dev] Design Proposal: Server suspend/resume (AKA Graceful Shutdown) >> >> Server suspend and resume is a feature that allows a running server >> to >> gracefully finish of all running requests. The most common use case >> for >> this is graceful shutdown, where you would like a server to complete >> all >> running requests, reject any new ones, and then shut down, however >> there >> are also plenty of other valid use cases (e.g. suspend the server, >> modify a data source or some other config, then resume). >> >> User View: >> >> For the users point of view two new operations will be added to the >> server: >> >> suspend(timeout) >> resume() >> >> A runtime only attribute suspend-state (is this a good name?) will >> also >> be added, that can take one of three possible values, RUNNING, >> SUSPENDING, SUSPENDED. >> >> A timeout attribute will also be added to the shutdown operation. If >> this is present then the server will first be suspended, and the >> server >> will not shut down until either the suspend is successful or the >> timeout >> occurs. If no timeout parameter is passed to the operation then a >> normal >> non-graceful shutdown will take place. >> >> In domain mode these operations will be added to both individual >> server >> and a complete server group. >> >> Implementation Details >> >> Suspend/resume operates on entry points to the server. Any request >> that >> is currently running must not be affected by the suspend state, >> however >> any new request should be rejected. In general subsystems will track >> the >> number of outstanding requests, and when this hits zero they are >> considered suspended. >> >> We will introduce the notion of a global SuspendController, that >> manages >> the servers suspend state. All subsystems that wish to do a graceful >> shutdown register callback handlers with this controller. >> >> When the suspend() operation is invoked the controller will invoke >> all >> these callbacks, letting the subsystem know that the server is >> suspend, >> and providing the subsystem with a SuspendContext object that the >> subsystem can then use to notify the controller that the suspend is >> complete. >> >> What the subsystem does when it receives a suspend command, and when >> it >> considers itself suspended will vary, but in the common case it will >> immediatly start rejecting external requests (e.g. Undertow will >> start >> responding with a 503 to all new requests). The subsystem will also >> track the number of outstanding requests, and when this hits zero >> then >> the subsystem will notify the controller that is has successfully >> suspended. >> Some subsystems will obviously want to do other actions on suspend, >> e.g. >> clustering will likely want to fail over, mod_cluster will notify the >> load balancer that the node is no longer available etc. In some cases >> we >> may want to make this configurable to an extent (e.g. Undertow could >> be >> configured to allow requests with an existing session, and not >> consider >> itself timed out until all sessions have either timed out or been >> invalidated, although this will obviously take a while). >> >> If anyone has any feedback let me know. In terms of implementation my >> basic plan is to get the core functionality and the Undertow >> implementation into Wildfly, and then work with subsystem authors to >> implement subsystem specific functionality once the core is in place. >> >> Stuart >> >> >> >> >> >> >> >> The >> >> A timeout attribute will also be added to the shutdown command, >> _______________________________________________ >> wildfly-dev mailing list >> wildfly-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/wildfly-dev > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/wildfly-dev _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

Michael Musgrove

Tuesday, 10 June Tue, 10 Jun

4:51 a.m.

New subject: Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

I agree with Stuart, it should wait for the transaction to finish before shutting down. And yes (with caveats), Jason, when the timeout is reached our transaction reaper will abort the transaction. However, if the transaction was started with a timeout value of 0 it will never abort. Also, if the suspend happens when there are prepared transactions then it's too late to cancel the transaction and they will be recovered when the system is resumed. Note also that suspending before an in-flight transaction has prepared is probably safe since the resource will either: - rollback the branch if all connections to the db are closed (when the system suspends); or - rollback the branch if the XAResource timeout (set via the XAResource.setTransactionTimeout()) value is reached [And since it was never prepared we have no log record for it so we would not do anything on resume] Mike

...

IIRC the behavior for a tx timeout is a rollback, but we should check that. > On Jun 9, 2014, at 6:50 PM, Stuart Douglas <stuart.w.douglas(a)gmail.com> wrote: > > Something I forgot to mention is that we will need a switch to turn this off, as there is a small but noticeable cost with tracking in flight requests. > > >> On 9 Jun 2014, at 18:04, Andrig Miller <anmiller(a)redhat.com> wrote: >> >> What I am bringing up is more subsystem specific, but it might be valuable to think about. In case of the time out of the graceful shutdown, what behavior would we consider correct in terms of an inflight transaction? > It waits for the transaction to finish before shutting down. > > Stuart > >> Should it be a forced rollback, so that when the server is started back up, the transaction manager will not find in the log a transaction to be recovered? >> >> Or, should it be considered the same as a crashed state, where transactions should be recoverable, and the recover manager wouuld try to recover the transaction? >> >> I would lean towards the first, as this would be considered graceful by the administrator, and having a transaction be in a state where it would be recovered on a restart, doesn't seem graceful to me. >> >> Andy >> >> ----- Original Message ----- >>> From: "Stuart Douglas" <sdouglas(a)redhat.com> >>> To: "Wildfly Dev mailing list" <wildfly-dev(a)lists.jboss.org> >>> Sent: Monday, June 9, 2014 4:38:55 PM >>> Subject: [wildfly-dev] Design Proposal: Server suspend/resume (AKA Graceful Shutdown) >>> >>> Server suspend and resume is a feature that allows a running server >>> to >>> gracefully finish of all running requests. The most common use case >>> for >>> this is graceful shutdown, where you would like a server to complete >>> all >>> running requests, reject any new ones, and then shut down, however >>> there >>> are also plenty of other valid use cases (e.g. suspend the server, >>> modify a data source or some other config, then resume). >>> >>> User View: >>> >>> For the users point of view two new operations will be added to the >>> server: >>> >>> suspend(timeout) >>> resume() >>> >>> A runtime only attribute suspend-state (is this a good name?) will >>> also >>> be added, that can take one of three possible values, RUNNING, >>> SUSPENDING, SUSPENDED. >>> >>> A timeout attribute will also be added to the shutdown operation. If >>> this is present then the server will first be suspended, and the >>> server >>> will not shut down until either the suspend is successful or the >>> timeout >>> occurs. If no timeout parameter is passed to the operation then a >>> normal >>> non-graceful shutdown will take place. >>> >>> In domain mode these operations will be added to both individual >>> server >>> and a complete server group. >>> >>> Implementation Details >>> >>> Suspend/resume operates on entry points to the server. Any request >>> that >>> is currently running must not be affected by the suspend state, >>> however >>> any new request should be rejected. In general subsystems will track >>> the >>> number of outstanding requests, and when this hits zero they are >>> considered suspended. >>> >>> We will introduce the notion of a global SuspendController, that >>> manages >>> the servers suspend state. All subsystems that wish to do a graceful >>> shutdown register callback handlers with this controller. >>> >>> When the suspend() operation is invoked the controller will invoke >>> all >>> these callbacks, letting the subsystem know that the server is >>> suspend, >>> and providing the subsystem with a SuspendContext object that the >>> subsystem can then use to notify the controller that the suspend is >>> complete. >>> >>> What the subsystem does when it receives a suspend command, and when >>> it >>> considers itself suspended will vary, but in the common case it will >>> immediatly start rejecting external requests (e.g. Undertow will >>> start >>> responding with a 503 to all new requests). The subsystem will also >>> track the number of outstanding requests, and when this hits zero >>> then >>> the subsystem will notify the controller that is has successfully >>> suspended. >>> Some subsystems will obviously want to do other actions on suspend, >>> e.g. >>> clustering will likely want to fail over, mod_cluster will notify the >>> load balancer that the node is no longer available etc. In some cases >>> we >>> may want to make this configurable to an extent (e.g. Undertow could >>> be >>> configured to allow requests with an existing session, and not >>> consider >>> itself timed out until all sessions have either timed out or been >>> invalidated, although this will obviously take a while). >>> >>> If anyone has any feedback let me know. In terms of implementation my >>> basic plan is to get the core functionality and the Undertow >>> implementation into Wildfly, and then work with subsystem authors to >>> implement subsystem specific functionality once the core is in place. >>> >>> Stuart >>> >>> >>> >>> >>> >>> >>> >>> The >>> >>> A timeout attribute will also be added to the shutdown command, >>> _______________________________________________ >>> wildfly-dev mailing list >>> wildfly-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >> _______________________________________________ >> wildfly-dev mailing list >> wildfly-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/wildfly-dev > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/wildfly-dev _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

-- Michael Musgrove Transactions Team e: mmusgrov(a)redhat.com t: +44 191 243 0870 Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham (US), Charles Peters (US), Matt Parson (US), Michael O'Neill(Ireland)

Andrig Miller

7:50 a.m.

New subject: Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

----- Original Message -----

...

From: "Michael Musgrove" <mmusgrov(a)redhat.com> To: "Jason T. Greene" <jgreene(a)redhat.com>, "Stuart Douglas" <stuart.w.douglas(a)gmail.com> Cc: "Wildfly Dev mailing list" <wildfly-dev(a)lists.jboss.org>, "Stuart Douglas" <sdouglas(a)redhat.com> Sent: Tuesday, June 10, 2014 3:51:02 AM Subject: Re: [wildfly-dev] Design Proposal: Server suspend/resume (AKA Graceful Shutdown) I agree with Stuart, it should wait for the transaction to finish before shutting down. And yes (with caveats), Jason, when the timeout is reached our transaction reaper will abort the transaction. However, if the transaction was started with a timeout value of 0 it will never abort. Also, if the suspend happens when there are prepared transactions then it's too late to cancel the transaction and they will be recovered when the system is resumed.

If the transaction will never abort, can we force a rollback? This could lead to a never ending "graceful" shutdown. Andy

...

Note also that suspending before an in-flight transaction has prepared is probably safe since the resource will either: - rollback the branch if all connections to the db are closed (when the system suspends); or - rollback the branch if the XAResource timeout (set via the XAResource.setTransactionTimeout()) value is reached [And since it was never prepared we have no log record for it so we would not do anything on resume] Mike > IIRC the behavior for a tx timeout is a rollback, but we should > check that. > >> On Jun 9, 2014, at 6:50 PM, Stuart Douglas >> <stuart.w.douglas(a)gmail.com> wrote: >> >> Something I forgot to mention is that we will need a switch to >> turn this off, as there is a small but noticeable cost with >> tracking in flight requests. >> >> >>> On 9 Jun 2014, at 18:04, Andrig Miller <anmiller(a)redhat.com> >>> wrote: >>> >>> What I am bringing up is more subsystem specific, but it might be >>> valuable to think about. In case of the time out of the >>> graceful shutdown, what behavior would we consider correct in >>> terms of an inflight transaction? >> It waits for the transaction to finish before shutting down. >> >> Stuart >> >>> Should it be a forced rollback, so that when the server is >>> started back up, the transaction manager will not find in the >>> log a transaction to be recovered? >>> >>> Or, should it be considered the same as a crashed state, where >>> transactions should be recoverable, and the recover manager >>> wouuld try to recover the transaction? >>> >>> I would lean towards the first, as this would be considered >>> graceful by the administrator, and having a transaction be in a >>> state where it would be recovered on a restart, doesn't seem >>> graceful to me. >>> >>> Andy >>> >>> ----- Original Message ----- >>>> From: "Stuart Douglas" <sdouglas(a)redhat.com> >>>> To: "Wildfly Dev mailing list" <wildfly-dev(a)lists.jboss.org> >>>> Sent: Monday, June 9, 2014 4:38:55 PM >>>> Subject: [wildfly-dev] Design Proposal: Server suspend/resume >>>> (AKA Graceful Shutdown) >>>> >>>> Server suspend and resume is a feature that allows a running >>>> server >>>> to >>>> gracefully finish of all running requests. The most common use >>>> case >>>> for >>>> this is graceful shutdown, where you would like a server to >>>> complete >>>> all >>>> running requests, reject any new ones, and then shut down, >>>> however >>>> there >>>> are also plenty of other valid use cases (e.g. suspend the >>>> server, >>>> modify a data source or some other config, then resume). >>>> >>>> User View: >>>> >>>> For the users point of view two new operations will be added to >>>> the >>>> server: >>>> >>>> suspend(timeout) >>>> resume() >>>> >>>> A runtime only attribute suspend-state (is this a good name?) >>>> will >>>> also >>>> be added, that can take one of three possible values, RUNNING, >>>> SUSPENDING, SUSPENDED. >>>> >>>> A timeout attribute will also be added to the shutdown >>>> operation. If >>>> this is present then the server will first be suspended, and the >>>> server >>>> will not shut down until either the suspend is successful or the >>>> timeout >>>> occurs. If no timeout parameter is passed to the operation then >>>> a >>>> normal >>>> non-graceful shutdown will take place. >>>> >>>> In domain mode these operations will be added to both individual >>>> server >>>> and a complete server group. >>>> >>>> Implementation Details >>>> >>>> Suspend/resume operates on entry points to the server. Any >>>> request >>>> that >>>> is currently running must not be affected by the suspend state, >>>> however >>>> any new request should be rejected. In general subsystems will >>>> track >>>> the >>>> number of outstanding requests, and when this hits zero they are >>>> considered suspended. >>>> >>>> We will introduce the notion of a global SuspendController, that >>>> manages >>>> the servers suspend state. All subsystems that wish to do a >>>> graceful >>>> shutdown register callback handlers with this controller. >>>> >>>> When the suspend() operation is invoked the controller will >>>> invoke >>>> all >>>> these callbacks, letting the subsystem know that the server is >>>> suspend, >>>> and providing the subsystem with a SuspendContext object that >>>> the >>>> subsystem can then use to notify the controller that the suspend >>>> is >>>> complete. >>>> >>>> What the subsystem does when it receives a suspend command, and >>>> when >>>> it >>>> considers itself suspended will vary, but in the common case it >>>> will >>>> immediatly start rejecting external requests (e.g. Undertow will >>>> start >>>> responding with a 503 to all new requests). The subsystem will >>>> also >>>> track the number of outstanding requests, and when this hits >>>> zero >>>> then >>>> the subsystem will notify the controller that is has >>>> successfully >>>> suspended. >>>> Some subsystems will obviously want to do other actions on >>>> suspend, >>>> e.g. >>>> clustering will likely want to fail over, mod_cluster will >>>> notify the >>>> load balancer that the node is no longer available etc. In some >>>> cases >>>> we >>>> may want to make this configurable to an extent (e.g. Undertow >>>> could >>>> be >>>> configured to allow requests with an existing session, and not >>>> consider >>>> itself timed out until all sessions have either timed out or >>>> been >>>> invalidated, although this will obviously take a while). >>>> >>>> If anyone has any feedback let me know. In terms of >>>> implementation my >>>> basic plan is to get the core functionality and the Undertow >>>> implementation into Wildfly, and then work with subsystem >>>> authors to >>>> implement subsystem specific functionality once the core is in >>>> place. >>>> >>>> Stuart >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> The >>>> >>>> A timeout attribute will also be added to the shutdown command, >>>> _______________________________________________ >>>> wildfly-dev mailing list >>>> wildfly-dev(a)lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>> _______________________________________________ >>> wildfly-dev mailing list >>> wildfly-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >> _______________________________________________ >> wildfly-dev mailing list >> wildfly-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/wildfly-dev > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/wildfly-dev -- Michael Musgrove Transactions Team e: mmusgrov(a)redhat.com t: +44 191 243 0870 Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham (US), Charles Peters (US), Matt Parson (US), Michael O'Neill(Ireland) _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

Stuart Douglas

8:13 a.m.

...

If the transaction will never abort, can we force a rollback? This could lead to a never ending "graceful" shutdown.

That is why we have a timeout, once the timeout is done the server will shutdown anyway. We should probably have some kind of post-timeout callback that gets invoked, so the TX subsystem could potentially take some action. I'm not sure what the best action would be, the subsystem specific details will be up to the team that maintains the subsystem. Stuart

...

Andy > Note also that suspending before an in-flight transaction has > prepared is probably safe since the resource will either: > > - rollback the branch if all connections to the db are closed (when > the system suspends); or > - rollback the branch if the XAResource timeout (set via the > XAResource.setTransactionTimeout()) value is reached > > [And since it was never prepared we have no log record for it so we > would not do anything on resume] > > Mike > >> IIRC the behavior for a tx timeout is a rollback, but we should >> check that. >> >>> On Jun 9, 2014, at 6:50 PM, Stuart Douglas >>> <stuart.w.douglas(a)gmail.com> wrote: >>> >>> Something I forgot to mention is that we will need a switch to >>> turn this off, as there is a small but noticeable cost with >>> tracking in flight requests. >>> >>> >>>> On 9 Jun 2014, at 18:04, Andrig Miller<anmiller(a)redhat.com> >>>> wrote: >>>> >>>> What I am bringing up is more subsystem specific, but it might be >>>> valuable to think about. In case of the time out of the >>>> graceful shutdown, what behavior would we consider correct in >>>> terms of an inflight transaction? >>> It waits for the transaction to finish before shutting down. >>> >>> Stuart >>> >>>> Should it be a forced rollback, so that when the server is >>>> started back up, the transaction manager will not find in the >>>> log a transaction to be recovered? >>>> >>>> Or, should it be considered the same as a crashed state, where >>>> transactions should be recoverable, and the recover manager >>>> wouuld try to recover the transaction? >>>> >>>> I would lean towards the first, as this would be considered >>>> graceful by the administrator, and having a transaction be in a >>>> state where it would be recovered on a restart, doesn't seem >>>> graceful to me. >>>> >>>> Andy >>>> >>>> ----- Original Message ----- >>>>> From: "Stuart Douglas"<sdouglas(a)redhat.com> >>>>> To: "Wildfly Dev mailing list"<wildfly-dev(a)lists.jboss.org> >>>>> Sent: Monday, June 9, 2014 4:38:55 PM >>>>> Subject: [wildfly-dev] Design Proposal: Server suspend/resume >>>>> (AKA Graceful Shutdown) >>>>> >>>>> Server suspend and resume is a feature that allows a running >>>>> server >>>>> to >>>>> gracefully finish of all running requests. The most common use >>>>> case >>>>> for >>>>> this is graceful shutdown, where you would like a server to >>>>> complete >>>>> all >>>>> running requests, reject any new ones, and then shut down, >>>>> however >>>>> there >>>>> are also plenty of other valid use cases (e.g. suspend the >>>>> server, >>>>> modify a data source or some other config, then resume). >>>>> >>>>> User View: >>>>> >>>>> For the users point of view two new operations will be added to >>>>> the >>>>> server: >>>>> >>>>> suspend(timeout) >>>>> resume() >>>>> >>>>> A runtime only attribute suspend-state (is this a good name?) >>>>> will >>>>> also >>>>> be added, that can take one of three possible values, RUNNING, >>>>> SUSPENDING, SUSPENDED. >>>>> >>>>> A timeout attribute will also be added to the shutdown >>>>> operation. If >>>>> this is present then the server will first be suspended, and the >>>>> server >>>>> will not shut down until either the suspend is successful or the >>>>> timeout >>>>> occurs. If no timeout parameter is passed to the operation then >>>>> a >>>>> normal >>>>> non-graceful shutdown will take place. >>>>> >>>>> In domain mode these operations will be added to both individual >>>>> server >>>>> and a complete server group. >>>>> >>>>> Implementation Details >>>>> >>>>> Suspend/resume operates on entry points to the server. Any >>>>> request >>>>> that >>>>> is currently running must not be affected by the suspend state, >>>>> however >>>>> any new request should be rejected. In general subsystems will >>>>> track >>>>> the >>>>> number of outstanding requests, and when this hits zero they are >>>>> considered suspended. >>>>> >>>>> We will introduce the notion of a global SuspendController, that >>>>> manages >>>>> the servers suspend state. All subsystems that wish to do a >>>>> graceful >>>>> shutdown register callback handlers with this controller. >>>>> >>>>> When the suspend() operation is invoked the controller will >>>>> invoke >>>>> all >>>>> these callbacks, letting the subsystem know that the server is >>>>> suspend, >>>>> and providing the subsystem with a SuspendContext object that >>>>> the >>>>> subsystem can then use to notify the controller that the suspend >>>>> is >>>>> complete. >>>>> >>>>> What the subsystem does when it receives a suspend command, and >>>>> when >>>>> it >>>>> considers itself suspended will vary, but in the common case it >>>>> will >>>>> immediatly start rejecting external requests (e.g. Undertow will >>>>> start >>>>> responding with a 503 to all new requests). The subsystem will >>>>> also >>>>> track the number of outstanding requests, and when this hits >>>>> zero >>>>> then >>>>> the subsystem will notify the controller that is has >>>>> successfully >>>>> suspended. >>>>> Some subsystems will obviously want to do other actions on >>>>> suspend, >>>>> e.g. >>>>> clustering will likely want to fail over, mod_cluster will >>>>> notify the >>>>> load balancer that the node is no longer available etc. In some >>>>> cases >>>>> we >>>>> may want to make this configurable to an extent (e.g. Undertow >>>>> could >>>>> be >>>>> configured to allow requests with an existing session, and not >>>>> consider >>>>> itself timed out until all sessions have either timed out or >>>>> been >>>>> invalidated, although this will obviously take a while). >>>>> >>>>> If anyone has any feedback let me know. In terms of >>>>> implementation my >>>>> basic plan is to get the core functionality and the Undertow >>>>> implementation into Wildfly, and then work with subsystem >>>>> authors to >>>>> implement subsystem specific functionality once the core is in >>>>> place. >>>>> >>>>> Stuart >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> The >>>>> >>>>> A timeout attribute will also be added to the shutdown command, >>>>> _______________________________________________ >>>>> wildfly-dev mailing list >>>>> wildfly-dev(a)lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>> _______________________________________________ >>>> wildfly-dev mailing list >>>> wildfly-dev(a)lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>> _______________________________________________ >>> wildfly-dev mailing list >>> wildfly-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >> _______________________________________________ >> wildfly-dev mailing list >> wildfly-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/wildfly-dev > > -- > Michael Musgrove > Transactions Team > e: mmusgrov(a)redhat.com > t: +44 191 243 0870 > > Registered in England and Wales under Company Registration No. > 03798903 > Directors: Michael Cunningham (US), Charles Peters (US), Matt Parson > (US), Michael O'Neill(Ireland) > > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/wildfly-dev >

Mark Little

Monday, 7 July Mon, 7 Jul

10:57 a.m.

Are we talking about really shutting down the app server or just suspending it, with a view to resuming activity later? The answer is different for these two scenarios. If we just consider the former, then since the transaction system can cope with a crash failure and ensure consistency, technically we could just SIGKILL (or equivalent) and let it tidy up later. Of course we don't want to do that because we could end up with heuristics etc. So I assume what we're trying to achieve is as graceful a shutdown as possible, reducing the number of log recoveries and amount of tidying up we need to do. We should wait for the transaction to terminate and prevent any new transactions from starting. There's a mechanism in Narayana which allows it to be started/stopped (or suspended/resumed) so that no new transactions can be created once the switch is flicked. Those inflight transactions that haven't been prepared *could* have setRollbackOnly called on them to force them to end quicker (potentially add a JBoss specific option to change the timeout value as well - speed up time, but that would be risky). Those transactions which get past prepare need to be waited upon, but obviously there's a possibility they could block. We have some interrupt code in the transaction reaper to help there and that should be sufficient. Using a combination of the above would help to shorten the time to wait in the case there are a lot of inflight transactions. Mark. On 10 Jun 2014, at 14:13, Stuart Douglas <sdouglas(a)redhat.com> wrote:

...

> > If the transaction will never abort, can we force a rollback? This could lead to a never ending "graceful" shutdown. > That is why we have a timeout, once the timeout is done the server will shutdown anyway. We should probably have some kind of post-timeout callback that gets invoked, so the TX subsystem could potentially take some action. I'm not sure what the best action would be, the subsystem specific details will be up to the team that maintains the subsystem. Stuart > Andy > >> Note also that suspending before an in-flight transaction has >> prepared is probably safe since the resource will either: >> >> - rollback the branch if all connections to the db are closed (when >> the system suspends); or >> - rollback the branch if the XAResource timeout (set via the >> XAResource.setTransactionTimeout()) value is reached >> >> [And since it was never prepared we have no log record for it so we >> would not do anything on resume] >> >> Mike >> >>> IIRC the behavior for a tx timeout is a rollback, but we should >>> check that. >>> >>>> On Jun 9, 2014, at 6:50 PM, Stuart Douglas >>>> <stuart.w.douglas(a)gmail.com> wrote: >>>> >>>> Something I forgot to mention is that we will need a switch to >>>> turn this off, as there is a small but noticeable cost with >>>> tracking in flight requests. >>>> >>>> >>>>> On 9 Jun 2014, at 18:04, Andrig Miller<anmiller(a)redhat.com> >>>>> wrote: >>>>> >>>>> What I am bringing up is more subsystem specific, but it might be >>>>> valuable to think about. In case of the time out of the >>>>> graceful shutdown, what behavior would we consider correct in >>>>> terms of an inflight transaction? >>>> It waits for the transaction to finish before shutting down. >>>> >>>> Stuart >>>> >>>>> Should it be a forced rollback, so that when the server is >>>>> started back up, the transaction manager will not find in the >>>>> log a transaction to be recovered? >>>>> >>>>> Or, should it be considered the same as a crashed state, where >>>>> transactions should be recoverable, and the recover manager >>>>> wouuld try to recover the transaction? >>>>> >>>>> I would lean towards the first, as this would be considered >>>>> graceful by the administrator, and having a transaction be in a >>>>> state where it would be recovered on a restart, doesn't seem >>>>> graceful to me. >>>>> >>>>> Andy >>>>> >>>>> ----- Original Message ----- >>>>>> From: "Stuart Douglas"<sdouglas(a)redhat.com> >>>>>> To: "Wildfly Dev mailing list"<wildfly-dev(a)lists.jboss.org> >>>>>> Sent: Monday, June 9, 2014 4:38:55 PM >>>>>> Subject: [wildfly-dev] Design Proposal: Server suspend/resume >>>>>> (AKA Graceful Shutdown) >>>>>> >>>>>> Server suspend and resume is a feature that allows a running >>>>>> server >>>>>> to >>>>>> gracefully finish of all running requests. The most common use >>>>>> case >>>>>> for >>>>>> this is graceful shutdown, where you would like a server to >>>>>> complete >>>>>> all >>>>>> running requests, reject any new ones, and then shut down, >>>>>> however >>>>>> there >>>>>> are also plenty of other valid use cases (e.g. suspend the >>>>>> server, >>>>>> modify a data source or some other config, then resume). >>>>>> >>>>>> User View: >>>>>> >>>>>> For the users point of view two new operations will be added to >>>>>> the >>>>>> server: >>>>>> >>>>>> suspend(timeout) >>>>>> resume() >>>>>> >>>>>> A runtime only attribute suspend-state (is this a good name?) >>>>>> will >>>>>> also >>>>>> be added, that can take one of three possible values, RUNNING, >>>>>> SUSPENDING, SUSPENDED. >>>>>> >>>>>> A timeout attribute will also be added to the shutdown >>>>>> operation. If >>>>>> this is present then the server will first be suspended, and the >>>>>> server >>>>>> will not shut down until either the suspend is successful or the >>>>>> timeout >>>>>> occurs. If no timeout parameter is passed to the operation then >>>>>> a >>>>>> normal >>>>>> non-graceful shutdown will take place. >>>>>> >>>>>> In domain mode these operations will be added to both individual >>>>>> server >>>>>> and a complete server group. >>>>>> >>>>>> Implementation Details >>>>>> >>>>>> Suspend/resume operates on entry points to the server. Any >>>>>> request >>>>>> that >>>>>> is currently running must not be affected by the suspend state, >>>>>> however >>>>>> any new request should be rejected. In general subsystems will >>>>>> track >>>>>> the >>>>>> number of outstanding requests, and when this hits zero they are >>>>>> considered suspended. >>>>>> >>>>>> We will introduce the notion of a global SuspendController, that >>>>>> manages >>>>>> the servers suspend state. All subsystems that wish to do a >>>>>> graceful >>>>>> shutdown register callback handlers with this controller. >>>>>> >>>>>> When the suspend() operation is invoked the controller will >>>>>> invoke >>>>>> all >>>>>> these callbacks, letting the subsystem know that the server is >>>>>> suspend, >>>>>> and providing the subsystem with a SuspendContext object that >>>>>> the >>>>>> subsystem can then use to notify the controller that the suspend >>>>>> is >>>>>> complete. >>>>>> >>>>>> What the subsystem does when it receives a suspend command, and >>>>>> when >>>>>> it >>>>>> considers itself suspended will vary, but in the common case it >>>>>> will >>>>>> immediatly start rejecting external requests (e.g. Undertow will >>>>>> start >>>>>> responding with a 503 to all new requests). The subsystem will >>>>>> also >>>>>> track the number of outstanding requests, and when this hits >>>>>> zero >>>>>> then >>>>>> the subsystem will notify the controller that is has >>>>>> successfully >>>>>> suspended. >>>>>> Some subsystems will obviously want to do other actions on >>>>>> suspend, >>>>>> e.g. >>>>>> clustering will likely want to fail over, mod_cluster will >>>>>> notify the >>>>>> load balancer that the node is no longer available etc. In some >>>>>> cases >>>>>> we >>>>>> may want to make this configurable to an extent (e.g. Undertow >>>>>> could >>>>>> be >>>>>> configured to allow requests with an existing session, and not >>>>>> consider >>>>>> itself timed out until all sessions have either timed out or >>>>>> been >>>>>> invalidated, although this will obviously take a while). >>>>>> >>>>>> If anyone has any feedback let me know. In terms of >>>>>> implementation my >>>>>> basic plan is to get the core functionality and the Undertow >>>>>> implementation into Wildfly, and then work with subsystem >>>>>> authors to >>>>>> implement subsystem specific functionality once the core is in >>>>>> place. >>>>>> >>>>>> Stuart >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> The >>>>>> >>>>>> A timeout attribute will also be added to the shutdown command, >>>>>> _______________________________________________ >>>>>> wildfly-dev mailing list >>>>>> wildfly-dev(a)lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>>> _______________________________________________ >>>>> wildfly-dev mailing list >>>>> wildfly-dev(a)lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>> _______________________________________________ >>>> wildfly-dev mailing list >>>> wildfly-dev(a)lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>> _______________________________________________ >>> wildfly-dev mailing list >>> wildfly-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >> >> -- >> Michael Musgrove >> Transactions Team >> e: mmusgrov(a)redhat.com >> t: +44 191 243 0870 >> >> Registered in England and Wales under Company Registration No. >> 03798903 >> Directors: Michael Cunningham (US), Charles Peters (US), Matt Parson >> (US), Michael O'Neill(Ireland) >> >> _______________________________________________ >> wildfly-dev mailing list >> wildfly-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/wildfly-dev >> _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

--- Mark Little mlittle(a)redhat.com JBoss, by Red Hat Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Charlie Peters (USA), Matt Parsons (USA) and Michael O'Neill (Ireland).

Stuart Douglas

11:34 p.m.

Mark Little wrote:

...

Are we talking about really shutting down the app server or just suspending it, with a view to resuming activity later? The answer is different for these two scenarios.

It could be either.

...

If we just consider the former, then since the transaction system can cope with a crash failure and ensure consistency, technically we could just SIGKILL (or equivalent) and let it tidy up later. Of course we don't want to do that because we could end up with heuristics etc. So I assume what we're trying to achieve is as graceful a shutdown as possible, reducing the number of log recoveries and amount of tidying up we need to do.

In the case of a fully graceful shutdown the transaction subsystem should not have to do anything, new requests will be rejected at the connector (or entry point) level, and existing requests will continue to run as normal. The question is what to do if the requests don't finish within some timeout, in this case from a transactions point of view I think recovery is inevitable.

...

We should wait for the transaction to terminate and prevent any new transactions from starting. There's a mechanism in Narayana which allows it to be started/stopped (or suspended/resumed) so that no new transactions can be created once the switch is flicked. Those inflight transactions that haven't been prepared *could* have setRollbackOnly called on them to force them to end quicker (potentially add a JBoss specific option to change the timeout value as well - speed up time, but that would be risky). Those transactions which get past prepare need to be waited upon, but obviously there's a possibility they could block. We have some interrupt code in the transaction reaper to help there and that should be sufficient. Using a combination of the above would help to shorten the time to wait in the case there are a lot of inflight transactions.

The problem is that this should only happen after the initial timeout has been hit. Before that the server has to work as normal for all currently running requests. You could potentially do this after the initial timeout has been hit, but then you would need a second timeout to see if it has worked. This makes me think it is worth the extra complexity, especially given that it is not going to be very graceful anyway. Stuart

...

Mark. On 10 Jun 2014, at 14:13, Stuart Douglas<sdouglas(a)redhat.com> wrote: >> If the transaction will never abort, can we force a rollback? This could lead to a never ending "graceful" shutdown. >> > That is why we have a timeout, once the timeout is done the server will > shutdown anyway. We should probably have some kind of post-timeout > callback that gets invoked, so the TX subsystem could potentially take > some action. > > I'm not sure what the best action would be, the subsystem specific > details will be up to the team that maintains the subsystem. > > Stuart > >> Andy >> >>> Note also that suspending before an in-flight transaction has >>> prepared is probably safe since the resource will either: >>> >>> - rollback the branch if all connections to the db are closed (when >>> the system suspends); or >>> - rollback the branch if the XAResource timeout (set via the >>> XAResource.setTransactionTimeout()) value is reached >>> >>> [And since it was never prepared we have no log record for it so we >>> would not do anything on resume] >>> >>> Mike >>> >>>> IIRC the behavior for a tx timeout is a rollback, but we should >>>> check that. >>>> >>>>> On Jun 9, 2014, at 6:50 PM, Stuart Douglas >>>>> <stuart.w.douglas(a)gmail.com> wrote: >>>>> >>>>> Something I forgot to mention is that we will need a switch to >>>>> turn this off, as there is a small but noticeable cost with >>>>> tracking in flight requests. >>>>> >>>>> >>>>>> On 9 Jun 2014, at 18:04, Andrig Miller<anmiller(a)redhat.com> >>>>>> wrote: >>>>>> >>>>>> What I am bringing up is more subsystem specific, but it might be >>>>>> valuable to think about. In case of the time out of the >>>>>> graceful shutdown, what behavior would we consider correct in >>>>>> terms of an inflight transaction? >>>>> It waits for the transaction to finish before shutting down. >>>>> >>>>> Stuart >>>>> >>>>>> Should it be a forced rollback, so that when the server is >>>>>> started back up, the transaction manager will not find in the >>>>>> log a transaction to be recovered? >>>>>> >>>>>> Or, should it be considered the same as a crashed state, where >>>>>> transactions should be recoverable, and the recover manager >>>>>> wouuld try to recover the transaction? >>>>>> >>>>>> I would lean towards the first, as this would be considered >>>>>> graceful by the administrator, and having a transaction be in a >>>>>> state where it would be recovered on a restart, doesn't seem >>>>>> graceful to me. >>>>>> >>>>>> Andy >>>>>> >>>>>> ----- Original Message ----- >>>>>>> From: "Stuart Douglas"<sdouglas(a)redhat.com> >>>>>>> To: "Wildfly Dev mailing list"<wildfly-dev(a)lists.jboss.org> >>>>>>> Sent: Monday, June 9, 2014 4:38:55 PM >>>>>>> Subject: [wildfly-dev] Design Proposal: Server suspend/resume >>>>>>> (AKA Graceful Shutdown) >>>>>>> >>>>>>> Server suspend and resume is a feature that allows a running >>>>>>> server >>>>>>> to >>>>>>> gracefully finish of all running requests. The most common use >>>>>>> case >>>>>>> for >>>>>>> this is graceful shutdown, where you would like a server to >>>>>>> complete >>>>>>> all >>>>>>> running requests, reject any new ones, and then shut down, >>>>>>> however >>>>>>> there >>>>>>> are also plenty of other valid use cases (e.g. suspend the >>>>>>> server, >>>>>>> modify a data source or some other config, then resume). >>>>>>> >>>>>>> User View: >>>>>>> >>>>>>> For the users point of view two new operations will be added to >>>>>>> the >>>>>>> server: >>>>>>> >>>>>>> suspend(timeout) >>>>>>> resume() >>>>>>> >>>>>>> A runtime only attribute suspend-state (is this a good name?) >>>>>>> will >>>>>>> also >>>>>>> be added, that can take one of three possible values, RUNNING, >>>>>>> SUSPENDING, SUSPENDED. >>>>>>> >>>>>>> A timeout attribute will also be added to the shutdown >>>>>>> operation. If >>>>>>> this is present then the server will first be suspended, and the >>>>>>> server >>>>>>> will not shut down until either the suspend is successful or the >>>>>>> timeout >>>>>>> occurs. If no timeout parameter is passed to the operation then >>>>>>> a >>>>>>> normal >>>>>>> non-graceful shutdown will take place. >>>>>>> >>>>>>> In domain mode these operations will be added to both individual >>>>>>> server >>>>>>> and a complete server group. >>>>>>> >>>>>>> Implementation Details >>>>>>> >>>>>>> Suspend/resume operates on entry points to the server. Any >>>>>>> request >>>>>>> that >>>>>>> is currently running must not be affected by the suspend state, >>>>>>> however >>>>>>> any new request should be rejected. In general subsystems will >>>>>>> track >>>>>>> the >>>>>>> number of outstanding requests, and when this hits zero they are >>>>>>> considered suspended. >>>>>>> >>>>>>> We will introduce the notion of a global SuspendController, that >>>>>>> manages >>>>>>> the servers suspend state. All subsystems that wish to do a >>>>>>> graceful >>>>>>> shutdown register callback handlers with this controller. >>>>>>> >>>>>>> When the suspend() operation is invoked the controller will >>>>>>> invoke >>>>>>> all >>>>>>> these callbacks, letting the subsystem know that the server is >>>>>>> suspend, >>>>>>> and providing the subsystem with a SuspendContext object that >>>>>>> the >>>>>>> subsystem can then use to notify the controller that the suspend >>>>>>> is >>>>>>> complete. >>>>>>> >>>>>>> What the subsystem does when it receives a suspend command, and >>>>>>> when >>>>>>> it >>>>>>> considers itself suspended will vary, but in the common case it >>>>>>> will >>>>>>> immediatly start rejecting external requests (e.g. Undertow will >>>>>>> start >>>>>>> responding with a 503 to all new requests). The subsystem will >>>>>>> also >>>>>>> track the number of outstanding requests, and when this hits >>>>>>> zero >>>>>>> then >>>>>>> the subsystem will notify the controller that is has >>>>>>> successfully >>>>>>> suspended. >>>>>>> Some subsystems will obviously want to do other actions on >>>>>>> suspend, >>>>>>> e.g. >>>>>>> clustering will likely want to fail over, mod_cluster will >>>>>>> notify the >>>>>>> load balancer that the node is no longer available etc. In some >>>>>>> cases >>>>>>> we >>>>>>> may want to make this configurable to an extent (e.g. Undertow >>>>>>> could >>>>>>> be >>>>>>> configured to allow requests with an existing session, and not >>>>>>> consider >>>>>>> itself timed out until all sessions have either timed out or >>>>>>> been >>>>>>> invalidated, although this will obviously take a while). >>>>>>> >>>>>>> If anyone has any feedback let me know. In terms of >>>>>>> implementation my >>>>>>> basic plan is to get the core functionality and the Undertow >>>>>>> implementation into Wildfly, and then work with subsystem >>>>>>> authors to >>>>>>> implement subsystem specific functionality once the core is in >>>>>>> place. >>>>>>> >>>>>>> Stuart >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> The >>>>>>> >>>>>>> A timeout attribute will also be added to the shutdown command, >>>>>>> _______________________________________________ >>>>>>> wildfly-dev mailing list >>>>>>> wildfly-dev(a)lists.jboss.org >>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>>>> _______________________________________________ >>>>>> wildfly-dev mailing list >>>>>> wildfly-dev(a)lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>>> _______________________________________________ >>>>> wildfly-dev mailing list >>>>> wildfly-dev(a)lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>> _______________________________________________ >>>> wildfly-dev mailing list >>>> wildfly-dev(a)lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>> -- >>> Michael Musgrove >>> Transactions Team >>> e: mmusgrov(a)redhat.com >>> t: +44 191 243 0870 >>> >>> Registered in England and Wales under Company Registration No. >>> 03798903 >>> Directors: Michael Cunningham (US), Charles Peters (US), Matt Parson >>> (US), Michael O'Neill(Ireland) >>> >>> _______________________________________________ >>> wildfly-dev mailing list >>> wildfly-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>> > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/wildfly-dev --- Mark Little mlittle(a)redhat.com JBoss, by Red Hat Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Charlie Peters (USA), Matt Parsons (USA) and Michael O'Neill (Ireland). _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

Mark Little

Friday, 18 July Fri, 18 Jul

6:28 a.m.

Apologise for the delay in replying, but I forgot to check this folder (ping me directly in future too) ;) On 8 Jul 2014, at 05:34, Stuart Douglas <stuart.w.douglas(a)gmail.com> wrote:

...

Mark Little wrote: > Are we talking about really shutting down the app server or just suspending it, with a view to resuming activity later? The answer is different for these two scenarios. > It could be either. > If we just consider the former, then since the transaction system can cope with a crash failure and ensure consistency, technically we could just SIGKILL (or equivalent) and let it tidy up later. Of course we don't want to do that because we could end up with heuristics etc. So I assume what we're trying to achieve is as graceful a shutdown as possible, reducing the number of log recoveries and amount of tidying up we need to do. In the case of a fully graceful shutdown the transaction subsystem should not have to do anything, new requests will be rejected at the connector (or entry point) level, and existing requests will continue to run as normal. The question is what to do if the requests don't finish within some timeout, in this case from a transactions point of view I think recovery is inevitable.

The transaction system should be able to treat this just as a crash failure. Though of course a graceful shutdown should be attempted first.

...

> > We should wait for the transaction to terminate and prevent any new transactions from starting. There's a mechanism in Narayana which allows it to be started/stopped (or suspended/resumed) so that no new transactions can be created once the switch is flicked. Those inflight transactions that haven't been prepared *could* have setRollbackOnly called on them to force them to end quicker (potentially add a JBoss specific option to change the timeout value as well - speed up time, but that would be risky). Those transactions which get past prepare need to be waited upon, but obviously there's a possibility they could block. We have some interrupt code in the transaction reaper to help there and that should be sufficient. > > Using a combination of the above would help to shorten the time to wait in the case there are a lot of inflight transactions. The problem is that this should only happen after the initial timeout has been hit. Before that the server has to work as normal for all currently running requests. You could potentially do this after the initial timeout has been hit, but then you would need a second timeout to see if it has worked. This makes me think it is worth the extra complexity, especially given that it is not going to be very graceful anyway.

In HP-AS (HP Application Server) "back in the day", we simply prevented the creation of further transactions immediately the app server was going down, ran through termination of other services such as CORBA etc. and then allowed the process to terminate. Any inflight transactions were tidied up by recovery. The other stuff I mentioned, such as setRollbackOnly, would only be applicable if we wanted to provide a more graceful way for applications to terminate or to prevent heuristics, which could happen if there's a failure after the prepare phase. The former is a nice-to-have, whereas the latter is definitely something we should be worried about. Mark.

...

Stuart > > Mark. > > > On 10 Jun 2014, at 14:13, Stuart Douglas<sdouglas(a)redhat.com> wrote: > >>> If the transaction will never abort, can we force a rollback? This could lead to a never ending "graceful" shutdown. >>> >> That is why we have a timeout, once the timeout is done the server will >> shutdown anyway. We should probably have some kind of post-timeout >> callback that gets invoked, so the TX subsystem could potentially take >> some action. >> >> I'm not sure what the best action would be, the subsystem specific >> details will be up to the team that maintains the subsystem. >> >> Stuart >> >>> Andy >>> >>>> Note also that suspending before an in-flight transaction has >>>> prepared is probably safe since the resource will either: >>>> >>>> - rollback the branch if all connections to the db are closed (when >>>> the system suspends); or >>>> - rollback the branch if the XAResource timeout (set via the >>>> XAResource.setTransactionTimeout()) value is reached >>>> >>>> [And since it was never prepared we have no log record for it so we >>>> would not do anything on resume] >>>> >>>> Mike >>>> >>>>> IIRC the behavior for a tx timeout is a rollback, but we should >>>>> check that. >>>>> >>>>>> On Jun 9, 2014, at 6:50 PM, Stuart Douglas >>>>>> <stuart.w.douglas(a)gmail.com> wrote: >>>>>> >>>>>> Something I forgot to mention is that we will need a switch to >>>>>> turn this off, as there is a small but noticeable cost with >>>>>> tracking in flight requests. >>>>>> >>>>>> >>>>>>> On 9 Jun 2014, at 18:04, Andrig Miller<anmiller(a)redhat.com> >>>>>>> wrote: >>>>>>> >>>>>>> What I am bringing up is more subsystem specific, but it might be >>>>>>> valuable to think about. In case of the time out of the >>>>>>> graceful shutdown, what behavior would we consider correct in >>>>>>> terms of an inflight transaction? >>>>>> It waits for the transaction to finish before shutting down. >>>>>> >>>>>> Stuart >>>>>> >>>>>>> Should it be a forced rollback, so that when the server is >>>>>>> started back up, the transaction manager will not find in the >>>>>>> log a transaction to be recovered? >>>>>>> >>>>>>> Or, should it be considered the same as a crashed state, where >>>>>>> transactions should be recoverable, and the recover manager >>>>>>> wouuld try to recover the transaction? >>>>>>> >>>>>>> I would lean towards the first, as this would be considered >>>>>>> graceful by the administrator, and having a transaction be in a >>>>>>> state where it would be recovered on a restart, doesn't seem >>>>>>> graceful to me. >>>>>>> >>>>>>> Andy >>>>>>> >>>>>>> ----- Original Message ----- >>>>>>>> From: "Stuart Douglas"<sdouglas(a)redhat.com> >>>>>>>> To: "Wildfly Dev mailing list"<wildfly-dev(a)lists.jboss.org> >>>>>>>> Sent: Monday, June 9, 2014 4:38:55 PM >>>>>>>> Subject: [wildfly-dev] Design Proposal: Server suspend/resume >>>>>>>> (AKA Graceful Shutdown) >>>>>>>> >>>>>>>> Server suspend and resume is a feature that allows a running >>>>>>>> server >>>>>>>> to >>>>>>>> gracefully finish of all running requests. The most common use >>>>>>>> case >>>>>>>> for >>>>>>>> this is graceful shutdown, where you would like a server to >>>>>>>> complete >>>>>>>> all >>>>>>>> running requests, reject any new ones, and then shut down, >>>>>>>> however >>>>>>>> there >>>>>>>> are also plenty of other valid use cases (e.g. suspend the >>>>>>>> server, >>>>>>>> modify a data source or some other config, then resume). >>>>>>>> >>>>>>>> User View: >>>>>>>> >>>>>>>> For the users point of view two new operations will be added to >>>>>>>> the >>>>>>>> server: >>>>>>>> >>>>>>>> suspend(timeout) >>>>>>>> resume() >>>>>>>> >>>>>>>> A runtime only attribute suspend-state (is this a good name?) >>>>>>>> will >>>>>>>> also >>>>>>>> be added, that can take one of three possible values, RUNNING, >>>>>>>> SUSPENDING, SUSPENDED. >>>>>>>> >>>>>>>> A timeout attribute will also be added to the shutdown >>>>>>>> operation. If >>>>>>>> this is present then the server will first be suspended, and the >>>>>>>> server >>>>>>>> will not shut down until either the suspend is successful or the >>>>>>>> timeout >>>>>>>> occurs. If no timeout parameter is passed to the operation then >>>>>>>> a >>>>>>>> normal >>>>>>>> non-graceful shutdown will take place. >>>>>>>> >>>>>>>> In domain mode these operations will be added to both individual >>>>>>>> server >>>>>>>> and a complete server group. >>>>>>>> >>>>>>>> Implementation Details >>>>>>>> >>>>>>>> Suspend/resume operates on entry points to the server. Any >>>>>>>> request >>>>>>>> that >>>>>>>> is currently running must not be affected by the suspend state, >>>>>>>> however >>>>>>>> any new request should be rejected. In general subsystems will >>>>>>>> track >>>>>>>> the >>>>>>>> number of outstanding requests, and when this hits zero they are >>>>>>>> considered suspended. >>>>>>>> >>>>>>>> We will introduce the notion of a global SuspendController, that >>>>>>>> manages >>>>>>>> the servers suspend state. All subsystems that wish to do a >>>>>>>> graceful >>>>>>>> shutdown register callback handlers with this controller. >>>>>>>> >>>>>>>> When the suspend() operation is invoked the controller will >>>>>>>> invoke >>>>>>>> all >>>>>>>> these callbacks, letting the subsystem know that the server is >>>>>>>> suspend, >>>>>>>> and providing the subsystem with a SuspendContext object that >>>>>>>> the >>>>>>>> subsystem can then use to notify the controller that the suspend >>>>>>>> is >>>>>>>> complete. >>>>>>>> >>>>>>>> What the subsystem does when it receives a suspend command, and >>>>>>>> when >>>>>>>> it >>>>>>>> considers itself suspended will vary, but in the common case it >>>>>>>> will >>>>>>>> immediatly start rejecting external requests (e.g. Undertow will >>>>>>>> start >>>>>>>> responding with a 503 to all new requests). The subsystem will >>>>>>>> also >>>>>>>> track the number of outstanding requests, and when this hits >>>>>>>> zero >>>>>>>> then >>>>>>>> the subsystem will notify the controller that is has >>>>>>>> successfully >>>>>>>> suspended. >>>>>>>> Some subsystems will obviously want to do other actions on >>>>>>>> suspend, >>>>>>>> e.g. >>>>>>>> clustering will likely want to fail over, mod_cluster will >>>>>>>> notify the >>>>>>>> load balancer that the node is no longer available etc. In some >>>>>>>> cases >>>>>>>> we >>>>>>>> may want to make this configurable to an extent (e.g. Undertow >>>>>>>> could >>>>>>>> be >>>>>>>> configured to allow requests with an existing session, and not >>>>>>>> consider >>>>>>>> itself timed out until all sessions have either timed out or >>>>>>>> been >>>>>>>> invalidated, although this will obviously take a while). >>>>>>>> >>>>>>>> If anyone has any feedback let me know. In terms of >>>>>>>> implementation my >>>>>>>> basic plan is to get the core functionality and the Undertow >>>>>>>> implementation into Wildfly, and then work with subsystem >>>>>>>> authors to >>>>>>>> implement subsystem specific functionality once the core is in >>>>>>>> place. >>>>>>>> >>>>>>>> Stuart >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> The >>>>>>>> >>>>>>>> A timeout attribute will also be added to the shutdown command, >>>>>>>> _______________________________________________ >>>>>>>> wildfly-dev mailing list >>>>>>>> wildfly-dev(a)lists.jboss.org >>>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>>>>> _______________________________________________ >>>>>>> wildfly-dev mailing list >>>>>>> wildfly-dev(a)lists.jboss.org >>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>>>> _______________________________________________ >>>>>> wildfly-dev mailing list >>>>>> wildfly-dev(a)lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>>> _______________________________________________ >>>>> wildfly-dev mailing list >>>>> wildfly-dev(a)lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>> -- >>>> Michael Musgrove >>>> Transactions Team >>>> e: mmusgrov(a)redhat.com >>>> t: +44 191 243 0870 >>>> >>>> Registered in England and Wales under Company Registration No. >>>> 03798903 >>>> Directors: Michael Cunningham (US), Charles Peters (US), Matt Parson >>>> (US), Michael O'Neill(Ireland) >>>> >>>> _______________________________________________ >>>> wildfly-dev mailing list >>>> wildfly-dev(a)lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>> >> _______________________________________________ >> wildfly-dev mailing list >> wildfly-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/wildfly-dev > > --- > Mark Little > mlittle(a)redhat.com > > JBoss, by Red Hat > Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. > Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Charlie Peters (USA), Matt Parsons (USA) and Michael O'Neill (Ireland). > > > > > > > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/wildfly-dev

Scott Marlow

Monday, 9 June Mon, 9 Jun

9:35 p.m.

On 06/09/2014 06:38 PM, Stuart Douglas wrote:

...

The SuspendController "state" might be a shorter attribute name and just as meaningful. When are we in the RUNNING state? Is that simply the pre-state for SUSPENDING?

...

A timeout attribute will also be added to the shutdown operation. If this is present then the server will first be suspended, and the server will not shut down until either the suspend is successful or the timeout occurs. If no timeout parameter is passed to the operation then a normal non-graceful shutdown will take place.

Will non-graceful shutdown wait for non-daemon threads or terminate immediately (call System.exit()).

...

In domain mode these operations will be added to both individual server and a complete server group. Implementation Details Suspend/resume operates on entry points to the server. Any request that is currently running must not be affected by the suspend state, however any new request should be rejected. In general subsystems will track the number of outstanding requests, and when this hits zero they are considered suspended. We will introduce the notion of a global SuspendController, that manages the servers suspend state. All subsystems that wish to do a graceful shutdown register callback handlers with this controller. When the suspend() operation is invoked the controller will invoke all these callbacks, letting the subsystem know that the server is suspend, and providing the subsystem with a SuspendContext object that the subsystem can then use to notify the controller that the suspend is complete. What the subsystem does when it receives a suspend command, and when it considers itself suspended will vary, but in the common case it will immediatly start rejecting external requests (e.g. Undertow will start responding with a 503 to all new requests). The subsystem will also track the number of outstanding requests, and when this hits zero then the subsystem will notify the controller that is has successfully suspended. Some subsystems will obviously want to do other actions on suspend, e.g. clustering will likely want to fail over, mod_cluster will notify the load balancer that the node is no longer available etc. In some cases we may want to make this configurable to an extent (e.g. Undertow could be configured to allow requests with an existing session, and not consider itself timed out until all sessions have either timed out or been invalidated, although this will obviously take a while). If anyone has any feedback let me know. In terms of implementation my basic plan is to get the core functionality and the Undertow implementation into Wildfly, and then work with subsystem authors to implement subsystem specific functionality once the core is in place. Stuart The A timeout attribute will also be added to the shutdown command, _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

Stuart Douglas

9:40 p.m.

Scott Marlow wrote:

...

On 06/09/2014 06:38 PM, Stuart Douglas wrote: > Server suspend and resume is a feature that allows a running server to > gracefully finish of all running requests. The most common use case for > this is graceful shutdown, where you would like a server to complete all > running requests, reject any new ones, and then shut down, however there > are also plenty of other valid use cases (e.g. suspend the server, > modify a data source or some other config, then resume). > > User View: > > For the users point of view two new operations will be added to the server: > > suspend(timeout) > resume() > > A runtime only attribute suspend-state (is this a good name?) will also > be added, that can take one of three possible values, RUNNING, > SUSPENDING, SUSPENDED. The SuspendController "state" might be a shorter attribute name and just as meaningful.

This will be in the global server namespace (i.e. from the CLI :read-attribute(name="suspend-state"). I think the name 'state' is just two generic, which kind of state are we talking about?

...

When are we in the RUNNING state? Is that simply the pre-state for SUSPENDING?

99.99% of the time. Basically servers are always running unless they are have been explicitly suspended, and then they go from suspending to suspended. Note that if resume is called at any time the server goes to RUNNING again immediately, as when subsystems are notified they should be able to begin accepting requests again straight away. We also have admin only mode, which is a kinda similar concept, so we need to make sure we document the differences.

...

> A timeout attribute will also be added to the shutdown operation. If > this is present then the server will first be suspended, and the server > will not shut down until either the suspend is successful or the timeout > occurs. If no timeout parameter is passed to the operation then a normal > non-graceful shutdown will take place. Will non-graceful shutdown wait for non-daemon threads or terminate immediately (call System.exit()).

It will execute the same way it does today (all services will shut down and then the server will exit). Stuart

...

> In domain mode these operations will be added to both individual server > and a complete server group. > > Implementation Details > > Suspend/resume operates on entry points to the server. Any request that > is currently running must not be affected by the suspend state, however > any new request should be rejected. In general subsystems will track the > number of outstanding requests, and when this hits zero they are > considered suspended. > > We will introduce the notion of a global SuspendController, that manages > the servers suspend state. All subsystems that wish to do a graceful > shutdown register callback handlers with this controller. > > When the suspend() operation is invoked the controller will invoke all > these callbacks, letting the subsystem know that the server is suspend, > and providing the subsystem with a SuspendContext object that the > subsystem can then use to notify the controller that the suspend is > complete. > > What the subsystem does when it receives a suspend command, and when it > considers itself suspended will vary, but in the common case it will > immediatly start rejecting external requests (e.g. Undertow will start > responding with a 503 to all new requests). The subsystem will also > track the number of outstanding requests, and when this hits zero then > the subsystem will notify the controller that is has successfully > suspended. > Some subsystems will obviously want to do other actions on suspend, e.g. > clustering will likely want to fail over, mod_cluster will notify the > load balancer that the node is no longer available etc. In some cases we > may want to make this configurable to an extent (e.g. Undertow could be > configured to allow requests with an existing session, and not consider > itself timed out until all sessions have either timed out or been > invalidated, although this will obviously take a while). > > If anyone has any feedback let me know. In terms of implementation my > basic plan is to get the core functionality and the Undertow > implementation into Wildfly, and then work with subsystem authors to > implement subsystem specific functionality once the core is in place. > > Stuart > > > > > > > > The > > A timeout attribute will also be added to the shutdown command, > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/wildfly-dev > _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

Scott Marlow

Tuesday, 10 June Tue, 10 Jun

7:29 a.m.

On 06/09/2014 10:40 PM, Stuart Douglas wrote:

...

I was thinking suspend-state was an attribute of SuspendController. Thanks for the explanation.

...

> > When are we in the RUNNING state? Is that simply the pre-state for > SUSPENDING? 99.99% of the time. Basically servers are always running unless they are have been explicitly suspended, and then they go from suspending to suspended. Note that if resume is called at any time the server goes to RUNNING again immediately, as when subsystems are notified they should be able to begin accepting requests again straight away. We also have admin only mode, which is a kinda similar concept, so we need to make sure we document the differences. > >> A timeout attribute will also be added to the shutdown operation. If >> this is present then the server will first be suspended, and the server >> will not shut down until either the suspend is successful or the timeout >> occurs. If no timeout parameter is passed to the operation then a normal >> non-graceful shutdown will take place. > > Will non-graceful shutdown wait for non-daemon threads or terminate > immediately (call System.exit()). It will execute the same way it does today (all services will shut down and then the server will exit). Stuart > >> In domain mode these operations will be added to both individual server >> and a complete server group. >> >> Implementation Details >> >> Suspend/resume operates on entry points to the server. Any request that >> is currently running must not be affected by the suspend state, however >> any new request should be rejected. In general subsystems will track the >> number of outstanding requests, and when this hits zero they are >> considered suspended. >> >> We will introduce the notion of a global SuspendController, that manages >> the servers suspend state. All subsystems that wish to do a graceful >> shutdown register callback handlers with this controller. >> >> When the suspend() operation is invoked the controller will invoke all >> these callbacks, letting the subsystem know that the server is suspend, >> and providing the subsystem with a SuspendContext object that the >> subsystem can then use to notify the controller that the suspend is >> complete. >> >> What the subsystem does when it receives a suspend command, and when it >> considers itself suspended will vary, but in the common case it will >> immediatly start rejecting external requests (e.g. Undertow will start >> responding with a 503 to all new requests). The subsystem will also >> track the number of outstanding requests, and when this hits zero then >> the subsystem will notify the controller that is has successfully >> suspended. >> Some subsystems will obviously want to do other actions on suspend, e.g. >> clustering will likely want to fail over, mod_cluster will notify the >> load balancer that the node is no longer available etc. In some cases we >> may want to make this configurable to an extent (e.g. Undertow could be >> configured to allow requests with an existing session, and not consider >> itself timed out until all sessions have either timed out or been >> invalidated, although this will obviously take a while). >> >> If anyone has any feedback let me know. In terms of implementation my >> basic plan is to get the core functionality and the Undertow >> implementation into Wildfly, and then work with subsystem authors to >> implement subsystem specific functionality once the core is in place. >> >> Stuart >> >> >> >> >> >> >> >> The >> >> A timeout attribute will also be added to the shutdown command, >> _______________________________________________ >> wildfly-dev mailing list >> wildfly-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/wildfly-dev >> > > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/wildfly-dev

Dimitris Andreadis

9:40 a.m.

...

Scott Marlow wrote: > On 06/09/2014 06:38 PM, Stuart Douglas wrote: >> Server suspend and resume is a feature that allows a running server to >> gracefully finish of all running requests. The most common use case for >> this is graceful shutdown, where you would like a server to complete all >> running requests, reject any new ones, and then shut down, however there >> are also plenty of other valid use cases (e.g. suspend the server, >> modify a data source or some other config, then resume). >> >> User View: >> >> For the users point of view two new operations will be added to the server: >> >> suspend(timeout) >> resume() >> >> A runtime only attribute suspend-state (is this a good name?) will also >> be added, that can take one of three possible values, RUNNING, >> SUSPENDING, SUSPENDED. > > The SuspendController "state" might be a shorter attribute name and just > as meaningful. This will be in the global server namespace (i.e. from the CLI :read-attribute(name="suspend-state"). I think the name 'state' is just two generic, which kind of state are we talking about? > > When are we in the RUNNING state? Is that simply the pre-state for > SUSPENDING? 99.99% of the time. Basically servers are always running unless they are have been explicitly suspended, and then they go from suspending to suspended. Note that if resume is called at any time the server goes to RUNNING again immediately, as when subsystems are notified they should be able to begin accepting requests again straight away. We also have admin only mode, which is a kinda similar concept, so we need to make sure we document the differences. > >> A timeout attribute will also be added to the shutdown operation. If >> this is present then the server will first be suspended, and the server >> will not shut down until either the suspend is successful or the timeout >> occurs. If no timeout parameter is passed to the operation then a normal >> non-graceful shutdown will take place. > > Will non-graceful shutdown wait for non-daemon threads or terminate > immediately (call System.exit()). It will execute the same way it does today (all services will shut down and then the server will exit). Stuart > >> In domain mode these operations will be added to both individual server >> and a complete server group. >> >> Implementation Details >> >> Suspend/resume operates on entry points to the server. Any request that >> is currently running must not be affected by the suspend state, however >> any new request should be rejected. In general subsystems will track the >> number of outstanding requests, and when this hits zero they are >> considered suspended. >> >> We will introduce the notion of a global SuspendController, that manages >> the servers suspend state. All subsystems that wish to do a graceful >> shutdown register callback handlers with this controller. >> >> When the suspend() operation is invoked the controller will invoke all >> these callbacks, letting the subsystem know that the server is suspend, >> and providing the subsystem with a SuspendContext object that the >> subsystem can then use to notify the controller that the suspend is >> complete. >> >> What the subsystem does when it receives a suspend command, and when it >> considers itself suspended will vary, but in the common case it will >> immediatly start rejecting external requests (e.g. Undertow will start >> responding with a 503 to all new requests). The subsystem will also >> track the number of outstanding requests, and when this hits zero then >> the subsystem will notify the controller that is has successfully >> suspended. >> Some subsystems will obviously want to do other actions on suspend, e.g. >> clustering will likely want to fail over, mod_cluster will notify the >> load balancer that the node is no longer available etc. In some cases we >> may want to make this configurable to an extent (e.g. Undertow could be >> configured to allow requests with an existing session, and not consider >> itself timed out until all sessions have either timed out or been >> invalidated, although this will obviously take a while). >> >> If anyone has any feedback let me know. In terms of implementation my >> basic plan is to get the core functionality and the Undertow >> implementation into Wildfly, and then work with subsystem authors to >> implement subsystem specific functionality once the core is in place. >> >> Stuart >> >> >> >> >> >> >> >> The >> >> A timeout attribute will also be added to the shutdown command, >> _______________________________________________ >> wildfly-dev mailing list >> wildfly-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/wildfly-dev >> > > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/wildfly-dev _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

Stuart Douglas

10:17 a.m.

...

Why not extend the states of the existing 'server-state' attribute to: (STARTING, RUNNING, SUSPENDING, SUSPENDED, RESTART_REQUIRED RUNNING) http://wildscribe.github.io/Wildfly/8.0.0.Final/index.html On 10/06/2014 04:40, Stuart Douglas wrote: > > Scott Marlow wrote: >> On 06/09/2014 06:38 PM, Stuart Douglas wrote: >>> Server suspend and resume is a feature that allows a running server to >>> gracefully finish of all running requests. The most common use case for >>> this is graceful shutdown, where you would like a server to complete all >>> running requests, reject any new ones, and then shut down, however there >>> are also plenty of other valid use cases (e.g. suspend the server, >>> modify a data source or some other config, then resume). >>> >>> User View: >>> >>> For the users point of view two new operations will be added to the server: >>> >>> suspend(timeout) >>> resume() >>> >>> A runtime only attribute suspend-state (is this a good name?) will also >>> be added, that can take one of three possible values, RUNNING, >>> SUSPENDING, SUSPENDED. >> The SuspendController "state" might be a shorter attribute name and just >> as meaningful. > This will be in the global server namespace (i.e. from the CLI > :read-attribute(name="suspend-state"). > > I think the name 'state' is just two generic, which kind of state are we > talking about? > >> When are we in the RUNNING state? Is that simply the pre-state for >> SUSPENDING? > 99.99% of the time. Basically servers are always running unless they are > have been explicitly suspended, and then they go from suspending to > suspended. Note that if resume is called at any time the server goes to > RUNNING again immediately, as when subsystems are notified they should > be able to begin accepting requests again straight away. > > We also have admin only mode, which is a kinda similar concept, so we > need to make sure we document the differences. > >>> A timeout attribute will also be added to the shutdown operation. If >>> this is present then the server will first be suspended, and the server >>> will not shut down until either the suspend is successful or the timeout >>> occurs. If no timeout parameter is passed to the operation then a normal >>> non-graceful shutdown will take place. >> Will non-graceful shutdown wait for non-daemon threads or terminate >> immediately (call System.exit()). > It will execute the same way it does today (all services will shut down > and then the server will exit). > > Stuart > >>> In domain mode these operations will be added to both individual server >>> and a complete server group. >>> >>> Implementation Details >>> >>> Suspend/resume operates on entry points to the server. Any request that >>> is currently running must not be affected by the suspend state, however >>> any new request should be rejected. In general subsystems will track the >>> number of outstanding requests, and when this hits zero they are >>> considered suspended. >>> >>> We will introduce the notion of a global SuspendController, that manages >>> the servers suspend state. All subsystems that wish to do a graceful >>> shutdown register callback handlers with this controller. >>> >>> When the suspend() operation is invoked the controller will invoke all >>> these callbacks, letting the subsystem know that the server is suspend, >>> and providing the subsystem with a SuspendContext object that the >>> subsystem can then use to notify the controller that the suspend is >>> complete. >>> >>> What the subsystem does when it receives a suspend command, and when it >>> considers itself suspended will vary, but in the common case it will >>> immediatly start rejecting external requests (e.g. Undertow will start >>> responding with a 503 to all new requests). The subsystem will also >>> track the number of outstanding requests, and when this hits zero then >>> the subsystem will notify the controller that is has successfully >>> suspended. >>> Some subsystems will obviously want to do other actions on suspend, e.g. >>> clustering will likely want to fail over, mod_cluster will notify the >>> load balancer that the node is no longer available etc. In some cases we >>> may want to make this configurable to an extent (e.g. Undertow could be >>> configured to allow requests with an existing session, and not consider >>> itself timed out until all sessions have either timed out or been >>> invalidated, although this will obviously take a while). >>> >>> If anyone has any feedback let me know. In terms of implementation my >>> basic plan is to get the core functionality and the Undertow >>> implementation into Wildfly, and then work with subsystem authors to >>> implement subsystem specific functionality once the core is in place. >>> >>> Stuart >>> >>> >>> >>> >>> >>> >>> >>> The >>> >>> A timeout attribute will also be added to the shutdown command, >>> _______________________________________________ >>> wildfly-dev mailing list >>> wildfly-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>> >> _______________________________________________ >> wildfly-dev mailing list >> wildfly-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/wildfly-dev > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/wildfly-dev > _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

Dimitris Andreadis

10:32 a.m.

Isn't RESTART_REQUIRED also orthogonal to RUNNING? On 10/06/2014 17:17, Stuart Douglas wrote:

...

They are actually orthogonal, a server can be in both RESTART_REQUIRED and any one of the suspend states. RESTART_REQUIRED is very much tied to services and the management model, while suspend/resume is a runtime only thing that should not touch the state of services. Stuart Dimitris Andreadis wrote: > Why not extend the states of the existing 'server-state' attribute to: > > (STARTING, RUNNING, SUSPENDING, SUSPENDED, RESTART_REQUIRED RUNNING) > > http://wildscribe.github.io/Wildfly/8.0.0.Final/index.html > > On 10/06/2014 04:40, Stuart Douglas wrote: >> >> Scott Marlow wrote: >>> On 06/09/2014 06:38 PM, Stuart Douglas wrote: >>>> Server suspend and resume is a feature that allows a running server to >>>> gracefully finish of all running requests. The most common use case for >>>> this is graceful shutdown, where you would like a server to complete all >>>> running requests, reject any new ones, and then shut down, however there >>>> are also plenty of other valid use cases (e.g. suspend the server, >>>> modify a data source or some other config, then resume). >>>> >>>> User View: >>>> >>>> For the users point of view two new operations will be added to the server: >>>> >>>> suspend(timeout) >>>> resume() >>>> >>>> A runtime only attribute suspend-state (is this a good name?) will also >>>> be added, that can take one of three possible values, RUNNING, >>>> SUSPENDING, SUSPENDED. >>> The SuspendController "state" might be a shorter attribute name and just >>> as meaningful. >> This will be in the global server namespace (i.e. from the CLI >> :read-attribute(name="suspend-state"). >> >> I think the name 'state' is just two generic, which kind of state are we >> talking about? >> >>> When are we in the RUNNING state? Is that simply the pre-state for >>> SUSPENDING? >> 99.99% of the time. Basically servers are always running unless they are >> have been explicitly suspended, and then they go from suspending to >> suspended. Note that if resume is called at any time the server goes to >> RUNNING again immediately, as when subsystems are notified they should >> be able to begin accepting requests again straight away. >> >> We also have admin only mode, which is a kinda similar concept, so we >> need to make sure we document the differences. >> >>>> A timeout attribute will also be added to the shutdown operation. If >>>> this is present then the server will first be suspended, and the server >>>> will not shut down until either the suspend is successful or the timeout >>>> occurs. If no timeout parameter is passed to the operation then a normal >>>> non-graceful shutdown will take place. >>> Will non-graceful shutdown wait for non-daemon threads or terminate >>> immediately (call System.exit()). >> It will execute the same way it does today (all services will shut down >> and then the server will exit). >> >> Stuart >> >>>> In domain mode these operations will be added to both individual server >>>> and a complete server group. >>>> >>>> Implementation Details >>>> >>>> Suspend/resume operates on entry points to the server. Any request that >>>> is currently running must not be affected by the suspend state, however >>>> any new request should be rejected. In general subsystems will track the >>>> number of outstanding requests, and when this hits zero they are >>>> considered suspended. >>>> >>>> We will introduce the notion of a global SuspendController, that manages >>>> the servers suspend state. All subsystems that wish to do a graceful >>>> shutdown register callback handlers with this controller. >>>> >>>> When the suspend() operation is invoked the controller will invoke all >>>> these callbacks, letting the subsystem know that the server is suspend, >>>> and providing the subsystem with a SuspendContext object that the >>>> subsystem can then use to notify the controller that the suspend is >>>> complete. >>>> >>>> What the subsystem does when it receives a suspend command, and when it >>>> considers itself suspended will vary, but in the common case it will >>>> immediatly start rejecting external requests (e.g. Undertow will start >>>> responding with a 503 to all new requests). The subsystem will also >>>> track the number of outstanding requests, and when this hits zero then >>>> the subsystem will notify the controller that is has successfully >>>> suspended. >>>> Some subsystems will obviously want to do other actions on suspend, e.g. >>>> clustering will likely want to fail over, mod_cluster will notify the >>>> load balancer that the node is no longer available etc. In some cases we >>>> may want to make this configurable to an extent (e.g. Undertow could be >>>> configured to allow requests with an existing session, and not consider >>>> itself timed out until all sessions have either timed out or been >>>> invalidated, although this will obviously take a while). >>>> >>>> If anyone has any feedback let me know. In terms of implementation my >>>> basic plan is to get the core functionality and the Undertow >>>> implementation into Wildfly, and then work with subsystem authors to >>>> implement subsystem specific functionality once the core is in place. >>>> >>>> Stuart >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> The >>>> >>>> A timeout attribute will also be added to the shutdown command, >>>> _______________________________________________ >>>> wildfly-dev mailing list >>>> wildfly-dev(a)lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>> >>> _______________________________________________ >>> wildfly-dev mailing list >>> wildfly-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >> _______________________________________________ >> wildfly-dev mailing list >> wildfly-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/wildfly-dev >> > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/wildfly-dev

Stuart Douglas

10:40 a.m.

...

Isn't RESTART_REQUIRED also orthogonal to RUNNING? On 10/06/2014 17:17, Stuart Douglas wrote: > They are actually orthogonal, a server can be in both RESTART_REQUIRED > and any one of the > suspend states. > > RESTART_REQUIRED is very much tied to services and the management > model, while > suspend/resume is a runtime only thing that should not touch the state > of services. > > > Stuart > > Dimitris Andreadis wrote: >> Why not extend the states of the existing 'server-state' attribute to: >> >> (STARTING, RUNNING, SUSPENDING, SUSPENDED, RESTART_REQUIRED RUNNING) >> >> http://wildscribe.github.io/Wildfly/8.0.0.Final/index.html >> >> On 10/06/2014 04:40, Stuart Douglas wrote: >>> >>> Scott Marlow wrote: >>>> On 06/09/2014 06:38 PM, Stuart Douglas wrote: >>>>> Server suspend and resume is a feature that allows a running >>>>> server to >>>>> gracefully finish of all running requests. The most common use >>>>> case for >>>>> this is graceful shutdown, where you would like a server to >>>>> complete all >>>>> running requests, reject any new ones, and then shut down, however >>>>> there >>>>> are also plenty of other valid use cases (e.g. suspend the server, >>>>> modify a data source or some other config, then resume). >>>>> >>>>> User View: >>>>> >>>>> For the users point of view two new operations will be added to >>>>> the server: >>>>> >>>>> suspend(timeout) >>>>> resume() >>>>> >>>>> A runtime only attribute suspend-state (is this a good name?) will >>>>> also >>>>> be added, that can take one of three possible values, RUNNING, >>>>> SUSPENDING, SUSPENDED. >>>> The SuspendController "state" might be a shorter attribute name and >>>> just >>>> as meaningful. >>> This will be in the global server namespace (i.e. from the CLI >>> :read-attribute(name="suspend-state"). >>> >>> I think the name 'state' is just two generic, which kind of state >>> are we >>> talking about? >>> >>>> When are we in the RUNNING state? Is that simply the pre-state for >>>> SUSPENDING? >>> 99.99% of the time. Basically servers are always running unless they >>> are >>> have been explicitly suspended, and then they go from suspending to >>> suspended. Note that if resume is called at any time the server goes to >>> RUNNING again immediately, as when subsystems are notified they should >>> be able to begin accepting requests again straight away. >>> >>> We also have admin only mode, which is a kinda similar concept, so we >>> need to make sure we document the differences. >>> >>>>> A timeout attribute will also be added to the shutdown operation. If >>>>> this is present then the server will first be suspended, and the >>>>> server >>>>> will not shut down until either the suspend is successful or the >>>>> timeout >>>>> occurs. If no timeout parameter is passed to the operation then a >>>>> normal >>>>> non-graceful shutdown will take place. >>>> Will non-graceful shutdown wait for non-daemon threads or terminate >>>> immediately (call System.exit()). >>> It will execute the same way it does today (all services will shut down >>> and then the server will exit). >>> >>> Stuart >>> >>>>> In domain mode these operations will be added to both individual >>>>> server >>>>> and a complete server group. >>>>> >>>>> Implementation Details >>>>> >>>>> Suspend/resume operates on entry points to the server. Any request >>>>> that >>>>> is currently running must not be affected by the suspend state, >>>>> however >>>>> any new request should be rejected. In general subsystems will >>>>> track the >>>>> number of outstanding requests, and when this hits zero they are >>>>> considered suspended. >>>>> >>>>> We will introduce the notion of a global SuspendController, that >>>>> manages >>>>> the servers suspend state. All subsystems that wish to do a graceful >>>>> shutdown register callback handlers with this controller. >>>>> >>>>> When the suspend() operation is invoked the controller will invoke >>>>> all >>>>> these callbacks, letting the subsystem know that the server is >>>>> suspend, >>>>> and providing the subsystem with a SuspendContext object that the >>>>> subsystem can then use to notify the controller that the suspend is >>>>> complete. >>>>> >>>>> What the subsystem does when it receives a suspend command, and >>>>> when it >>>>> considers itself suspended will vary, but in the common case it will >>>>> immediatly start rejecting external requests (e.g. Undertow will >>>>> start >>>>> responding with a 503 to all new requests). The subsystem will also >>>>> track the number of outstanding requests, and when this hits zero >>>>> then >>>>> the subsystem will notify the controller that is has successfully >>>>> suspended. >>>>> Some subsystems will obviously want to do other actions on >>>>> suspend, e.g. >>>>> clustering will likely want to fail over, mod_cluster will notify the >>>>> load balancer that the node is no longer available etc. In some >>>>> cases we >>>>> may want to make this configurable to an extent (e.g. Undertow >>>>> could be >>>>> configured to allow requests with an existing session, and not >>>>> consider >>>>> itself timed out until all sessions have either timed out or been >>>>> invalidated, although this will obviously take a while). >>>>> >>>>> If anyone has any feedback let me know. In terms of implementation my >>>>> basic plan is to get the core functionality and the Undertow >>>>> implementation into Wildfly, and then work with subsystem authors to >>>>> implement subsystem specific functionality once the core is in place. >>>>> >>>>> Stuart >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> The >>>>> >>>>> A timeout attribute will also be added to the shutdown command, >>>>> _______________________________________________ >>>>> wildfly-dev mailing list >>>>> wildfly-dev(a)lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>>> >>>> _______________________________________________ >>>> wildfly-dev mailing list >>>> wildfly-dev(a)lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>> _______________________________________________ >>> wildfly-dev mailing list >>> wildfly-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>> >> _______________________________________________ >> wildfly-dev mailing list >> wildfly-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/wildfly-dev

Dimitris Andreadis

10:47 a.m.

It seems to me RESTART_REQUIRED (or RELOAD_REQUIRED) should be a boolean on its own to simplify the state diagram. On 10/06/2014 17:40, Stuart Douglas wrote:

...

I don't think so, I think RESTART_REQUIRED means running, but I need to restart to apply management changes (I think that attribute can also be RELOAD_REQUIRED, I think the description may be a bit out of date). To accurately reflect all the possible states you would need something like: RUNNING PAUSING, PAUSED, RESTART_REQUIRED PAUSING_RESTART_REQUIRED PAUSED_RESTART_REQUIRED RELOAD_REQUIRED PAUSING_RELOAD_REQUIRED PAUSED_RELOAD_REQUIRED Which does not seem great, and may introduce compatibility problems for clients that are not expecting these new values. Stuart Dimitris Andreadis wrote: > Isn't RESTART_REQUIRED also orthogonal to RUNNING? > > On 10/06/2014 17:17, Stuart Douglas wrote: >> They are actually orthogonal, a server can be in both RESTART_REQUIRED >> and any one of the >> suspend states. >> >> RESTART_REQUIRED is very much tied to services and the management >> model, while >> suspend/resume is a runtime only thing that should not touch the state >> of services. >> >> >> Stuart >> >> Dimitris Andreadis wrote: >>> Why not extend the states of the existing 'server-state' attribute to: >>> >>> (STARTING, RUNNING, SUSPENDING, SUSPENDED, RESTART_REQUIRED RUNNING) >>> >>> http://wildscribe.github.io/Wildfly/8.0.0.Final/index.html >>> >>> On 10/06/2014 04:40, Stuart Douglas wrote: >>>> >>>> Scott Marlow wrote: >>>>> On 06/09/2014 06:38 PM, Stuart Douglas wrote: >>>>>> Server suspend and resume is a feature that allows a running >>>>>> server to >>>>>> gracefully finish of all running requests. The most common use >>>>>> case for >>>>>> this is graceful shutdown, where you would like a server to >>>>>> complete all >>>>>> running requests, reject any new ones, and then shut down, however >>>>>> there >>>>>> are also plenty of other valid use cases (e.g. suspend the server, >>>>>> modify a data source or some other config, then resume). >>>>>> >>>>>> User View: >>>>>> >>>>>> For the users point of view two new operations will be added to >>>>>> the server: >>>>>> >>>>>> suspend(timeout) >>>>>> resume() >>>>>> >>>>>> A runtime only attribute suspend-state (is this a good name?) will >>>>>> also >>>>>> be added, that can take one of three possible values, RUNNING, >>>>>> SUSPENDING, SUSPENDED. >>>>> The SuspendController "state" might be a shorter attribute name and >>>>> just >>>>> as meaningful. >>>> This will be in the global server namespace (i.e. from the CLI >>>> :read-attribute(name="suspend-state"). >>>> >>>> I think the name 'state' is just two generic, which kind of state >>>> are we >>>> talking about? >>>> >>>>> When are we in the RUNNING state? Is that simply the pre-state for >>>>> SUSPENDING? >>>> 99.99% of the time. Basically servers are always running unless they >>>> are >>>> have been explicitly suspended, and then they go from suspending to >>>> suspended. Note that if resume is called at any time the server goes to >>>> RUNNING again immediately, as when subsystems are notified they should >>>> be able to begin accepting requests again straight away. >>>> >>>> We also have admin only mode, which is a kinda similar concept, so we >>>> need to make sure we document the differences. >>>> >>>>>> A timeout attribute will also be added to the shutdown operation. If >>>>>> this is present then the server will first be suspended, and the >>>>>> server >>>>>> will not shut down until either the suspend is successful or the >>>>>> timeout >>>>>> occurs. If no timeout parameter is passed to the operation then a >>>>>> normal >>>>>> non-graceful shutdown will take place. >>>>> Will non-graceful shutdown wait for non-daemon threads or terminate >>>>> immediately (call System.exit()). >>>> It will execute the same way it does today (all services will shut down >>>> and then the server will exit). >>>> >>>> Stuart >>>> >>>>>> In domain mode these operations will be added to both individual >>>>>> server >>>>>> and a complete server group. >>>>>> >>>>>> Implementation Details >>>>>> >>>>>> Suspend/resume operates on entry points to the server. Any request >>>>>> that >>>>>> is currently running must not be affected by the suspend state, >>>>>> however >>>>>> any new request should be rejected. In general subsystems will >>>>>> track the >>>>>> number of outstanding requests, and when this hits zero they are >>>>>> considered suspended. >>>>>> >>>>>> We will introduce the notion of a global SuspendController, that >>>>>> manages >>>>>> the servers suspend state. All subsystems that wish to do a graceful >>>>>> shutdown register callback handlers with this controller. >>>>>> >>>>>> When the suspend() operation is invoked the controller will invoke >>>>>> all >>>>>> these callbacks, letting the subsystem know that the server is >>>>>> suspend, >>>>>> and providing the subsystem with a SuspendContext object that the >>>>>> subsystem can then use to notify the controller that the suspend is >>>>>> complete. >>>>>> >>>>>> What the subsystem does when it receives a suspend command, and >>>>>> when it >>>>>> considers itself suspended will vary, but in the common case it will >>>>>> immediatly start rejecting external requests (e.g. Undertow will >>>>>> start >>>>>> responding with a 503 to all new requests). The subsystem will also >>>>>> track the number of outstanding requests, and when this hits zero >>>>>> then >>>>>> the subsystem will notify the controller that is has successfully >>>>>> suspended. >>>>>> Some subsystems will obviously want to do other actions on >>>>>> suspend, e.g. >>>>>> clustering will likely want to fail over, mod_cluster will notify the >>>>>> load balancer that the node is no longer available etc. In some >>>>>> cases we >>>>>> may want to make this configurable to an extent (e.g. Undertow >>>>>> could be >>>>>> configured to allow requests with an existing session, and not >>>>>> consider >>>>>> itself timed out until all sessions have either timed out or been >>>>>> invalidated, although this will obviously take a while). >>>>>> >>>>>> If anyone has any feedback let me know. In terms of implementation my >>>>>> basic plan is to get the core functionality and the Undertow >>>>>> implementation into Wildfly, and then work with subsystem authors to >>>>>> implement subsystem specific functionality once the core is in place. >>>>>> >>>>>> Stuart >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> The >>>>>> >>>>>> A timeout attribute will also be added to the shutdown command, >>>>>> _______________________________________________ >>>>>> wildfly-dev mailing list >>>>>> wildfly-dev(a)lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>>>> >>>>> _______________________________________________ >>>>> wildfly-dev mailing list >>>>> wildfly-dev(a)lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>> _______________________________________________ >>>> wildfly-dev mailing list >>>> wildfly-dev(a)lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>> >>> _______________________________________________ >>> wildfly-dev mailing list >>> wildfly-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/wildfly-dev

Stuart Douglas

10:50 a.m.

We can't really change that now, as it is part of our existing API. Stuart Dimitris Andreadis wrote:

...

It seems to me RESTART_REQUIRED (or RELOAD_REQUIRED) should be a boolean on its own to simplify the state diagram. On 10/06/2014 17:40, Stuart Douglas wrote: > I don't think so, I think RESTART_REQUIRED means running, but I need > to restart to apply > management changes (I think that attribute can also be > RELOAD_REQUIRED, I think the > description may be a bit out of date). > > To accurately reflect all the possible states you would need something > like: > > RUNNING > PAUSING, > PAUSED, > RESTART_REQUIRED > PAUSING_RESTART_REQUIRED > PAUSED_RESTART_REQUIRED > RELOAD_REQUIRED > PAUSING_RELOAD_REQUIRED > PAUSED_RELOAD_REQUIRED > > Which does not seem great, and may introduce compatibility problems > for clients that are not > expecting these new values. > > Stuart > > > > Dimitris Andreadis wrote: >> Isn't RESTART_REQUIRED also orthogonal to RUNNING? >> >> On 10/06/2014 17:17, Stuart Douglas wrote: >>> They are actually orthogonal, a server can be in both RESTART_REQUIRED >>> and any one of the >>> suspend states. >>> >>> RESTART_REQUIRED is very much tied to services and the management >>> model, while >>> suspend/resume is a runtime only thing that should not touch the state >>> of services. >>> >>> >>> Stuart >>> >>> Dimitris Andreadis wrote: >>>> Why not extend the states of the existing 'server-state' attribute to: >>>> >>>> (STARTING, RUNNING, SUSPENDING, SUSPENDED, RESTART_REQUIRED RUNNING) >>>> >>>> http://wildscribe.github.io/Wildfly/8.0.0.Final/index.html >>>> >>>> On 10/06/2014 04:40, Stuart Douglas wrote: >>>>> >>>>> Scott Marlow wrote: >>>>>> On 06/09/2014 06:38 PM, Stuart Douglas wrote: >>>>>>> Server suspend and resume is a feature that allows a running >>>>>>> server to >>>>>>> gracefully finish of all running requests. The most common use >>>>>>> case for >>>>>>> this is graceful shutdown, where you would like a server to >>>>>>> complete all >>>>>>> running requests, reject any new ones, and then shut down, however >>>>>>> there >>>>>>> are also plenty of other valid use cases (e.g. suspend the server, >>>>>>> modify a data source or some other config, then resume). >>>>>>> >>>>>>> User View: >>>>>>> >>>>>>> For the users point of view two new operations will be added to >>>>>>> the server: >>>>>>> >>>>>>> suspend(timeout) >>>>>>> resume() >>>>>>> >>>>>>> A runtime only attribute suspend-state (is this a good name?) will >>>>>>> also >>>>>>> be added, that can take one of three possible values, RUNNING, >>>>>>> SUSPENDING, SUSPENDED. >>>>>> The SuspendController "state" might be a shorter attribute name and >>>>>> just >>>>>> as meaningful. >>>>> This will be in the global server namespace (i.e. from the CLI >>>>> :read-attribute(name="suspend-state"). >>>>> >>>>> I think the name 'state' is just two generic, which kind of state >>>>> are we >>>>> talking about? >>>>> >>>>>> When are we in the RUNNING state? Is that simply the pre-state for >>>>>> SUSPENDING? >>>>> 99.99% of the time. Basically servers are always running unless they >>>>> are >>>>> have been explicitly suspended, and then they go from suspending to >>>>> suspended. Note that if resume is called at any time the server >>>>> goes to >>>>> RUNNING again immediately, as when subsystems are notified they >>>>> should >>>>> be able to begin accepting requests again straight away. >>>>> >>>>> We also have admin only mode, which is a kinda similar concept, so we >>>>> need to make sure we document the differences. >>>>> >>>>>>> A timeout attribute will also be added to the shutdown >>>>>>> operation. If >>>>>>> this is present then the server will first be suspended, and the >>>>>>> server >>>>>>> will not shut down until either the suspend is successful or the >>>>>>> timeout >>>>>>> occurs. If no timeout parameter is passed to the operation then a >>>>>>> normal >>>>>>> non-graceful shutdown will take place. >>>>>> Will non-graceful shutdown wait for non-daemon threads or terminate >>>>>> immediately (call System.exit()). >>>>> It will execute the same way it does today (all services will shut >>>>> down >>>>> and then the server will exit). >>>>> >>>>> Stuart >>>>> >>>>>>> In domain mode these operations will be added to both individual >>>>>>> server >>>>>>> and a complete server group. >>>>>>> >>>>>>> Implementation Details >>>>>>> >>>>>>> Suspend/resume operates on entry points to the server. Any request >>>>>>> that >>>>>>> is currently running must not be affected by the suspend state, >>>>>>> however >>>>>>> any new request should be rejected. In general subsystems will >>>>>>> track the >>>>>>> number of outstanding requests, and when this hits zero they are >>>>>>> considered suspended. >>>>>>> >>>>>>> We will introduce the notion of a global SuspendController, that >>>>>>> manages >>>>>>> the servers suspend state. All subsystems that wish to do a >>>>>>> graceful >>>>>>> shutdown register callback handlers with this controller. >>>>>>> >>>>>>> When the suspend() operation is invoked the controller will invoke >>>>>>> all >>>>>>> these callbacks, letting the subsystem know that the server is >>>>>>> suspend, >>>>>>> and providing the subsystem with a SuspendContext object that the >>>>>>> subsystem can then use to notify the controller that the suspend is >>>>>>> complete. >>>>>>> >>>>>>> What the subsystem does when it receives a suspend command, and >>>>>>> when it >>>>>>> considers itself suspended will vary, but in the common case it >>>>>>> will >>>>>>> immediatly start rejecting external requests (e.g. Undertow will >>>>>>> start >>>>>>> responding with a 503 to all new requests). The subsystem will also >>>>>>> track the number of outstanding requests, and when this hits zero >>>>>>> then >>>>>>> the subsystem will notify the controller that is has successfully >>>>>>> suspended. >>>>>>> Some subsystems will obviously want to do other actions on >>>>>>> suspend, e.g. >>>>>>> clustering will likely want to fail over, mod_cluster will >>>>>>> notify the >>>>>>> load balancer that the node is no longer available etc. In some >>>>>>> cases we >>>>>>> may want to make this configurable to an extent (e.g. Undertow >>>>>>> could be >>>>>>> configured to allow requests with an existing session, and not >>>>>>> consider >>>>>>> itself timed out until all sessions have either timed out or been >>>>>>> invalidated, although this will obviously take a while). >>>>>>> >>>>>>> If anyone has any feedback let me know. In terms of >>>>>>> implementation my >>>>>>> basic plan is to get the core functionality and the Undertow >>>>>>> implementation into Wildfly, and then work with subsystem >>>>>>> authors to >>>>>>> implement subsystem specific functionality once the core is in >>>>>>> place. >>>>>>> >>>>>>> Stuart >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> The >>>>>>> >>>>>>> A timeout attribute will also be added to the shutdown command, >>>>>>> _______________________________________________ >>>>>>> wildfly-dev mailing list >>>>>>> wildfly-dev(a)lists.jboss.org >>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>>>>> >>>>>> _______________________________________________ >>>>>> wildfly-dev mailing list >>>>>> wildfly-dev(a)lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>>> _______________________________________________ >>>>> wildfly-dev mailing list >>>>> wildfly-dev(a)lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>>> >>>> _______________________________________________ >>>> wildfly-dev mailing list >>>> wildfly-dev(a)lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev

Dimitris Andreadis

11:21 a.m.

Sure. Which justifies trying to avoid those issues in the first place ;) On 10/06/2014 17:50, Stuart Douglas wrote:

...

We can't really change that now, as it is part of our existing API. Stuart Dimitris Andreadis wrote: > It seems to me RESTART_REQUIRED (or RELOAD_REQUIRED) should be a boolean > on its own to simplify the state diagram. > > On 10/06/2014 17:40, Stuart Douglas wrote: >> I don't think so, I think RESTART_REQUIRED means running, but I need >> to restart to apply >> management changes (I think that attribute can also be >> RELOAD_REQUIRED, I think the >> description may be a bit out of date). >> >> To accurately reflect all the possible states you would need something >> like: >> >> RUNNING >> PAUSING, >> PAUSED, >> RESTART_REQUIRED >> PAUSING_RESTART_REQUIRED >> PAUSED_RESTART_REQUIRED >> RELOAD_REQUIRED >> PAUSING_RELOAD_REQUIRED >> PAUSED_RELOAD_REQUIRED >> >> Which does not seem great, and may introduce compatibility problems >> for clients that are not >> expecting these new values. >> >> Stuart >> >> >> >> Dimitris Andreadis wrote: >>> Isn't RESTART_REQUIRED also orthogonal to RUNNING? >>> >>> On 10/06/2014 17:17, Stuart Douglas wrote: >>>> They are actually orthogonal, a server can be in both RESTART_REQUIRED >>>> and any one of the >>>> suspend states. >>>> >>>> RESTART_REQUIRED is very much tied to services and the management >>>> model, while >>>> suspend/resume is a runtime only thing that should not touch the state >>>> of services. >>>> >>>> >>>> Stuart >>>> >>>> Dimitris Andreadis wrote: >>>>> Why not extend the states of the existing 'server-state' attribute to: >>>>> >>>>> (STARTING, RUNNING, SUSPENDING, SUSPENDED, RESTART_REQUIRED RUNNING) >>>>> >>>>> http://wildscribe.github.io/Wildfly/8.0.0.Final/index.html >>>>> >>>>> On 10/06/2014 04:40, Stuart Douglas wrote: >>>>>> >>>>>> Scott Marlow wrote: >>>>>>> On 06/09/2014 06:38 PM, Stuart Douglas wrote: >>>>>>>> Server suspend and resume is a feature that allows a running >>>>>>>> server to >>>>>>>> gracefully finish of all running requests. The most common use >>>>>>>> case for >>>>>>>> this is graceful shutdown, where you would like a server to >>>>>>>> complete all >>>>>>>> running requests, reject any new ones, and then shut down, however >>>>>>>> there >>>>>>>> are also plenty of other valid use cases (e.g. suspend the server, >>>>>>>> modify a data source or some other config, then resume). >>>>>>>> >>>>>>>> User View: >>>>>>>> >>>>>>>> For the users point of view two new operations will be added to >>>>>>>> the server: >>>>>>>> >>>>>>>> suspend(timeout) >>>>>>>> resume() >>>>>>>> >>>>>>>> A runtime only attribute suspend-state (is this a good name?) will >>>>>>>> also >>>>>>>> be added, that can take one of three possible values, RUNNING, >>>>>>>> SUSPENDING, SUSPENDED. >>>>>>> The SuspendController "state" might be a shorter attribute name and >>>>>>> just >>>>>>> as meaningful. >>>>>> This will be in the global server namespace (i.e. from the CLI >>>>>> :read-attribute(name="suspend-state"). >>>>>> >>>>>> I think the name 'state' is just two generic, which kind of state >>>>>> are we >>>>>> talking about? >>>>>> >>>>>>> When are we in the RUNNING state? Is that simply the pre-state for >>>>>>> SUSPENDING? >>>>>> 99.99% of the time. Basically servers are always running unless they >>>>>> are >>>>>> have been explicitly suspended, and then they go from suspending to >>>>>> suspended. Note that if resume is called at any time the server >>>>>> goes to >>>>>> RUNNING again immediately, as when subsystems are notified they >>>>>> should >>>>>> be able to begin accepting requests again straight away. >>>>>> >>>>>> We also have admin only mode, which is a kinda similar concept, so we >>>>>> need to make sure we document the differences. >>>>>> >>>>>>>> A timeout attribute will also be added to the shutdown >>>>>>>> operation. If >>>>>>>> this is present then the server will first be suspended, and the >>>>>>>> server >>>>>>>> will not shut down until either the suspend is successful or the >>>>>>>> timeout >>>>>>>> occurs. If no timeout parameter is passed to the operation then a >>>>>>>> normal >>>>>>>> non-graceful shutdown will take place. >>>>>>> Will non-graceful shutdown wait for non-daemon threads or terminate >>>>>>> immediately (call System.exit()). >>>>>> It will execute the same way it does today (all services will shut >>>>>> down >>>>>> and then the server will exit). >>>>>> >>>>>> Stuart >>>>>> >>>>>>>> In domain mode these operations will be added to both individual >>>>>>>> server >>>>>>>> and a complete server group. >>>>>>>> >>>>>>>> Implementation Details >>>>>>>> >>>>>>>> Suspend/resume operates on entry points to the server. Any request >>>>>>>> that >>>>>>>> is currently running must not be affected by the suspend state, >>>>>>>> however >>>>>>>> any new request should be rejected. In general subsystems will >>>>>>>> track the >>>>>>>> number of outstanding requests, and when this hits zero they are >>>>>>>> considered suspended. >>>>>>>> >>>>>>>> We will introduce the notion of a global SuspendController, that >>>>>>>> manages >>>>>>>> the servers suspend state. All subsystems that wish to do a >>>>>>>> graceful >>>>>>>> shutdown register callback handlers with this controller. >>>>>>>> >>>>>>>> When the suspend() operation is invoked the controller will invoke >>>>>>>> all >>>>>>>> these callbacks, letting the subsystem know that the server is >>>>>>>> suspend, >>>>>>>> and providing the subsystem with a SuspendContext object that the >>>>>>>> subsystem can then use to notify the controller that the suspend is >>>>>>>> complete. >>>>>>>> >>>>>>>> What the subsystem does when it receives a suspend command, and >>>>>>>> when it >>>>>>>> considers itself suspended will vary, but in the common case it >>>>>>>> will >>>>>>>> immediatly start rejecting external requests (e.g. Undertow will >>>>>>>> start >>>>>>>> responding with a 503 to all new requests). The subsystem will also >>>>>>>> track the number of outstanding requests, and when this hits zero >>>>>>>> then >>>>>>>> the subsystem will notify the controller that is has successfully >>>>>>>> suspended. >>>>>>>> Some subsystems will obviously want to do other actions on >>>>>>>> suspend, e.g. >>>>>>>> clustering will likely want to fail over, mod_cluster will >>>>>>>> notify the >>>>>>>> load balancer that the node is no longer available etc. In some >>>>>>>> cases we >>>>>>>> may want to make this configurable to an extent (e.g. Undertow >>>>>>>> could be >>>>>>>> configured to allow requests with an existing session, and not >>>>>>>> consider >>>>>>>> itself timed out until all sessions have either timed out or been >>>>>>>> invalidated, although this will obviously take a while). >>>>>>>> >>>>>>>> If anyone has any feedback let me know. In terms of >>>>>>>> implementation my >>>>>>>> basic plan is to get the core functionality and the Undertow >>>>>>>> implementation into Wildfly, and then work with subsystem >>>>>>>> authors to >>>>>>>> implement subsystem specific functionality once the core is in >>>>>>>> place. >>>>>>>> >>>>>>>> Stuart >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> The >>>>>>>> >>>>>>>> A timeout attribute will also be added to the shutdown command, >>>>>>>> _______________________________________________ >>>>>>>> wildfly-dev mailing list >>>>>>>> wildfly-dev(a)lists.jboss.org >>>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> wildfly-dev mailing list >>>>>>> wildfly-dev(a)lists.jboss.org >>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>>>> _______________________________________________ >>>>>> wildfly-dev mailing list >>>>>> wildfly-dev(a)lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>>>> >>>>> _______________________________________________ >>>>> wildfly-dev mailing list >>>>> wildfly-dev(a)lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev

Brian Stansberry

Wednesday, 11 June Wed, 11 Jun

11:21 a.m.

...

Sure. Which justifies trying to avoid those issues in the first place ;) On 10/06/2014 17:50, Stuart Douglas wrote: > We can't really change that now, as it is part of our existing API. > > Stuart > > Dimitris Andreadis wrote: >> It seems to me RESTART_REQUIRED (or RELOAD_REQUIRED) should be a boolean >> on its own to simplify the state diagram. >> >> On 10/06/2014 17:40, Stuart Douglas wrote: >>> I don't think so, I think RESTART_REQUIRED means running, but I need >>> to restart to apply >>> management changes (I think that attribute can also be >>> RELOAD_REQUIRED, I think the >>> description may be a bit out of date). >>> >>> To accurately reflect all the possible states you would need something >>> like: >>> >>> RUNNING >>> PAUSING, >>> PAUSED, >>> RESTART_REQUIRED >>> PAUSING_RESTART_REQUIRED >>> PAUSED_RESTART_REQUIRED >>> RELOAD_REQUIRED >>> PAUSING_RELOAD_REQUIRED >>> PAUSED_RELOAD_REQUIRED >>> >>> Which does not seem great, and may introduce compatibility problems >>> for clients that are not >>> expecting these new values. >>> >>> Stuart >>> >>> >>> >>> Dimitris Andreadis wrote: >>>> Isn't RESTART_REQUIRED also orthogonal to RUNNING? >>>> >>>> On 10/06/2014 17:17, Stuart Douglas wrote: >>>>> They are actually orthogonal, a server can be in both RESTART_REQUIRED >>>>> and any one of the >>>>> suspend states. >>>>> >>>>> RESTART_REQUIRED is very much tied to services and the management >>>>> model, while >>>>> suspend/resume is a runtime only thing that should not touch the state >>>>> of services. >>>>> >>>>> >>>>> Stuart >>>>> >>>>> Dimitris Andreadis wrote: >>>>>> Why not extend the states of the existing 'server-state' attribute to: >>>>>> >>>>>> (STARTING, RUNNING, SUSPENDING, SUSPENDED, RESTART_REQUIRED RUNNING) >>>>>> >>>>>> http://wildscribe.github.io/Wildfly/8.0.0.Final/index.html >>>>>> >>>>>> On 10/06/2014 04:40, Stuart Douglas wrote: >>>>>>> >>>>>>> Scott Marlow wrote: >>>>>>>> On 06/09/2014 06:38 PM, Stuart Douglas wrote: >>>>>>>>> Server suspend and resume is a feature that allows a running >>>>>>>>> server to >>>>>>>>> gracefully finish of all running requests. The most common use >>>>>>>>> case for >>>>>>>>> this is graceful shutdown, where you would like a server to >>>>>>>>> complete all >>>>>>>>> running requests, reject any new ones, and then shut down, however >>>>>>>>> there >>>>>>>>> are also plenty of other valid use cases (e.g. suspend the server, >>>>>>>>> modify a data source or some other config, then resume). >>>>>>>>> >>>>>>>>> User View: >>>>>>>>> >>>>>>>>> For the users point of view two new operations will be added to >>>>>>>>> the server: >>>>>>>>> >>>>>>>>> suspend(timeout) >>>>>>>>> resume() >>>>>>>>> >>>>>>>>> A runtime only attribute suspend-state (is this a good name?) will >>>>>>>>> also >>>>>>>>> be added, that can take one of three possible values, RUNNING, >>>>>>>>> SUSPENDING, SUSPENDED. >>>>>>>> The SuspendController "state" might be a shorter attribute name and >>>>>>>> just >>>>>>>> as meaningful. >>>>>>> This will be in the global server namespace (i.e. from the CLI >>>>>>> :read-attribute(name="suspend-state"). >>>>>>> >>>>>>> I think the name 'state' is just two generic, which kind of state >>>>>>> are we >>>>>>> talking about? >>>>>>> >>>>>>>> When are we in the RUNNING state? Is that simply the pre-state for >>>>>>>> SUSPENDING? >>>>>>> 99.99% of the time. Basically servers are always running unless they >>>>>>> are >>>>>>> have been explicitly suspended, and then they go from suspending to >>>>>>> suspended. Note that if resume is called at any time the server >>>>>>> goes to >>>>>>> RUNNING again immediately, as when subsystems are notified they >>>>>>> should >>>>>>> be able to begin accepting requests again straight away. >>>>>>> >>>>>>> We also have admin only mode, which is a kinda similar concept, so we >>>>>>> need to make sure we document the differences. >>>>>>> >>>>>>>>> A timeout attribute will also be added to the shutdown >>>>>>>>> operation. If >>>>>>>>> this is present then the server will first be suspended, and the >>>>>>>>> server >>>>>>>>> will not shut down until either the suspend is successful or the >>>>>>>>> timeout >>>>>>>>> occurs. If no timeout parameter is passed to the operation then a >>>>>>>>> normal >>>>>>>>> non-graceful shutdown will take place. >>>>>>>> Will non-graceful shutdown wait for non-daemon threads or terminate >>>>>>>> immediately (call System.exit()). >>>>>>> It will execute the same way it does today (all services will shut >>>>>>> down >>>>>>> and then the server will exit). >>>>>>> >>>>>>> Stuart >>>>>>> >>>>>>>>> In domain mode these operations will be added to both individual >>>>>>>>> server >>>>>>>>> and a complete server group. >>>>>>>>> >>>>>>>>> Implementation Details >>>>>>>>> >>>>>>>>> Suspend/resume operates on entry points to the server. Any request >>>>>>>>> that >>>>>>>>> is currently running must not be affected by the suspend state, >>>>>>>>> however >>>>>>>>> any new request should be rejected. In general subsystems will >>>>>>>>> track the >>>>>>>>> number of outstanding requests, and when this hits zero they are >>>>>>>>> considered suspended. >>>>>>>>> >>>>>>>>> We will introduce the notion of a global SuspendController, that >>>>>>>>> manages >>>>>>>>> the servers suspend state. All subsystems that wish to do a >>>>>>>>> graceful >>>>>>>>> shutdown register callback handlers with this controller. >>>>>>>>> >>>>>>>>> When the suspend() operation is invoked the controller will invoke >>>>>>>>> all >>>>>>>>> these callbacks, letting the subsystem know that the server is >>>>>>>>> suspend, >>>>>>>>> and providing the subsystem with a SuspendContext object that the >>>>>>>>> subsystem can then use to notify the controller that the suspend is >>>>>>>>> complete. >>>>>>>>> >>>>>>>>> What the subsystem does when it receives a suspend command, and >>>>>>>>> when it >>>>>>>>> considers itself suspended will vary, but in the common case it >>>>>>>>> will >>>>>>>>> immediatly start rejecting external requests (e.g. Undertow will >>>>>>>>> start >>>>>>>>> responding with a 503 to all new requests). The subsystem will also >>>>>>>>> track the number of outstanding requests, and when this hits zero >>>>>>>>> then >>>>>>>>> the subsystem will notify the controller that is has successfully >>>>>>>>> suspended. >>>>>>>>> Some subsystems will obviously want to do other actions on >>>>>>>>> suspend, e.g. >>>>>>>>> clustering will likely want to fail over, mod_cluster will >>>>>>>>> notify the >>>>>>>>> load balancer that the node is no longer available etc. In some >>>>>>>>> cases we >>>>>>>>> may want to make this configurable to an extent (e.g. Undertow >>>>>>>>> could be >>>>>>>>> configured to allow requests with an existing session, and not >>>>>>>>> consider >>>>>>>>> itself timed out until all sessions have either timed out or been >>>>>>>>> invalidated, although this will obviously take a while). >>>>>>>>> >>>>>>>>> If anyone has any feedback let me know. In terms of >>>>>>>>> implementation my >>>>>>>>> basic plan is to get the core functionality and the Undertow >>>>>>>>> implementation into Wildfly, and then work with subsystem >>>>>>>>> authors to >>>>>>>>> implement subsystem specific functionality once the core is in >>>>>>>>> place. >>>>>>>>> >>>>>>>>> Stuart >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> The >>>>>>>>> >>>>>>>>> A timeout attribute will also be added to the shutdown command, >>>>>>>>> _______________________________________________ >>>>>>>>> wildfly-dev mailing list >>>>>>>>> wildfly-dev(a)lists.jboss.org >>>>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> wildfly-dev mailing list >>>>>>>> wildfly-dev(a)lists.jboss.org >>>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>>>>> _______________________________________________ >>>>>>> wildfly-dev mailing list >>>>>>> wildfly-dev(a)lists.jboss.org >>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>>>>> >>>>>> _______________________________________________ >>>>>> wildfly-dev mailing list >>>>>> wildfly-dev(a)lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

-- Brian Stansberry Senior Principal Software Engineer JBoss by Red Hat

Brian Stansberry

11:47 a.m.

The STARTING state in the existing attribute makes me think an equivalent thing is needed for this concept. STARTING in the existing means the runtime services are possibly out of sync due to boot. Doesn't a similar problem exist with RUNNING, SUSPENDING, SUSPENDED? It's about how the server is reacting to external requests. There's some state during boot/reload when the server is not reacting normally to external requests. Perhaps that's just another condition where the server is SUSPENDED. This leads to whether this whole mechanism can be used to provide "Graceful Startup". We have problems with this now; endpoints accepting requests before everything is fully ready, leading to things like 404s because a deployment is installed yet. On 6/11/14, 11:21 AM, Brian Stansberry wrote:

...

I do think these are orthogonal and should not be combined. The existing attribute is fundamentally about how the state of the runtime services relates to the persistent configuration. STARTING == out of sync due to still getting in sync during start RUNNING == in sync RELOAD_REQURIRED = out of sync, needs a reload to get in sync RESTART_REQUIRED = out of sync, needs a full process restart to get in sync There are two problems though with the existing attribute that exposes this: 1) It's named "server-state" on a server and "host-state" on a Host Controller. Really crappy name; way too broad. That's fixable by creating a new attribute and making the old one an alias for compatibility purposes. 2) The RUNNING state is really poorly named. The could perhaps be fixed by coming up with a new name and translating it back to "RUNNING" in the handlers for the legacy "server-state" and "host-state" attributes. On 6/10/14, 11:21 AM, Dimitris Andreadis wrote: > Sure. Which justifies trying to avoid those issues in the first place ;) > > On 10/06/2014 17:50, Stuart Douglas wrote: >> We can't really change that now, as it is part of our existing API. >> >> Stuart >> >> Dimitris Andreadis wrote: >>> It seems to me RESTART_REQUIRED (or RELOAD_REQUIRED) should be a boolean >>> on its own to simplify the state diagram. >>> >>> On 10/06/2014 17:40, Stuart Douglas wrote: >>>> I don't think so, I think RESTART_REQUIRED means running, but I need >>>> to restart to apply >>>> management changes (I think that attribute can also be >>>> RELOAD_REQUIRED, I think the >>>> description may be a bit out of date). >>>> >>>> To accurately reflect all the possible states you would need something >>>> like: >>>> >>>> RUNNING >>>> PAUSING, >>>> PAUSED, >>>> RESTART_REQUIRED >>>> PAUSING_RESTART_REQUIRED >>>> PAUSED_RESTART_REQUIRED >>>> RELOAD_REQUIRED >>>> PAUSING_RELOAD_REQUIRED >>>> PAUSED_RELOAD_REQUIRED >>>> >>>> Which does not seem great, and may introduce compatibility problems >>>> for clients that are not >>>> expecting these new values. >>>> >>>> Stuart >>>> >>>> >>>> >>>> Dimitris Andreadis wrote: >>>>> Isn't RESTART_REQUIRED also orthogonal to RUNNING? >>>>> >>>>> On 10/06/2014 17:17, Stuart Douglas wrote: >>>>>> They are actually orthogonal, a server can be in both RESTART_REQUIRED >>>>>> and any one of the >>>>>> suspend states. >>>>>> >>>>>> RESTART_REQUIRED is very much tied to services and the management >>>>>> model, while >>>>>> suspend/resume is a runtime only thing that should not touch the state >>>>>> of services. >>>>>> >>>>>> >>>>>> Stuart >>>>>> >>>>>> Dimitris Andreadis wrote: >>>>>>> Why not extend the states of the existing 'server-state' attribute to: >>>>>>> >>>>>>> (STARTING, RUNNING, SUSPENDING, SUSPENDED, RESTART_REQUIRED RUNNING) >>>>>>> >>>>>>> http://wildscribe.github.io/Wildfly/8.0.0.Final/index.html >>>>>>> >>>>>>> On 10/06/2014 04:40, Stuart Douglas wrote: >>>>>>>> >>>>>>>> Scott Marlow wrote: >>>>>>>>> On 06/09/2014 06:38 PM, Stuart Douglas wrote: >>>>>>>>>> Server suspend and resume is a feature that allows a running >>>>>>>>>> server to >>>>>>>>>> gracefully finish of all running requests. The most common use >>>>>>>>>> case for >>>>>>>>>> this is graceful shutdown, where you would like a server to >>>>>>>>>> complete all >>>>>>>>>> running requests, reject any new ones, and then shut down, however >>>>>>>>>> there >>>>>>>>>> are also plenty of other valid use cases (e.g. suspend the server, >>>>>>>>>> modify a data source or some other config, then resume). >>>>>>>>>> >>>>>>>>>> User View: >>>>>>>>>> >>>>>>>>>> For the users point of view two new operations will be added to >>>>>>>>>> the server: >>>>>>>>>> >>>>>>>>>> suspend(timeout) >>>>>>>>>> resume() >>>>>>>>>> >>>>>>>>>> A runtime only attribute suspend-state (is this a good name?) will >>>>>>>>>> also >>>>>>>>>> be added, that can take one of three possible values, RUNNING, >>>>>>>>>> SUSPENDING, SUSPENDED. >>>>>>>>> The SuspendController "state" might be a shorter attribute name and >>>>>>>>> just >>>>>>>>> as meaningful. >>>>>>>> This will be in the global server namespace (i.e. from the CLI >>>>>>>> :read-attribute(name="suspend-state"). >>>>>>>> >>>>>>>> I think the name 'state' is just two generic, which kind of state >>>>>>>> are we >>>>>>>> talking about? >>>>>>>> >>>>>>>>> When are we in the RUNNING state? Is that simply the pre-state for >>>>>>>>> SUSPENDING? >>>>>>>> 99.99% of the time. Basically servers are always running unless they >>>>>>>> are >>>>>>>> have been explicitly suspended, and then they go from suspending to >>>>>>>> suspended. Note that if resume is called at any time the server >>>>>>>> goes to >>>>>>>> RUNNING again immediately, as when subsystems are notified they >>>>>>>> should >>>>>>>> be able to begin accepting requests again straight away. >>>>>>>> >>>>>>>> We also have admin only mode, which is a kinda similar concept, so we >>>>>>>> need to make sure we document the differences. >>>>>>>> >>>>>>>>>> A timeout attribute will also be added to the shutdown >>>>>>>>>> operation. If >>>>>>>>>> this is present then the server will first be suspended, and the >>>>>>>>>> server >>>>>>>>>> will not shut down until either the suspend is successful or the >>>>>>>>>> timeout >>>>>>>>>> occurs. If no timeout parameter is passed to the operation then a >>>>>>>>>> normal >>>>>>>>>> non-graceful shutdown will take place. >>>>>>>>> Will non-graceful shutdown wait for non-daemon threads or terminate >>>>>>>>> immediately (call System.exit()). >>>>>>>> It will execute the same way it does today (all services will shut >>>>>>>> down >>>>>>>> and then the server will exit). >>>>>>>> >>>>>>>> Stuart >>>>>>>> >>>>>>>>>> In domain mode these operations will be added to both individual >>>>>>>>>> server >>>>>>>>>> and a complete server group. >>>>>>>>>> >>>>>>>>>> Implementation Details >>>>>>>>>> >>>>>>>>>> Suspend/resume operates on entry points to the server. Any request >>>>>>>>>> that >>>>>>>>>> is currently running must not be affected by the suspend state, >>>>>>>>>> however >>>>>>>>>> any new request should be rejected. In general subsystems will >>>>>>>>>> track the >>>>>>>>>> number of outstanding requests, and when this hits zero they are >>>>>>>>>> considered suspended. >>>>>>>>>> >>>>>>>>>> We will introduce the notion of a global SuspendController, that >>>>>>>>>> manages >>>>>>>>>> the servers suspend state. All subsystems that wish to do a >>>>>>>>>> graceful >>>>>>>>>> shutdown register callback handlers with this controller. >>>>>>>>>> >>>>>>>>>> When the suspend() operation is invoked the controller will invoke >>>>>>>>>> all >>>>>>>>>> these callbacks, letting the subsystem know that the server is >>>>>>>>>> suspend, >>>>>>>>>> and providing the subsystem with a SuspendContext object that the >>>>>>>>>> subsystem can then use to notify the controller that the suspend is >>>>>>>>>> complete. >>>>>>>>>> >>>>>>>>>> What the subsystem does when it receives a suspend command, and >>>>>>>>>> when it >>>>>>>>>> considers itself suspended will vary, but in the common case it >>>>>>>>>> will >>>>>>>>>> immediatly start rejecting external requests (e.g. Undertow will >>>>>>>>>> start >>>>>>>>>> responding with a 503 to all new requests). The subsystem will also >>>>>>>>>> track the number of outstanding requests, and when this hits zero >>>>>>>>>> then >>>>>>>>>> the subsystem will notify the controller that is has successfully >>>>>>>>>> suspended. >>>>>>>>>> Some subsystems will obviously want to do other actions on >>>>>>>>>> suspend, e.g. >>>>>>>>>> clustering will likely want to fail over, mod_cluster will >>>>>>>>>> notify the >>>>>>>>>> load balancer that the node is no longer available etc. In some >>>>>>>>>> cases we >>>>>>>>>> may want to make this configurable to an extent (e.g. Undertow >>>>>>>>>> could be >>>>>>>>>> configured to allow requests with an existing session, and not >>>>>>>>>> consider >>>>>>>>>> itself timed out until all sessions have either timed out or been >>>>>>>>>> invalidated, although this will obviously take a while). >>>>>>>>>> >>>>>>>>>> If anyone has any feedback let me know. In terms of >>>>>>>>>> implementation my >>>>>>>>>> basic plan is to get the core functionality and the Undertow >>>>>>>>>> implementation into Wildfly, and then work with subsystem >>>>>>>>>> authors to >>>>>>>>>> implement subsystem specific functionality once the core is in >>>>>>>>>> place. >>>>>>>>>> >>>>>>>>>> Stuart >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The >>>>>>>>>> >>>>>>>>>> A timeout attribute will also be added to the shutdown command, >>>>>>>>>> _______________________________________________ >>>>>>>>>> wildfly-dev mailing list >>>>>>>>>> wildfly-dev(a)lists.jboss.org >>>>>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> wildfly-dev mailing list >>>>>>>>> wildfly-dev(a)lists.jboss.org >>>>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>>>>>> _______________________________________________ >>>>>>>> wildfly-dev mailing list >>>>>>>> wildfly-dev(a)lists.jboss.org >>>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> wildfly-dev mailing list >>>>>>> wildfly-dev(a)lists.jboss.org >>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/wildfly-dev >

-- Brian Stansberry Senior Principal Software Engineer JBoss by Red Hat

Stuart Douglas

Thursday, 12 June Thu, 12 Jun

9:45 a.m.

Brian Stansberry wrote:

...

The STARTING state in the existing attribute makes me think an equivalent thing is needed for this concept.

This is a good idea, we could do this by just having a serverStarted() notification that gets sent to all subsystems. This will allow them to basically start in a paused state, and only allow access after the server is up. Stuart

...

STARTING in the existing means the runtime services are possibly out of sync due to boot. Doesn't a similar problem exist with RUNNING, SUSPENDING, SUSPENDED? It's about how the server is reacting to external requests. There's some state during boot/reload when the server is not reacting normally to external requests. Perhaps that's just another condition where the server is SUSPENDED. This leads to whether this whole mechanism can be used to provide "Graceful Startup". We have problems with this now; endpoints accepting requests before everything is fully ready, leading to things like 404s because a deployment is installed yet. On 6/11/14, 11:21 AM, Brian Stansberry wrote: > I do think these are orthogonal and should not be combined. > > The existing attribute is fundamentally about how the state of the > runtime services relates to the persistent configuration. > > STARTING == out of sync due to still getting in sync during start > RUNNING == in sync > RELOAD_REQURIRED = out of sync, needs a reload to get in sync > RESTART_REQUIRED = out of sync, needs a full process restart to get in sync > > There are two problems though with the existing attribute that exposes this: > > 1) It's named "server-state" on a server and "host-state" on a Host > Controller. Really crappy name; way too broad. > > That's fixable by creating a new attribute and making the old one an > alias for compatibility purposes. > > 2) The RUNNING state is really poorly named. > > The could perhaps be fixed by coming up with a new name and translating > it back to "RUNNING" in the handlers for the legacy "server-state" and > "host-state" attributes. > > > On 6/10/14, 11:21 AM, Dimitris Andreadis wrote: >> Sure. Which justifies trying to avoid those issues in the first place ;) >> >> On 10/06/2014 17:50, Stuart Douglas wrote: >>> We can't really change that now, as it is part of our existing API. >>> >>> Stuart >>> >>> Dimitris Andreadis wrote: >>>> It seems to me RESTART_REQUIRED (or RELOAD_REQUIRED) should be a boolean >>>> on its own to simplify the state diagram. >>>> >>>> On 10/06/2014 17:40, Stuart Douglas wrote: >>>>> I don't think so, I think RESTART_REQUIRED means running, but I need >>>>> to restart to apply >>>>> management changes (I think that attribute can also be >>>>> RELOAD_REQUIRED, I think the >>>>> description may be a bit out of date). >>>>> >>>>> To accurately reflect all the possible states you would need something >>>>> like: >>>>> >>>>> RUNNING >>>>> PAUSING, >>>>> PAUSED, >>>>> RESTART_REQUIRED >>>>> PAUSING_RESTART_REQUIRED >>>>> PAUSED_RESTART_REQUIRED >>>>> RELOAD_REQUIRED >>>>> PAUSING_RELOAD_REQUIRED >>>>> PAUSED_RELOAD_REQUIRED >>>>> >>>>> Which does not seem great, and may introduce compatibility problems >>>>> for clients that are not >>>>> expecting these new values. >>>>> >>>>> Stuart >>>>> >>>>> >>>>> >>>>> Dimitris Andreadis wrote: >>>>>> Isn't RESTART_REQUIRED also orthogonal to RUNNING? >>>>>> >>>>>> On 10/06/2014 17:17, Stuart Douglas wrote: >>>>>>> They are actually orthogonal, a server can be in both RESTART_REQUIRED >>>>>>> and any one of the >>>>>>> suspend states. >>>>>>> >>>>>>> RESTART_REQUIRED is very much tied to services and the management >>>>>>> model, while >>>>>>> suspend/resume is a runtime only thing that should not touch the state >>>>>>> of services. >>>>>>> >>>>>>> >>>>>>> Stuart >>>>>>> >>>>>>> Dimitris Andreadis wrote: >>>>>>>> Why not extend the states of the existing 'server-state' attribute to: >>>>>>>> >>>>>>>> (STARTING, RUNNING, SUSPENDING, SUSPENDED, RESTART_REQUIRED RUNNING) >>>>>>>> >>>>>>>> http://wildscribe.github.io/Wildfly/8.0.0.Final/index.html >>>>>>>> >>>>>>>> On 10/06/2014 04:40, Stuart Douglas wrote: >>>>>>>>> Scott Marlow wrote: >>>>>>>>>> On 06/09/2014 06:38 PM, Stuart Douglas wrote: >>>>>>>>>>> Server suspend and resume is a feature that allows a running >>>>>>>>>>> server to >>>>>>>>>>> gracefully finish of all running requests. The most common use >>>>>>>>>>> case for >>>>>>>>>>> this is graceful shutdown, where you would like a server to >>>>>>>>>>> complete all >>>>>>>>>>> running requests, reject any new ones, and then shut down, however >>>>>>>>>>> there >>>>>>>>>>> are also plenty of other valid use cases (e.g. suspend the server, >>>>>>>>>>> modify a data source or some other config, then resume). >>>>>>>>>>> >>>>>>>>>>> User View: >>>>>>>>>>> >>>>>>>>>>> For the users point of view two new operations will be added to >>>>>>>>>>> the server: >>>>>>>>>>> >>>>>>>>>>> suspend(timeout) >>>>>>>>>>> resume() >>>>>>>>>>> >>>>>>>>>>> A runtime only attribute suspend-state (is this a good name?) will >>>>>>>>>>> also >>>>>>>>>>> be added, that can take one of three possible values, RUNNING, >>>>>>>>>>> SUSPENDING, SUSPENDED. >>>>>>>>>> The SuspendController "state" might be a shorter attribute name and >>>>>>>>>> just >>>>>>>>>> as meaningful. >>>>>>>>> This will be in the global server namespace (i.e. from the CLI >>>>>>>>> :read-attribute(name="suspend-state"). >>>>>>>>> >>>>>>>>> I think the name 'state' is just two generic, which kind of state >>>>>>>>> are we >>>>>>>>> talking about? >>>>>>>>> >>>>>>>>>> When are we in the RUNNING state? Is that simply the pre-state for >>>>>>>>>> SUSPENDING? >>>>>>>>> 99.99% of the time. Basically servers are always running unless they >>>>>>>>> are >>>>>>>>> have been explicitly suspended, and then they go from suspending to >>>>>>>>> suspended. Note that if resume is called at any time the server >>>>>>>>> goes to >>>>>>>>> RUNNING again immediately, as when subsystems are notified they >>>>>>>>> should >>>>>>>>> be able to begin accepting requests again straight away. >>>>>>>>> >>>>>>>>> We also have admin only mode, which is a kinda similar concept, so we >>>>>>>>> need to make sure we document the differences. >>>>>>>>> >>>>>>>>>>> A timeout attribute will also be added to the shutdown >>>>>>>>>>> operation. If >>>>>>>>>>> this is present then the server will first be suspended, and the >>>>>>>>>>> server >>>>>>>>>>> will not shut down until either the suspend is successful or the >>>>>>>>>>> timeout >>>>>>>>>>> occurs. If no timeout parameter is passed to the operation then a >>>>>>>>>>> normal >>>>>>>>>>> non-graceful shutdown will take place. >>>>>>>>>> Will non-graceful shutdown wait for non-daemon threads or terminate >>>>>>>>>> immediately (call System.exit()). >>>>>>>>> It will execute the same way it does today (all services will shut >>>>>>>>> down >>>>>>>>> and then the server will exit). >>>>>>>>> >>>>>>>>> Stuart >>>>>>>>> >>>>>>>>>>> In domain mode these operations will be added to both individual >>>>>>>>>>> server >>>>>>>>>>> and a complete server group. >>>>>>>>>>> >>>>>>>>>>> Implementation Details >>>>>>>>>>> >>>>>>>>>>> Suspend/resume operates on entry points to the server. Any request >>>>>>>>>>> that >>>>>>>>>>> is currently running must not be affected by the suspend state, >>>>>>>>>>> however >>>>>>>>>>> any new request should be rejected. In general subsystems will >>>>>>>>>>> track the >>>>>>>>>>> number of outstanding requests, and when this hits zero they are >>>>>>>>>>> considered suspended. >>>>>>>>>>> >>>>>>>>>>> We will introduce the notion of a global SuspendController, that >>>>>>>>>>> manages >>>>>>>>>>> the servers suspend state. All subsystems that wish to do a >>>>>>>>>>> graceful >>>>>>>>>>> shutdown register callback handlers with this controller. >>>>>>>>>>> >>>>>>>>>>> When the suspend() operation is invoked the controller will invoke >>>>>>>>>>> all >>>>>>>>>>> these callbacks, letting the subsystem know that the server is >>>>>>>>>>> suspend, >>>>>>>>>>> and providing the subsystem with a SuspendContext object that the >>>>>>>>>>> subsystem can then use to notify the controller that the suspend is >>>>>>>>>>> complete. >>>>>>>>>>> >>>>>>>>>>> What the subsystem does when it receives a suspend command, and >>>>>>>>>>> when it >>>>>>>>>>> considers itself suspended will vary, but in the common case it >>>>>>>>>>> will >>>>>>>>>>> immediatly start rejecting external requests (e.g. Undertow will >>>>>>>>>>> start >>>>>>>>>>> responding with a 503 to all new requests). The subsystem will also >>>>>>>>>>> track the number of outstanding requests, and when this hits zero >>>>>>>>>>> then >>>>>>>>>>> the subsystem will notify the controller that is has successfully >>>>>>>>>>> suspended. >>>>>>>>>>> Some subsystems will obviously want to do other actions on >>>>>>>>>>> suspend, e.g. >>>>>>>>>>> clustering will likely want to fail over, mod_cluster will >>>>>>>>>>> notify the >>>>>>>>>>> load balancer that the node is no longer available etc. In some >>>>>>>>>>> cases we >>>>>>>>>>> may want to make this configurable to an extent (e.g. Undertow >>>>>>>>>>> could be >>>>>>>>>>> configured to allow requests with an existing session, and not >>>>>>>>>>> consider >>>>>>>>>>> itself timed out until all sessions have either timed out or been >>>>>>>>>>> invalidated, although this will obviously take a while). >>>>>>>>>>> >>>>>>>>>>> If anyone has any feedback let me know. In terms of >>>>>>>>>>> implementation my >>>>>>>>>>> basic plan is to get the core functionality and the Undertow >>>>>>>>>>> implementation into Wildfly, and then work with subsystem >>>>>>>>>>> authors to >>>>>>>>>>> implement subsystem specific functionality once the core is in >>>>>>>>>>> place. >>>>>>>>>>> >>>>>>>>>>> Stuart >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The >>>>>>>>>>> >>>>>>>>>>> A timeout attribute will also be added to the shutdown command, >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> wildfly-dev mailing list >>>>>>>>>>> wildfly-dev(a)lists.jboss.org >>>>>>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> wildfly-dev mailing list >>>>>>>>>> wildfly-dev(a)lists.jboss.org >>>>>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>>>>>>> _______________________________________________ >>>>>>>>> wildfly-dev mailing list >>>>>>>>> wildfly-dev(a)lists.jboss.org >>>>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> wildfly-dev mailing list >>>>>>>> wildfly-dev(a)lists.jboss.org >>>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >> _______________________________________________ >> wildfly-dev mailing list >> wildfly-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/wildfly-dev >> >

Jason T. Greene

Tuesday, 10 June Tue, 10 Jun

11:57 a.m.

You could potentially eliminate the PAUSING_*_REQUIRED states, since you typically wouldn't want to recommend restart in them

...

On Jun 10, 2014, at 10:41 AM, Stuart Douglas <stuart.w.douglas(a)gmail.com> wrote: I don't think so, I think RESTART_REQUIRED means running, but I need to restart to apply management changes (I think that attribute can also be RELOAD_REQUIRED, I think the description may be a bit out of date). To accurately reflect all the possible states you would need something like: RUNNING PAUSING, PAUSED, RESTART_REQUIRED PAUSING_RESTART_REQUIRED PAUSED_RESTART_REQUIRED RELOAD_REQUIRED PAUSING_RELOAD_REQUIRED PAUSED_RELOAD_REQUIRED Which does not seem great, and may introduce compatibility problems for clients that are not expecting these new values. Stuart Dimitris Andreadis wrote: > Isn't RESTART_REQUIRED also orthogonal to RUNNING? > >> On 10/06/2014 17:17, Stuart Douglas wrote: >> They are actually orthogonal, a server can be in both RESTART_REQUIRED >> and any one of the >> suspend states. >> >> RESTART_REQUIRED is very much tied to services and the management >> model, while >> suspend/resume is a runtime only thing that should not touch the state >> of services. >> >> >> Stuart >> >> Dimitris Andreadis wrote: >>> Why not extend the states of the existing 'server-state' attribute to: >>> >>> (STARTING, RUNNING, SUSPENDING, SUSPENDED, RESTART_REQUIRED RUNNING) >>> >>> http://wildscribe.github.io/Wildfly/8.0.0.Final/index.html >>> >>>> On 10/06/2014 04:40, Stuart Douglas wrote: >>>> >>>> Scott Marlow wrote: >>>>>> On 06/09/2014 06:38 PM, Stuart Douglas wrote: >>>>>> Server suspend and resume is a feature that allows a running >>>>>> server to >>>>>> gracefully finish of all running requests. The most common use >>>>>> case for >>>>>> this is graceful shutdown, where you would like a server to >>>>>> complete all >>>>>> running requests, reject any new ones, and then shut down, however >>>>>> there >>>>>> are also plenty of other valid use cases (e.g. suspend the server, >>>>>> modify a data source or some other config, then resume). >>>>>> >>>>>> User View: >>>>>> >>>>>> For the users point of view two new operations will be added to >>>>>> the server: >>>>>> >>>>>> suspend(timeout) >>>>>> resume() >>>>>> >>>>>> A runtime only attribute suspend-state (is this a good name?) will >>>>>> also >>>>>> be added, that can take one of three possible values, RUNNING, >>>>>> SUSPENDING, SUSPENDED. >>>>> The SuspendController "state" might be a shorter attribute name and >>>>> just >>>>> as meaningful. >>>> This will be in the global server namespace (i.e. from the CLI >>>> :read-attribute(name="suspend-state"). >>>> >>>> I think the name 'state' is just two generic, which kind of state >>>> are we >>>> talking about? >>>> >>>>> When are we in the RUNNING state? Is that simply the pre-state for >>>>> SUSPENDING? >>>> 99.99% of the time. Basically servers are always running unless they >>>> are >>>> have been explicitly suspended, and then they go from suspending to >>>> suspended. Note that if resume is called at any time the server goes to >>>> RUNNING again immediately, as when subsystems are notified they should >>>> be able to begin accepting requests again straight away. >>>> >>>> We also have admin only mode, which is a kinda similar concept, so we >>>> need to make sure we document the differences. >>>> >>>>>> A timeout attribute will also be added to the shutdown operation. If >>>>>> this is present then the server will first be suspended, and the >>>>>> server >>>>>> will not shut down until either the suspend is successful or the >>>>>> timeout >>>>>> occurs. If no timeout parameter is passed to the operation then a >>>>>> normal >>>>>> non-graceful shutdown will take place. >>>>> Will non-graceful shutdown wait for non-daemon threads or terminate >>>>> immediately (call System.exit()). >>>> It will execute the same way it does today (all services will shut down >>>> and then the server will exit). >>>> >>>> Stuart >>>> >>>>>> In domain mode these operations will be added to both individual >>>>>> server >>>>>> and a complete server group. >>>>>> >>>>>> Implementation Details >>>>>> >>>>>> Suspend/resume operates on entry points to the server. Any request >>>>>> that >>>>>> is currently running must not be affected by the suspend state, >>>>>> however >>>>>> any new request should be rejected. In general subsystems will >>>>>> track the >>>>>> number of outstanding requests, and when this hits zero they are >>>>>> considered suspended. >>>>>> >>>>>> We will introduce the notion of a global SuspendController, that >>>>>> manages >>>>>> the servers suspend state. All subsystems that wish to do a graceful >>>>>> shutdown register callback handlers with this controller. >>>>>> >>>>>> When the suspend() operation is invoked the controller will invoke >>>>>> all >>>>>> these callbacks, letting the subsystem know that the server is >>>>>> suspend, >>>>>> and providing the subsystem with a SuspendContext object that the >>>>>> subsystem can then use to notify the controller that the suspend is >>>>>> complete. >>>>>> >>>>>> What the subsystem does when it receives a suspend command, and >>>>>> when it >>>>>> considers itself suspended will vary, but in the common case it will >>>>>> immediatly start rejecting external requests (e.g. Undertow will >>>>>> start >>>>>> responding with a 503 to all new requests). The subsystem will also >>>>>> track the number of outstanding requests, and when this hits zero >>>>>> then >>>>>> the subsystem will notify the controller that is has successfully >>>>>> suspended. >>>>>> Some subsystems will obviously want to do other actions on >>>>>> suspend, e.g. >>>>>> clustering will likely want to fail over, mod_cluster will notify the >>>>>> load balancer that the node is no longer available etc. In some >>>>>> cases we >>>>>> may want to make this configurable to an extent (e.g. Undertow >>>>>> could be >>>>>> configured to allow requests with an existing session, and not >>>>>> consider >>>>>> itself timed out until all sessions have either timed out or been >>>>>> invalidated, although this will obviously take a while). >>>>>> >>>>>> If anyone has any feedback let me know. In terms of implementation my >>>>>> basic plan is to get the core functionality and the Undertow >>>>>> implementation into Wildfly, and then work with subsystem authors to >>>>>> implement subsystem specific functionality once the core is in place. >>>>>> >>>>>> Stuart >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> The >>>>>> >>>>>> A timeout attribute will also be added to the shutdown command, >>>>>> _______________________________________________ >>>>>> wildfly-dev mailing list >>>>>> wildfly-dev(a)lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>>> _______________________________________________ >>>>> wildfly-dev mailing list >>>>> wildfly-dev(a)lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>>> _______________________________________________ >>>> wildfly-dev mailing list >>>> wildfly-dev(a)lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>> _______________________________________________ >>> wildfly-dev mailing list >>> wildfly-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/wildfly-dev _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

Jaikiran Pai

Monday, 9 June Mon, 9 Jun

11:46 p.m.

...

Jaikiran Pai

Tuesday, 10 June Tue, 10 Jun

12:02 a.m.

One other question - When a server is put into suspend mode, is it going to trigger undeployment of certain deployed deployments? And would that be considered the expected behaviour? More specifically, when the admin triggers a suspend, are the subsystems allowed to trigger certain operations which might stop the services that back the currently deployed deployments or are they expected to keep those services in a started/UP state? -Jaikiran On Tuesday 10 June 2014 10:16 AM, Jaikiran Pai wrote:

...

This is more of a subsystem specific question - How are internal operations (i.e. something that doesn't exactly have an entry point /into/ the server) handled when the server is in a suspended state or when it is suspending. One such example is, an EJB application which might have scheduled timer tasks associated with it. Are such timer tasks supposed to continue to run even when server is suspended or when it is suspending? Or is the subsystem expected to shut those down too? -Jaikiran On Tuesday 10 June 2014 04:08 AM, Stuart Douglas wrote: > Server suspend and resume is a feature that allows a running server to > gracefully finish of all running requests. The most common use case for > this is graceful shutdown, where you would like a server to complete all > running requests, reject any new ones, and then shut down, however there > are also plenty of other valid use cases (e.g. suspend the server, > modify a data source or some other config, then resume). > > User View: > > For the users point of view two new operations will be added to the server: > > suspend(timeout) > resume() > > A runtime only attribute suspend-state (is this a good name?) will also > be added, that can take one of three possible values, RUNNING, > SUSPENDING, SUSPENDED. > > A timeout attribute will also be added to the shutdown operation. If > this is present then the server will first be suspended, and the server > will not shut down until either the suspend is successful or the timeout > occurs. If no timeout parameter is passed to the operation then a normal > non-graceful shutdown will take place. > > In domain mode these operations will be added to both individual server > and a complete server group. > > Implementation Details > > Suspend/resume operates on entry points to the server. Any request that > is currently running must not be affected by the suspend state, however > any new request should be rejected. In general subsystems will track the > number of outstanding requests, and when this hits zero they are > considered suspended. > > We will introduce the notion of a global SuspendController, that manages > the servers suspend state. All subsystems that wish to do a graceful > shutdown register callback handlers with this controller. > > When the suspend() operation is invoked the controller will invoke all > these callbacks, letting the subsystem know that the server is suspend, > and providing the subsystem with a SuspendContext object that the > subsystem can then use to notify the controller that the suspend is > complete. > > What the subsystem does when it receives a suspend command, and when it > considers itself suspended will vary, but in the common case it will > immediatly start rejecting external requests (e.g. Undertow will start > responding with a 503 to all new requests). The subsystem will also > track the number of outstanding requests, and when this hits zero then > the subsystem will notify the controller that is has successfully > suspended. > Some subsystems will obviously want to do other actions on suspend, e.g. > clustering will likely want to fail over, mod_cluster will notify the > load balancer that the node is no longer available etc. In some cases we > may want to make this configurable to an extent (e.g. Undertow could be > configured to allow requests with an existing session, and not consider > itself timed out until all sessions have either timed out or been > invalidated, although this will obviously take a while). > > If anyone has any feedback let me know. In terms of implementation my > basic plan is to get the core functionality and the Undertow > implementation into Wildfly, and then work with subsystem authors to > implement subsystem specific functionality once the core is in place. > > Stuart > > > > > > > > The > > A timeout attribute will also be added to the shutdown command, > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/wildfly-dev

Stuart Douglas

7:22 a.m.

Nothing is underplayed and all services remain up. If the server is resumed it should be able to begin processing requests immediately. Stuart Sent from my iPhone

...

On 10 Jun 2014, at 0:02, Jaikiran Pai <jai.forums2013(a)gmail.com> wrote: One other question - When a server is put into suspend mode, is it going to trigger undeployment of certain deployed deployments? And would that be considered the expected behaviour? More specifically, when the admin triggers a suspend, are the subsystems allowed to trigger certain operations which might stop the services that back the currently deployed deployments or are they expected to keep those services in a started/UP state? -Jaikiran > On Tuesday 10 June 2014 10:16 AM, Jaikiran Pai wrote: > This is more of a subsystem specific question - How are internal operations (i.e. something that doesn't exactly have an entry point into the server) handled when the server is in a suspended state or when it is suspending. One such example is, an EJB application which might have scheduled timer tasks associated with it. Are such timer tasks supposed to continue to run even when server is suspended or when it is suspending? Or is the subsystem expected to shut those down too? > > -Jaikiran >> On Tuesday 10 June 2014 04:08 AM, Stuart Douglas wrote: >> Server suspend and resume is a feature that allows a running server to >> gracefully finish of all running requests. The most common use case for >> this is graceful shutdown, where you would like a server to complete all >> running requests, reject any new ones, and then shut down, however there >> are also plenty of other valid use cases (e.g. suspend the server, >> modify a data source or some other config, then resume). >> >> User View: >> >> For the users point of view two new operations will be added to the server: >> >> suspend(timeout) >> resume() >> >> A runtime only attribute suspend-state (is this a good name?) will also >> be added, that can take one of three possible values, RUNNING, >> SUSPENDING, SUSPENDED. >> >> A timeout attribute will also be added to the shutdown operation. If >> this is present then the server will first be suspended, and the server >> will not shut down until either the suspend is successful or the timeout >> occurs. If no timeout parameter is passed to the operation then a normal >> non-graceful shutdown will take place. >> >> In domain mode these operations will be added to both individual server >> and a complete server group. >> >> Implementation Details >> >> Suspend/resume operates on entry points to the server. Any request that >> is currently running must not be affected by the suspend state, however >> any new request should be rejected. In general subsystems will track the >> number of outstanding requests, and when this hits zero they are >> considered suspended. >> >> We will introduce the notion of a global SuspendController, that manages >> the servers suspend state. All subsystems that wish to do a graceful >> shutdown register callback handlers with this controller. >> >> When the suspend() operation is invoked the controller will invoke all >> these callbacks, letting the subsystem know that the server is suspend, >> and providing the subsystem with a SuspendContext object that the >> subsystem can then use to notify the controller that the suspend is >> complete. >> >> What the subsystem does when it receives a suspend command, and when it >> considers itself suspended will vary, but in the common case it will >> immediatly start rejecting external requests (e.g. Undertow will start >> responding with a 503 to all new requests). The subsystem will also >> track the number of outstanding requests, and when this hits zero then >> the subsystem will notify the controller that is has successfully >> suspended. >> Some subsystems will obviously want to do other actions on suspend, e.g. >> clustering will likely want to fail over, mod_cluster will notify the >> load balancer that the node is no longer available etc. In some cases we >> may want to make this configurable to an extent (e.g. Undertow could be >> configured to allow requests with an existing session, and not consider >> itself timed out until all sessions have either timed out or been >> invalidated, although this will obviously take a while). >> >> If anyone has any feedback let me know. In terms of implementation my >> basic plan is to get the core functionality and the Undertow >> implementation into Wildfly, and then work with subsystem authors to >> implement subsystem specific functionality once the core is in place. >> >> Stuart >> >> >> >> >> >> >> >> The >> >> A timeout attribute will also be added to the shutdown command, >> _______________________________________________ >> wildfly-dev mailing list >> wildfly-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/wildfly-dev _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

Stuart Douglas

7:20 a.m.

They still have an entry point, the subsystem tracks outstanding timers and will not execute new ones while suspended. Stuart Sent from my iPhone

...

On 9 Jun 2014, at 23:46, Jaikiran Pai <jai.forums2013(a)gmail.com> wrote: This is more of a subsystem specific question - How are internal operations (i.e. something that doesn't exactly have an entry point into the server) handled when the server is in a suspended state or when it is suspending. One such example is, an EJB application which might have scheduled timer tasks associated with it. Are such timer tasks supposed to continue to run even when server is suspended or when it is suspending? Or is the subsystem expected to shut those down too? -Jaikiran > On Tuesday 10 June 2014 04:08 AM, Stuart Douglas wrote: > Server suspend and resume is a feature that allows a running server to > gracefully finish of all running requests. The most common use case for > this is graceful shutdown, where you would like a server to complete all > running requests, reject any new ones, and then shut down, however there > are also plenty of other valid use cases (e.g. suspend the server, > modify a data source or some other config, then resume). > > User View: > > For the users point of view two new operations will be added to the server: > > suspend(timeout) > resume() > > A runtime only attribute suspend-state (is this a good name?) will also > be added, that can take one of three possible values, RUNNING, > SUSPENDING, SUSPENDED. > > A timeout attribute will also be added to the shutdown operation. If > this is present then the server will first be suspended, and the server > will not shut down until either the suspend is successful or the timeout > occurs. If no timeout parameter is passed to the operation then a normal > non-graceful shutdown will take place. > > In domain mode these operations will be added to both individual server > and a complete server group. > > Implementation Details > > Suspend/resume operates on entry points to the server. Any request that > is currently running must not be affected by the suspend state, however > any new request should be rejected. In general subsystems will track the > number of outstanding requests, and when this hits zero they are > considered suspended. > > We will introduce the notion of a global SuspendController, that manages > the servers suspend state. All subsystems that wish to do a graceful > shutdown register callback handlers with this controller. > > When the suspend() operation is invoked the controller will invoke all > these callbacks, letting the subsystem know that the server is suspend, > and providing the subsystem with a SuspendContext object that the > subsystem can then use to notify the controller that the suspend is > complete. > > What the subsystem does when it receives a suspend command, and when it > considers itself suspended will vary, but in the common case it will > immediatly start rejecting external requests (e.g. Undertow will start > responding with a 503 to all new requests). The subsystem will also > track the number of outstanding requests, and when this hits zero then > the subsystem will notify the controller that is has successfully > suspended. > Some subsystems will obviously want to do other actions on suspend, e.g. > clustering will likely want to fail over, mod_cluster will notify the > load balancer that the node is no longer available etc. In some cases we > may want to make this configurable to an extent (e.g. Undertow could be > configured to allow requests with an existing session, and not consider > itself timed out until all sessions have either timed out or been > invalidated, although this will obviously take a while). > > If anyone has any feedback let me know. In terms of implementation my > basic plan is to get the core functionality and the Undertow > implementation into Wildfly, and then work with subsystem authors to > implement subsystem specific functionality once the core is in place. > > Stuart > > > > > > > > The > > A timeout attribute will also be added to the shutdown command, > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/wildfly-dev _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

Stuart Douglas

8:26 a.m.

Note that none of this really has anything to do with services, suspending a server should not involve any services changing state, so that when a server is resumed it can start handling requests immediately. Stuart Stuart Douglas wrote:

...

Brian Stansberry

Wednesday, 11 June Wed, 11 Jun

11:07 a.m.

On 6/9/14, 5:38 PM, Stuart Douglas wrote:

...

I think there will need to be some mechanism for coordination between subsystems here. For example, I doubt mod_cluster will want Undertow deciding to start sending 503s before it gets a chance to get the LB sorted.

...

The subsystem will also track the number of outstanding requests, and when this hits zero then the subsystem will notify the controller that is has successfully suspended. Some subsystems will obviously want to do other actions on suspend, e.g. clustering will likely want to fail over, mod_cluster will notify the load balancer that the node is no longer available etc. In some cases we may want to make this configurable to an extent (e.g. Undertow could be configured to allow requests with an existing session, and not consider itself timed out until all sessions have either timed out or been invalidated, although this will obviously take a while). If anyone has any feedback let me know. In terms of implementation my basic plan is to get the core functionality and the Undertow implementation into Wildfly, and then work with subsystem authors to implement subsystem specific functionality once the core is in place. Stuart The A timeout attribute will also be added to the shutdown command, _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

-- Brian Stansberry Senior Principal Software Engineer JBoss by Red Hat

Stuart Douglas

Thursday, 12 June Thu, 12 Jun

1:16 p.m.

...

> What the subsystem does when it receives a suspend command, and when it > considers itself suspended will vary, but in the common case it will > immediatly start rejecting external requests (e.g. Undertow will start > responding with a 503 to all new requests). I think there will need to be some mechanism for coordination between subsystems here. For example, I doubt mod_cluster will want Undertow deciding to start sending 503s before it gets a chance to get the LB sorted.

Maybe this needs to be a two step process. Basically preSuspend() gets called for each container, and they notify when this is done (which would be mod_cluster notifying the load balancer), then the actual suspend takes place. I don't think we need anything any more complex. Stuart

3814

days inactive

3853

days old

wildfly-dev@lists.jboss.org

Manage subscription

30 comments

10 participants

tags (0)

participants (10)

Andrig Miller
Brian Stansberry
Dimitris Andreadis
Jaikiran Pai
Jason T. Greene
Mark Little
Michael Musgrove
Scott Marlow
Stuart Douglas
Stuart Douglas

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Design Proposal: Server suspend/resume (AKA Graceful Shutdown)