[wildfly-dev] Graceful startup

Thu Apr 21 07:50:47 EDT 2016

On 04/20/2016 09:05 PM, Stuart Douglas wrote:
> Hi,
>
> I have come across a few bug reports [1][2] (and a feature request from
> the Swarm team) recently that are essentially caused by an application
> being accessed before it is fully deployed. Basically even though we
> have service dependencies that make sure individual components
> dependencies are up, once a request has been accepted it can potentially
> programmatically access other parts of the deployment that are not up
> yet (basically the same problem we have with graceful shutdown, but in
> reverse).
>
> I propose we solve this using a 'graceful startup' mechanism, that holds
> or rejects new requests until a server or deployment is fully started.
> The specifics of how this would work are:
>
> - If the server is booting all external requests will be held or
> rejected until the the boot process is complete
> - When deploying a new deployment all requests for that deployment will
> be held or rejected until MSC has attained stability
>
> This would be implemented for the following endpoints/subsystems:
>
> - Undertow will hold requests until the deployment is done (so if you
> try and load a page while deployment is happening it could be a bit of a
> wait)
> - Remote EJB will hold requests until deployment is done
> - mod_cluster will not send availability messages until deployment is done
> - JMS will delay message delivery until deployment is done
> - EJB persistent timers will not fire until deployment is done
> - Possibly some other cases I can't think of right now
>
> One thing I am not really sure about is if we need a configuration
> switch for hold/reject behavior. e.g. for Undertow the request holding
> behavior is very developer friendly, as it means they can just hit
> refresh in their browser and as soon as the redeployment is done the
> page will display, however I am worried that it might not be ideal for
> load balancers that may prefer a quick error response that could then be
> attempted on another node (although if mod_cluster is not sending out
> availability till the deployment is 100% complete this may not be a big
> deal).

I think load balancers should not have a server that is being started up 
in the rotation anyway; this already probably doesn't work great today.

I like the idea overall.

-- 
- DML