[mod_cluster-dev] Failover failed

Thu Dec 18 15:42:37 EST 2008

On Mon, 2008-12-15 at 22:01 +0100, Bela Ban wrote:
> 
> Paul Ferraro wrote:
> >> Secondly, why is load-demo.war/WEB-INF/web.xml not marked as
> >> <distributable/> ?
> >>
> > Session replication was not part of our original scope for the demo.
> > Though - I see no reason not to enable it.
> 
> Got you
> 
> 
> >>
> >>
> >> #2
> >>
> >> * Start httpd
> >> * Start node1
> >> * Start node2
> > Which server.xml Listener are you using? ModClusterListener,
> > ModClusterService, or HAModClusterService? Only the latter 2 use the
> > config from mod-cluster-jboss-beans.xml.
> 
> 
> HAModClusterService
> 
> 
> >
> >> * Start the demo with 80 clients
> >> * Observer the same as above: node1 serves roughly 40 sessions and
> >> so does node2
> >> * Kill node2
> > Gracefully shutdown? or killed?
> 
> 
> CTRL-C.
> 
That will shutdown gracefully, triggering all of the appropriate
shutdown events.

> 
> >> * Now I have 80 failed clients and 0 active clients !
> >
> > OK - that's weird. Let me try to reproduce this on my end.
> 
> 
> OK
> 
I could not reproduce this.  The only threads that should report errors
are the ones that were currently being processed by that server - these
will not failover to the other server.  Though it is unlikely that all
80 threads were all being processed (i.e. not queued) by the server that
you stopped, I'm not sure what else could cause all 80 clients to fail.
Please regurgitate all of the input values you used in the demo app.

> >
> >> * If I call Server.shutdown() on node2 (via JMX), then node1 serves
> >> 40 sessions. But why does node1 not pick up the other 40 sessions
> >> from left node2 ? Is this the client, which simply terminates
> >> threads which got a 500 ?
> > Yes
> 
> Why don't we change that, and ignore HTTP error responses ? This way, 
> eventually, all clients would fail over to the running node.
> 
I should clarify - not all requests failover - only those requests not
yet accepted by target server's connector.  It would be dangerous to
failover requests already accepted by the target connector.

> 
> >> * I guess we don't send a DISABLE-NODE, STOP-SERVER or DISABLE-APP
> >> to httpd when we kill a server ? Can we do this, e.g. by adding a
> >> shutdown hook ? Or can I at least try this out through the command
> >> line ? What's the recommended way of doing this currently (JMX
> >> console) ?
> > The [STOP_APP, REMOVE-APP, REMOVE-APP *] message sequence is only sent
> > during graceful shutdown of a server.
> 
> So Server.shutdown() should have triggered this right ?
> 
Yes.

> > If the node is killed, then
> > mod_cluster httpd modules will follow traditional failover logic (same
> > as mod_jk/mod_proxy).
> 
> OK
> 
> > I've a tentative plan to enhance this in a later release to utilize
> > jgroups view change to allow the ha singleton master to send these
> > message on behalf of the failed server. This sounds more useful than it
> > actually is, since with high volume, it is likely that httpd will have
> > already detected the failure before the master has the chance to send
> > the corresponding shutdown message sequence
> 
> Agreed. But the graceful shutdown case (maybe complemented with a 
> shutdown hook) should allow us to tell httpd to redirect requests for 
> all apps of a given node before the node shuts down.
> 
We are doing that.  Currently, mod_cluster responds to context and
server STOP_EVENTs.  Really, we should be using BEFORE_STOP_EVENT to
notify mod_cluster before any threads are interrupted.  I've made this
change in trunk.