[mod_cluster-dev] Failover failed

Mon Dec 15 14:06:56 EST 2008

On Mon, 2008-12-15 at 15:25 +0100, Bela Ban wrote:
> First off, shouldn't we set the maxThreads in server.xml in the AJP 
> connector to a higher value that the (default I think) value of 40 ? The 
> demo creates 80 clients and if we have only 1 node up and running, it 
> has to serve 80 connections !
> 
> Secondly, why is load-demo.war/WEB-INF/web.xml not marked as 
> <distributable/> ?
> 
Session replication was not part of our original scope for the demo.
Though - I see no reason not to enable it.

> Then, I ran 2 experiments and they didn't behave as I expected them. Can 
> you confirm/reject my assumptions ?
> 
> #1
> 
>     * Run httpd, then only node1. The only change I made from the
>       default is to set maxThreads=100 in the AJP connector, and to use
>       SimpleLoadbalanceFactorPolicy
>     * Run the demo app with 80 clients
>     * You'll see that node1 was serving 80 sessions
>     * Start node2, keep the demo running
>     * My expectation is that now node1's sessions will go down to 40 and
>       node will serve roughly 40 sessions
>     * Result: node2 picked up 46 sessions, and node1 went down to 34. I
>       waited for the session life timeout (120 secs), but the result was
>       still the same. Question: why didn't both nodes serve 40 sessions
>       each ?
>     * Update: the # of sessions in node1 and node2 got closer after some
>       time, I guess my session life of 120 secs was the reason for this !
>     * Update 2: I see that with a session life of 10, node1 and node2
>       take turns at allocating new sessions, sometimes node1 is on top,
>       sometimes node2, but on average both nodes serve 40 sessions !
>       Great ! Disregard this first item !
> 
> 
> #2
> 
>     * Start httpd
>     * Start node1
>     * Start node2
Which server.xml Listener are you using?  ModClusterListener,
ModClusterService, or HAModClusterService?  Only the latter 2 use the
config from mod-cluster-jboss-beans.xml.

>     * Start the demo with 80 clients
>     * Observer the same as above: node1 serves roughly 40 sessions and
>       so does node2
>     * Kill node2
Gracefully shutdown? or killed?
>     * Now I have 80 failed clients and 0 active clients !
:(
OK - that's weird.  Let me try to reproduce this on my end.

>     * If I call Server.shutdown() on node2 (via JMX), then node1 serves
>       40 sessions. But why does node1 not pick up the other 40 sessions
>       from left node2 ? Is this the client, which simply terminates
>       threads which got a 500 ?
Yes.
>     * I guess we don't send a DISABLE-NODE, STOP-SERVER or DISABLE-APP
>       to httpd when we kill a server ? Can we do this, e.g. by adding a
>       shutdown hook ? Or can I at least try this out through the command
>       line ? What's the recommended way of doing this currently (JMX
>       console) ?
The [STOP_APP, REMOVE-APP, REMOVE-APP *] message sequence is only sent
during graceful shutdown of a server.  If the node is killed, then
mod_cluster httpd modules will follow traditional failover logic (same
as mod_jk/mod_proxy).
I've a tentative plan to enhance this in a later release to utilize
jgroups view change to allow the ha singleton master to send these
message on behalf of the failed server.  This sounds more useful than it
actually is, since with high volume, it is likely that httpd will have
already detected the failure before the master has the chance to send
the corresponding shutdown message sequence.