[mod_cluster-dev] Failover failed

Thu Dec 18 18:23:35 EST 2008

Paul Ferraro wrote:
> On Mon, 2008-12-15 at 22:01 +0100, Bela Ban wrote:
>> Paul Ferraro wrote:
>>>> Secondly, why is load-demo.war/WEB-INF/web.xml not marked as
>>>> <distributable/> ?
>>>>
>>> Session replication was not part of our original scope for the demo.
>>> Though - I see no reason not to enable it.
>> Got you
>>
>>
>>>>
>>>> #2
>>>>
>>>> * Start httpd
>>>> * Start node1
>>>> * Start node2
>>> Which server.xml Listener are you using? ModClusterListener,
>>> ModClusterService, or HAModClusterService? Only the latter 2 use the
>>> config from mod-cluster-jboss-beans.xml.
>>
>> HAModClusterService
>>
>>
>>>> * Start the demo with 80 clients
>>>> * Observer the same as above: node1 serves roughly 40 sessions and
>>>> so does node2
>>>> * Kill node2
>>> Gracefully shutdown? or killed?
>>
>> CTRL-C.
>>
> That will shutdown gracefully, triggering all of the appropriate
> shutdown events.
> 
>>>> * Now I have 80 failed clients and 0 active clients !
>>> OK - that's weird. Let me try to reproduce this on my end.
>>
>> OK
>>
> I could not reproduce this.  The only threads that should report errors
> are the ones that were currently being processed by that server - these
> will not failover to the other server.  Though it is unlikely that all
> 80 threads were all being processed (i.e. not queued) by the server that
> you stopped, I'm not sure what else could cause all 80 clients to fail.
> Please regurgitate all of the input values you used in the demo app.
> 

If it's a server shutdown, the connector should be stopped before any 
webapp is undeployed, and I'd thought in-flight requests weren't 
disrupted during the connector shutdown.  Sounds like something isn't 
right with that. Not really a mod_cluster issue but something to track 
down in AS/JBoss Web.

>>>> * If I call Server.shutdown() on node2 (via JMX), then node1 serves
>>>> 40 sessions. But why does node1 not pick up the other 40 sessions
>>>> from left node2 ? Is this the client, which simply terminates
>>>> threads which got a 500 ?
>>> Yes
>> Why don't we change that, and ignore HTTP error responses ? This way, 
>> eventually, all clients would fail over to the running node.
>>
> I should clarify - not all requests failover - only those requests not
> yet accepted by target server's connector.  It would be dangerous to
> failover requests already accepted by the target connector.
> 

In a clean shutdown or undeploy we should find a way to make sure those 
complete. Perhaps a valve in the Context pipeline that works with the 
deployer in the same way the stuff you added to EJB3 does.

>>>> * I guess we don't send a DISABLE-NODE, STOP-SERVER or DISABLE-APP
>>>> to httpd when we kill a server ? Can we do this, e.g. by adding a
>>>> shutdown hook ? Or can I at least try this out through the command
>>>> line ? What's the recommended way of doing this currently (JMX
>>>> console) ?
>>> The [STOP_APP, REMOVE-APP, REMOVE-APP *] message sequence is only sent
>>> during graceful shutdown of a server.
>> So Server.shutdown() should have triggered this right ?
>>
> Yes.
> 
>>> If the node is killed, then
>>> mod_cluster httpd modules will follow traditional failover logic (same
>>> as mod_jk/mod_proxy).
>> OK
>>
>>> I've a tentative plan to enhance this in a later release to utilize
>>> jgroups view change to allow the ha singleton master to send these
>>> message on behalf of the failed server. This sounds more useful than it
>>> actually is, since with high volume, it is likely that httpd will have
>>> already detected the failure before the master has the chance to send
>>> the corresponding shutdown message sequence
>> Agreed. But the graceful shutdown case (maybe complemented with a 
>> shutdown hook) should allow us to tell httpd to redirect requests for 
>> all apps of a given node before the node shuts down.
>>
> We are doing that.  Currently, mod_cluster responds to context and
> server STOP_EVENTs.  Really, we should be using BEFORE_STOP_EVENT to
> notify mod_cluster before any threads are interrupted.  I've made this
> change in trunk.
> 
> _______________________________________________
> mod_cluster-dev mailing list
> mod_cluster-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/mod_cluster-dev

-- 
Brian Stansberry
Lead, AS Clustering
JBoss, a division of Red Hat
brian.stansberry at redhat.com