[mod_cluster-dev] Failover failed

Bela Ban bela at jboss.com
Wed Dec 17 11:50:45 EST 2008


No, kill 'em. I usually use the jbento user...

Paul Ferraro wrote:
> Bela,
>
> I'm seeing similar strange asymmetric load balancing when using 2 jboss
> instances + httpd all on the same machine.  I'd like to run some tests
> on the cluster lab, but I see that you've got a processing running.  Are
> you still using this?  If not, can you kill it?
>
> Paul
>
> On Mon, 2008-12-15 at 14:06 -0500, Paul Ferraro wrote:
>   
>> On Mon, 2008-12-15 at 15:25 +0100, Bela Ban wrote:
>>     
>>> First off, shouldn't we set the maxThreads in server.xml in the AJP 
>>> connector to a higher value that the (default I think) value of 40 ? The 
>>> demo creates 80 clients and if we have only 1 node up and running, it 
>>> has to serve 80 connections !
>>>
>>> Secondly, why is load-demo.war/WEB-INF/web.xml not marked as 
>>> <distributable/> ?
>>>
>>>       
>> Session replication was not part of our original scope for the demo.
>> Though - I see no reason not to enable it.
>>
>>     
>>> Then, I ran 2 experiments and they didn't behave as I expected them. Can 
>>> you confirm/reject my assumptions ?
>>>
>>> #1
>>>
>>>     * Run httpd, then only node1. The only change I made from the
>>>       default is to set maxThreads=100 in the AJP connector, and to use
>>>       SimpleLoadbalanceFactorPolicy
>>>     * Run the demo app with 80 clients
>>>     * You'll see that node1 was serving 80 sessions
>>>     * Start node2, keep the demo running
>>>     * My expectation is that now node1's sessions will go down to 40 and
>>>       node will serve roughly 40 sessions
>>>     * Result: node2 picked up 46 sessions, and node1 went down to 34. I
>>>       waited for the session life timeout (120 secs), but the result was
>>>       still the same. Question: why didn't both nodes serve 40 sessions
>>>       each ?
>>>     * Update: the # of sessions in node1 and node2 got closer after some
>>>       time, I guess my session life of 120 secs was the reason for this !
>>>     * Update 2: I see that with a session life of 10, node1 and node2
>>>       take turns at allocating new sessions, sometimes node1 is on top,
>>>       sometimes node2, but on average both nodes serve 40 sessions !
>>>       Great ! Disregard this first item !
>>>
>>>
>>> #2
>>>
>>>     * Start httpd
>>>     * Start node1
>>>     * Start node2
>>>       
>> Which server.xml Listener are you using?  ModClusterListener,
>> ModClusterService, or HAModClusterService?  Only the latter 2 use the
>> config from mod-cluster-jboss-beans.xml.
>>
>>     
>>>     * Start the demo with 80 clients
>>>     * Observer the same as above: node1 serves roughly 40 sessions and
>>>       so does node2
>>>     * Kill node2
>>>       
>> Gracefully shutdown? or killed?
>>     
>>>     * Now I have 80 failed clients and 0 active clients !
>>>       
>> :(
>> OK - that's weird.  Let me try to reproduce this on my end.
>>
>>     
>>>     * If I call Server.shutdown() on node2 (via JMX), then node1 serves
>>>       40 sessions. But why does node1 not pick up the other 40 sessions
>>>       from left node2 ? Is this the client, which simply terminates
>>>       threads which got a 500 ?
>>>       
>> Yes.
>>     
>>>     * I guess we don't send a DISABLE-NODE, STOP-SERVER or DISABLE-APP
>>>       to httpd when we kill a server ? Can we do this, e.g. by adding a
>>>       shutdown hook ? Or can I at least try this out through the command
>>>       line ? What's the recommended way of doing this currently (JMX
>>>       console) ?
>>>       
>> The [STOP_APP, REMOVE-APP, REMOVE-APP *] message sequence is only sent
>> during graceful shutdown of a server.  If the node is killed, then
>> mod_cluster httpd modules will follow traditional failover logic (same
>> as mod_jk/mod_proxy).
>> I've a tentative plan to enhance this in a later release to utilize
>> jgroups view change to allow the ha singleton master to send these
>> message on behalf of the failed server.  This sounds more useful than it
>> actually is, since with high volume, it is likely that httpd will have
>> already detected the failure before the master has the chance to send
>> the corresponding shutdown message sequence.
>>
>>
>> _______________________________________________
>> mod_cluster-dev mailing list
>> mod_cluster-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/mod_cluster-dev
>>     
>
> _______________________________________________
> mod_cluster-dev mailing list
> mod_cluster-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/mod_cluster-dev
>   

-- 
Bela Ban
Lead JGroups / Clustering Team
JBoss - a division of Red Hat




More information about the mod_cluster-dev mailing list