Re: [mod_cluster-dev] Handling crashed/hung AS nodes

Friday, 27 March 2009

...
>> 2. Can httpd detect hung nodes?  A hung node will not affect
the
>> connected state of the AJP/HTTP/S connector - it could only detect this
>> by sending data to the connector and timing out on the response.
>
> The hung node will be detected and marked as broken but the 
> corresponding request(s) may be delayed or lost due to time-out.
>

 How long does this take, say in a typical case where the hung node was 
 up and running with a pool of AJP connections open? Is it the 10 secs, 
 the default value of the "ping" property listed at 
 https://www.jboss.org/mod_cluster/java/properties.html#proxy ? 
cping/cpong is done in the Connector I was thinking of nodeTimeout.

...

 Also, if a request is being handled by a hung node and the 
 HAModClusterService tells httpd to stop that node, the request will 
 fail, yes? It shouldn't just fail over, as it may have already caused 
 the transfer of my $1,000,000 to my secret account at UBS. Failing over 
 would cause transfer of a second $1,000,000 and sadly I don't have that 
 much. 
maxAttempts = 0 controls that.

...

>>
>> And some questions for open discussion:
>> What does HAModClusterService really buy us over the normal
>> ModClusterService?  Do the benefits outweigh the complexity?
>>  * Maintains a uniform view of proxy status across each AS node
>>  * Can detect and send STOP-APP/REMOVE-APP messages on behalf of
>> hung/crashed nodes (if httpd cannot already do this) (not yet
>> implemented)
>>    + Requires special handling of network partitions
>>  * Potentially improve scalability by minimizing network traffic for
>> very large clusters.

 Assume a near-term goal is to run a 150 node cluster with say 10 httpd 
 servers. Assume the background thread runs every 10 seconds. That comes 
 to 150 connections per second across the cluster being opened/closed to 
 handle STATUS. Each httpd server handles 15 connections per second. 
That is very little :-)

...

 With HAModClusterService the way it is now, you get the same, because 
 besides STATUS each node also checks its ability to communicate w/ each 
 httpd in order to validate its ability to become master. But let's 
 assume we add some complexity to allow that health check to become much 
 more infrequent. So ignore those ping checks. So, w/ HAModClusterService 
 you get 1 connection/sec being opened closed across the cluster for 
 status, 0.1 connection/sec per httpd.  But the STATUS request sent 
 across each connection has a much bigger payload.

 How significant is the cost of opening/closing all those connections? 
They are keepalived http connections so it is only receive / send on the 
socket.

Cheers

Jean-Frederic

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

Re: [mod_cluster-dev] Handling crashed/hung AS nodes