[mod_cluster-dev] Handling crashed/hung AS nodes

Brian Stansberry brian.stansberry at redhat.com
Fri Mar 27 17:41:41 EDT 2009


jean-frederic clere wrote:
> 
>>>> 2. Can httpd detect hung nodes?  A hung node will not affect the
>>>> connected state of the AJP/HTTP/S connector - it could only detect this
>>>> by sending data to the connector and timing out on the response.
>>>
>>> The hung node will be detected and marked as broken but the 
>>> corresponding request(s) may be delayed or lost due to time-out.
>>>
>>
>> How long does this take, say in a typical case where the hung node was 
>> up and running with a pool of AJP connections open? Is it the 10 secs, 
>> the default value of the "ping" property listed at 
>> https://www.jboss.org/mod_cluster/java/properties.html#proxy ?
> 
> cping/cpong is done in the Connector I was thinking of nodeTimeout.
> 
>>
>> Also, if a request is being handled by a hung node and the 
>> HAModClusterService tells httpd to stop that node, the request will 
>> fail, yes? It shouldn't just fail over, as it may have already caused 
>> the transfer of my $1,000,000 to my secret account at UBS. Failing 
>> over would cause transfer of a second $1,000,000 and sadly I don't 
>> have that much.
> 
> maxAttempts = 0 controls that.
> 
>>
>>>>
>>>> And some questions for open discussion:
>>>> What does HAModClusterService really buy us over the normal
>>>> ModClusterService?  Do the benefits outweigh the complexity?
>>>>  * Maintains a uniform view of proxy status across each AS node
>>>>  * Can detect and send STOP-APP/REMOVE-APP messages on behalf of
>>>> hung/crashed nodes (if httpd cannot already do this) (not yet
>>>> implemented)
>>>>    + Requires special handling of network partitions
>>>>  * Potentially improve scalability by minimizing network traffic for
>>>> very large clusters.
>>
>> Assume a near-term goal is to run a 150 node cluster with say 10 httpd 
>> servers. Assume the background thread runs every 10 seconds. That 
>> comes to 150 connections per second across the cluster being 
>> opened/closed to handle STATUS. Each httpd server handles 15 
>> connections per second.
> 
> That is very little :-)
> 
>>
>> With HAModClusterService the way it is now, you get the same, because 
>> besides STATUS each node also checks its ability to communicate w/ 
>> each httpd in order to validate its ability to become master. But 
>> let's assume we add some complexity to allow that health check to 
>> become much more infrequent. So ignore those ping checks. So, w/ 
>> HAModClusterService you get 1 connection/sec being opened closed 
>> across the cluster for status, 0.1 connection/sec per httpd.  But the 
>> STATUS request sent across each connection has a much bigger payload.
>>
>> How significant is the cost of opening/closing all those connections?
> 
> They are keepalived http connections so it is only receive / send on the 
> socket.
> 

Ah, I'm blind; it only closes the connection if proxy.getState() != 
Proxy.State.OK. Great!


-- 
Brian Stansberry
Lead, AS Clustering
JBoss, a division of Red Hat
brian.stansberry at redhat.com



More information about the mod_cluster-dev mailing list