[mod_cluster-dev] Handling crashed/hung AS nodes

Fri Mar 27 17:45:29 EDT 2009

Brian Stansberry wrote:
> Paul Ferraro wrote:
>> On Fri, 2009-03-27 at 09:15 -0500, Brian Stansberry wrote:
>>> jean-frederic clere wrote:
>>>> Paul Ferraro wrote:
>>>>> Currently, the HAModClusterService (where httpd communication is
>>>>> coordinated by an HA singleton) does not react to crashed/hung 
>>>>> members.
>>>>> Specifically, when the HA singleton gets a callback that the group
>>>>> membership changes, it does not send any REMOVE-APP messages to 
>>>>> httpd on
>>>>> behalf of the member that just left.  Currently, httpd will detect the
>>>>> failure (via a disconnected socket) on its own and sets its internal
>>>>> state accordingly, e.g. a STATUS message will return NOTOK.
>>>>>
>>>>> The non-handling of dropped members is actually a good thing in the
>>>>> event of a network partition, where communication between nodes is 
>>>>> lost,
>>>>> but communication between httpd and the nodes is unaffected.  If we 
>>>>> were
>>>>> handling dropped members, we would have to handle the ugly scenario
>>>>> described here:
>>>>> https://jira.jboss.org/jira/browse/MODCLUSTER-66
>>>>>
>>>>> Jean-Frederic: a few questions...
>>>>> 1. Is it up to the AS to drive the recovery of a NOTOK node when it
>>>>> becomes functional again?
>>>> Yes.
>>>>
>>>>>  In the case of a crashed member, fresh
>>>>> CONFIG/ENABLE-APP messages will be sent upon node restart.  In the 
>>>>> case
>>>>> of a re-merged network partition, no additional messages are sent.  Is
>>>>> the subsequent STATUS message (with a non-zero lbfactor) enough to
>>>>> trigger the recovery of this node?
>>>> Yes.
>>
>> Good to know.
>>  
>>>>> 2. Can httpd detect hung nodes?  A hung node will not affect the
>>>>> connected state of the AJP/HTTP/S connector - it could only detect 
>>>>> this
>>>>> by sending data to the connector and timing out on the response.
>>>> The hung node will be detected and marked as broken but the 
>>>> corresponding request(s) may be delayed or lost due to time-out.
>>>>
>>> How long does this take, say in a typical case where the hung node 
>>> was up and running with a pool of AJP connections open? Is it the 10 
>>> secs, the default value of the "ping" property listed at 
>>> https://www.jboss.org/mod_cluster/java/properties.html#proxy ?
>>
>> I think he's talking about "nodeTimeout".
>>
> 
> There's also the mod_proxy ProxyTimeout directive which might???

Not if you set nodeTimeout.

> apply 
> since mod_proxy is what is being used. From the mod_proxy docs:
> 
> "This directive allows a user to specifiy a timeout on proxy requests. 
> This is useful when you have a slow/buggy appserver which hangs, and you 
> would rather just return a timeout and fail gracefully instead of 
> waiting however long it takes the server to return."
> 
> Default there is 300 seconds!!
> 
> It would be good to beef up the docs of this quite a bit. From the 
> mod_jk docs at 
> http://tomcat.apache.org/connectors-doc/reference/workers.html I can get 
> a pretty good idea of all the details of how mod_jk works in this area. 
> Not so much from 
> https://www.jboss.org/mod_cluster/java/properties.html#proxy 
> particularly when I factor in that it's mod_proxy/mod_proxy_ajp that's 
> actually handling requests.

Yep few lines more could help.

>>> Also, if a request is being handled by a hung node and the 
>>> HAModClusterService tells httpd to stop that node, the request will 
>>> fail, yes? It shouldn't just fail over, as it may have already caused 
>>> the transfer of my $1,000,000 to my secret account at UBS. Failing 
>>> over would cause transfer of a second $1,000,000 and sadly I don't 
>>> have that much.
>>
>> Not unlike those damn double-clickers...
>>
> 
> Ah, that's what happened to my second $1,000,000! Thanks!
> 
>>>>> And some questions for open discussion:
>>>>> What does HAModClusterService really buy us over the normal
>>>>> ModClusterService?  Do the benefits outweigh the complexity?
>>>>>  * Maintains a uniform view of proxy status across each AS node
>>>>>  * Can detect and send STOP-APP/REMOVE-APP messages on behalf of
>>>>> hung/crashed nodes (if httpd cannot already do this) (not yet
>>>>> implemented)
>>>>>    + Requires special handling of network partitions
>>>>>  * Potentially improve scalability by minimizing network traffic for
>>>>> very large clusters.
>>> Assume a near-term goal is to run a 150 node cluster with say 10 
>>> httpd servers. Assume the background thread runs every 10 seconds. 
>>> That comes to 150 connections per second across the cluster being 
>>> opened/closed to handle STATUS. Each httpd server handles 15 
>>> connections per second.
>>>
>>> With HAModClusterService the way it is now, you get the same, because 
>>> besides STATUS each node also checks its ability to communicate w/ 
>>> each httpd in order to validate its ability to become master. But 
>>> let's assume we add some complexity to allow that health check to 
>>> become much more infrequent. So ignore those ping checks. So, w/ 
>>> HAModClusterService you get 1 connection/sec being opened closed 
>>> across the cluster for status, 0.1 connection/sec per httpd.  But the 
>>> STATUS request sent across each connection has a much bigger payload.
>>
>> True, the STATUS request body is larger than the INFO request (no body),
>> but the resulting STATUS-RSP is significantly smaller than the
>> corresponding INFO-RSP.
>>
> 
> Ah, we're still using INFO for the connectivity checks? That's not good; 
> for sure we'd want to make that happen less often.

I though we agreed to add a new message/response for that in the next 
release.

Cheers

Jean-Frederic