[mod_cluster-issues] [JBoss JIRA] (MODCLUSTER-369) httpd should remove lost node/worker

Mon Nov 11 04:52:05 EST 2013

Stefano Nichele created MODCLUSTER-369:
------------------------------------------

             Summary: httpd should remove lost node/worker
                 Key: MODCLUSTER-369
                 URL: https://issues.jboss.org/browse/MODCLUSTER-369
             Project: mod_cluster
          Issue Type: Bug
    Affects Versions: 1.2.6.Final, 1.2.0.Final
         Environment: httpd 2.2.24+tomcat 7.0.37
            Reporter: Stefano Nichele
            Assignee: Jean-Frederic Clere
            Priority: Critical

supposing  this env running on amazon:
- one virtual machine running HTTPD (let's call it FE)
- two virtual machines running tomcat (let's call them NODE01 and NODE02)

If NODE02 virtual machine dies or crash for any reason, HTTPD still report it in mod_cluster-manager sometime view with status ok and sometime with status notok.
Moreover some requests are still sent to the dead node.

Please note i replicated this behavior just closing netwrok traffic using iptables on NODE02:
iptables -A OUTPUT -p tcp -m state --state NEW,ESTABLISHED  -m tcp --dport 6666 -j DROP
iptables -A INPUT -m state --state NEW,ESTABLISHED -p tcp --dport 8009 -j DROP

In this way tomcat instance is not able to send its status and HTTPD is not able to sends traffic on port 8009.

In my opinion two issues here:
1. if HTTPD doesn't recevice any STATUS from a worker, the worker mustb e at least marked as NOTOK
2. it seems that HTTPD considers the worker available during its retry policy and this causes that some requests are still forwarded to that node and it causes the worker appears as flapping.

Original threads:https://community.jboss.org/thread/234235

Marking this is as critical since amazon is one of the main player in virtualization and if on a production environment an instance disappears (it can happen in any moment) the whole production environemnet is affected (since some requests are still send to the dead node).

In attahcment httpd error log file produced using modcluster 1.2.6.

at 08:32:30 HTTPD has been restarted (with two wrokers OK)
at 08:33:10 instance NODE02 (IP: 10.2.2.2) disappears
at 08:33:25 NODE2 is marked as NOTOK
at 08:33:32 NODE02 is marked as OK
at 08:33:41 NODE2 is marked as NOTOK
...and so on...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira