[mod_cluster-issues] [JBoss JIRA] (MODCLUSTER-369) httpd should remove lost node/worker
Stefano Nichele (JIRA)
jira-events at lists.jboss.org
Mon Nov 11 04:52:05 EST 2013
Stefano Nichele created MODCLUSTER-369:
------------------------------------------
Summary: httpd should remove lost node/worker
Key: MODCLUSTER-369
URL: https://issues.jboss.org/browse/MODCLUSTER-369
Project: mod_cluster
Issue Type: Bug
Affects Versions: 1.2.6.Final, 1.2.0.Final
Environment: httpd 2.2.24+tomcat 7.0.37
Reporter: Stefano Nichele
Assignee: Jean-Frederic Clere
Priority: Critical
supposing this env running on amazon:
- one virtual machine running HTTPD (let's call it FE)
- two virtual machines running tomcat (let's call them NODE01 and NODE02)
If NODE02 virtual machine dies or crash for any reason, HTTPD still report it in mod_cluster-manager sometime view with status ok and sometime with status notok.
Moreover some requests are still sent to the dead node.
Please note i replicated this behavior just closing netwrok traffic using iptables on NODE02:
iptables -A OUTPUT -p tcp -m state --state NEW,ESTABLISHED -m tcp --dport 6666 -j DROP
iptables -A INPUT -m state --state NEW,ESTABLISHED -p tcp --dport 8009 -j DROP
In this way tomcat instance is not able to send its status and HTTPD is not able to sends traffic on port 8009.
In my opinion two issues here:
1. if HTTPD doesn't recevice any STATUS from a worker, the worker mustb e at least marked as NOTOK
2. it seems that HTTPD considers the worker available during its retry policy and this causes that some requests are still forwarded to that node and it causes the worker appears as flapping.
Original threads:https://community.jboss.org/thread/234235
Marking this is as critical since amazon is one of the main player in virtualization and if on a production environment an instance disappears (it can happen in any moment) the whole production environemnet is affected (since some requests are still send to the dead node).
In attahcment httpd error log file produced using modcluster 1.2.6.
at 08:32:30 HTTPD has been restarted (with two wrokers OK)
at 08:33:10 instance NODE02 (IP: 10.2.2.2) disappears
at 08:33:25 NODE2 is marked as NOTOK
at 08:33:32 NODE02 is marked as OK
at 08:33:41 NODE2 is marked as NOTOK
...and so on...
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the mod_cluster-issues
mailing list