mod_cluster-dev March 2009

mod_cluster-dev@lists.jboss.org

4 participants
3 discussions

by Paul Ferraro

Currently, the HAModClusterService (where httpd communication is coordinated by an HA singleton) does not react to crashed/hung members. Specifically, when the HA singleton gets a callback that the group membership changes, it does not send any REMOVE-APP messages to httpd on behalf of the member that just left. Currently, httpd will detect the failure (via a disconnected socket) on its own and sets its internal state accordingly, e.g. a STATUS message will return NOTOK. The non-handling of dropped members is actually a good thing in the event of a network partition, where communication between nodes is lost, but communication between httpd and the nodes is unaffected. If we were handling dropped members, we would have to handle the ugly scenario described here: https://jira.jboss.org/jira/browse/MODCLUSTER-66 Jean-Frederic: a few questions... 1. Is it up to the AS to drive the recovery of a NOTOK node when it becomes functional again? In the case of a crashed member, fresh CONFIG/ENABLE-APP messages will be sent upon node restart. In the case of a re-merged network partition, no additional messages are sent. Is the subsequent STATUS message (with a non-zero lbfactor) enough to trigger the recovery of this node? 2. Can httpd detect hung nodes? A hung node will not affect the connected state of the AJP/HTTP/S connector - it could only detect this by sending data to the connector and timing out on the response. And some questions for open discussion: What does HAModClusterService really buy us over the normal ModClusterService? Do the benefits outweigh the complexity? * Maintains a uniform view of proxy status across each AS node * Can detect and send STOP-APP/REMOVE-APP messages on behalf of hung/crashed nodes (if httpd cannot already do this) (not yet implemented) + Requires special handling of network partitions * Potentially improve scalability by minimizing network traffic for very large clusters. e.g. non-masters ping httpd less often * Anything else? Paul

15 years, 4 months

4
11
0 / 0

mod_cluster 1.0.0.CR1 released

by Paul Ferraro

Continuing the steady march towards a final release, the mod_cluster team is proud to announce its first candidate for release. mod_cluster is a new httpd-based load balancer for use with JBoss AS, JBoss Web, and Tomcat. Get it here: http://www.jboss.org/mod_cluster/downloads/latest/ Change log: http://www.jboss.org/mod_cluster/changelog.html

15 years, 4 months

1
0
0 / 0

Problems with Beta4

by Brian Stansberry

Following is a list of issues I saw when playing with Beta4 on Windows. Apologies if some of these are known issues / already fixed. I'll scan JIRA now and open issues for any I don't see. 1) Undeploy an app or shut down server, clients with an existing session do not fail over. Following from access_log shows the issue. Last 404 occurs a couple seconds after the REMOVE-APP, so doesn't seem to be a race. > 192.168.2.3 - - [16/Mar/2009:16:07:48 +0100] "STOP-APP / HTTP/1.0" 200 - > 127.0.0.1 - - [16/Mar/2009:16:07:48 +0100] "GET /load-demo/record HTTP/1.1" 503 1086 > 127.0.0.1 - - [16/Mar/2009:16:07:48 +0100] "GET /load-demo/record HTTP/1.1" 503 1086 > 127.0.0.1 - - [16/Mar/2009:16:07:48 +0100] "GET /load-demo/record HTTP/1.1" 200 21 > 127.0.0.1 - - [16/Mar/2009:16:07:48 +0100] "GET /load-demo/record HTTP/1.1" 503 1086 > 127.0.0.1 - - [16/Mar/2009:16:07:48 +0100] "GET /load-demo/record HTTP/1.1" 503 1086 > 127.0.0.1 - - [16/Mar/2009:16:07:48 +0100] "GET /load-demo/record HTTP/1.1" 503 1086 > 192.168.2.3 - - [16/Mar/2009:16:07:48 +0100] "REMOVE-APP / HTTP/1.0" 200 - > 192.168.2.3 - - [16/Mar/2009:16:07:48 +0100] "STATUS / HTTP/1.0" 200 59 > 127.0.0.1 - - [16/Mar/2009:16:07:49 +0100] "GET /load-demo/record HTTP/1.1" 404 999 > 127.0.0.1 - - [16/Mar/2009:16:07:49 +0100] "GET /load-demo/record HTTP/1.1" 200 21 > 127.0.0.1 - - [16/Mar/2009:16:07:49 +0100] "GET /load-demo/record HTTP/1.1" 200 21 > 127.0.0.1 - - [16/Mar/2009:16:07:49 +0100] "GET /load-demo/record HTTP/1.1" 404 999 > 127.0.0.1 - - [16/Mar/2009:16:07:49 +0100] "GET /load-demo/record HTTP/1.1" 200 21 > 127.0.0.1 - - [16/Mar/2009:16:07:49 +0100] "GET /load-demo/record HTTP/1.1" 404 999 > 127.0.0.1 - - [16/Mar/2009:16:07:49 +0100] "GET /load-demo/record HTTP/1.1" 404 999 > 127.0.0.1 - - [16/Mar/2009:16:07:49 +0100] "GET /load-demo/record HTTP/1.1" 200 21 > 127.0.0.1 - - [16/Mar/2009:16:07:49 +0100] "GET /load-demo/record HTTP/1.1" 200 21 > 127.0.0.1 - - [16/Mar/2009:16:07:49 +0100] "GET /load-demo/record HTTP/1.1" 200 21 > 127.0.0.1 - - [16/Mar/2009:16:07:49 +0100] "GET /load-demo/record HTTP/1.1" 200 21 > 127.0.0.1 - - [16/Mar/2009:16:07:49 +0100] "GET /load-demo/record HTTP/1.1" 200 21 > 127.0.0.1 - - [16/Mar/2009:16:07:49 +0100] "GET /load-demo/record HTTP/1.1" 404 999 > 127.0.0.1 - - [16/Mar/2009:16:07:49 +0100] "GET /load-demo/record HTTP/1.1" 200 21 > 127.0.0.1 - - [16/Mar/2009:16:07:49 +0100] "GET /load-demo/record HTTP/1.1" 200 21 > 127.0.0.1 - - [16/Mar/2009:16:07:49 +0100] "GET /load-demo/record HTTP/1.1" 200 21 > 127.0.0.1 - - [16/Mar/2009:16:07:49 +0100] "GET /load-demo/record HTTP/1.1" 404 999 > 127.0.0.1 - - [16/Mar/2009:16:07:49 +0100] "GET /load-demo/record HTTP/1.1" 200 21 > 127.0.0.1 - - [16/Mar/2009:16:07:49 +0100] "GET /load-demo/record HTTP/1.1" 200 21 > 127.0.0.1 - - [16/Mar/2009:16:07:49 +0100] "GET /load-demo/record?destroy=true HTTP/1.1" 200 21 > 127.0.0.1 - - [16/Mar/2009:16:07:49 +0100] "GET /load-demo/record HTTP/1.1" 404 999 > 192.168.2.3 - - [16/Mar/2009:16:07:49 +0100] "STATUS / HTTP/1.0" 200 59 > 192.168.2.3 - - [16/Mar/2009:16:07:50 +0100] "STATUS / HTTP/1.0" 200 59 > 127.0.0.1 - - [16/Mar/2009:16:07:51 +0100] "GET /load-demo/record HTTP/1.1" 404 999 > 127.0.0.1 - - [16/Mar/2009:16:07:51 +0100] "GET /load-demo/record HTTP/1.1" 200 21 > 127.0.0.1 - - [16/Mar/2009:16:07:51 +0100] "GET /load-demo/record?destroy=true HTTP/1.1" 200 21 > 127.0.0.1 - - [16/Mar/2009:16:07:51 +0100] "GET /load-demo/record HTTP/1.1" 404 999 > 127.0.0.1 - - [16/Mar/2009:16:07:51 +0100] "GET /load-demo/record HTTP/1.1" 404 999 > 127.0.0.1 - - [16/Mar/2009:16:07:51 +0100] "GET /load-demo/record HTTP/1.1" 404 999 > 127.0.0.1 - - [16/Mar/2009:16:07:51 +0100] "GET /load-demo/record HTTP/1.1" 404 999 > 127.0.0.1 - - [16/Mar/2009:16:07:51 +0100] "GET /load-demo/record HTTP/1.1" 404 999 > 127.0.0.1 - - [16/Mar/2009:16:07:51 +0100] "GET /load-demo/record HTTP/1.1" 200 21 > 127.0.0.1 - - [16/Mar/2009:16:07:51 +0100] "GET /load-demo/record HTTP/1.1" 200 21 > 127.0.0.1 - - [16/Mar/2009:16:07:51 +0100] "GET /load-demo/record HTTP/1.1" 404 999 > 127.0.0.1 - - [16/Mar/2009:16:07:51 +0100] "GET /load-demo/record HTTP/1.1" 200 21 > 127.0.0.1 - - [16/Mar/2009:16:07:51 +0100] "GET /load-demo/record HTTP/1.1" 200 21 > 127.0.0.1 - - [16/Mar/2009:16:07:51 +0100] "GET /load-demo/record HTTP/1.1" 404 999 > 127.0.0.1 - - [16/Mar/2009:16:07:51 +0100] "GET /load-demo/record HTTP/1.1" 404 999 2) When you run with HAModClusterService, every 10 seconds there is logging about a DRM replicantsChanged event and a new HASingletonMaster election. (The election just picks the existing master.) That means the DRM is being updated even when nothing has changed, which shouldn't happen. 3) To get advertise to work, I had to add a AdvertiseGroup 224.0.1.105:23364 directive to httpd.conf. The docs on jboss.org imply that shouldn't be necessary since the value is just the default. 4) The mod_cluster-manager status page reports Transfered: 0, Connected: 0, Load: 0 Num sessions: 0 for all nodes, always; doesn't ever report actual data. Also "Transfered" should be "Transferred" 5) The mod_cluster-manager status page "SessionIDs" section lists session ids, which is a security violation. Jean-Frederic, you mentioned you wanted to remove this. In case you haven't, I tried to disable it by setting Maxsessionid 0 in httpd.conf, but that had no effect. 6) Playing with the demo's "Server Load Control" tab I tried to use the "Heap Memory Use" control. I couldn't get this to have any effect on load balancing. a) The servlet isn't multiplying the duration value by 1000 to convert seconds to ms. I'll fix this in just a sec after I send this. b) but, even after adjusting for this I couldn't get any load balancing effect by using "Heap Memory Use". Looking at the process in Task Manager, it seemed the servlet was increasing heap usage. So I'm concerned there is an issue with the load metric. 7) Go into jmx-console, jboss.web:service=ModClusterService, invoke the "disable" operation. Node logs this in server.log: 009-03-16 17:15:58,765 ERROR [org.jboss.modcluster.mcmp.impl.DefaultMCMPHandler] (http-192.168.2.2-8080-2) Error [null: null: {4}] sending command DUMP to proxy 192.168.2.3:6666, configuration will be reset 2009-03-16 17:16:55,250 ERROR [org.jboss.modcluster.mcmp.impl.DefaultMCMPHandler] (http-192.168.2.2-8080-2) Error [null: null: {4}] sending command DISABLE-APP to proxy 192.168.2.3:6666, configuration will be reset -- Brian Stansberry Lead, AS Clustering JBoss, a division of Red Hat brian.stansberry(a)redhat.com

15 years, 4 months

3
2
0 / 0

← Newer
1
Older →

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

mod_cluster-dev March 2009