[JBoss JIRA] (MODCLUSTER-288) SystemMemoryUsageLoadMetric is of no use to anyone
by Michal Babacek (JIRA)
Michal Babacek created MODCLUSTER-288:
-----------------------------------------
Summary: SystemMemoryUsageLoadMetric is of no use to anyone
Key: MODCLUSTER-288
URL: https://issues.jboss.org/browse/MODCLUSTER-288
Project: mod_cluster
Issue Type: Bug
Affects Versions: 1.2.0.Final, 1.1.1.Final, 1.1.0.Final, 1.0.10
Reporter: Michal Babacek
Assignee: Paul Ferraro
I am convinced that the [SystemMemoryUsageLoadMetric.java|https://github.com/modcluster/mod_cluste...] is of no use to anyone.
I have tested it by allocating some memory outside the JVM control with [NativeMemoryStress.c|https://github.com/Karm/clusterbench/blob/system-mem...].
The results from Mod_cluster are such that the system is almost fully loaded all the time despite the fact that I work with 8GB RAM box with almost nothing running on it.
This [picture|http://hudson.qa.jboss.com/hudson/view/mod_cluster-QE/view/mod_cl...] displays time on X axis and (100-(Load from Mod_cluster)) on Y axis. As you may observe, all 4 nodes appear to be under full load all the time and there is a spike on each of them. This spike represents a period of time when the aforementioned [NativeMemoryStress.c|https://github.com/Karm/clusterbench/blob/system-mem...] was called with the following setting:
{noformat}
megabytes 500;
milliseconds 60000;
{noformat}
What had happened is that kernel has cleared some cache and garbage data from the RAM in order to accommodate new demands, so by the time I freed the memory, there was all of sudden actually more free RAM then before.
The actual problem with the current [SystemMemoryUsageLoadMetric.java|https://github.com/modcluster/mod_cluste...] implementation using [OperatingSystemMXBean|http://docs.oracle.com/javase/6/docs/jre/api/manage...] is that this code:
{code:lang=java}
+++
public static final String FREE_MEMORY = "FreePhysicalMemorySize";
public static final String TOTAL_MEMORY = "TotalPhysicalMemorySize";
+++
double free = ((Number) this.server.getAttribute(name, FREE_MEMORY)).doubleValue();
double total = ((Number) this.server.getAttribute(name, TOTAL_MEMORY)).doubleValue();
return (total - free) / total;
+++
{code}
Is somewhat similar to this (on 8GB RAM machine with almost nothing running on at the moment):
{noformat}
[karm@killerbox karm]$ free -m
total used free shared buffers cached
Mem: 7933 6755 1177 0 44 141
-/+ buffers/cache: 6569 1363
Swap: 16383 58 16325
{noformat}
Obviously, this does not mean that I could get only 1177MB of RAM if I want to. The "free" actually means "wasted". The majority of RAM is being used as cache. If I ask for e.g. 2GB, I am gonna get them, because kernel will simply sacrifice some cache, clears some garbage from RAM and free up space so as to accommodate my demands.
So, according to my opinion, on a sane, well-behaving caching operating system, the [SystemMemoryUsageLoadMetric.java|https://github.com/modcluster/mod_cluste...] will always show ~ 98% RAM usage.
Due to this fact, this metric is unlikely to be of some use to anyone. What do you think?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years, 4 months
[JBoss JIRA] Created: (MODCLUSTER-91) Connector bind address of 0.0.0.0 propagated to proxy
by Brian Stansberry (JIRA)
Connector bind address of 0.0.0.0 propagated to proxy
-----------------------------------------------------
Key: MODCLUSTER-91
URL: https://jira.jboss.org/jira/browse/MODCLUSTER-91
Project: mod_cluster
Issue Type: Bug
Affects Versions: 1.0.1.GA
Reporter: Brian Stansberry
Assignee: Jean-Frederic Clere
Marek Goldmann wrote:
> I'm encountered a strange error. When I bind JBoss instance to 0.0.0.0
> address instead of a fixed ethernet address, node gets registered in
> mod_cluster, shows in mod_cluster-manager, but every request to
> registered contexts throws 503 error.
>
> httpd error log:
>
> [Fri Aug 07 03:21:05 2009] [error] (111)Connection refused: proxy:
> ajp: attempt to connect to 0.0.0.0:8009 (0.0.0.0) failed
> [Fri Aug 07 03:21:05 2009] [error] ap_proxy_connect_backend disabling
> worker for (0.0.0.0)
> [Fri Aug 07 03:21:15 2009] [error] proxy: ajp: disabled connection for
> (0.0.0.0)
> [Fri Aug 07 03:21:25 2009] [error] proxy: ajp: disabled connection for
> (0.0.0.0)
>
> This looks like a bug for me, because many administrators are binding
> JBoss to 0.0.0.0.
The java side needs to understand that 0.0.0.0 is useless as a client address and send something useful. Trick is deciding what's useful.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years, 5 months
[JBoss JIRA] (MODCLUSTER-311) mod_manager doesn't handle multiple virtualhosts per node
by Simone Gotti (JIRA)
Simone Gotti created MODCLUSTER-311:
---------------------------------------
Summary: mod_manager doesn't handle multiple virtualhosts per node
Key: MODCLUSTER-311
URL: https://issues.jboss.org/browse/MODCLUSTER-311
Project: mod_cluster
Issue Type: Bug
Affects Versions: 1.2.1.Final
Environment: RedHat EL 6.2, httpd-2.2.15-15.el6
Reporter: Simone Gotti
Assignee: Jean-Frederic Clere
Hi,
I was experimenting with mod_cluster and jboss as 7.1 configured with multiple virtualhosts.
My simple test was made of a single node (as instance) with 2 virtualhosts (site01 and site02) and 2 applications respectively deployed on one of the two vhosts.
I noticed that mod_manager was inserting the aliases of the 2 jboss vhosts in the same virtualhost (same vhost id):
{noformat}
balancer: [1] Name: balancer01 Sticky: 1 [JSESSIONID]/[jsessionid] remove: 0 force: 0 Timeout: 0 maxAttempts: 1
node: [1:1],Balancer: balancer01,JVMRoute: bf3c1d57-ed66-38b4-838d-0cba532b6737,LBGroup: [],Host: 192.168.122.21,Port: 8259,Type: ajp,flushpackets: 0,flushwait: 10,ping: 10,smax: 1,ttl: 60,timeout: 0
host: 1 [site01] vhost: 1 node: 1
host: 2 [site02] vhost: 1 node: 1
context: 1 [/context01] vhost: 1 node: 1 status: 1
context: 2 [/context02] vhost: 1 node: 1 status: 1
{noformat}
Now, looking at the mod_manager.c code I noticed that, inside process_appl_cmd, if the first alias name (I assume they always come in order and the first one provided in the ENABLE-APP MCMP command is always the jboss vhost default-name) doesn't exists in the hoststatsmem table then a new one is created with a fixed vhost id of 1 (as the comment says):
host = read_host(hoststatsmem, &hostinfo);
if (host == NULL) {
int vid = 1; /* XXX: That is not really the right value, but that works most time */
I tried to fix this trying to calculate the first available vhost id (see first part of the patch attached below)
>From my tests this seems to work (tried deploy, undeploy of various apps on different hosts and context). This also means that the logic inside mod_proxy_cluster looks right and correctly choose the right balancer (and sends the request to the backend only if the requested context inside the requestes vhost is defined).
{noformat}
balancer: [1] Name: balancer01 Sticky: 1 [JSESSIONID]/[jsessionid] remove: 0 force: 0 Timeout: 0 maxAttempts: 1
node: [1:1],Balancer: balancer01,JVMRoute: bf3c1d57-ed66-38b4-838d-0cba532b6737,LBGroup: [],Host: 192.168.122.21,Port: 8259,Type: ajp,flushpackets: 0,flushwait: 10,ping: 10,smax: 1,ttl: 60,timeout: 0
host: 1 [site02] vhost: 1 node: 1
host: 2 [site01] vhost: 2 node: 1
context: 1 [/context01] vhost: 2 node: 1 status: 1
context: 2 [/context02] vhost: 1 node: 1 status: 1
{noformat}
Then I tried adding some aliases on the jboss virtualhosts. On ENABLE it worked. Instead, during REMOVE, only the vhost default-name (the first Alias in the MCMP command) was removed keeping the other aliases and so the vhost (and giving problems during another ENABLE as it created another virtualhost only for the first alias).
On ENABLE:
{noformat}
balancer: [1] Name: balancer01 Sticky: 1 [JSESSIONID]/[jsessionid] remove: 0 force: 0 Timeout: 0 maxAttempts: 1
node: [1:1],Balancer: balancer01,JVMRoute: bf3c1d57-ed66-38b4-838d-0cba532b6737,LBGroup: [],Host: 192.168.122.21,Port: 8259,Type: ajp,flushpackets: 0,flushwait: 10,ping: 10,smax: 1,ttl: 60,timeout: 0
host: 1 [site01] vhost: 1 node: 1
host: 2 [site01alias01] vhost: 1 node: 1
host: 3 [site02] vhost: 2 node: 1
context: 1 [/context01] vhost: 1 node: 1 status: 1
context: 2 [/context02] vhost: 2 node: 1 status: 1
{noformat}
On REMOVE:
{noformat}
balancer: [1] Name: balancer01 Sticky: 1 [JSESSIONID]/[jsessionid] remove: 0 force: 0 Timeout: 0 maxAttempts: 1
node: [1:1],Balancer: balancer01,JVMRoute: bf3c1d57-ed66-38b4-838d-0cba532b6737,LBGroup: [],Host: 192.168.122.21,Port: 8259,Type: ajp,flushpackets: 0,flushwait: 10,ping: 10,smax: 1,ttl: 60,timeout: 0
host: 2 [site01alias01] vhost: 1 node: 1
host: 3 [site02] vhost: 2 node: 1
context: 2 [/context02] vhost: 2 node: 1 status: 1
{noformat}
To fix this, always inside process_appl_cmd I noticed that it was removing only the first host. So I modified it to remove all the hosts of that node with that vhost id.
This is the patch I made trying to fix this:
{noformat}
Index: mod_manager.c
===================================================================
--- mod_manager.c (revision 840)
+++ mod_manager.c (working copy)
@@ -1341,10 +1341,26 @@
hostinfo.id = 0;
host = read_host(hoststatsmem, &hostinfo);
if (host == NULL) {
- int vid = 1; /* XXX: That is not really the right value, but that works most time */
+
/* If REMOVE ignores it */
if (status == REMOVE)
return NULL;
+
+ /* Find the first available vhost id */
+ /* XXX: This can be racy if another request from the same node comes in the middle */
+ int vid = 1;
+ int size = loc_get_max_size_host();
+ int *id = apr_palloc(r->pool, sizeof(int) * size);
+ size = get_ids_used_host(hoststatsmem, id);
+ for (i=0; i<size; i++) {
+ hostinfo_t *ou;
+ if (get_host(hoststatsmem, &ou, id[i]) != APR_SUCCESS)
+ continue;
+
+ if(ou->vhost == vid && ou->node == node->mess.id)
+ vid++;
+ }
+
/* If the Host doesn't exist yet create it */
if (insert_update_hosts(hoststatsmem, vhost->host, node->mess.id, vid) != APR_SUCCESS) {
*errtype = TYPEMEM;
@@ -1384,7 +1400,18 @@
}
if (i==size) {
hostinfo.id = host->id;
- remove_host(hoststatsmem, &hostinfo);
+
+ int size = loc_get_max_size_host();
+ int *id = apr_palloc(r->pool, sizeof(int) * size);
+ size = get_ids_used_host(hoststatsmem, id);
+ for (i=0; i<size; i++) {
+ hostinfo_t *ou;
+
+ if (get_host(hoststatsmem, &ou, id[i]) != APR_SUCCESS)
+ continue;
+ if(ou->vhost == host->vhost && ou->node == node->mess.id)
+ remove_host(hoststatsmem, ou);
+ }
}
} else if (status == STOPPED) {
/* insert_update_contexts in fact makes that vhost->context corresponds only to the first context... */
{noformat}
As discussed on the forum, during ENABLE, some concurrency problems may happen. Probably this can create problems only if the same node launches multiple concurrent ENABLE-APP commands (I don't know if this can happen on the as side).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years, 6 months
[JBoss JIRA] (MODCLUSTER-314) mod_cluster: HTTP 404 on node shutdown with pure IPv6 setup
by Michal Babacek (JIRA)
Michal Babacek created MODCLUSTER-314:
-----------------------------------------
Summary: mod_cluster: HTTP 404 on node shutdown with pure IPv6 setup
Key: MODCLUSTER-314
URL: https://issues.jboss.org/browse/MODCLUSTER-314
Project: mod_cluster
Issue Type: Bug
Affects Versions: 1.2.1.Final
Environment: RHEL 6 x86_64, pure IPv6, *Apache/2.2.21* (Unix) *mod_cluster/1.2.1.Final*
Reporter: Michal Babacek
Assignee: Jean-Frederic Clere
Priority: Critical
As a follow up on
* [JBPAPP-9195] mod_cluster: HTTP 503 on node shutdown with pure IPv6 setup
I have tried this mod_cluster + httpd bundle featuring *Apache/2.2.21* (Unix) *mod_cluster/1.2.1.Final* (unlike in [JBPAPP-9195] where we used Apache/2.2.17 (Unix) DAV/2 mod_cluster/1.2.1.Beta2)
* [mod_cluster-1.2.1.Final-linux2-x64.tar.gz|http://hudson.qa.jboss.com/huds...]
the result is surprising: Very frequent HTTP 404 errors on node shutdown.
h3. Http client
I have a curl client issuing requests to [2620:52:0:105f::ffff:c]:8888/SessionTest/hostname periodically, delay being 1 s. Note that there is always a new session for each request (no JSESSIONID stuff anywhere). There are two nodes I switch off and on randomly, always giving enough time so as the starting one may take off safely.
{noformat}
Wed May 30 17:00:13 EDT 2012 [2620:52:0:105f::ffff:c]:8888 0
+++ No errors in meanwhile +++
Wed May 30 17:05:24 EDT 2012 [2620:52:0:105f::ffff:c]:8888 0
Wed May 30 17:05:25 EDT 2012 404 Not Found The requested URL /SessionTest/hostname was not found on this server.
+++ HTTP 404 errors keep showing up every second +++
Wed May 30 17:05:58 EDT 2012 404 Not Found The requested URL /SessionTest/hostname was not found on this server.
Wed May 30 17:05:59 EDT 2012 [2620:52:0:105f::ffff:c]:8888 0
+++ No errors in meanwhile +++
Wed May 30 17:06:03 EDT 2012 [2620:52:0:105f::ffff:c]:8888 0
Wed May 30 17:06:04 EDT 2012 404 Not Found The requested URL /SessionTest/hostname was not found on this server.
+++ HTTP 404 errors keep showing up every second +++
Wed May 30 17:06:08 EDT 2012 404 Not Found The requested URL /SessionTest/hostname was not found on this server.
Wed May 30 17:06:09 EDT 2012 [2620:52:0:105f::ffff:c]:8888 0
+++ No errors in meanwhile +++
Wed May 30 17:06:25 EDT 2012 [2620:52:0:105f::ffff:c]:8888 0
{noformat}
please, note the time stamps marking HTTP 404 errors, we will match them against the attached debug logs.
h4. IO error
(i) *Note:* At *17:05:24* node vmg36 was switched off and vmg35 (up and running by that time) was supposed to take over. What actually happened with *vmg35* was the undermentioned *IO error sending command CONFIG to proxy* exception at *17:05:29*, which is 5 seconds after the vmg36's shutdown. Hmmm...was httpd somehow too busy to accept the command?
h3. Worker nodes
The configuration is exactly the same as in [JBPAPP-9195], I just swapped the balancer. If you take a look at the attached
* node-vmg35-Ctrl+C-log.zip
* node-vmg36-Ctrl+C-log.zip
you may observe the shutdown time stamps ( *^C* ) as well as several exceptions:
*vmg35, IP:2620:52:0:105f:0:0:ffff:c, JvmRoute:f49689d6-cdbb-3015-a642-f8200ea456ff*
* 17:04:26,550 WARN [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] Problems unmarshalling remote command from byte buffer: java.lang.NullPointerException
* 17:05:29,133 INFO [org.jboss.modcluster.mcmp.impl.DefaultMCMPHandler] (ContainerBackgroundProcessor[StandardEngine[jboss.web]]) IO error sending command CONFIG to proxy
2620:52:0:105f:0:0:ffff:c/2620:52:0:105f:0:0:ffff:c:8888: java.net.SocketTimeoutException: Read timed out
*vmg36, IP:2620:52:0:105f::ffff:0, JvmRoute:dc7bd552-a020-3d08-acee-4ae3e0f178a8*
* 17:03:36,275 WARN [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] Problems unmarshalling remote command from byte buffer: java.lang.NullPointerException
h3. Httpd
There is the attached *error_log_report.zip* I am about to investigate. I have not managed to see what was wrong yet.
The promising reading probably lay between *17:05:24* and *17:05:29* throughout to the glitch at *17:05:59* and *17:05:58*.
To be continued...
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years, 6 months
[JBoss JIRA] (MODCLUSTER-316) CLONE - mod_cluster: HTTP 404 on node shutdown with pure IPv6 setup
by Michal Babacek (JIRA)
Michal Babacek created MODCLUSTER-316:
-----------------------------------------
Summary: CLONE - mod_cluster: HTTP 404 on node shutdown with pure IPv6 setup
Key: MODCLUSTER-316
URL: https://issues.jboss.org/browse/MODCLUSTER-316
Project: mod_cluster
Issue Type: Bug
Affects Versions: 1.2.1.Final
Environment: RHEL 6 x86_64, pure IPv6, *Apache/2.2.21* (Unix) *mod_cluster/1.2.1.Final*
Reporter: Michal Babacek
Assignee: Jean-Frederic Clere
Priority: Critical
Attachments: access_log_report.zip, error_log_report.zip, httpd.conf.zip, node-vmg35-Ctrl+C-log.zip, node-vmg36-Ctrl+C-log.zip
As a follow up on
* [JBPAPP-9195] mod_cluster: HTTP 503 on node shutdown with pure IPv6 setup
I have tried this mod_cluster + httpd bundle featuring *Apache/2.2.21* (Unix) *mod_cluster/1.2.1.Final* (unlike in [JBPAPP-9195] where we used Apache/2.2.17 (Unix) DAV/2 mod_cluster/1.2.1.Beta2)
* [mod_cluster-1.2.1.Final-linux2-x64.tar.gz|http://hudson.qa.jboss.com/huds...]
the result is surprising: Very frequent HTTP 404 errors on node shutdown.
h3. Http client
I have a curl client issuing requests to [2620:52:0:105f::ffff:c]:8888/SessionTest/hostname periodically, delay being 1 s. Note that there is always a new session for each request (no JSESSIONID stuff anywhere). There are two nodes I switch off and on randomly, always giving enough time so as the starting one may take off safely.
{noformat}
Wed May 30 17:00:13 EDT 2012 [2620:52:0:105f::ffff:c]:8888 0
+++ No errors in meanwhile +++
Wed May 30 17:05:24 EDT 2012 [2620:52:0:105f::ffff:c]:8888 0
Wed May 30 17:05:25 EDT 2012 404 Not Found The requested URL /SessionTest/hostname was not found on this server.
+++ HTTP 404 errors keep showing up every second +++
Wed May 30 17:05:58 EDT 2012 404 Not Found The requested URL /SessionTest/hostname was not found on this server.
Wed May 30 17:05:59 EDT 2012 [2620:52:0:105f::ffff:c]:8888 0
+++ No errors in meanwhile +++
Wed May 30 17:06:03 EDT 2012 [2620:52:0:105f::ffff:c]:8888 0
Wed May 30 17:06:04 EDT 2012 404 Not Found The requested URL /SessionTest/hostname was not found on this server.
+++ HTTP 404 errors keep showing up every second +++
Wed May 30 17:06:08 EDT 2012 404 Not Found The requested URL /SessionTest/hostname was not found on this server.
Wed May 30 17:06:09 EDT 2012 [2620:52:0:105f::ffff:c]:8888 0
+++ No errors in meanwhile +++
Wed May 30 17:06:25 EDT 2012 [2620:52:0:105f::ffff:c]:8888 0
{noformat}
please, note the time stamps marking HTTP 404 errors, we will match them against the attached debug logs.
h4. IO error
(i) *Note:* At *17:05:24* node vmg36 was switched off and vmg35 (up and running by that time) was supposed to take over. What actually happened with *vmg35* was the undermentioned *IO error sending command CONFIG to proxy* exception at *17:05:29*, which is 5 seconds after the vmg36's shutdown. Hmmm...was httpd somehow too busy to accept the command?
(i) *Note:* Does the fact that nodes are talking via proxy-01.mw.lab.eng.bos.redhat.com (squid/3.1.10) anything to do with the problem on hand?
h3. Worker nodes
The configuration is exactly the same as in [JBPAPP-9195], I just swapped the balancer. If you take a look at the attached
* node-vmg35-Ctrl+C-log.zip
* node-vmg36-Ctrl+C-log.zip
you may observe the shutdown time stamps ( *^C* ) as well as several exceptions:
*vmg35, IP:2620:52:0:105f:0:0:ffff:c, JvmRoute:f49689d6-cdbb-3015-a642-f8200ea456ff*
* 17:04:26,550 WARN [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] Problems unmarshalling remote command from byte buffer: java.lang.NullPointerException
* 17:05:29,133 INFO [org.jboss.modcluster.mcmp.impl.DefaultMCMPHandler] (ContainerBackgroundProcessor[StandardEngine[jboss.web]]) IO error sending command CONFIG to proxy
2620:52:0:105f:0:0:ffff:c/2620:52:0:105f:0:0:ffff:c:8888: java.net.SocketTimeoutException: Read timed out
*vmg36, IP:2620:52:0:105f::ffff:0, JvmRoute:dc7bd552-a020-3d08-acee-4ae3e0f178a8*
* 17:03:36,275 WARN [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] Problems unmarshalling remote command from byte buffer: java.lang.NullPointerException
h3. Httpd
There is the attached *error_log_report.zip* I am about to investigate. I have not managed to see what was wrong yet.
The promising reading probably lay between *17:05:24* and *17:05:29* throughout to the glitch at *17:05:59* and *17:05:58*.
(i) *Note:* I have not yet carried the IPv4/IPv6 comparison out, the fact that this issue is IPv6 / network related is just a suspicion.
To be continued...
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years, 6 months