[JBoss JIRA] (MODCLUSTER-288) SystemMemoryUsageLoadMetric is of no use to anyone
by Michal Babacek (JIRA)
Michal Babacek created MODCLUSTER-288:
-----------------------------------------
Summary: SystemMemoryUsageLoadMetric is of no use to anyone
Key: MODCLUSTER-288
URL: https://issues.jboss.org/browse/MODCLUSTER-288
Project: mod_cluster
Issue Type: Bug
Affects Versions: 1.2.0.Final, 1.1.1.Final, 1.1.0.Final, 1.0.10
Reporter: Michal Babacek
Assignee: Paul Ferraro
I am convinced that the [SystemMemoryUsageLoadMetric.java|https://github.com/modcluster/mod_cluste...] is of no use to anyone.
I have tested it by allocating some memory outside the JVM control with [NativeMemoryStress.c|https://github.com/Karm/clusterbench/blob/system-mem...].
The results from Mod_cluster are such that the system is almost fully loaded all the time despite the fact that I work with 8GB RAM box with almost nothing running on it.
This [picture|http://hudson.qa.jboss.com/hudson/view/mod_cluster-QE/view/mod_cl...] displays time on X axis and (100-(Load from Mod_cluster)) on Y axis. As you may observe, all 4 nodes appear to be under full load all the time and there is a spike on each of them. This spike represents a period of time when the aforementioned [NativeMemoryStress.c|https://github.com/Karm/clusterbench/blob/system-mem...] was called with the following setting:
{noformat}
megabytes 500;
milliseconds 60000;
{noformat}
What had happened is that kernel has cleared some cache and garbage data from the RAM in order to accommodate new demands, so by the time I freed the memory, there was all of sudden actually more free RAM then before.
The actual problem with the current [SystemMemoryUsageLoadMetric.java|https://github.com/modcluster/mod_cluste...] implementation using [OperatingSystemMXBean|http://docs.oracle.com/javase/6/docs/jre/api/manage...] is that this code:
{code:lang=java}
+++
public static final String FREE_MEMORY = "FreePhysicalMemorySize";
public static final String TOTAL_MEMORY = "TotalPhysicalMemorySize";
+++
double free = ((Number) this.server.getAttribute(name, FREE_MEMORY)).doubleValue();
double total = ((Number) this.server.getAttribute(name, TOTAL_MEMORY)).doubleValue();
return (total - free) / total;
+++
{code}
Is somewhat similar to this (on 8GB RAM machine with almost nothing running on at the moment):
{noformat}
[karm@killerbox karm]$ free -m
total used free shared buffers cached
Mem: 7933 6755 1177 0 44 141
-/+ buffers/cache: 6569 1363
Swap: 16383 58 16325
{noformat}
Obviously, this does not mean that I could get only 1177MB of RAM if I want to. The "free" actually means "wasted". The majority of RAM is being used as cache. If I ask for e.g. 2GB, I am gonna get them, because kernel will simply sacrifice some cache, clears some garbage from RAM and free up space so as to accommodate my demands.
So, according to my opinion, on a sane, well-behaving caching operating system, the [SystemMemoryUsageLoadMetric.java|https://github.com/modcluster/mod_cluste...] will always show ~ 98% RAM usage.
Due to this fact, this metric is unlikely to be of some use to anyone. What do you think?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years, 4 months
[JBoss JIRA] (MODCLUSTER-322) Using AverageSystemLoadMetric can improperly cause a Load Factor of 0
by Aaron Ogburn (JIRA)
Aaron Ogburn created MODCLUSTER-322:
---------------------------------------
Summary: Using AverageSystemLoadMetric can improperly cause a Load Factor of 0
Key: MODCLUSTER-322
URL: https://issues.jboss.org/browse/MODCLUSTER-322
Project: mod_cluster
Issue Type: Bug
Affects Versions: 1.2.1.Final, MOD_CLUSTER_1_0_10_GA_CP02
Environment: *JBoss Enterprise Application Platform 5
*Apache httpd
*mod_cluster 1.0.10.GA_CP02 or 1.2.1
Reporter: Aaron Ogburn
Assignee: Jean-Frederic Clere
Fix For: MOD_CLUSTER_1_0_10_GA_CP03
It looks like AverageSystemLoadMetric is not properly implemented. When using it, mod_cluster may always report a load factor of 0, thus making the JBoss node unreachable from Apache. We've tested with a simple web app that checks the underlying MXBean SystemLoadAverage:
Double.class.cast(server.getAttribute(ObjectName.getInstance(ManagementFactory.OPERATING_SYSTEM_MXBEAN_NAME), "SystemLoadAverage")).doubleValue()
This info is grabbed pretty much the same way mod_cluster does, but these calls appeared to work just fine outside of mod_cluster as it returns the following values from my test app:
12:08:26,519 INFO [STDOUT] From MBeanServer: 1.81640625
So the issue does not appear to be necessarily with the underlying JDK/MXBean call but with how mod_cluster is handling the data grabbed from it. But the root cause here appears to be that a value above 1 is being returned, and it looks like mod_cluster is expecting metrics to return a 0-1 percentile based range.
The way the load is determined allows the AverageSystemLoadMetric to improperly exceed its weight. For example if it were weighted as 2 and another metric was weighted at 1 (say RequestCountLoadMetric with a capacity of 1000), then AverageSystemLoadMetric should only be able to account for 67% of the load. But here we can see AverageSystemLoadMetric can out run its weight and really account for 100% of the load. So let's say AverageSystemLoadMetric is the above seen 1.81640625 value with 100 requests/second, putting RequestCountLoadMetric at .1 load, so DynamicLoadBalanceFactorProvider would calculate the load factor like so:
int load = (int) Math.round(100 * totalWeightedLoad / totalWeight);
int load = (int) Math.round(100 * (1.81640625 * 2 + 0.1) / 3);
load = 124.4
And that gets truncated down to 100 so AverageSystemLoadMetric comes to represent really all of the load. But if a 2 weight metric is at its max and a 1 weight metric is at .1%, then their total load should just be ~70-71%.
S is mod_cluster assuming that the SystemLoadAverage will always be between 0 and 1? Does it look like mod_cluster is not properly scaling this metric? Do we know a definite max return to expect from OperatingSystemMXBean.getSystemLoadAverage() so that this metric can be scaled more in line with the other percentile based ones? Or should a user definable max capacity be implemented into this metric as it is with others?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years, 5 months