]
Michal Babacek edited comment on MODCLUSTER-372 at 11/18/13 4:57 AM:
---------------------------------------------------------------------
I've identified these bottle necks:
Three the most expensive functions and their callers.They keep this order regardless of
the event metric being Instruction fetch/Data read/Data write.
# Function {{ap_slotmem_mem}} from mod_slotmem/sharedmem_util.c called by:
* {{get_context}} from mod_manager/context.c
* {{get_node}} from mod_manager/node.c
* {{loc_read_node}} from mod_manager/mod_manager.c
* {{find_node_context_host}} from mod_proxy_cluster/mod_proxy_cluster.c
* {{loc_read_context}} mod_manager/context.c
* {{read_context_table}} mod_proxy_cluster/mod_proxy_cluster.c
* {{manager_info}} from mod_manager/mod_manager.c
# Function {{ap_slotmem_do}} from mod_slotmem/sharedmem_util.c called mostly by
* httpd_request and httpd_core functions from httpd sources.
# Function {{find_node_context_host}} from mod_proxy_cluster/mod_proxy_cluster.c called
by
* httpd sources, mostly on request_process and request_connection.
So, IMHO, {{find_node_context_host}} is indeed the trouble :)
Attaching the profiling valgrind logs [^callgrind.zip], created as:
{noformat}
valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes --collect-jumps=yes
--compress-strings=no --compress-pos=no --collect-systime=yes ./httpd -f
/tmp/hudson/httpd/conf/httpd.conf -E /tmp/hudson/httpd/logs/httpd.log
{noformat}
with this debug setting:
{code}
<IfModule worker.c>
ThreadLimit 50
StartServers 1
ServerLimit 1
MinSpareThreads 50
MaxSpareThreads 50
MaxClients 50
ThreadsPerChild 50
MaxRequestsPerChild 0
</IfModule>
{code}
and 4 worker nodes, 65 contexts each, and several dozens of concurrent client sessions.
was (Author: mbabacek):
I've identified these bottle necks:
Three the most expensive functions and their callers:
# Function {{ap_slotmem_mem}} from mod_slotmem/sharedmem_util.c called by:
* {{get_context}} from mod_manager/context.c
* {{get_node}} from mod_manager/node.c
* {{loc_read_node}} from mod_manager/mod_manager.c
* {{find_node_context_host}} from mod_proxy_cluster/mod_proxy_cluster.c
* {{loc_read_context}} mod_manager/context.c
* {{read_context_table}} mod_proxy_cluster/mod_proxy_cluster.c
* {{manager_info}} from mod_manager/mod_manager.c
# Function {{ap_slotmem_do}} from mod_slotmem/sharedmem_util.c called mostly by
* httpd_request and httpd_core functions from httpd sources.
# Function {{find_node_context_host}} from mod_proxy_cluster/mod_proxy_cluster.c called
by
* httpd sources, mostly on request_process and request_connection.
So, IMHO, {{find_node_context_host}} is indeed the trouble :)
Attaching the profiling valgrind logs [^callgrind.zip], created as:
{noformat}
valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes --collect-jumps=yes
--compress-strings=no --compress-pos=no --collect-systime=yes ./httpd -f
/tmp/hudson/httpd/conf/httpd.conf -E /tmp/hudson/httpd/logs/httpd.log
{noformat}
with this debug setting:
{code}
<IfModule worker.c>
ThreadLimit 50
StartServers 1
ServerLimit 1
MinSpareThreads 50
MaxSpareThreads 50
MaxClients 50
ThreadsPerChild 50
MaxRequestsPerChild 0
</IfModule>
{code}
and 4 worker nodes, 65 contexts each, and several dozens of concurrent client sessions.
Number of registered contexts negatively affects mod_cluster
performance
------------------------------------------------------------------------
Key: MODCLUSTER-372
URL:
https://issues.jboss.org/browse/MODCLUSTER-372
Project: mod_cluster
Issue Type: Bug
Affects Versions: 1.2.4.Final, 1.2.6.Final
Environment: RHEL6, other platforms are to be confirmed.
Reporter: Michal Babacek
Assignee: Michal Babacek
Labels: mod_cluster, performace
Fix For: 1.2.7.Final, 1.3.0.Alpha2
Attachments: 4-workers-1-context-balancer-cpu.png,
4-workers-61-context-accessing-1-balancer-cpu.png, 9-workers-1-context-balancer-cpu.png,
9-workers-31-context-accessing-1-balancer-cpu.png,
9-workers-31-context-round-robin-balancer-cpu.png, callgrind.zip, httpd.conf
There is a performance concern regarding CPU usage on the Apache HTTP Server with
mod_cluster set up as a load balancer. Perf tests revealed that the major variable
affecting the CPU usage is the overall number of registered contexts. It's noteworthy
that these contexts actually don't need to be accessed at all, it's enough that
they are present.
The first test [9 workers, 31 contexts, round
robin|https://issues.jboss.org/browse/MODCLUSTER-372#9workers,31contexts,...]
depicts a behavior where all contexts are being accessed in a round robin fashion. Note
the CPU usage. If we compare it to the [9 workers, 1
context|https://issues.jboss.org/browse/MODCLUSTER-372#9workers,1context] test, it becomes
apparent that there is something wrong with the CPU usage. As it is clear from the [9
workers, 31 contexts, access
1|https://issues.jboss.org/browse/MODCLUSTER-372#9workers,31contexts,acce...]
test, accessing only one of these contexts doesn't help much.
The last two tests, [4 workers, 61 contexts, accessing
1|https://issues.jboss.org/browse/MODCLUSTER-372#4workers,61contexts,acce...]
and [4 workers, 1
context|https://issues.jboss.org/browse/MODCLUSTER-372#4workers,1context] confirm the
results, environment differs only in number of nodes and number of contexts.
[^httpd.conf] attached, stay tuned for some profiler outputs...
h3. 9 workers, 31 contexts, round robin
!9-workers-31-context-round-robin-balancer-cpu.png|thumbnail!
||Nodes||Sessions||Ses.Active||Samples||samples/s||mean resp. ms||max resp. ms||conn.
errors||valid samples||%||
|9|150|150|8843|147.4|15|60|0|8843|100%|
|9|250|250|14422|240.3|37|215|0|14422|100%|
|9|350|350|17551|292.5|196|443|0|17551|100%|
|9|450|450|21303|355.0|269|603|0|21303|100%|
|9|550|550|24818|413.6|325|686|0|24818|100%|
|9|650|650|26804|446.7|469|800|0|26804|100%|
|9|750|750|29930|498.8|507|985|0|29930|100%|
|9|850|850|30665|511.0|665|1185|0|30665|100%|
|9|950|950|34421|573.6|647|1316|0|34421|100%|
|9|1050|1050|35067|584.4|800|1487|0|35067|100%|
|9|1150|1150|36024|600.4|925|1612|0|36024|100%|
|9|1250|1250|36635|610.5|1030|1815|0|36635|100%|
|9|1350|1350|38395|639.9|1096|1942|0|38395|100%|
|9|1450|1450|39713|661.8|1170|2466|0|39713|100%|
|9|1550|1550|39455|657.5|1392|2340|0|39455|100%|
|9|1650|1650|39849|664.1|1465|4240|0|39849|100%|
|9|1750|1750|42435|707.2|1454|6511|0|42435|100%|
|9|1850|1850|44714|745.2|1498|4866|0|44714|100%|
|9|1950|1911|46072|767.8|1554|43016|0|46072|100%|
|9|2050|1911|44496|741.5|1617|4953|39|44457|99%|
|9|2150|1911|43715|728.5|1632|5348|125|43590|99%|
|9|2250|1911|41112|685.1|1764|9800|155|40957|99%|
h3. 9 workers, 31 contexts, accessing only one of them
!9-workers-31-context-accessing-1-balancer-cpu.png|thumbnail!
||Nodes||Sessions||Ses.Active||Samples||samples/s||mean resp. ms||max resp. ms||conn.
errors||valid samples||%||
|9|150|150|8954|149.2|4|42|0|8954|100%|
|9|250|250|14897|248.3|7|33|0|14897|100%|
|9|350|350|20784|346.4|8|68|0|20784|100%|
|9|450|450|26748|445.8|8|69|0|26748|100%|
|9|550|550|32553|542.5|11|92|0|32553|100%|
|9|650|650|38558|642.6|10|60|0|38558|100%|
|9|750|750|43571|726.1|29|329|0|43571|100%|
|9|850|850|46133|768.8|99|494|0|46133|100%|
|9|950|950|50854|847.5|120|501|0|50854|100%|
|9|1050|1050|54451|907.4|154|584|0|54451|100%|
|9|1150|1150|59961|999.3|138|674|0|59961|100%|
|9|1250|1250|62567|1,042.6|198|675|0|62567|100%|
|9|1350|1350|61939|1,032.2|301|799|0|61939|100%|
|9|1450|1450|67920|1,131.9|276|844|0|67920|100%|
|9|1550|1550|73151|1,219.1|261|861|0|73151|100%|
|9|1650|1650|73937|1,232.2|332|955|0|73937|100%|
|9|1750|1750|73516|1,225.2|423|1046|0|73516|100%|
|9|1850|1850|72556|1,209.1|515|1264|0|72556|100%|
|9|1950|1911|78613|1,310.1|454|50273|0|78613|100%|
|9|2050|1911|80141|1,335.6|431|1225|39|80102|99%|
|9|2150|1911|76979|1,282.9|490|1338|127|76852|99%|
|9|2250|1911|78048|1,300.7|464|1305|136|77912|99%|
h3. 9 workers, 1 context
!9-workers-1-context-balancer-cpu.png|thumbnail!
||Nodes||Sessions||Ses.Active||Samples||samples/s||mean resp. ms||max resp. ms||conn.
errors||valid samples||%||
|9|150|150|8965|149.4|2|12|0|8965|100%|
|9|250|250|14965|249.4|2|22|0|14965|100%|
|9|350|350|20950|349.1|2|23|0|20950|100%|
|9|450|450|26941|449.0|2|26|0|26941|100%|
|9|550|550|32937|548.9|1|26|0|32937|100%|
|9|650|650|38900|648.3|1|19|0|38900|100%|
|9|750|750|44918|748.6|1|11|0|44918|100%|
|9|850|850|50902|848.3|2|22|0|50902|100%|
|9|950|950|56878|947.9|1|14|0|56878|100%|
|9|1050|1050|62874|1,047.8|2|12|0|62874|100%|
|9|1150|1150|68845|1,147.3|2|99|0|68845|100%|
|9|1250|1250|74851|1,247.4|2|103|0|74851|100%|
|9|1350|1350|80826|1,347.0|2|100|0|80826|100%|
|9|1450|1450|86806|1,446.7|2|19|0|86806|100%|
|9|1550|1550|92817|1,546.8|2|52|0|92817|100%|
|9|1650|1650|98774|1,646.1|2|18|0|98774|100%|
|9|1750|1750|104755|1,745.8|2|18|0|104755|100%|
|9|1850|1850|110734|1,845.4|2|20|0|110734|100%|
|9|1950|1910|113419|1,890.2|9|41855|0|113419|100%|
|9|2050|1911|114437|1,907.1|2|77962|39|114397|99%|
|9|2150|1911|114481|1,907.9|2|15|128|114353|99%|
|9|2250|1911|114545|1,908.9|2|24|144|114401|99%|
h3. 4 workers, 61 contexts, accessing only one of them
!4-workers-61-context-accessing-1-balancer-cpu.png|thumbnail!
||Nodes||Sessions||Ses.Active||Samples||samples/s||mean resp. ms||max resp. ms||conn.
errors||valid samples||%||
|4|500|500|29796|496.6|6|52|0|29796|100%|
|4|650|650|38706|645.0|7|149|0|38706|100%|
|4|800|800|47585|793.0|8|129|0|47585|100%|
|4|950|950|54467|907.7|43|377|0|54467|100%|
|4|1100|1100|62500|1,041.6|54|396|0|62500|100%|
|4|1250|1250|69446|1,157.3|81|512|0|69446|100%|
|4|1400|1400|76217|1,270.2|97|517|0|76217|100%|
|4|1550|1550|80216|1,336.8|152|810|0|80216|100%|
|4|1700|1700|80797|1,346.5|271|864|0|80797|100%|
|4|1850|1850|94172|1,569.3|182|822|0|94172|100%|
|4|2000|1916|91014|1,516.8|253|48650|0|91014|100%|
|4|2150|1916|95852|1,597.4|205|848|83|95769|99%|
h3. 4 workers, 1 context
!4-workers-1-context-balancer-cpu.png|thumbnail!
||Nodes||Sessions||Ses.Active||Samples||samples/s||mean resp. ms||max resp. ms||conn.
errors||valid samples||%||
|4|500|500|29922|498.7|2|39|0|29922|100%|
|4|650|650|38923|648.7|1|39|0|38923|100%|
|4|800|800|47916|798.5|1|17|0|47916|100%|
|4|950|950|56896|948.2|1|17|0|56896|100%|
|4|1100|1100|65889|1,098.1|1|115|0|65889|100%|
|4|1250|1250|74874|1,247.8|1|101|0|74874|100%|
|4|1400|1400|83818|1,396.8|1|17|0|83818|100%|
|4|1550|1550|92830|1,547.0|1|17|0|92830|100%|
|4|1700|1700|101805|1,696.6|1|11|0|101805|100%|
|4|1850|1850|110785|1,846.3|1|11|0|110785|100%|
|4|2000|1916|113747|1,895.6|10|53108|0|113747|100%|
|4|2150|1916|114825|1,913.6|1|24|83|114742|99%|
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: