[JBoss JIRA] (MODCLUSTER-372) Number of registered contexts negatively affects mod_cluster performance

Monday, 18 November 2013

    [
https://issues.jboss.org/browse/MODCLUSTER-372?page=com.atlassian.jira.pl...
] 

Michal Babacek edited comment on MODCLUSTER-372 at 11/18/13 4:57 AM:
---------------------------------------------------------------------

I've identified these bottle necks:
Three the most expensive functions and their callers.They keep this order regardless of
the event metric being Instruction fetch/Data read/Data write.

 # Function {{ap_slotmem_mem}} from mod_slotmem/sharedmem_util.c called by:
  * {{get_context}} from mod_manager/context.c
  * {{get_node}} from mod_manager/node.c
  * {{loc_read_node}} from mod_manager/mod_manager.c
  * {{find_node_context_host}} from mod_proxy_cluster/mod_proxy_cluster.c
  * {{loc_read_context}} mod_manager/context.c
  * {{read_context_table}} mod_proxy_cluster/mod_proxy_cluster.c
  * {{manager_info}} from mod_manager/mod_manager.c

 # Function {{ap_slotmem_do}} from mod_slotmem/sharedmem_util.c called mostly by
  * httpd_request and httpd_core functions from httpd sources.

 # Function {{find_node_context_host}} from mod_proxy_cluster/mod_proxy_cluster.c called
by
  * httpd sources, mostly on request_process and request_connection.

So, IMHO, {{find_node_context_host}} is indeed the trouble :)

Attaching the profiling valgrind logs [^callgrind.zip], created as: 
{noformat}
valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes --collect-jumps=yes
--compress-strings=no --compress-pos=no --collect-systime=yes ./httpd -f
/tmp/hudson/httpd/conf/httpd.conf -E /tmp/hudson/httpd/logs/httpd.log
{noformat}
with this debug setting:
{code}
<IfModule worker.c>
ThreadLimit         50
StartServers        1
ServerLimit         1
MinSpareThreads     50
MaxSpareThreads     50
MaxClients          50
ThreadsPerChild     50
MaxRequestsPerChild 0
</IfModule>
{code}
and 4 worker nodes, 65 contexts each, and several dozens of concurrent client sessions.

      was (Author: mbabacek):
    I've identified these bottle necks:
Three the most expensive functions and their callers:

 # Function {{ap_slotmem_mem}} from mod_slotmem/sharedmem_util.c called by:
  * {{get_context}} from mod_manager/context.c
  * {{get_node}} from mod_manager/node.c
  * {{loc_read_node}} from mod_manager/mod_manager.c
  * {{find_node_context_host}} from mod_proxy_cluster/mod_proxy_cluster.c
  * {{loc_read_context}} mod_manager/context.c
  * {{read_context_table}} mod_proxy_cluster/mod_proxy_cluster.c
  * {{manager_info}} from mod_manager/mod_manager.c

 # Function {{ap_slotmem_do}} from mod_slotmem/sharedmem_util.c called mostly by
  * httpd_request and httpd_core functions from httpd sources.

 # Function {{find_node_context_host}} from mod_proxy_cluster/mod_proxy_cluster.c called
by
  * httpd sources, mostly on request_process and request_connection.

So, IMHO, {{find_node_context_host}} is indeed the trouble :)

Attaching the profiling valgrind logs [^callgrind.zip], created as: 
{noformat}
valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes --collect-jumps=yes
--compress-strings=no --compress-pos=no --collect-systime=yes ./httpd -f
/tmp/hudson/httpd/conf/httpd.conf -E /tmp/hudson/httpd/logs/httpd.log
{noformat}
with this debug setting:
{code}
<IfModule worker.c>
ThreadLimit         50
StartServers        1
ServerLimit         1
MinSpareThreads     50
MaxSpareThreads     50
MaxClients          50
ThreadsPerChild     50
MaxRequestsPerChild 0
</IfModule>
{code}
and 4 worker nodes, 65 contexts each, and several dozens of concurrent client sessions.

...
 Number of registered contexts negatively affects mod_cluster
performance
 ------------------------------------------------------------------------

                 Key: MODCLUSTER-372
                 URL: https://issues.jboss.org/browse/MODCLUSTER-372
             Project: mod_cluster
          Issue Type: Bug
    Affects Versions: 1.2.4.Final, 1.2.6.Final
         Environment: RHEL6, other platforms are to be confirmed.
            Reporter: Michal Babacek
            Assignee: Michal Babacek
              Labels: mod_cluster, performace
             Fix For: 1.2.7.Final, 1.3.0.Alpha2

         Attachments: 4-workers-1-context-balancer-cpu.png,
4-workers-61-context-accessing-1-balancer-cpu.png, 9-workers-1-context-balancer-cpu.png,
9-workers-31-context-accessing-1-balancer-cpu.png,
9-workers-31-context-round-robin-balancer-cpu.png, callgrind.zip, httpd.conf

 There is a performance concern regarding CPU usage on the Apache HTTP Server with
mod_cluster set up as a load balancer. Perf tests revealed that the major variable
affecting the CPU usage is the overall number of registered contexts. It's noteworthy
that these contexts actually don't need to be accessed at all, it's enough that
they are present.
 The first test [9 workers, 31 contexts, round
robin|https://issues.jboss.org/browse/MODCLUSTER-372#9workers,31contexts,...]
depicts a behavior where all contexts are being accessed in a round robin fashion. Note
the CPU usage. If we compare it to the [9 workers, 1
context|https://issues.jboss.org/browse/MODCLUSTER-372#9workers,1context] test, it becomes
apparent that there is something wrong with the CPU usage. As it is clear from the [9
workers, 31 contexts, access
1|https://issues.jboss.org/browse/MODCLUSTER-372#9workers,31contexts,acce...]
test, accessing only one of these contexts doesn't help much.
 The last two tests, [4 workers, 61 contexts, accessing
1|https://issues.jboss.org/browse/MODCLUSTER-372#4workers,61contexts,acce...]
and [4 workers, 1
context|https://issues.jboss.org/browse/MODCLUSTER-372#4workers,1context] confirm the
results, environment differs only in number of nodes and number of contexts.
 [^httpd.conf] attached, stay tuned for some profiler outputs...
 h3. 9 workers, 31 contexts, round robin
 !9-workers-31-context-round-robin-balancer-cpu.png|thumbnail!
 ||Nodes||Sessions||Ses.Active||Samples||samples/s||mean resp. ms||max resp. ms||conn.
errors||valid samples||%||
 |9|150|150|8843|147.4|15|60|0|8843|100%|
 |9|250|250|14422|240.3|37|215|0|14422|100%|
 |9|350|350|17551|292.5|196|443|0|17551|100%|
 |9|450|450|21303|355.0|269|603|0|21303|100%|
 |9|550|550|24818|413.6|325|686|0|24818|100%|
 |9|650|650|26804|446.7|469|800|0|26804|100%|
 |9|750|750|29930|498.8|507|985|0|29930|100%|
 |9|850|850|30665|511.0|665|1185|0|30665|100%|
 |9|950|950|34421|573.6|647|1316|0|34421|100%|
 |9|1050|1050|35067|584.4|800|1487|0|35067|100%|
 |9|1150|1150|36024|600.4|925|1612|0|36024|100%|
 |9|1250|1250|36635|610.5|1030|1815|0|36635|100%|
 |9|1350|1350|38395|639.9|1096|1942|0|38395|100%|
 |9|1450|1450|39713|661.8|1170|2466|0|39713|100%|
 |9|1550|1550|39455|657.5|1392|2340|0|39455|100%|
 |9|1650|1650|39849|664.1|1465|4240|0|39849|100%|
 |9|1750|1750|42435|707.2|1454|6511|0|42435|100%|
 |9|1850|1850|44714|745.2|1498|4866|0|44714|100%|
 |9|1950|1911|46072|767.8|1554|43016|0|46072|100%|
 |9|2050|1911|44496|741.5|1617|4953|39|44457|99%|
 |9|2150|1911|43715|728.5|1632|5348|125|43590|99%|
 |9|2250|1911|41112|685.1|1764|9800|155|40957|99%|
 h3. 9 workers, 31 contexts, accessing only one of them
 !9-workers-31-context-accessing-1-balancer-cpu.png|thumbnail!
 ||Nodes||Sessions||Ses.Active||Samples||samples/s||mean resp. ms||max resp. ms||conn.
errors||valid samples||%||
 |9|150|150|8954|149.2|4|42|0|8954|100%|
 |9|250|250|14897|248.3|7|33|0|14897|100%|
 |9|350|350|20784|346.4|8|68|0|20784|100%|
 |9|450|450|26748|445.8|8|69|0|26748|100%|
 |9|550|550|32553|542.5|11|92|0|32553|100%|
 |9|650|650|38558|642.6|10|60|0|38558|100%|
 |9|750|750|43571|726.1|29|329|0|43571|100%|
 |9|850|850|46133|768.8|99|494|0|46133|100%|
 |9|950|950|50854|847.5|120|501|0|50854|100%|
 |9|1050|1050|54451|907.4|154|584|0|54451|100%|
 |9|1150|1150|59961|999.3|138|674|0|59961|100%|
 |9|1250|1250|62567|1,042.6|198|675|0|62567|100%|
 |9|1350|1350|61939|1,032.2|301|799|0|61939|100%|
 |9|1450|1450|67920|1,131.9|276|844|0|67920|100%|
 |9|1550|1550|73151|1,219.1|261|861|0|73151|100%|
 |9|1650|1650|73937|1,232.2|332|955|0|73937|100%|
 |9|1750|1750|73516|1,225.2|423|1046|0|73516|100%|
 |9|1850|1850|72556|1,209.1|515|1264|0|72556|100%|
 |9|1950|1911|78613|1,310.1|454|50273|0|78613|100%|
 |9|2050|1911|80141|1,335.6|431|1225|39|80102|99%|
 |9|2150|1911|76979|1,282.9|490|1338|127|76852|99%|
 |9|2250|1911|78048|1,300.7|464|1305|136|77912|99%|
 h3. 9 workers, 1 context
 !9-workers-1-context-balancer-cpu.png|thumbnail!
 ||Nodes||Sessions||Ses.Active||Samples||samples/s||mean resp. ms||max resp. ms||conn.
errors||valid samples||%||
 |9|150|150|8965|149.4|2|12|0|8965|100%|
 |9|250|250|14965|249.4|2|22|0|14965|100%|
 |9|350|350|20950|349.1|2|23|0|20950|100%|
 |9|450|450|26941|449.0|2|26|0|26941|100%|
 |9|550|550|32937|548.9|1|26|0|32937|100%|
 |9|650|650|38900|648.3|1|19|0|38900|100%|
 |9|750|750|44918|748.6|1|11|0|44918|100%|
 |9|850|850|50902|848.3|2|22|0|50902|100%|
 |9|950|950|56878|947.9|1|14|0|56878|100%|
 |9|1050|1050|62874|1,047.8|2|12|0|62874|100%|
 |9|1150|1150|68845|1,147.3|2|99|0|68845|100%|
 |9|1250|1250|74851|1,247.4|2|103|0|74851|100%|
 |9|1350|1350|80826|1,347.0|2|100|0|80826|100%|
 |9|1450|1450|86806|1,446.7|2|19|0|86806|100%|
 |9|1550|1550|92817|1,546.8|2|52|0|92817|100%|
 |9|1650|1650|98774|1,646.1|2|18|0|98774|100%|
 |9|1750|1750|104755|1,745.8|2|18|0|104755|100%|
 |9|1850|1850|110734|1,845.4|2|20|0|110734|100%|
 |9|1950|1910|113419|1,890.2|9|41855|0|113419|100%|
 |9|2050|1911|114437|1,907.1|2|77962|39|114397|99%|
 |9|2150|1911|114481|1,907.9|2|15|128|114353|99%|
 |9|2250|1911|114545|1,908.9|2|24|144|114401|99%|
 h3. 4 workers, 61 contexts, accessing only one of them
 !4-workers-61-context-accessing-1-balancer-cpu.png|thumbnail!
 ||Nodes||Sessions||Ses.Active||Samples||samples/s||mean resp. ms||max resp. ms||conn.
errors||valid samples||%||
 |4|500|500|29796|496.6|6|52|0|29796|100%|
 |4|650|650|38706|645.0|7|149|0|38706|100%|
 |4|800|800|47585|793.0|8|129|0|47585|100%|
 |4|950|950|54467|907.7|43|377|0|54467|100%|
 |4|1100|1100|62500|1,041.6|54|396|0|62500|100%|
 |4|1250|1250|69446|1,157.3|81|512|0|69446|100%|
 |4|1400|1400|76217|1,270.2|97|517|0|76217|100%|
 |4|1550|1550|80216|1,336.8|152|810|0|80216|100%|
 |4|1700|1700|80797|1,346.5|271|864|0|80797|100%|
 |4|1850|1850|94172|1,569.3|182|822|0|94172|100%|
 |4|2000|1916|91014|1,516.8|253|48650|0|91014|100%|
 |4|2150|1916|95852|1,597.4|205|848|83|95769|99%|
 h3. 4 workers, 1 context
 !4-workers-1-context-balancer-cpu.png|thumbnail!
 ||Nodes||Sessions||Ses.Active||Samples||samples/s||mean resp. ms||max resp. ms||conn.
errors||valid samples||%||
 |4|500|500|29922|498.7|2|39|0|29922|100%|
 |4|650|650|38923|648.7|1|39|0|38923|100%|
 |4|800|800|47916|798.5|1|17|0|47916|100%|
 |4|950|950|56896|948.2|1|17|0|56896|100%|
 |4|1100|1100|65889|1,098.1|1|115|0|65889|100%|
 |4|1250|1250|74874|1,247.8|1|101|0|74874|100%|
 |4|1400|1400|83818|1,396.8|1|17|0|83818|100%|
 |4|1550|1550|92830|1,547.0|1|17|0|92830|100%|
 |4|1700|1700|101805|1,696.6|1|11|0|101805|100%|
 |4|1850|1850|110785|1,846.3|1|11|0|110785|100%|
 |4|2000|1916|113747|1,895.6|10|53108|0|113747|100%|
 |4|2150|1916|114825|1,913.6|1|24|83|114742|99%| 
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009