[JBoss JIRA] (MODCLUSTER-356) BalancerMember directives cause httpd crash when using mod_cluster 1.2.x
by Michal Babacek (JIRA)
[ https://issues.jboss.org/browse/MODCLUSTER-356?page=com.atlassian.jira.pl... ]
Michal Babacek closed MODCLUSTER-356.
-------------------------------------
I haven't seen it with 1.2.8. Closing.
> BalancerMember directives cause httpd crash when using mod_cluster 1.2.x
> ------------------------------------------------------------------------
>
> Key: MODCLUSTER-356
> URL: https://issues.jboss.org/browse/MODCLUSTER-356
> Project: mod_cluster
> Issue Type: Bug
> Affects Versions: 1.2.4.Final
> Environment: SLES-11-SP1 x86_64 running httpd 2.2.24 or 2.4.6 with any mod_cluster-1.2.x version and JBoss-AS-7.2.0.Final
> Reporter: Marco Danti
> Assignee: Jean-Frederic Clere
> Labels: balancerMember, crash, httpd, mod_cluster
> Fix For: 1.2.8.Final
>
>
> httpd dumps core in mod_proxy_cluster.c if the httpd.conf file contains BalancerMember directives.
> this happens at least with the following two setups:
> 1) httpd-2.4.6 and mod_cluster-1.2.x
> 2) httpd-2.2.24 and mod_cluster-1.2.3.Final
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
10 years, 8 months
[JBoss JIRA] (MODCLUSTER-380) Some serious log messages don't have id to identify them
by Michal Babacek (JIRA)
[ https://issues.jboss.org/browse/MODCLUSTER-380?page=com.atlassian.jira.pl... ]
Michal Babacek closed MODCLUSTER-380.
-------------------------------------
> Some serious log messages don't have id to identify them
> --------------------------------------------------------
>
> Key: MODCLUSTER-380
> URL: https://issues.jboss.org/browse/MODCLUSTER-380
> Project: mod_cluster
> Issue Type: Feature Request
> Affects Versions: 1.2.6.Final
> Reporter: Jean-Frederic Clere
> Assignee: Jean-Frederic Clere
> Fix For: 1.3.0.Final, 1.2.8.Final
>
>
> As a part of logging unification tests for EAP I found:
> modcluster/core/src/main/java/org/jboss/modcluster/load/metric/impl/AverageSystemLoadMetric.java
> - 52 logger.warnf("%s is not supported on this system and will be disabled.", this.getClass().getSimpleName());
> modcluster/core/src/main/java/org/jboss/modcluster/load/metric/impl/SystemMemoryUsageLoadMetric.java
> - 64 this.logger.warn(this.getClass().getSimpleName() + " requires com.sun.management.OperatingSystemMXBean.");
> these should be moved to ModClusterLogger.java as well
> (See BZ: 1008901)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
10 years, 8 months
[JBoss JIRA] (MODCLUSTER-398) mod_cluster deadlock in a jboss/windows environment
by Michal Babacek (JIRA)
[ https://issues.jboss.org/browse/MODCLUSTER-398?page=com.atlassian.jira.pl... ]
Michal Babacek updated MODCLUSTER-398:
--------------------------------------
Bugzilla Update: (was: Perform)
> mod_cluster deadlock in a jboss/windows environment
> ---------------------------------------------------
>
> Key: MODCLUSTER-398
> URL: https://issues.jboss.org/browse/MODCLUSTER-398
> Project: mod_cluster
> Issue Type: Bug
> Affects Versions: 1.2.6.Final
> Environment: Windows 2008, EAP6 and EWS2.0.1
> Reporter: Marc Maurer
> Assignee: Jean-Frederic Clere
>
> Under load Apache stops serving pages, with all threads are stuck in "W : Sending reply" state. With the windows Process Explorer we then got a stacktrace from a hanging thread. We don't have debug symbols, but it's easy enough to see what's happening:
> ntoskrnl.exe!KeWaitForMultipleObjects+0xc0a
> ntoskrnl.exe!KeAcquireSpinLockAtDpcLevel+0x732
> ntoskrnl.exe!KeWaitForMutexObject+0x19f
> ntoskrnl.exe!NtDeleteFile+0x3c4
> ntoskrnl.exe!PsDereferenceKernelStack+0x35358
> ntoskrnl.exe!KeSynchronizeExecution+0x3a23
> ntdll.dll!ZwLockFile+0xa
> KERNELBASE.dll!LockFileEx+0xb2
> kernel32.dll!LockFileEx+0x1b
> libapr-1.dll!apr_file_lock+0x69 <-- here
> mod_slotmem.so+0x1318 <-- here
> mod_manager.so+0x2a11 <-- here
> mod_proxy_cluster.so+0x679e
> mod_proxy.so!proxy_run_post_request+0x4e
> mod_proxy.so!proxy_run_request_status+0x924
> libhttpd.dll!ap_run_handler+0x35
> libhttpd.dll!ap_invoke_handler+0x114
> libhttpd.dll!ap_die+0x2ea
> libhttpd.dll!ap_psignature+0x1ae8
> libhttpd.dll!ap_run_process_connection+0x35
> libhttpd.dll!ap_process_connection+0x3b
> libhttpd.dll!ap_regkey_value_remove+0x136e
> msvcrt.dll!srand+0x93
> msvcrt.dll!ftime64_s+0x1dd
> kernel32.dll!BaseThreadInitThunk+0xd
> ntdll.dll!RtlUserThreadStart+0x21
> So mod_manager is requesting a filelock on one of the lockfiles in in the MemManagerFile path. In this case it was the "manager.sessionid.sessionid.lock" file. Removing the lockfile fixed the problem.
> When bisecting the mod_cluster code, I think commit "74eeb9c026380deb8d833be53b09b3d808e02d10 - Lock in insert-update" in version 1.2.2 is the culprit. This would also explain why mod_cluster 1.2.1 is the last known working version.
> What we don't know, is which process is already holding the lock when all Apache threads start blocking on it. We are trying to figure that out. There are no obviously wrong lock/unlock slotmem call pairs in the mod_manager module, and no locks are requested within other locks as far as we can see. Therefor our best guess would be a deadlock on a thread already holding the globalmutex_lock in combination with the slotmem file locks, but that's just a guess without debugging it.
> More context can be found here: https://bugzilla.redhat.com/show_bug.cgi?id=1080047
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
10 years, 8 months
[JBoss JIRA] (MODCLUSTER-372) Number of registered contexts negatively affects mod_cluster performance
by Michal Babacek (JIRA)
[ https://issues.jboss.org/browse/MODCLUSTER-372?page=com.atlassian.jira.pl... ]
Michal Babacek closed MODCLUSTER-372.
-------------------------------------
As per performance report in [comment|https://issues.jboss.org/browse/MODCLUSTER-372?focusedCommentId=1...], I'm closing this issue. On one hand, mod_cluster's performance is an ongoing effort, on the other hand, this particular problem is fixed.
> Number of registered contexts negatively affects mod_cluster performance
> ------------------------------------------------------------------------
>
> Key: MODCLUSTER-372
> URL: https://issues.jboss.org/browse/MODCLUSTER-372
> Project: mod_cluster
> Issue Type: Bug
> Affects Versions: 1.2.4.Final, 1.2.6.Final
> Environment: RHEL6, other platforms are to be confirmed.
> Reporter: Michal Babacek
> Assignee: Jean-Frederic Clere
> Labels: mod_cluster, performace
> Fix For: 1.3.0.Final, 1.2.8.Final
>
> Attachments: 4-workers-1-context-balancer-cpu.png, 4-workers-61-context-accessing-1-balancer-cpu.png, 9-workers-1-context-balancer-cpu.png, 9-workers-31-context-accessing-1-balancer-cpu.png, 9-workers-31-context-round-robin-balancer-cpu.png, callgrind.zip, httpd.conf, performance-chart-1.2.8-flags.png, performance-chart-BZ1079156.png, performance-test-httpd.conf, performance-test-mod_cluster.conf
>
>
> There is a performance concern regarding CPU usage on the Apache HTTP Server with mod_cluster set up as a load balancer. Perf tests revealed that the major variable affecting the CPU usage is the overall number of registered contexts. It's noteworthy that these contexts actually don't need to be accessed at all, it's enough that they are present.
> The first test [9 workers, 31 contexts, round robin|https://issues.jboss.org/browse/MODCLUSTER-372#9workers,31contexts,...] depicts a behavior where all contexts are being accessed in a round robin fashion. Note the CPU usage. If we compare it to the [9 workers, 1 context|https://issues.jboss.org/browse/MODCLUSTER-372#9workers,1context] test, it becomes apparent that there is something wrong with the CPU usage. As it is clear from the [9 workers, 31 contexts, access 1|https://issues.jboss.org/browse/MODCLUSTER-372#9workers,31contexts,acce...] test, accessing only one of these contexts doesn't help much.
> The last two tests, [4 workers, 61 contexts, accessing 1|https://issues.jboss.org/browse/MODCLUSTER-372#4workers,61contexts,acce...] and [4 workers, 1 context|https://issues.jboss.org/browse/MODCLUSTER-372#4workers,1context] confirm the results, environment differs only in number of nodes and number of contexts.
> [^httpd.conf] attached, stay tuned for some profiler outputs...
> h3. 9 workers, 31 contexts, round robin
> !9-workers-31-context-round-robin-balancer-cpu.png|thumbnail!
> ||Nodes||Sessions||Ses.Active||Samples||samples/s||mean resp. ms||max resp. ms||conn. errors||valid samples||%||
> |9|150|150|8843|147.4|15|60|0|8843|100%|
> |9|250|250|14422|240.3|37|215|0|14422|100%|
> |9|350|350|17551|292.5|196|443|0|17551|100%|
> |9|450|450|21303|355.0|269|603|0|21303|100%|
> |9|550|550|24818|413.6|325|686|0|24818|100%|
> |9|650|650|26804|446.7|469|800|0|26804|100%|
> |9|750|750|29930|498.8|507|985|0|29930|100%|
> |9|850|850|30665|511.0|665|1185|0|30665|100%|
> |9|950|950|34421|573.6|647|1316|0|34421|100%|
> |9|1050|1050|35067|584.4|800|1487|0|35067|100%|
> |9|1150|1150|36024|600.4|925|1612|0|36024|100%|
> |9|1250|1250|36635|610.5|1030|1815|0|36635|100%|
> |9|1350|1350|38395|639.9|1096|1942|0|38395|100%|
> |9|1450|1450|39713|661.8|1170|2466|0|39713|100%|
> |9|1550|1550|39455|657.5|1392|2340|0|39455|100%|
> |9|1650|1650|39849|664.1|1465|4240|0|39849|100%|
> |9|1750|1750|42435|707.2|1454|6511|0|42435|100%|
> |9|1850|1850|44714|745.2|1498|4866|0|44714|100%|
> |9|1950|1911|46072|767.8|1554|43016|0|46072|100%|
> |9|2050|1911|44496|741.5|1617|4953|39|44457|99%|
> |9|2150|1911|43715|728.5|1632|5348|125|43590|99%|
> |9|2250|1911|41112|685.1|1764|9800|155|40957|99%|
> h3. 9 workers, 31 contexts, accessing only one of them
> !9-workers-31-context-accessing-1-balancer-cpu.png|thumbnail!
> ||Nodes||Sessions||Ses.Active||Samples||samples/s||mean resp. ms||max resp. ms||conn. errors||valid samples||%||
> |9|150|150|8954|149.2|4|42|0|8954|100%|
> |9|250|250|14897|248.3|7|33|0|14897|100%|
> |9|350|350|20784|346.4|8|68|0|20784|100%|
> |9|450|450|26748|445.8|8|69|0|26748|100%|
> |9|550|550|32553|542.5|11|92|0|32553|100%|
> |9|650|650|38558|642.6|10|60|0|38558|100%|
> |9|750|750|43571|726.1|29|329|0|43571|100%|
> |9|850|850|46133|768.8|99|494|0|46133|100%|
> |9|950|950|50854|847.5|120|501|0|50854|100%|
> |9|1050|1050|54451|907.4|154|584|0|54451|100%|
> |9|1150|1150|59961|999.3|138|674|0|59961|100%|
> |9|1250|1250|62567|1,042.6|198|675|0|62567|100%|
> |9|1350|1350|61939|1,032.2|301|799|0|61939|100%|
> |9|1450|1450|67920|1,131.9|276|844|0|67920|100%|
> |9|1550|1550|73151|1,219.1|261|861|0|73151|100%|
> |9|1650|1650|73937|1,232.2|332|955|0|73937|100%|
> |9|1750|1750|73516|1,225.2|423|1046|0|73516|100%|
> |9|1850|1850|72556|1,209.1|515|1264|0|72556|100%|
> |9|1950|1911|78613|1,310.1|454|50273|0|78613|100%|
> |9|2050|1911|80141|1,335.6|431|1225|39|80102|99%|
> |9|2150|1911|76979|1,282.9|490|1338|127|76852|99%|
> |9|2250|1911|78048|1,300.7|464|1305|136|77912|99%|
> h3. 9 workers, 1 context
> !9-workers-1-context-balancer-cpu.png|thumbnail!
> ||Nodes||Sessions||Ses.Active||Samples||samples/s||mean resp. ms||max resp. ms||conn. errors||valid samples||%||
> |9|150|150|8965|149.4|2|12|0|8965|100%|
> |9|250|250|14965|249.4|2|22|0|14965|100%|
> |9|350|350|20950|349.1|2|23|0|20950|100%|
> |9|450|450|26941|449.0|2|26|0|26941|100%|
> |9|550|550|32937|548.9|1|26|0|32937|100%|
> |9|650|650|38900|648.3|1|19|0|38900|100%|
> |9|750|750|44918|748.6|1|11|0|44918|100%|
> |9|850|850|50902|848.3|2|22|0|50902|100%|
> |9|950|950|56878|947.9|1|14|0|56878|100%|
> |9|1050|1050|62874|1,047.8|2|12|0|62874|100%|
> |9|1150|1150|68845|1,147.3|2|99|0|68845|100%|
> |9|1250|1250|74851|1,247.4|2|103|0|74851|100%|
> |9|1350|1350|80826|1,347.0|2|100|0|80826|100%|
> |9|1450|1450|86806|1,446.7|2|19|0|86806|100%|
> |9|1550|1550|92817|1,546.8|2|52|0|92817|100%|
> |9|1650|1650|98774|1,646.1|2|18|0|98774|100%|
> |9|1750|1750|104755|1,745.8|2|18|0|104755|100%|
> |9|1850|1850|110734|1,845.4|2|20|0|110734|100%|
> |9|1950|1910|113419|1,890.2|9|41855|0|113419|100%|
> |9|2050|1911|114437|1,907.1|2|77962|39|114397|99%|
> |9|2150|1911|114481|1,907.9|2|15|128|114353|99%|
> |9|2250|1911|114545|1,908.9|2|24|144|114401|99%|
> h3. 4 workers, 61 contexts, accessing only one of them
> !4-workers-61-context-accessing-1-balancer-cpu.png|thumbnail!
> ||Nodes||Sessions||Ses.Active||Samples||samples/s||mean resp. ms||max resp. ms||conn. errors||valid samples||%||
> |4|500|500|29796|496.6|6|52|0|29796|100%|
> |4|650|650|38706|645.0|7|149|0|38706|100%|
> |4|800|800|47585|793.0|8|129|0|47585|100%|
> |4|950|950|54467|907.7|43|377|0|54467|100%|
> |4|1100|1100|62500|1,041.6|54|396|0|62500|100%|
> |4|1250|1250|69446|1,157.3|81|512|0|69446|100%|
> |4|1400|1400|76217|1,270.2|97|517|0|76217|100%|
> |4|1550|1550|80216|1,336.8|152|810|0|80216|100%|
> |4|1700|1700|80797|1,346.5|271|864|0|80797|100%|
> |4|1850|1850|94172|1,569.3|182|822|0|94172|100%|
> |4|2000|1916|91014|1,516.8|253|48650|0|91014|100%|
> |4|2150|1916|95852|1,597.4|205|848|83|95769|99%|
> h3. 4 workers, 1 context
> !4-workers-1-context-balancer-cpu.png|thumbnail!
> ||Nodes||Sessions||Ses.Active||Samples||samples/s||mean resp. ms||max resp. ms||conn. errors||valid samples||%||
> |4|500|500|29922|498.7|2|39|0|29922|100%|
> |4|650|650|38923|648.7|1|39|0|38923|100%|
> |4|800|800|47916|798.5|1|17|0|47916|100%|
> |4|950|950|56896|948.2|1|17|0|56896|100%|
> |4|1100|1100|65889|1,098.1|1|115|0|65889|100%|
> |4|1250|1250|74874|1,247.8|1|101|0|74874|100%|
> |4|1400|1400|83818|1,396.8|1|17|0|83818|100%|
> |4|1550|1550|92830|1,547.0|1|17|0|92830|100%|
> |4|1700|1700|101805|1,696.6|1|11|0|101805|100%|
> |4|1850|1850|110785|1,846.3|1|11|0|110785|100%|
> |4|2000|1916|113747|1,895.6|10|53108|0|113747|100%|
> |4|2150|1916|114825|1,913.6|1|24|83|114742|99%|
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
10 years, 8 months