[mod_cluster-issues] [JBoss JIRA] (MODCLUSTER-543) BalancerMember directives don't work and casue SegFaults

Wed Oct 26 05:23:00 EDT 2016

     [ https://issues.jboss.org/browse/MODCLUSTER-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Frederic Clere updated MODCLUSTER-543:
-------------------------------------------
    Fix Version/s: 1.3.6.Final


> BalancerMember directives don't work and casue SegFaults 
> ---------------------------------------------------------
>
>                 Key: MODCLUSTER-543
>                 URL: https://issues.jboss.org/browse/MODCLUSTER-543
>             Project: mod_cluster
>          Issue Type: Bug
>          Components: Native (httpd modules)
>    Affects Versions: 1.3.3.Final, 1.2.13.Final
>         Environment: RHEL (others definitely too)
>            Reporter: Michal Karm Babacek
>            Assignee: Jean-Frederic Clere
>              Labels: balancerMember, proxy
>             Fix For: 1.3.6.Final
>
>         Attachments: clusterbench.war, mod_cluster.conf, proxy_test.conf, tses.war
>
>
> There has been an ongoing discussion about interoperability between BalancerMember and ProxyPass directives and mod_cluster. This is a follow up on MODCLUSTER-391 and especially MODCLUSTER-356.
> h3. TL;DR
> * BalancerMember directives don't work as expected (at all)
> * it is possible to use it to cause SegFault in httpd
> * If these directives are *supposed to work*, then I have a wrong configuration or it is a bug to be fixed
> * If they are *not supposed to work* in conjunction with mod_cluster, then I should stop trying to test these and remove all ever-failing scenarios from the test suite
> h3. Configuration and goal
> * two web apps, [^clusterbench.war] and [^tses.war], both deployed on each of two tomcats
> * one web app is in excluded contexts (it is  [^tses.war])
> * the other one ([^clusterbench]) is registered with mod_cluster balancer
> * main server: {{\*:2080}}
> * mod_cluster VirtualHost: {{\*:8747}}
> * proxyPass BalancerMember VirtualHost {{\*:2081}}
> * I want to access [^clusterbench.war] via {{\*:8747}} and {{\*:2080}} (works (/)), and [^tses.war] via {{\*:2081}} (fails (x))
> * see [^proxy_test.conf] for BalancerMember configuration (taken from httpd 2.2.26 test run, you must edit Location access)
> * see [^mod_cluster.conf] for mod_cluster configuration (taken from httpd 2.2.26 test run, as above)
> h3. Test
> * (/) check, that only [^clusterbench.war] is registered and everything is cool: [mod_cluster-manager console|https://gist.github.com/Karm/26015dabf446360b0e019da6c907bed5]
> * (/) [^clusterbench.war] on mod_cluster VirtualHost works: {{curl http://192.168.122.172:8747/clusterbench/requestinfo}}
> * (/) [^clusterbench.war] on main server also works: {{curl http://192.168.122.172:2080/clusterbench/requestinfo}} (it works due to MODCLUSTER-430)
> * httpd 2.2.26 / mod_cluster 1.2.13.Final:
> ** (x) [^tses.war] on BalancerMember ProxyPass VirtualHost fails: {{curl http://192.168.122.172:2081/tses}} with: {noformat}mod_proxy_cluster.c(2374): proxy: byrequests balancer FAILED
> proxy: CLUSTER: (balancer://xqacluster). All workers are in error state
> {noformat} and it doesn't matter whether I configure the same balancer (qacluster) for both mod_cluster and additional BalancerMemebr directives or if I have two balancers (this case).
> ** (x) [^clusterbench.war] on BalancerMember ProxyPass VirtualHost sometimes works and sometimes causes SegFault {{curl http://192.168.122.172:2081/clusterbench/requestinfo}} (see below)
> * httpd 2.4.23 / mod_cluster 1.3.3.Final:
> ** (x) [^tses.war] on BalancerMember ProxyPass VirtualHost fails with {{curl http://192.168.122.172:2081/tses}} SegFault, *always* (see below)
> ** (/) [^clusterbench.war] on BalancerMember ProxyPass VirtualHost works {{curl http://192.168.122.172:2081/clusterbench/requestinfo}}
> h3. Intermittent and stable SegFaults
> h4. httpd 2.2.26 / mod_cluster 1.2.13.Final (EWS 2.1.1)
> With the aforementioned setup, it is possible to cause SegFault roughly in 50% of requests to {{curl http://192.168.122.172:2081/clusterbench/requestinfo}} on httpd 2.2.26 mod_cluster 1.2.13.Final, the rest passes fine and the web app is served.
> *Offending line:* [mod_proxy_cluster.c:3843|https://github.com/modcluster/mod_cluster/blob/1.2.13.Final/native/mod_proxy_cluster/mod_proxy_cluster.c#L3843]
> *Trace:*
> {noformat}
> #0  proxy_cluster_pre_request (worker=<optimized out>, balancer=<optimized out>, r=0x5555558be3e0, conf=0x5555558767d8, url=0x7fffffffdd40) at mod_proxy_cluster.c:3843
> #1  0x00007ffff0cfe3d6 in proxy_run_pre_request (worker=worker at entry=0x7fffffffdd38, balancer=balancer at entry=0x7fffffffdd30, r=r at entry=0x5555558be3e0, 
>     conf=conf at entry=0x5555558767d8, url=url at entry=0x7fffffffdd40) at /builddir/build/BUILD/httpd-EWS_2.1.1.CR1/modules/proxy/mod_proxy.c:2428
> #2  0x00007ffff0d01ef2 in ap_proxy_pre_request (worker=worker at entry=0x7fffffffdd38, balancer=balancer at entry=0x7fffffffdd30, r=r at entry=0x5555558be3e0, 
>     conf=conf at entry=0x5555558767d8, url=url at entry=0x7fffffffdd40) at /builddir/build/BUILD/httpd-EWS_2.1.1.CR1/modules/proxy/proxy_util.c:1512
> #3  0x00007ffff0cfeabb in proxy_handler (r=0x5555558be3e0) at /builddir/build/BUILD/httpd-EWS_2.1.1.CR1/modules/proxy/mod_proxy.c:952
> #4  0x00005555555805e0 in ap_run_handler (r=0x5555558be3e0) at /builddir/build/BUILD/httpd-EWS_2.1.1.CR1/server/config.c:157
> #5  0x00005555555809a9 in ap_invoke_handler (r=r at entry=0x5555558be3e0) at /builddir/build/BUILD/httpd-EWS_2.1.1.CR1/server/config.c:376
> #6  0x000055555558dc58 in ap_process_request (r=r at entry=0x5555558be3e0) at /builddir/build/BUILD/httpd-EWS_2.1.1.CR1/modules/http/http_request.c:282
> #7  0x000055555558aff8 in ap_process_http_connection (c=0x5555558ae2f0) at /builddir/build/BUILD/httpd-EWS_2.1.1.CR1/modules/http/http_core.c:190
> #8  0x0000555555587010 in ap_run_process_connection (c=0x5555558ae2f0) at /builddir/build/BUILD/httpd-EWS_2.1.1.CR1/server/connection.c:43
> #9  0x00005555555873b0 in ap_process_connection (c=c at entry=0x5555558ae2f0, csd=<optimized out>) at /builddir/build/BUILD/httpd-EWS_2.1.1.CR1/server/connection.c:190
> #10 0x0000555555592b5b in child_main (child_num_arg=child_num_arg at entry=0) at /builddir/build/BUILD/httpd-EWS_2.1.1.CR1/server/mpm/prefork/prefork.c:667
> #11 0x0000555555592fae in make_child (s=0x5555557bf880, slot=0) at /builddir/build/BUILD/httpd-EWS_2.1.1.CR1/server/mpm/prefork/prefork.c:712
> #12 0x0000555555593b6e in ap_mpm_run (_pconf=_pconf at entry=0x5555557ba158, plog=<optimized out>, s=s at entry=0x5555557bf880)
>     at /builddir/build/BUILD/httpd-EWS_2.1.1.CR1/server/mpm/prefork/prefork.c:988
> #13 0x000055555556b50e in main (argc=8, argv=0x7fffffffe268) at /builddir/build/BUILD/httpd-EWS_2.1.1.CR1/server/main.c:753
> {noformat}
> h4. httpd 2.4.23 / mod_cluster 1.3.3.Final (JBCS 2.4.23)
> With the aforementioned setup, it is *always* possible to SegFault httpd by accessing [^tses.war] on BalancerMember ProxyPass VirtualHos: {{curl http://192.168.122.172:2081/tses}}.
> *Offending line:* [mod_proxy_cluster.c:2230|https://github.com/modcluster/mod_cluster/blob/1.3.3.Final/native/mod_proxy_cluster/mod_proxy_cluster.c#L2230]
> *Trace:*
> {noformat}
> #0  0x00007fffe61a598f in internal_find_best_byrequests (balancer=0x55555593ad38, conf=0x555555918dd8, r=0x5555559a6630, domain=0x0, failoverdomain=0, 
>     vhost_table=0x5555559a5c98, context_table=0x5555559a5e00, node_table=0x5555559a6088) at mod_proxy_cluster.c:2230
> #1  0x00007fffe61a90c8 in find_best_worker (balancer=0x55555593ad38, conf=0x555555918dd8, r=0x5555559a6630, domain=0x0, failoverdomain=0, vhost_table=0x5555559a5c98, 
>     context_table=0x5555559a5e00, node_table=0x5555559a6088, recurse=1) at mod_proxy_cluster.c:3457
> #2  0x00007fffe61a9f4d in proxy_cluster_pre_request (worker=0x7fffffffdb68, balancer=0x7fffffffdb60, r=0x5555559a6630, conf=0x555555918dd8, url=0x7fffffffdb70)
>     at mod_proxy_cluster.c:3825
> #3  0x00007fffec2fd9a6 in proxy_run_pre_request (worker=worker at entry=0x7fffffffdb68, balancer=balancer at entry=0x7fffffffdb60, r=r at entry=0x5555559a6630, 
>     conf=conf at entry=0x555555918dd8, url=url at entry=0x7fffffffdb70) at mod_proxy.c:2853
> #4  0x00007fffec302652 in ap_proxy_pre_request (worker=worker at entry=0x7fffffffdb68, balancer=balancer at entry=0x7fffffffdb60, r=r at entry=0x5555559a6630, 
>     conf=conf at entry=0x555555918dd8, url=url at entry=0x7fffffffdb70) at proxy_util.c:1956
> #5  0x00007fffec2fe1dc in proxy_handler (r=0x5555559a6630) at mod_proxy.c:1108
> #6  0x00005555555aeff0 in ap_run_handler (r=r at entry=0x5555559a6630) at config.c:170
> #7  0x00005555555af539 in ap_invoke_handler (r=r at entry=0x5555559a6630) at config.c:434
> #8  0x00005555555c5b2a in ap_process_async_request (r=0x5555559a6630) at http_request.c:410
> #9  0x00005555555c5e04 in ap_process_request (r=r at entry=0x5555559a6630) at http_request.c:445
> #10 0x00005555555c1ded in ap_process_http_sync_connection (c=0x555555950050) at http_core.c:210
> #11 ap_process_http_connection (c=0x555555950050) at http_core.c:251
> #12 0x00005555555b9470 in ap_run_process_connection (c=c at entry=0x555555950050) at connection.c:42
> #13 0x00005555555b99c8 in ap_process_connection (c=c at entry=0x555555950050, csd=<optimized out>) at connection.c:226
> #14 0x00007fffec513a30 in child_main (child_num_arg=child_num_arg at entry=0, child_bucket=child_bucket at entry=0) at prefork.c:723
> #15 0x00007fffec513c70 in make_child (s=0x55555582d400, slot=slot at entry=0, bucket=bucket at entry=0) at prefork.c:767
> #16 0x00007fffec51521d in prefork_run (_pconf=<optimized out>, plog=0x5555558313a8, s=0x55555582d400) at prefork.c:979
> #17 0x0000555555592aae in ap_run_mpm (pconf=pconf at entry=0x555555804188, plog=0x5555558313a8, s=0x55555582d400) at mpm_common.c:94
> #18 0x000055555558bb18 in main (argc=8, argv=0x7fffffffe1a8) at main.c:783
> {noformat}
> h3. About the test
> This test has always been failing in one way or another: not serving URL (HTTP 404), returning All workers in Error state (HTTP 503). SegFault has been slipping under the radar for some time, because the test ended up on assert earlier in the scenario - on the first HTTP 503.
> We should clearly document which BalancerMember integration is supported and which is not. Furthermore, we must not SegFault even if user tries to do something weird, we must log an error message instead.


--
This message was sent by Atlassian JIRA
(v7.2.2#72004)