[JBoss JIRA] (MODCLUSTER-526) SIGSEGV in remove_workers_node (mod_proxy_cluster.so) when using LoadBalancingGroup

Wednesday, 9 January 2019



     [
https://issues.jboss.org/browse/MODCLUSTER-526?page=com.atlassian.jira.pl...
]

Work on MODCLUSTER-526 stopped by Michal Karm Babacek.
------------------------------------------------------
...
 SIGSEGV in remove_workers_node (mod_proxy_cluster.so) when using
LoadBalancingGroup
 -----------------------------------------------------------------------------------

                 Key: MODCLUSTER-526
                 URL: https://issues.jboss.org/browse/MODCLUSTER-526
             Project: mod_cluster
          Issue Type: Bug
          Components: Native (httpd modules)
    Affects Versions: 1.3.3.Final
         Environment: Fedora 20, x86_64, httpd 2.4.20 mpm_event
            Reporter: Michal Karm Babacek
            Assignee: Michal Karm Babacek
            Priority: Blocker

 h3. Setup
 * 3 tomcats
 * 2 load balancing groups
 * 1 request every 3 seconds (no load at all)
 * shutdown and kill of various nodes
 * no later than third kill/start iteration causes SIGSEGV
 h3. SIGSEGV
 {code}
     #if AP_MODULE_MAGIC_AT_LEAST(20101223,1)
             /* Here that is tricky the worker needs shared memory but we don't and
CONFIG will reset it */
             helper->index = 0; /* mark it removed */
             worker->s = helper->shared;
 crash--->   memcpy(worker->s, stat, sizeof(proxy_worker_shared));
     #else
             worker->id = 0; /* mark it removed */
     #endif
 {code}
 h3. Behavior
 {code}
  957 helper = (proxy_cluster_helper *) worker->context;
  961 if (helper) {
  962     i = helper->count_active;
  963 }
  968 if (i == 0) {
  971    proxy_worker_shared *stat = worker->s;
  972    proxy_cluster_helper *helper = (proxy_cluster_helper *) worker->context;
 {code}
 At this point, {{helper->shared}} points to a {{proxy_worker_shared}} structure that
appears to be properly filled.
 {code}
  999    if (worker->cp->pool) {
 1000        apr_pool_destroy(worker->cp->pool);
 1001        worker->cp->pool = NULL;
 1002    }
 {code}
 Regardless of the aforementioned block being there or nor (stuffed after 1010),
 {{helper->shared}} suddenly points to {{NULL}}.
 {code}
 1008    helper->index = 0;
 1009    worker->s = helper->shared;
 {code}
 Above assignment makes {{worker->s}} pointing to NULL.
 {code}
 1010    memcpy(worker->s, stat, sizeof(proxy_worker_shared));
 {code}
 And here we go :(
 IMHO, _other thread_ already cleared that memory and nulled the pointer, because it
absolutely doesn't happen if
 I run 1 process and 1 thread.
 The [workaround that prevents the
core|https://github.com/modcluster/mod_cluster/pull/207] looks like this:
 {code}
 if (helper->shared) {
     worker->s = helper->shared;
     memcpy(worker->s, stat, sizeof(proxy_worker_shared));
 }
 {code}
 h3. How do we fix it?
 Any ideas? [~jfclere] 


--
This message was sent by Atlassian Jira
(v7.12.1#712002)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009