[mod_cluster-issues] [JBoss JIRA] (MODCLUSTER-526) SIGSEGV in remove_workers_node (mod_proxy_cluster.so) when using LoadBalancingGroup

Wed Jul 20 17:12:00 EDT 2016

     [ https://issues.jboss.org/browse/MODCLUSTER-526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michal Karm Babacek updated MODCLUSTER-526:
-------------------------------------------
    Description: 
h3. Setup
* 3 tomcats
* 2 load balancing groups
* 1 request every 3 seconds (no load at all)
* shutdown and kill of various nodes
* no later than third kill/start iteration causes SIGSEGV

h3. SIGSEGV
{code}
    #if AP_MODULE_MAGIC_AT_LEAST(20101223,1)
            /* Here that is tricky the worker needs shared memory but we don't and CONFIG will reset it */
            helper->index = 0; /* mark it removed */
            worker->s = helper->shared;
crash--->   memcpy(worker->s, stat, sizeof(proxy_worker_shared));
    #else
            worker->id = 0; /* mark it removed */
    #endif
{code}

h3. Behavior
{code}
 957 helper = (proxy_cluster_helper *) worker->context;
 961 if (helper) {
 962     i = helper->count_active;
 963 }

 968 if (i == 0) {
 971    proxy_worker_shared *stat = worker->s;
 972    proxy_cluster_helper *helper = (proxy_cluster_helper *) worker->context;
{code}
At this point, {{helper->shared}} points to a {{proxy_worker_shared}} structure that appears to be properly filled.
{code}
 999    if (worker->cp->pool) {
1000        apr_pool_destroy(worker->cp->pool);
1001        worker->cp->pool = NULL;
1002    }
{code}
Regardless of the aforementioned block being there or nor (stuffed after 1010),
{{helper->shared}} suddenly points to {{NULL}}.
{code}
1008    helper->index = 0;
1009    worker->s = helper->shared;
{code}
Above assignment makes {{worker->s}} pointing to NULL.
{code}
1010    memcpy(worker->s, stat, sizeof(proxy_worker_shared));
{code}
And here we go :(

IMHO, _other thread_ already cleared that memory and nulled the pointer, because it absolutely doesn't happen if
I run 1 process and 1 thread.

The [workaround that prevents the core|https://github.com/modcluster/mod_cluster/pull/207] looks like this:
{code}
if (helper->shared) {
    worker->s = helper->shared;
    memcpy(worker->s, stat, sizeof(proxy_worker_shared));
}
{code}

  was:
* 3 tomcats
* 2 load balancing groups
* 1 request every 3 seconds (no load at all)
* shutdown and kill of various nodes
* no later than third kill/start iteration causes SIGSEGV
{code}
    #if AP_MODULE_MAGIC_AT_LEAST(20101223,1)
            /* Here that is tricky the worker needs shared memory but we don't and CONFIG will reset it */
            helper->index = 0; /* mark it removed */
            worker->s = helper->shared;
crash--->   memcpy(worker->s, stat, sizeof(proxy_worker_shared));
    #else
            worker->id = 0; /* mark it removed */
    #endif
{code}

> SIGSEGV in remove_workers_node (mod_proxy_cluster.so) when using LoadBalancingGroup
> -----------------------------------------------------------------------------------
>
>                 Key: MODCLUSTER-526
>                 URL: https://issues.jboss.org/browse/MODCLUSTER-526
>             Project: mod_cluster
>          Issue Type: Bug
>          Components: Native (httpd modules)
>    Affects Versions: 1.3.3.Final
>         Environment: Fedora 20, x86_64, httpd 2.4.20
>            Reporter: Michal Karm Babacek
>            Assignee: Michal Karm Babacek
>            Priority: Blocker
>             Fix For: 1.3.4.Final
>
>
> h3. Setup
> * 3 tomcats
> * 2 load balancing groups
> * 1 request every 3 seconds (no load at all)
> * shutdown and kill of various nodes
> * no later than third kill/start iteration causes SIGSEGV
> h3. SIGSEGV
> {code}
>     #if AP_MODULE_MAGIC_AT_LEAST(20101223,1)
>             /* Here that is tricky the worker needs shared memory but we don't and CONFIG will reset it */
>             helper->index = 0; /* mark it removed */
>             worker->s = helper->shared;
> crash--->   memcpy(worker->s, stat, sizeof(proxy_worker_shared));
>     #else
>             worker->id = 0; /* mark it removed */
>     #endif
> {code}
> h3. Behavior
> {code}
>  957 helper = (proxy_cluster_helper *) worker->context;
>  961 if (helper) {
>  962     i = helper->count_active;
>  963 }
>  968 if (i == 0) {
>  971    proxy_worker_shared *stat = worker->s;
>  972    proxy_cluster_helper *helper = (proxy_cluster_helper *) worker->context;
> {code}
> At this point, {{helper->shared}} points to a {{proxy_worker_shared}} structure that appears to be properly filled.
> {code}
>  999    if (worker->cp->pool) {
> 1000        apr_pool_destroy(worker->cp->pool);
> 1001        worker->cp->pool = NULL;
> 1002    }
> {code}
> Regardless of the aforementioned block being there or nor (stuffed after 1010),
> {{helper->shared}} suddenly points to {{NULL}}.
> {code}
> 1008    helper->index = 0;
> 1009    worker->s = helper->shared;
> {code}
> Above assignment makes {{worker->s}} pointing to NULL.
> {code}
> 1010    memcpy(worker->s, stat, sizeof(proxy_worker_shared));
> {code}
> And here we go :(
> IMHO, _other thread_ already cleared that memory and nulled the pointer, because it absolutely doesn't happen if
> I run 1 process and 1 thread.
> The [workaround that prevents the core|https://github.com/modcluster/mod_cluster/pull/207] looks like this:
> {code}
> if (helper->shared) {
>     worker->s = helper->shared;
>     memcpy(worker->s, stat, sizeof(proxy_worker_shared));
> }
> {code}

--
This message was sent by Atlassian JIRA
(v6.4.11#64026)