[JBoss JIRA] (MODCLUSTER-526) SIGSEGV in remove_workers_node (mod_proxy_cluster.so) when using LoadBalancingGroup
by Michal Karm Babacek (Jira)
[ https://issues.jboss.org/browse/MODCLUSTER-526?page=com.atlassian.jira.pl... ]
Michal Karm Babacek updated MODCLUSTER-526:
-------------------------------------------
Fix Version/s: Awaiting Volunteers
> SIGSEGV in remove_workers_node (mod_proxy_cluster.so) when using LoadBalancingGroup
> -----------------------------------------------------------------------------------
>
> Key: MODCLUSTER-526
> URL: https://issues.jboss.org/browse/MODCLUSTER-526
> Project: mod_cluster
> Issue Type: Bug
> Components: Native (httpd modules)
> Affects Versions: 1.3.3.Final
> Environment: Fedora 20, x86_64, httpd 2.4.20 mpm_event
> Reporter: Michal Karm Babacek
> Assignee: Michal Karm Babacek
> Priority: Blocker
> Fix For: Awaiting Volunteers
>
>
> h3. Setup
> * 3 tomcats
> * 2 load balancing groups
> * 1 request every 3 seconds (no load at all)
> * shutdown and kill of various nodes
> * no later than third kill/start iteration causes SIGSEGV
> h3. SIGSEGV
> {code}
> #if AP_MODULE_MAGIC_AT_LEAST(20101223,1)
> /* Here that is tricky the worker needs shared memory but we don't and CONFIG will reset it */
> helper->index = 0; /* mark it removed */
> worker->s = helper->shared;
> crash---> memcpy(worker->s, stat, sizeof(proxy_worker_shared));
> #else
> worker->id = 0; /* mark it removed */
> #endif
> {code}
> h3. Behavior
> {code}
> 957 helper = (proxy_cluster_helper *) worker->context;
> 961 if (helper) {
> 962 i = helper->count_active;
> 963 }
> 968 if (i == 0) {
> 971 proxy_worker_shared *stat = worker->s;
> 972 proxy_cluster_helper *helper = (proxy_cluster_helper *) worker->context;
> {code}
> At this point, {{helper->shared}} points to a {{proxy_worker_shared}} structure that appears to be properly filled.
> {code}
> 999 if (worker->cp->pool) {
> 1000 apr_pool_destroy(worker->cp->pool);
> 1001 worker->cp->pool = NULL;
> 1002 }
> {code}
> Regardless of the aforementioned block being there or nor (stuffed after 1010),
> {{helper->shared}} suddenly points to {{NULL}}.
> {code}
> 1008 helper->index = 0;
> 1009 worker->s = helper->shared;
> {code}
> Above assignment makes {{worker->s}} pointing to NULL.
> {code}
> 1010 memcpy(worker->s, stat, sizeof(proxy_worker_shared));
> {code}
> And here we go :(
> IMHO, _other thread_ already cleared that memory and nulled the pointer, because it absolutely doesn't happen if
> I run 1 process and 1 thread.
> The [workaround that prevents the core|https://github.com/modcluster/mod_cluster/pull/207] looks like this:
> {code}
> if (helper->shared) {
> worker->s = helper->shared;
> memcpy(worker->s, stat, sizeof(proxy_worker_shared));
> }
> {code}
> h3. How do we fix it?
> Any ideas? [~jfclere]
--
This message was sent by Atlassian Jira
(v7.12.1#712002)