]
Work on MODCLUSTER-526 stopped by Michal Karm Babacek.
------------------------------------------------------
SIGSEGV in remove_workers_node (mod_proxy_cluster.so) when using
LoadBalancingGroup
-----------------------------------------------------------------------------------
Key: MODCLUSTER-526
URL:
https://issues.jboss.org/browse/MODCLUSTER-526
Project: mod_cluster
Issue Type: Bug
Components: Native (httpd modules)
Affects Versions: 1.3.3.Final
Environment: Fedora 20, x86_64, httpd 2.4.20 mpm_event
Reporter: Michal Karm Babacek
Assignee: Michal Karm Babacek
Priority: Blocker
h3. Setup
* 3 tomcats
* 2 load balancing groups
* 1 request every 3 seconds (no load at all)
* shutdown and kill of various nodes
* no later than third kill/start iteration causes SIGSEGV
h3. SIGSEGV
{code}
#if AP_MODULE_MAGIC_AT_LEAST(20101223,1)
/* Here that is tricky the worker needs shared memory but we don't and
CONFIG will reset it */
helper->index = 0; /* mark it removed */
worker->s = helper->shared;
crash---> memcpy(worker->s, stat, sizeof(proxy_worker_shared));
#else
worker->id = 0; /* mark it removed */
#endif
{code}
h3. Behavior
{code}
957 helper = (proxy_cluster_helper *) worker->context;
961 if (helper) {
962 i = helper->count_active;
963 }
968 if (i == 0) {
971 proxy_worker_shared *stat = worker->s;
972 proxy_cluster_helper *helper = (proxy_cluster_helper *) worker->context;
{code}
At this point, {{helper->shared}} points to a {{proxy_worker_shared}} structure that
appears to be properly filled.
{code}
999 if (worker->cp->pool) {
1000 apr_pool_destroy(worker->cp->pool);
1001 worker->cp->pool = NULL;
1002 }
{code}
Regardless of the aforementioned block being there or nor (stuffed after 1010),
{{helper->shared}} suddenly points to {{NULL}}.
{code}
1008 helper->index = 0;
1009 worker->s = helper->shared;
{code}
Above assignment makes {{worker->s}} pointing to NULL.
{code}
1010 memcpy(worker->s, stat, sizeof(proxy_worker_shared));
{code}
And here we go :(
IMHO, _other thread_ already cleared that memory and nulled the pointer, because it
absolutely doesn't happen if
I run 1 process and 1 thread.
The [workaround that prevents the
core|https://github.com/modcluster/mod_cluster/pull/207] looks like this:
{code}
if (helper->shared) {
worker->s = helper->shared;
memcpy(worker->s, stat, sizeof(proxy_worker_shared));
}
{code}
h3. How do we fix it?
Any ideas? [~jfclere]