[JBoss JIRA] (MODCLUSTER-398) mod_cluster deadlock in a jboss/windows environment

Monday, 28 April 2014

    [
https://issues.jboss.org/browse/MODCLUSTER-398?page=com.atlassian.jira.pl...
] 

Jean-Frederic Clere commented on MODCLUSTER-398:
------------------------------------------------

Actually the lock occurs in a piece of code that shouldn't be called in production (it
is just there for demos).
A fix that activates correctly the feature has been pushed
(https://github.com/modcluster/mod_cluster/commit/df8af31db7468aba37c33c6e...).

...
 mod_cluster deadlock in a jboss/windows environment
 ---------------------------------------------------

                 Key: MODCLUSTER-398
                 URL: https://issues.jboss.org/browse/MODCLUSTER-398
             Project: mod_cluster
          Issue Type: Bug
    Affects Versions: 1.2.6.Final
         Environment: Windows 2008, EAP6 and EWS2.0.1
            Reporter: Marc Maurer
            Assignee: Jean-Frederic Clere

  Under load Apache stops serving pages, with all threads are stuck in "W : Sending
reply" state. With the windows Process Explorer we then got a stacktrace from a
hanging thread. We don't have debug symbols, but it's easy enough to see
what's happening:
 ntoskrnl.exe!KeWaitForMultipleObjects+0xc0a 
 ntoskrnl.exe!KeAcquireSpinLockAtDpcLevel+0x732 
 ntoskrnl.exe!KeWaitForMutexObject+0x19f 
 ntoskrnl.exe!NtDeleteFile+0x3c4 
 ntoskrnl.exe!PsDereferenceKernelStack+0x35358 
 ntoskrnl.exe!KeSynchronizeExecution+0x3a23 
 ntdll.dll!ZwLockFile+0xa 
 KERNELBASE.dll!LockFileEx+0xb2 
 kernel32.dll!LockFileEx+0x1b 
 libapr-1.dll!apr_file_lock+0x69  <-- here
 mod_slotmem.so+0x1318            <-- here
 mod_manager.so+0x2a11            <-- here
 mod_proxy_cluster.so+0x679e 
 mod_proxy.so!proxy_run_post_request+0x4e 
 mod_proxy.so!proxy_run_request_status+0x924 
 libhttpd.dll!ap_run_handler+0x35 
 libhttpd.dll!ap_invoke_handler+0x114 
 libhttpd.dll!ap_die+0x2ea 
 libhttpd.dll!ap_psignature+0x1ae8 
 libhttpd.dll!ap_run_process_connection+0x35 
 libhttpd.dll!ap_process_connection+0x3b 
 libhttpd.dll!ap_regkey_value_remove+0x136e 
 msvcrt.dll!srand+0x93 
 msvcrt.dll!ftime64_s+0x1dd 
 kernel32.dll!BaseThreadInitThunk+0xd 
 ntdll.dll!RtlUserThreadStart+0x21 
 So mod_manager is requesting a filelock on one of the lockfiles in in the MemManagerFile
path. In this case it was the "manager.sessionid.sessionid.lock" file. Removing
the lockfile fixed the problem.
 When bisecting the mod_cluster code, I think commit
"74eeb9c026380deb8d833be53b09b3d808e02d10 - Lock in insert-update" in version
1.2.2 is the culprit. This would also explain why mod_cluster 1.2.1 is the last known
working version.
 What we don't know, is which process is already holding the lock when all Apache
threads start blocking on it. We are trying to figure that out. There are no obviously
wrong lock/unlock slotmem call pairs in the mod_manager module, and no locks are requested
within other locks as far as we can see. Therefor our best guess would be a deadlock on a
thread already holding the globalmutex_lock in combination with the slotmem file locks,
but that's just a guess without debugging it.
 More context can  be found here: https://bugzilla.redhat.com/show_bug.cgi?id=1080047

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009