[mod_cluster-issues] [JBoss JIRA] (MODCLUSTER-398) mod_cluster deadlock in a jboss/windows environment

Wed Apr 30 16:55:56 EDT 2014

     [ https://issues.jboss.org/browse/MODCLUSTER-398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

RH Bugzilla Integration updated MODCLUSTER-398:
-----------------------------------------------

        Bugzilla Update: Perform
    Bugzilla References: https://bugzilla.redhat.com/show_bug.cgi?id=1080047


> mod_cluster deadlock in a jboss/windows environment
> ---------------------------------------------------
>
>                 Key: MODCLUSTER-398
>                 URL: https://issues.jboss.org/browse/MODCLUSTER-398
>             Project: mod_cluster
>          Issue Type: Bug
>    Affects Versions: 1.2.6.Final
>         Environment: Windows 2008, EAP6 and EWS2.0.1
>            Reporter: Marc Maurer
>            Assignee: Jean-Frederic Clere
>
>  Under load Apache stops serving pages, with all threads are stuck in "W : Sending reply" state. With the windows Process Explorer we then got a stacktrace from a hanging thread. We don't have debug symbols, but it's easy enough to see what's happening:
> ntoskrnl.exe!KeWaitForMultipleObjects+0xc0a 
> ntoskrnl.exe!KeAcquireSpinLockAtDpcLevel+0x732 
> ntoskrnl.exe!KeWaitForMutexObject+0x19f 
> ntoskrnl.exe!NtDeleteFile+0x3c4 
> ntoskrnl.exe!PsDereferenceKernelStack+0x35358 
> ntoskrnl.exe!KeSynchronizeExecution+0x3a23 
> ntdll.dll!ZwLockFile+0xa 
> KERNELBASE.dll!LockFileEx+0xb2 
> kernel32.dll!LockFileEx+0x1b 
> libapr-1.dll!apr_file_lock+0x69  <-- here
> mod_slotmem.so+0x1318            <-- here
> mod_manager.so+0x2a11            <-- here
> mod_proxy_cluster.so+0x679e 
> mod_proxy.so!proxy_run_post_request+0x4e 
> mod_proxy.so!proxy_run_request_status+0x924 
> libhttpd.dll!ap_run_handler+0x35 
> libhttpd.dll!ap_invoke_handler+0x114 
> libhttpd.dll!ap_die+0x2ea 
> libhttpd.dll!ap_psignature+0x1ae8 
> libhttpd.dll!ap_run_process_connection+0x35 
> libhttpd.dll!ap_process_connection+0x3b 
> libhttpd.dll!ap_regkey_value_remove+0x136e 
> msvcrt.dll!srand+0x93 
> msvcrt.dll!ftime64_s+0x1dd 
> kernel32.dll!BaseThreadInitThunk+0xd 
> ntdll.dll!RtlUserThreadStart+0x21 
> So mod_manager is requesting a filelock on one of the lockfiles in in the MemManagerFile path. In this case it was the "manager.sessionid.sessionid.lock" file. Removing the lockfile fixed the problem.
> When bisecting the mod_cluster code, I think commit "74eeb9c026380deb8d833be53b09b3d808e02d10 - Lock in insert-update" in version 1.2.2 is the culprit. This would also explain why mod_cluster 1.2.1 is the last known working version.
> What we don't know, is which process is already holding the lock when all Apache threads start blocking on it. We are trying to figure that out. There are no obviously wrong lock/unlock slotmem call pairs in the mod_manager module, and no locks are requested within other locks as far as we can see. Therefor our best guess would be a deadlock on a thread already holding the globalmutex_lock in combination with the slotmem file locks, but that's just a guess without debugging it.
> More context can  be found here: https://bugzilla.redhat.com/show_bug.cgi?id=1080047


--
This message was sent by Atlassian JIRA
(v6.2.3#6260)