[jbossts-issues] [JBoss JIRA] (JBTM-2409) XARecoveryModuleHelpersUnitTest hang

Tom Jenkinson (JIRA) issues at jboss.org
Fri Oct 2 11:46:00 EDT 2015


    [ https://issues.jboss.org/browse/JBTM-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114710#comment-13114710 ] 

Tom Jenkinson commented on JBTM-2409:
-------------------------------------

I think what has happened here is that the notifyAll: https://github.com/jbosstm/narayana/blob/5.2.5.Final/ArjunaJTA/jta/classes/com/arjuna/ats/internal/jta/recovery/arjunacore/XARecoveryModule.java#L994
released the locks:
https://github.com/jbosstm/narayana/blob/5.2.5.Final/ArjunaJTA/jta/classes/com/arjuna/ats/internal/jta/recovery/arjunacore/XARecoveryModule.java#L976
but not before:
https://github.com/jbosstm/narayana/blob/5.2.5.Final/ArjunaJTA/jta/classes/com/arjuna/ats/internal/jta/recovery/arjunacore/XARecoveryModule.java#L201 acquires the lock to set scan state to second pass which prevents these waiting threads from waking up.

I think it can do this as there is only a 100 mill wait and its possible that the main thread fires before the two waiting threads wake from the notify to reacquire the lock:
https://github.com/jbosstm/narayana/blob/5.2.5.Final/ArjunaJTA/jta/tests/classes/com/hp/mwtests/ts/jta/recovery/XARecoveryModuleHelpersUnitTest.java#L128

As in normal circumstances the delay between scans is much longer, plus this test has only failed once I will close it. If it reoccurs we could increase the delay in the test.

> XARecoveryModuleHelpersUnitTest hang
> ------------------------------------
>
>                 Key: JBTM-2409
>                 URL: https://issues.jboss.org/browse/JBTM-2409
>             Project: JBoss Transaction Manager
>          Issue Type: Bug
>          Components: Recovery
>    Affects Versions: 5.1.1
>            Reporter: Michael Musgrove
>            Assignee: Tom Jenkinson
>         Attachments: 32287.jstack
>
>
> The test checks that recovery helpers can be removed at the correct stages during recovery scans. The CI job http://albany.eng.hst.ams2.redhat.com/job/narayana-codeCoverage/196 shows the hang.
> The junit test thread:
> - starts 2 threads that will remove a recovery helper
> - triggers xaRecoveryModule.periodicWorkFirstPass()
> - triggers xaRecoveryModule.periodicWorkSecondPass()
> - joins with remover threads
> The jstack output shows:
> - the two threads in the process of removing a recovery helper and are waiting for the first pass to finish;
> - the junit test thread is waiting to join with these two threads;
> Since both recovery passes must have completed it looks like the remover threads weren't notified when the first pass completed. 



--
This message was sent by Atlassian JIRA
(v6.4.11#64026)


More information about the jbossts-issues mailing list