[jbossts-issues] [JBoss JIRA] (JBTM-2409) XARecoveryModuleHelpersUnitTest hang
Tom Jenkinson (JIRA)
issues at jboss.org
Fri Oct 2 11:46:00 EDT 2015
[ https://issues.jboss.org/browse/JBTM-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114710#comment-13114710 ]
Tom Jenkinson commented on JBTM-2409:
-------------------------------------
I think what has happened here is that the notifyAll: https://github.com/jbosstm/narayana/blob/5.2.5.Final/ArjunaJTA/jta/classes/com/arjuna/ats/internal/jta/recovery/arjunacore/XARecoveryModule.java#L994
released the locks:
https://github.com/jbosstm/narayana/blob/5.2.5.Final/ArjunaJTA/jta/classes/com/arjuna/ats/internal/jta/recovery/arjunacore/XARecoveryModule.java#L976
but not before:
https://github.com/jbosstm/narayana/blob/5.2.5.Final/ArjunaJTA/jta/classes/com/arjuna/ats/internal/jta/recovery/arjunacore/XARecoveryModule.java#L201 acquires the lock to set scan state to second pass which prevents these waiting threads from waking up.
I think it can do this as there is only a 100 mill wait and its possible that the main thread fires before the two waiting threads wake from the notify to reacquire the lock:
https://github.com/jbosstm/narayana/blob/5.2.5.Final/ArjunaJTA/jta/tests/classes/com/hp/mwtests/ts/jta/recovery/XARecoveryModuleHelpersUnitTest.java#L128
As in normal circumstances the delay between scans is much longer, plus this test has only failed once I will close it. If it reoccurs we could increase the delay in the test.
> XARecoveryModuleHelpersUnitTest hang
> ------------------------------------
>
> Key: JBTM-2409
> URL: https://issues.jboss.org/browse/JBTM-2409
> Project: JBoss Transaction Manager
> Issue Type: Bug
> Components: Recovery
> Affects Versions: 5.1.1
> Reporter: Michael Musgrove
> Assignee: Tom Jenkinson
> Attachments: 32287.jstack
>
>
> The test checks that recovery helpers can be removed at the correct stages during recovery scans. The CI job http://albany.eng.hst.ams2.redhat.com/job/narayana-codeCoverage/196 shows the hang.
> The junit test thread:
> - starts 2 threads that will remove a recovery helper
> - triggers xaRecoveryModule.periodicWorkFirstPass()
> - triggers xaRecoveryModule.periodicWorkSecondPass()
> - joins with remover threads
> The jstack output shows:
> - the two threads in the process of removing a recovery helper and are waiting for the first pass to finish;
> - the junit test thread is waiting to join with these two threads;
> Since both recovery passes must have completed it looks like the remover threads weren't notified when the first pass completed.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
More information about the jbossts-issues
mailing list