[JBoss JIRA] (JBTM-2549) CrashRecovery05_2_Test002 failure

Friday, 6 November 2015

     [
https://issues.jboss.org/browse/JBTM-2549?page=com.atlassian.jira.plugin....
]

Michael Musgrove updated JBTM-2549:
-----------------------------------
        Status: Resolved  (was: Pull Request Sent)
    Resolution: Done

...
 CrashRecovery05_2_Test002 failure
 ---------------------------------

                 Key: JBTM-2549
                 URL: https://issues.jboss.org/browse/JBTM-2549
             Project: JBoss Transaction Manager
          Issue Type: Bug
          Components: Testing
    Affects Versions: 5.2.7.Final
            Reporter: Michael Musgrove
            Assignee: Michael Musgrove
             Fix For: 5.next

 There are a number of failures in the CrashRecovery05_1 and CrashRecovery05_2 test groups
with a similar root cause:
 Tests that use AfterCrashServiceImpl01#check_oper and AfterCrashServiceImpl02#check_oper
with JdkORB fail because they make invalid assumptions about the return value of
RecoveryCoordinator#replay_completion. The OTS spec says "This (replay_completion)
non-blocking operation returns the current status of the transaction" but the
check_oper() test assumes that the return value represents the transaction status after
all resources have been replayed and therefore returns the wrong result to the requesting
client. The fix is to ask the resources for their status after the replay_completion
attempt (but also waiting for the resource to have been replayed). I  tested the
hypothesis by forcing a 200ms wait after issuing replay_completion on the
RecoveryCoordinator object (but I will use a rendezvous for the actual fix).
 There is a second problem with the way these tests are coded since they ignore the fact
that replay_completion reruns phase 2 on all resources whereas the test relies on it being
done on only the requested resource). A test sequence is as follows:
 # client starts a transaction
 # asks service1 to create a resource1
 # asks service2 to create a resource2
 # client commits the transaction (one of which crashes during 2PC)
 # restart the server (hosting the services)
 # client asks service1 for the status of the first resource
 # service1 invokes replay_completion on the RecoveryCoordinator for resource1
 # this causes a recovery attempt on both resources
 # client asks service2 for the status of the second resource
 # service2 invokes replay_completion on the RecoveryCoordinator for resource2
 # but this will fail because the transaction was completed during steps 7 and 8 so the
 # transaction log no longer exists and the recovery attempt calls rollback on resource2
 The fix is to store the transaction Status with the resources and have the service ask
the resources for the state (rather than the return value of the replay_completion
request). 

--
This message was sent by Atlassian JIRA
(v6.4.11#64026)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007