jbossts-issues November 2015

jbossts-issues@lists.jboss.org

1 participants
244 discussions

[JBoss JIRA] (JBTM-2548) CrashRecovery05_2_Test076 (through CrashRecovery05_2_Test100) failures

by Michael Musgrove (JIRA)

[ https://issues.jboss.org/browse/JBTM-2548?page=com.atlassian.jira.plugin.... ] Michael Musgrove commented on JBTM-2548: ---------------------------------------- These test failures also require the fix provided by JBTM-2547 > CrashRecovery05_2_Test076 (through CrashRecovery05_2_Test100) failures > ---------------------------------------------------------------------- > > Key: JBTM-2548 > URL: https://issues.jboss.org/browse/JBTM-2548 > Project: JBoss Transaction Manager > Issue Type: Bug > Components: Testing > Affects Versions: 5.2.7.Final > Reporter: Michael Musgrove > Assignee: Michael Musgrove > Fix For: 5.next > > > These tests typically start 2 servers and crash one of them in various ways. > Then the two servers are restarted but the tests fail because the recovery port used by one of the two servers is stuck in the TIME-WAIT state so the server restart gets a "Bind: Address Already in Use" error. > I propose to fix this by doing a retry and backoff approach. -- This message was sent by Atlassian JIRA (v6.4.11#64026)

10 years, 1 month

1
0
0 / 0

[JBoss JIRA] (JBTM-2549) CrashRecovery05_2_Test002 failure

by Michael Musgrove (JIRA)

Michael Musgrove created JBTM-2549: -------------------------------------- Summary: CrashRecovery05_2_Test002 failure Key: JBTM-2549 URL: https://issues.jboss.org/browse/JBTM-2549 Project: JBoss Transaction Manager Issue Type: Bug Components: Testing Affects Versions: 5.2.7.Final Reporter: Michael Musgrove Assignee: Michael Musgrove Fix For: 5.next There are a number of failures in the CrashRecovery05_1 and CrashRecovery05_2 test groups with a similar root cause: Tests that use AfterCrashServiceImpl01#check_oper and AfterCrashServiceImpl02#check_oper with JdkORB fail because they make invalid assumptions about the return value of RecoveryCoordinator#replay_completion. The OTS spec says "This (replay_completion) non-blocking operation returns the current status of the transaction" but the check_oper() test assumes that the return value represents the transaction status after all resources have been replayed and therefore returns the wrong result to the requesting client. The fix is to ask the resources for their status after the replay_completion attempt (but also waiting for the resource to have been replayed). I tested the hypothesis by forcing a 200ms wait after issuing replay_completion on the RecoveryCoordinator object (but I will use a rendezvous for the actual fix). There is a second problem with the way these tests are coded since they ignore the fact that replay_completion reruns phase 2 on all resources whereas the test relies on it being done on only the requested resource). A test sequence is as follows: # client starts a transaction # asks service1 to create a resource1 # asks service2 to create a resource2 # client commits the transaction (one of which crashes during 2PC) # restart the server (hosting the services) # client asks service1 for the status of the first resource # service1 invokes replay_completion on the RecoveryCoordinator for resource1 # this causes a recovery attempt on both resources # client asks service2 for the status of the second resource # service2 invokes replay_completion on the RecoveryCoordinator for resource2 # but this will fail because the transaction was completed during steps 7 and 8 so the # transaction log no longer exists and the recovery attempt calls rollback on resource2 The fix is to store the transaction Status with the resources and have the service ask the resources for the state (rather than the return value of the replay_completion request). -- This message was sent by Atlassian JIRA (v6.4.11#64026)

10 years, 1 month

1
0
0 / 0

[JBoss JIRA] (JBTM-2548) CrashRecovery05_2_Test076 (through CrashRecovery05_2_Test100) failures

by Michael Musgrove (JIRA)

Michael Musgrove created JBTM-2548: -------------------------------------- Summary: CrashRecovery05_2_Test076 (through CrashRecovery05_2_Test100) failures Key: JBTM-2548 URL: https://issues.jboss.org/browse/JBTM-2548 Project: JBoss Transaction Manager Issue Type: Bug Components: Testing Affects Versions: 5.2.7.Final Reporter: Michael Musgrove Assignee: Michael Musgrove Fix For: 5.next These tests typically start 2 servers and crash one of them in various ways. Then the two servers are restarted but the tests fail because the recovery port used by one of the two servers is stuck in the TIME-WAIT state so the server restart gets a "Bind: Address Already in Use" error. I propose to fix this by doing a retry and backoff approach. -- This message was sent by Atlassian JIRA (v6.4.11#64026)

10 years, 1 month

1
0
0 / 0

[JBoss JIRA] (JBTM-2547) CrashRecovery02_2_Test01 failure

by Michael Musgrove (JIRA)

Michael Musgrove created JBTM-2547: -------------------------------------- Summary: CrashRecovery02_2_Test01 failure Key: JBTM-2547 URL: https://issues.jboss.org/browse/JBTM-2547 Project: JBoss Transaction Manager Issue Type: Bug Components: Testing Affects Versions: 5.2.7.Final Environment: Run the QA testsuite using the JdkORB Reporter: Michael Musgrove Assignee: Michael Musgrove Fix For: 5.next The root cause of the failure is that StatusChecker#getStatus gets a different exception when trying to obtain the transaction status. With JacORB it gets OBJECT_NOT_EXIST but with JdkORB orb it gets CORBA.BAD_OPERATION. In the former case we assume that the original transaction is dead and get a fresh one to do recovery on. In the later case we log the error and abort the replay attempt. The recommended fix is to mark the original as dead when we get CORBA.BAD_OPERATION when doing the status lookup. -- This message was sent by Atlassian JIRA (v6.4.11#64026)

10 years, 1 month

1
0
0 / 0

← Newer
1
...
22
23
24
25
Older →

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

jbossts-issues November 2015