[JBoss JIRA] (WFLY-88) Recovery not fully triggered when distributed transaction falls down at prepare phase of 2PC

Friday, 1 November 2013

    [
https://issues.jboss.org/browse/WFLY-88?page=com.atlassian.jira.plugin.sy...
] 

RH Bugzilla Integration commented on WFLY-88:
---------------------------------------------

Ondrej Chaloupka <ochaloup(a)redhat.com&gt; made a comment on [bug
952746|https://bugzilla.redhat.com/show_bug.cgi?id=952746]

Hi David,

I've checked the current state of the issue (as it's longer time that I've
been checking it) and I can say that there is still the problem in the waking up the ejb
remote connection when the remote server (remote server which is called from client server
- via outbound connection from client server) crashes and then comes up again. Then the
client sever (it started the tx) does not know nothing about the remote server is up and
that the recovery can be done.

This happen just for the distributed JTA transactions. The JTS transactions manage the
distributed communication between nodes and the recovery starts without problem.

The workaround for the recovery is to call a remote method from the client server to the
remote server after the remote server comes back to life. Then the crash recovery will
start.

The test scenario when this problem occurs look:
 - transaction is started on the client server 
 - the client server does call via outbound connection to the remote server (tx context is
propagated to remote server)
 - the remote server sends a message to a queue (simulation of some action done during the
transaction)
 - finishing the remote call and the bean method
 - the transaction started 2PC. The prepare phase is done and the commit phase is started.
The remote server crashes at the entry to the commit method
 - client server is still alive
 - remote server comes to life
 - the crash recovery should proceed the commit as all the participant agreed on it

I would put here the explanation from Jaikiran:
When a connection breaks down between the server and the client, specifically when the
client goes down and comes back up again, then the server and the client will not auto
communicate with each other. 
In other words, the server will have no knowledge (in EJB resource sense) that the client
has come back up again. That effectively means that the EJB tx recovery process will have
no clue of the EJB nodes to communicate with.
To deal with that, there should be some communication from the client (which is now up) to
the server to reestablish that connection. 
In a real application, it would be the first invocation from the client to the server. 

I've checked that the call from the client server to remote one really establishes the
connection and recovery starts.
B the next call from the client to server could take some time and meanwhile the
transaction could be rollbacked because of the timeout.

What do you think about this?
I think that current behavior is not correct. We agreed on it with Jaikiran before as well
but he haven't got a time to fix it
(https://bugzilla.redhat.com/show_bug.cgi?id=952746#c15).

Thanks
Ondra

...
 Recovery not fully triggered when distributed transaction falls down
at prepare phase of 2PC

--------------------------------------------------------------------------------------------

                 Key: WFLY-88
                 URL: https://issues.jboss.org/browse/WFLY-88
             Project: WildFly
          Issue Type: Bug
      Security Level: Public(Everyone can see) 
          Components: EJB, Remoting
            Reporter: Ivo Studensky
            Assignee: jaikiran pai
             Fix For: 8.0.0.Alpha1

         Attachments: logs_prepareHaltClient.tgz

 It looks like recovery process is not fully triggered on a distributed transaction when
the transaction falls down at prepare phase of 2PC. In the new crash recovery tests over
propagated transactions only one of two servers recovers from the crash, but the other
keeps an unfinished tx in its tx log. 
 It corresponds to prepareHaltClient and prepareHaltServer test methods of
org.jboss.as.test.jbossts.crashrec.txpropagation.TxPropagationCrashRecoveryTestCase, see
JBQA-2604 for general description of the new tests. The prepareHaltClient test crashes the
server which initiated the transaction, while as the prepareHaltServer test crashes the
second server.
 The tests are written against EAP6.x branch, so for reproducing this it is needed a built
server from the 7.1 branch of AS7.
 Steps to reproduce.
 1. git clone -b as7 git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-transactions.git
 2. cd eap-tests-transactions
 3. git checkout tx_propag_crashrec_tests
 4a. mvn clean verify -Dtest=TxPropagationCrashRecoveryTestCase#prepareHaltClient
-Djboss.dist=<path to jboss-as-7.1.3.Final-SNAPSHOT>
 or
 4b. mvn clean verify -Dtest=TxPropagationCrashRecoveryTestCase#prepareHaltServer
-Djboss.dist=<path to jboss-as-7.1.3.Final-SNAPSHOT>
 The logs of prepareHaltClient run attached to this jira. 
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006