[jboss-jira] [Red Hat JIRA] (WFLY-14284) WildFly doesn't stop while waiting for PeriodicRecovery

Monday, 25 January 2021

    [
https://issues.redhat.com/browse/WFLY-14284?page=com.atlassian.jira.plugi...
] 

Ondrej Chaloupka commented on WFLY-14284:
-----------------------------------------

hi [~adrianots],

thanks for you investigation. The changes you did are those which I would recommend to
examine.

For the first part of your question. I'm worry there is no clear documentation on
remoting protocols and their differences in current WFLY. We have got a documentation
issue for that https://issues.redhat.com/browse/JBEAP-10343 (it's created for JBoss
EAP product but it's what's missing in WFLY doc as well).
I can only link you to source code like here:
https://github.com/wildfly/jboss-ejb-client/blob/4.0.37.Final/src/main/ja...

About your changes. What you can observe is something I didn't realize but sounds
logical. The process of the EJB remoting and transaction propagation means creating a
record about remote connection when transaction is persisted to transaction object store
after the  {{prepare}} phase finishes.
Currently it works in the following way. When the remote connection with transaction
propagation starts then the WildFly Transaction Client creates a record for recovery knows
how to connect to the remote side. It should be under {{data/ejb-txn-recovery}} directory
(https://github.com/wildfly/wildfly-transaction-client/blob/1.1.13.Final/s...).
Then after prepare the transaction manager makes a record about transaction was prepared
under {{data/tx-object-store}}.
Now, your transaction was somehow corrupted as it was saved with HEURISTIC outcome
(https://jbossts.blogspot.com/2019/09/heuristic-exceptions.html) to object store. The
heuristic state means the recovery manager is not capable to decide what to do with the
transaction and the human intervention is needed. That means it won't be finished by
WFLY automatically.
The transaction recovery then retries to connect the remote connection saved in
{{data/ejb-txn-recovery}} by WFLY txn client. The client saved this connection to be on
protocol {{http-remoting}} and thus it retries with that protocol - I assume ti from the
error you get that still contains the same class handling the connection
client.transaction@1.0.21.Final//org.wildfly.httpclient.transaction.HttpRemoteTransactionPeer.recover(HttpRemoteTransactionPeer.java:107)
When you remove the {{data}} directory then the recover is not retrying to connect to the
remote endpoint endlessly, still waiting for somebody to verify  the txn in the heuristic
state. With deleting the data directory (or it could be just content of the
{{data/ejb-txn-recovery}} and {{data/tx-object-store}} you says that the transaction has
no further value for you and you start with clean object store. If it's ok from the
app state perspective then it's fine.

Now, the only question is - when you get to the same state of having the HEURISTIC
transaction while your app server configures the {{remote+http}} protocol - if the result
will be only error reported and no issue for the WFLY shutdown process. From what I can
remember using the {{remote+http}} should fix that.

...
 WildFly doesn't stop while waiting for PeriodicRecovery
 -------------------------------------------------------

                 Key: WFLY-14284
                 URL: https://issues.redhat.com/browse/WFLY-14284
             Project: WildFly
          Issue Type: Bug
          Components: EJB, Transactions
    Affects Versions: 18.0.1.Final, 20.0.1.Final
            Reporter: Adriano Teixeira de Souza
            Assignee: Michael Musgrove
            Priority: Major
         Attachments: ejb-configs.sh, jboss-ejb-client.xml, server(transaction).log,
thread-dump-stop-1.txt

 I'm testing wildfly 20.0.1 (and 21.0.2 was tested too) for replace our old version of
Wildfly 10.
 it happens that frequently we have seen that the stop function of server does not work
and we need to kill the process by manual operation on the OS.
 It sounds like a dead look.
 I attatch the thread dump on this. 

--
This message was sent by Atlassian Jira
(v8.13.1#813001)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[jboss-jira] [Red Hat JIRA] (WFLY-14284) WildFly doesn't stop while waiting for PeriodicRecovery