[
https://issues.redhat.com/browse/JBTM-3297?page=com.atlassian.jira.plugin...
]
Michael Musgrove edited comment on JBTM-3297 at 7/15/20 7:42 AM:
-----------------------------------------------------------------
Here is some more detail about what this task entails:
After a transaction has prepared we create a transaction record so that if a failure
occurs we can recover it. The responsibility for processing failures is the recovery
system [1]. We have a number of recovery modules [2], each responsible for a different
transaction log type. The different log types are stored in specific locations in the
object store (for example LRAs are stored at [7]. A recovery module manages records of a
particular type. The recovery manager periodically asks each recovery module to check for
any logs in need of recovery [3].
The recovery module responsible for completing Long Running Actions (LRAs) is [4]. When it
runs it looks for records in the store of a specific type/location [5, 7]. But an LRA can
finish in a failure state in which case it can never be recovered. However logs in the
failed state will be processed on every recovery pass (every few minutes) since they are
of the same type as LRA records that do need recovery. This is not efficient (since there
will never be a recovery attempt on a failed LRA) and will eventually, if there are many
failures, significantly degrade performance.
But we still need to keep hold of logs for failed LRAs for reporting purposes so that an
external management system can inspect them, for example [6] .
This task is to move these failed records to a different location [8] so that the LRA
recovery module will ignore them. The LRA recovery coordinator [6] will need to look in
this new location when reporting or deleting failed logs. An example of where we already
perform this kind of operation is in the ExpiredTransactionScanner#moveEntry [9]
[1] ArjunaCore/arjuna/classes/com/arjuna/ats/arjuna/recovery/RecoveryManager.java
[2] ArjunaCore/arjuna/classes/com/arjuna/ats/arjuna/recovery/RecoveryModule.java
[3]
ArjunaCore/arjuna/classes/com/arjuna/ats/internal/arjuna/recovery/PeriodicRecovery.java
[4]
rts/lra/lra-coordinator-jar/src/main/java/io/narayana/lra/coordinator/internal/LRARecoveryModule.java
[5] look for LRARecoveryModule#_transactionType (=
io.narayana.lra.coordinator.domain.model.Transaction.getType() )
[6]
rts/lra/lra-coordinator-jar/src/main/java/io/narayana/lra/coordinator/api/RecoveryCoordinator.java
(see method getFailedLRAs())
[7]
rts/lra/lra-coordinator-jar/src/main/java/io/narayana/lra/coordinator/domain/model/Transaction.java
(LRA_TYPE = "/StateManager/BasicAction/TwoPhaseCoordinator/LRA")
[8] something similar to LRA_FAILED_TYPE = LRA_TYPE + "/Failed" (ie
"/StateManager/BascAction/TwoPhaseCoordinator/LRA/Failed")
[9]
ArjunaCore/arjuna/classes/com/arjuna/ats/internal/arjuna/recovery/ExpiredTransactionScanner.java
was (Author: mmusgrov):
Here is some more detail about what this task entails:
After a transaction has prepared we create a transaction record so that if a failure
occurs we can recover it. The responsibility for processing failures is the recovery
system [1]. We have a number of recovery modules [2], each responsible for a different
transaction log type. The different log types are stored in specific locations in the
object store (for example LRAs are stored at [7]. A recovery module manages records of a
particular type. The recovery manager periodically asks each recovery module to check for
any logs in need of recovery [3].
The recovery module responsible for completing Long Running Actions (LRAs) is [4]. When it
runs it looks for records in the store of a specific type/location [5, 7]. But an LRA can
finish in a failure state in which case it can never be recovered. However logs in the
failed state will be processed on every recovery pass (every few minutes) since they are
of the same type as LRA records that do need recovery. This is not efficient (since there
will never be a recovery attempt on a failed LRA) and will eventually, if there are many
failures, significantly degrade performance.
But we still need to keep hold of logs for failed LRAs for reporting purposes so that an
external management system can inspect them, for example [6] .
This task is to move these failed records to a different location [8] so that the LRA
recovery module will ignore them. The LRA recovery coordinator [6] will need to look in
this new location when reporting or deleting failed logs. An example of where we already
perform this kind of operation is in the ExpiredTransactionScanner#moveEntry [9]
[1] ArjunaCore/arjuna/classes/com/arjuna/ats/arjuna/recovery/RecoveryManager.java
[2] ArjunaCore/arjuna/classes/com/arjuna/ats/arjuna/recovery/RecoveryModule.java
[3]
ArjunaCore/arjuna/classes/com/arjuna/ats/internal/arjuna/recovery/PeriodicRecovery.java
[4]
rts/lra/lra-coordinator-jar/src/main/java/io/narayana/lra/coordinator/internal/LRARecoveryModule.java
[5] look for LRARecoveryModule#_transactionType (=
io.narayana.lra.coordinator.domain.model.Transaction.getType();)
[6]
rts/lra/lra-coordinator-jar/src/main/java/io/narayana/lra/coordinator/api/RecoveryCoordinator.java
(see method getFailedLRAs())
[7]
rts/lra/lra-coordinator-jar/src/main/java/io/narayana/lra/coordinator/domain/model/Transaction.java
(LRA_TYPE = "/StateManager/BasicAction/TwoPhaseCoordinator/LRA")
[8] something similar to LRA_FAILED_TYPE = LRA_TYPE + "/Expired" (ie
"/StateManager/BascAction/TwoPhaseCoordinator/LRA/Expired")
[9]
ArjunaCore/arjuna/classes/com/arjuna/ats/internal/arjuna/recovery/ExpiredTransactionScanner.java
Move LRA failure records to another part of the store
-----------------------------------------------------
Key: JBTM-3297
URL:
https://issues.redhat.com/browse/JBTM-3297
Project: JBoss Transaction Manager
Issue Type: Enhancement
Components: LRA
Affects Versions: 5.10.4.Final
Reporter: Michael Musgrove
Assignee: Mayank Kunwar
Priority: Optional
Prior to JBTM-3247 we deleted LRA failure records (after reporting them). With the fix
for JBTM-3247 we now retain failure records which can impact processing of the transaction
logs. These records should be moved to another part of the store (note that they can still
be queried and deleted by the user).
I have marked the priority as optional even though it is highly desirable.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)