]
Michael Musgrove edited comment on JBTM-3331 at 6/30/20 11:32 AM:
------------------------------------------------------------------
I did not follow the issue description so instead I will explain how the spec works and
how narayana implements it:
# Once a participant has successfully joined with the LRA (ie the coordinator responded
with a 200 OK status code to the join request) it is guaranteed to receive a notification
when the LRA is ended. The end notification can be the result of a timeout of the LRA or
due to an explicit close or cancel of the LRA.
# In the case of a timeout the test must ensure that the LRA is not closed before the
coordinator has had a chance to process its internal timers. If the test environment makes
it difficult to guarantee that then the test needs to be modified so as to provide enough
time for any reasonable implementation to cancel the LRA before the test attempts to close
it.
My advice is to change the timing in the TCK test (rather than try to fix the
implementation).
On the other hand, if your bug report is saying that the implementation is not spec
compliant then please can you simplify your bug report and clearly state which part of the
spec we are violating.
was (Author: mmusgrov):
I did not follow the issue description so instead I will explain how the spec works and
how narayana implements it:
# Once a participant has successfully joined with the LRA (ie the coordinator responded
with a 200 OK status code to the join request) it is guaranteed to receive a notification
when the LRA is ended. The end notification can be the result of a timeout of the LRA or
due to an explicit close of the LRA.
# In the case of a timeout the test must ensure that the LRA is not cancelled before the
coordinator has had a chance to process its internal timers. If the test environment makes
it difficult to guarantee that then the test needs to be modified so as to provide enough
time for any reasonable implementation to cancel the LRA before the test attempts to close
it.
My advice is to change the timing in the TCK test (rather than try to fix the
implementation).
On the other hand, if your bug report is saying that the implementation is not spec
compliant then please can you simplify your bug report and clearly state which part of the
spec we are violating.
LRA end should not return internal server error when precondition
fails as it's considered behaviour by spec
------------------------------------------------------------------------------------------------------------
Key: JBTM-3331
URL:
https://issues.redhat.com/browse/JBTM-3331
Project: JBoss Transaction Manager
Issue Type: Bug
Components: LRA
Affects Versions: 5.10.5.Final
Reporter: Ondrej Chaloupka
Assignee: Ondrej Chaloupka
Priority: Critical
I've spent some time with
https://issues.redhat.com/browse/JBTM-3318 recently and I
have a doubt about behaviour of the LRA. The JBTM-3318 consists of race condition on
starting/enlisting/timeouting the LRA.
The same issue makes failing the `TckTests#timeLimit` and
`TckRecoveryTests#testCancelWhenParticipantIsUnavailable` on our slow AMS CI.
I don't want to talk now about the TCK failure but about the related behaviour of the
Narayana implementation.
The LRA participant defines the {{timeLimit}}
(
https://github.com/eclipse/microprofile-lra/blob/1.0-M1/tck/src/main/java...).
And what happens is that the client (TCK test) calls the LRA method, the JAX-RS filter
starts a LRA on coordinator, meanwhile the timeout limit elapses, the JAX-RS filter tries
to enlist the LRA participant to started LRA but it fails as the LRA was cancelled because
of timeout.
Now. The possible non-deterministic Narayana behaviour is that in case of failure on LRA
participant enlistment the client may or may not get internal server error.
It's because the LRA in timeouted state on client is tried to be cancelled (see
https://github.com/jbosstm/narayana/blob/5.10.5.Final/rts/lra/lra-client/...).
The {{NarayanaLRAClient#endLRA}} tries to cancel the LRA. But as the coordinator
timeouted the LRA then now depends if recovery already removed the LRA or not. If the LRA
was not removed yet then {{412, PRECONDITION FAILED}} is returned. If the recovery made it
then {{404, NOT FOUDN}} is returned.
Now the {{endLRA}} considers the {{404}} as not a failure that is considered as {{500,
INTERNAL SERVER ERROR}} while the {{412, PRECONDITION FAILED}} is considered as internal
server error. That should not be that way as the LRA spec considers {{412}} as
"correct" error (see
https://github.com/eclipse/microprofile-lra/blob/1.0-M1/api/src/main/java...).
Such an anticipated return state should not be reported to client as {{500}} - {{internal
server error}}.