[JBoss JIRA] (JBTM-3331) LRA end should not return internal server error when precondition fails as it's considered behaviour by spec
by Ondrej Chaloupka (Jira)
[ https://issues.redhat.com/browse/JBTM-3331?page=com.atlassian.jira.plugin... ]
Ondrej Chaloupka edited comment on JBTM-3331 at 7/1/20 7:00 AM:
----------------------------------------------------------------
This report is not about the TCK test failure and it's not meant to fix the TCK instability by changing the implementation. The reference to the TCK tests only provides the context for when the behaviour can be observed. As said it's _the related behaviour of the Narayana implementation_.
When the situation which was causing the TCK failures happen then Narayana does not behave in the deterministic and correct(...?) way. The scenario is:
* client calls to the {{@LRA}} method (the participant)
* the filter causes the LRA is started
* LRA defines short timeout and the LRA is timeouted
* the filter tries to enlist the participant to LRA. As the LRA was cancelled the enlistment fails.
Now the follow up code call goes to the method {{NarayanaLRAClient#endLRA}}. It's because the filter comes there was some failure and thus it tries to end (cancel) it. The result of the {{endLRA}} method influences the return/error code that the client receives (aka. client called the `@LRA` method and now it's still waiting for the response from that call at this point).
If the coordinator already lost the notion about the LRA instance then the {{endLRA}} will return a different exception code and the filter processing returns to client {{412}} or {{500}}.
First, the behaviour is quite variable while I think the client should be receiving only {{412}} in all cases. I'm not sure right now if the {{500}} is strictly against the spec but I think it is not correct for this scenario.
was (Author: ochaloup):
This report is not about the TCK test failure and it's not meant to fix the TCK instability by changing the implementation. The reference to the TCK tests only provides the context for when the behaviour can be observed. As said it's _the related behaviour of the Narayana implementation_.
When the situation which was causing the TCK failures happen then Narayana does not behave in the deterministic and correct(?) way. The scenario is:
* client calls to the {{@LRA}} method (the participant)
* the filter causes the LRA is started
* LRA defines short timeout and the LRA is timeouted
* the filter tries to enlist the participant to LRA. As the LRA was cancelled the enlistment fails.
Now the follow up code call goes to the method {{NarayanaLRAClient#endLRA}}. It's because the filter comes there was some failure and thus it tries to end (cancel) it. The result of the {{endLRA}} method influences the return/error code that the client receives (aka. client called the `@LRA` method and now it's still waiting for the response from that call at this point).
If the coordinator already lost the notion about the LRA instance then the {{endLRA}} will return a different exception code and the filter processing returns to client {{412}} or {{500}}.
First, the behaviour is quite variable while I think the client should be receiving only {{412}} in all cases. I'm not sure right now if the {{500}} is strictly against the spec but I think it is not correct for this scenario.
> LRA end should not return internal server error when precondition fails as it's considered behaviour by spec
> ------------------------------------------------------------------------------------------------------------
>
> Key: JBTM-3331
> URL: https://issues.redhat.com/browse/JBTM-3331
> Project: JBoss Transaction Manager
> Issue Type: Bug
> Components: LRA
> Affects Versions: 5.10.5.Final
> Reporter: Ondrej Chaloupka
> Assignee: Ondrej Chaloupka
> Priority: Critical
>
> I've spent some time with https://issues.redhat.com/browse/JBTM-3318 recently and I have a doubt about behaviour of the LRA. The JBTM-3318 consists of race condition on starting/enlisting/timeouting the LRA.
> The same issue makes failing the `TckTests#timeLimit` and `TckRecoveryTests#testCancelWhenParticipantIsUnavailable` on our slow AMS CI.
> I don't want to talk now about the TCK failure but about the related behaviour of the Narayana implementation.
> The LRA participant defines the {{timeLimit}} (https://github.com/eclipse/microprofile-lra/blob/1.0-M1/tck/src/main/java...).
> And what happens is that the client (TCK test) calls the LRA method, the JAX-RS filter starts a LRA on coordinator, meanwhile the timeout limit elapses, the JAX-RS filter tries to enlist the LRA participant to started LRA but it fails as the LRA was cancelled because of timeout.
> Now. The possible non-deterministic Narayana behaviour is that in case of failure on LRA participant enlistment the client may or may not get internal server error.
> It's because the LRA in timeouted state on client is tried to be cancelled (see https://github.com/jbosstm/narayana/blob/5.10.5.Final/rts/lra/lra-client/...).
> The {{NarayanaLRAClient#endLRA}} tries to cancel the LRA. But as the coordinator timeouted the LRA then now depends if recovery already removed the LRA or not. If the LRA was not removed yet then {{412, PRECONDITION FAILED}} is returned. If the recovery made it then {{404, NOT FOUDN}} is returned.
> Now the {{endLRA}} considers the {{404}} as not a failure that is considered as {{500, INTERNAL SERVER ERROR}} while the {{412, PRECONDITION FAILED}} is considered as internal server error. That should not be that way as the LRA spec considers {{412}} as "correct" error (see https://github.com/eclipse/microprofile-lra/blob/1.0-M1/api/src/main/java...).
> Such an anticipated return state should not be reported to client as {{500}} - {{internal server error}}.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)