[JBoss JIRA] (JBTM-3331) LRA end should not return internal server error when precondition fails as it's considered behaviour by spec
by Michael Musgrove (Jira)
[ https://issues.redhat.com/browse/JBTM-3331?page=com.atlassian.jira.plugin... ]
Michael Musgrove edited comment on JBTM-3331 at 6/30/20 11:34 AM:
------------------------------------------------------------------
I did not follow the issue description so instead I will explain how the spec works and how narayana implements it:
# Once a participant has successfully joined with the LRA (ie the coordinator responded with a 200 OK status code to the join request) it is guaranteed to receive a notification when the LRA is ended. The end notification can be the result of a timeout of the LRA or due to an explicit close or cancel of the LRA.
# In the case of a timeout the test must ensure that the LRA is not closed before the coordinator has had a chance to process its internal timers (note that LRA does not provide hard real time guarantees so the test has to be coded with that in mind). If the test environment makes it difficult to guarantee that then the test needs to be modified so as to provide enough time for any reasonable implementation to cancel the LRA before the test attempts to close it.
My advice is to change the timing in the TCK test (rather than try to fix the implementation).
On the other hand, if your bug report is saying that the implementation is not spec compliant then please can you simplify your bug report and clearly state which part of the spec we are violating.
was (Author: mmusgrov):
I did not follow the issue description so instead I will explain how the spec works and how narayana implements it:
# Once a participant has successfully joined with the LRA (ie the coordinator responded with a 200 OK status code to the join request) it is guaranteed to receive a notification when the LRA is ended. The end notification can be the result of a timeout of the LRA or due to an explicit close or cancel of the LRA.
# In the case of a timeout the test must ensure that the LRA is not closed before the coordinator has had a chance to process its internal timers. If the test environment makes it difficult to guarantee that then the test needs to be modified so as to provide enough time for any reasonable implementation to cancel the LRA before the test attempts to close it.
My advice is to change the timing in the TCK test (rather than try to fix the implementation).
On the other hand, if your bug report is saying that the implementation is not spec compliant then please can you simplify your bug report and clearly state which part of the spec we are violating.
> LRA end should not return internal server error when precondition fails as it's considered behaviour by spec
> ------------------------------------------------------------------------------------------------------------
>
> Key: JBTM-3331
> URL: https://issues.redhat.com/browse/JBTM-3331
> Project: JBoss Transaction Manager
> Issue Type: Bug
> Components: LRA
> Affects Versions: 5.10.5.Final
> Reporter: Ondrej Chaloupka
> Assignee: Ondrej Chaloupka
> Priority: Critical
>
> I've spent some time with https://issues.redhat.com/browse/JBTM-3318 recently and I have a doubt about behaviour of the LRA. The JBTM-3318 consists of race condition on starting/enlisting/timeouting the LRA.
> The same issue makes failing the `TckTests#timeLimit` and `TckRecoveryTests#testCancelWhenParticipantIsUnavailable` on our slow AMS CI.
> I don't want to talk now about the TCK failure but about the related behaviour of the Narayana implementation.
> The LRA participant defines the {{timeLimit}} (https://github.com/eclipse/microprofile-lra/blob/1.0-M1/tck/src/main/java...).
> And what happens is that the client (TCK test) calls the LRA method, the JAX-RS filter starts a LRA on coordinator, meanwhile the timeout limit elapses, the JAX-RS filter tries to enlist the LRA participant to started LRA but it fails as the LRA was cancelled because of timeout.
> Now. The possible non-deterministic Narayana behaviour is that in case of failure on LRA participant enlistment the client may or may not get internal server error.
> It's because the LRA in timeouted state on client is tried to be cancelled (see https://github.com/jbosstm/narayana/blob/5.10.5.Final/rts/lra/lra-client/...).
> The {{NarayanaLRAClient#endLRA}} tries to cancel the LRA. But as the coordinator timeouted the LRA then now depends if recovery already removed the LRA or not. If the LRA was not removed yet then {{412, PRECONDITION FAILED}} is returned. If the recovery made it then {{404, NOT FOUDN}} is returned.
> Now the {{endLRA}} considers the {{404}} as not a failure that is considered as {{500, INTERNAL SERVER ERROR}} while the {{412, PRECONDITION FAILED}} is considered as internal server error. That should not be that way as the LRA spec considers {{412}} as "correct" error (see https://github.com/eclipse/microprofile-lra/blob/1.0-M1/api/src/main/java...).
> Such an anticipated return state should not be reported to client as {{500}} - {{internal server error}}.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 5 months
[JBoss JIRA] (JBTM-3331) LRA end should not return internal server error when precondition fails as it's considered behaviour by spec
by Michael Musgrove (Jira)
[ https://issues.redhat.com/browse/JBTM-3331?page=com.atlassian.jira.plugin... ]
Michael Musgrove edited comment on JBTM-3331 at 6/30/20 11:31 AM:
------------------------------------------------------------------
I did not follow the issue description so instead I will explain how the spec works and how narayana implements it:
# Once a participant has successfully joined with the LRA (ie the coordinator responded with a 200 OK status code to the join request) it is guaranteed to receive a notification when the LRA is ended. The end notification can be the result of a timeout of the LRA or due to an explicit close of the LRA.
# In the case of a timeout the test must ensure that the LRA is not cancelled before the coordinator has had a chance to process its internal timers. If the test environment makes it difficult to guarantee that then the test needs to be modified so as to provide enough time for any reasonable implementation to cancel the LRA before the test attempts to close it.
My advice is to change the timing in the TCK test (rather than try to fix the implementation).
On the other hand, if your bug report is saying that the implementation is not spec compliant then please can you simplify your bug report and clearly state which part of the spec we are violating.
was (Author: mmusgrov):
I did not follow the issue description so instead I will explain how the spec works and how narayana implements it:
# Once a participant has successfully joined with the LRA (ie the coordinator responded with a 200 OK status code to the join request) it is guaranteed to receive a notification when the LRA is ended. The end notification can be the result of a timeout of the LRA or due to an explicit cancellation of the LRA.
# In the case of a timeout the test must ensure that the LRA is not cancelled before the coordinator has had a chance to process its internal timers. If the test environment makes it difficult to guarantee that then the test needs to be modified so as to provide enough time for any reasonable implementation to cancel the LRA before the test attempts to close it.
My advice is to change the timing in the TCK test (rather than try to fix the implementation).
On the other hand, if your bug report is saying that the implementation is not spec compliant then please can you simplify your bug report and clearly state which part of the spec we are violating.
> LRA end should not return internal server error when precondition fails as it's considered behaviour by spec
> ------------------------------------------------------------------------------------------------------------
>
> Key: JBTM-3331
> URL: https://issues.redhat.com/browse/JBTM-3331
> Project: JBoss Transaction Manager
> Issue Type: Bug
> Components: LRA
> Affects Versions: 5.10.5.Final
> Reporter: Ondrej Chaloupka
> Assignee: Ondrej Chaloupka
> Priority: Critical
>
> I've spent some time with https://issues.redhat.com/browse/JBTM-3318 recently and I have a doubt about behaviour of the LRA. The JBTM-3318 consists of race condition on starting/enlisting/timeouting the LRA.
> The same issue makes failing the `TckTests#timeLimit` and `TckRecoveryTests#testCancelWhenParticipantIsUnavailable` on our slow AMS CI.
> I don't want to talk now about the TCK failure but about the related behaviour of the Narayana implementation.
> The LRA participant defines the {{timeLimit}} (https://github.com/eclipse/microprofile-lra/blob/1.0-M1/tck/src/main/java...).
> And what happens is that the client (TCK test) calls the LRA method, the JAX-RS filter starts a LRA on coordinator, meanwhile the timeout limit elapses, the JAX-RS filter tries to enlist the LRA participant to started LRA but it fails as the LRA was cancelled because of timeout.
> Now. The possible non-deterministic Narayana behaviour is that in case of failure on LRA participant enlistment the client may or may not get internal server error.
> It's because the LRA in timeouted state on client is tried to be cancelled (see https://github.com/jbosstm/narayana/blob/5.10.5.Final/rts/lra/lra-client/...).
> The {{NarayanaLRAClient#endLRA}} tries to cancel the LRA. But as the coordinator timeouted the LRA then now depends if recovery already removed the LRA or not. If the LRA was not removed yet then {{412, PRECONDITION FAILED}} is returned. If the recovery made it then {{404, NOT FOUDN}} is returned.
> Now the {{endLRA}} considers the {{404}} as not a failure that is considered as {{500, INTERNAL SERVER ERROR}} while the {{412, PRECONDITION FAILED}} is considered as internal server error. That should not be that way as the LRA spec considers {{412}} as "correct" error (see https://github.com/eclipse/microprofile-lra/blob/1.0-M1/api/src/main/java...).
> Such an anticipated return state should not be reported to client as {{500}} - {{internal server error}}.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 5 months
[JBoss JIRA] (JBTM-3331) LRA end should not return internal server error when precondition fails as it's considered behaviour by spec
by Michael Musgrove (Jira)
[ https://issues.redhat.com/browse/JBTM-3331?page=com.atlassian.jira.plugin... ]
Michael Musgrove edited comment on JBTM-3331 at 6/30/20 11:32 AM:
------------------------------------------------------------------
I did not follow the issue description so instead I will explain how the spec works and how narayana implements it:
# Once a participant has successfully joined with the LRA (ie the coordinator responded with a 200 OK status code to the join request) it is guaranteed to receive a notification when the LRA is ended. The end notification can be the result of a timeout of the LRA or due to an explicit close or cancel of the LRA.
# In the case of a timeout the test must ensure that the LRA is not closed before the coordinator has had a chance to process its internal timers. If the test environment makes it difficult to guarantee that then the test needs to be modified so as to provide enough time for any reasonable implementation to cancel the LRA before the test attempts to close it.
My advice is to change the timing in the TCK test (rather than try to fix the implementation).
On the other hand, if your bug report is saying that the implementation is not spec compliant then please can you simplify your bug report and clearly state which part of the spec we are violating.
was (Author: mmusgrov):
I did not follow the issue description so instead I will explain how the spec works and how narayana implements it:
# Once a participant has successfully joined with the LRA (ie the coordinator responded with a 200 OK status code to the join request) it is guaranteed to receive a notification when the LRA is ended. The end notification can be the result of a timeout of the LRA or due to an explicit close of the LRA.
# In the case of a timeout the test must ensure that the LRA is not cancelled before the coordinator has had a chance to process its internal timers. If the test environment makes it difficult to guarantee that then the test needs to be modified so as to provide enough time for any reasonable implementation to cancel the LRA before the test attempts to close it.
My advice is to change the timing in the TCK test (rather than try to fix the implementation).
On the other hand, if your bug report is saying that the implementation is not spec compliant then please can you simplify your bug report and clearly state which part of the spec we are violating.
> LRA end should not return internal server error when precondition fails as it's considered behaviour by spec
> ------------------------------------------------------------------------------------------------------------
>
> Key: JBTM-3331
> URL: https://issues.redhat.com/browse/JBTM-3331
> Project: JBoss Transaction Manager
> Issue Type: Bug
> Components: LRA
> Affects Versions: 5.10.5.Final
> Reporter: Ondrej Chaloupka
> Assignee: Ondrej Chaloupka
> Priority: Critical
>
> I've spent some time with https://issues.redhat.com/browse/JBTM-3318 recently and I have a doubt about behaviour of the LRA. The JBTM-3318 consists of race condition on starting/enlisting/timeouting the LRA.
> The same issue makes failing the `TckTests#timeLimit` and `TckRecoveryTests#testCancelWhenParticipantIsUnavailable` on our slow AMS CI.
> I don't want to talk now about the TCK failure but about the related behaviour of the Narayana implementation.
> The LRA participant defines the {{timeLimit}} (https://github.com/eclipse/microprofile-lra/blob/1.0-M1/tck/src/main/java...).
> And what happens is that the client (TCK test) calls the LRA method, the JAX-RS filter starts a LRA on coordinator, meanwhile the timeout limit elapses, the JAX-RS filter tries to enlist the LRA participant to started LRA but it fails as the LRA was cancelled because of timeout.
> Now. The possible non-deterministic Narayana behaviour is that in case of failure on LRA participant enlistment the client may or may not get internal server error.
> It's because the LRA in timeouted state on client is tried to be cancelled (see https://github.com/jbosstm/narayana/blob/5.10.5.Final/rts/lra/lra-client/...).
> The {{NarayanaLRAClient#endLRA}} tries to cancel the LRA. But as the coordinator timeouted the LRA then now depends if recovery already removed the LRA or not. If the LRA was not removed yet then {{412, PRECONDITION FAILED}} is returned. If the recovery made it then {{404, NOT FOUDN}} is returned.
> Now the {{endLRA}} considers the {{404}} as not a failure that is considered as {{500, INTERNAL SERVER ERROR}} while the {{412, PRECONDITION FAILED}} is considered as internal server error. That should not be that way as the LRA spec considers {{412}} as "correct" error (see https://github.com/eclipse/microprofile-lra/blob/1.0-M1/api/src/main/java...).
> Such an anticipated return state should not be reported to client as {{500}} - {{internal server error}}.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 5 months
[JBoss JIRA] (JBTM-3331) LRA end should not return internal server error when precondition fails as it's considered behaviour by spec
by Michael Musgrove (Jira)
[ https://issues.redhat.com/browse/JBTM-3331?page=com.atlassian.jira.plugin... ]
Michael Musgrove commented on JBTM-3331:
----------------------------------------
I did not follow the issue description so instead I will explain how the spec works and how narayana implements it:
# Once a participant has successfully joined with the LRA (ie the coordinator responded with a 200 OK status code to the join request) it is guaranteed to receive a notification when the LRA is ended. The end notification can be the result of a timeout of the LRA or due to an explicit cancellation of the LRA.
# In the case of a timeout the test must ensure that the LRA is not cancelled before the coordinator has had a chance to process its internal timers. If the test environment makes it difficult to guarantee that then the test needs to be modified so as to provide enough time for any reasonable implementation to cancel the LRA before the test attempts to close it.
My advice is to change the timing in the TCK test (rather than try to fix the implementation).
On the other hand, if your bug report is saying that the implementation is not spec compliant then please can you simplify your bug report and clearly state which part of the spec we are violating.
> LRA end should not return internal server error when precondition fails as it's considered behaviour by spec
> ------------------------------------------------------------------------------------------------------------
>
> Key: JBTM-3331
> URL: https://issues.redhat.com/browse/JBTM-3331
> Project: JBoss Transaction Manager
> Issue Type: Bug
> Components: LRA
> Affects Versions: 5.10.5.Final
> Reporter: Ondrej Chaloupka
> Assignee: Ondrej Chaloupka
> Priority: Critical
>
> I've spent some time with https://issues.redhat.com/browse/JBTM-3318 recently and I have a doubt about behaviour of the LRA. The JBTM-3318 consists of race condition on starting/enlisting/timeouting the LRA.
> The same issue makes failing the `TckTests#timeLimit` and `TckRecoveryTests#testCancelWhenParticipantIsUnavailable` on our slow AMS CI.
> I don't want to talk now about the TCK failure but about the related behaviour of the Narayana implementation.
> The LRA participant defines the {{timeLimit}} (https://github.com/eclipse/microprofile-lra/blob/1.0-M1/tck/src/main/java...).
> And what happens is that the client (TCK test) calls the LRA method, the JAX-RS filter starts a LRA on coordinator, meanwhile the timeout limit elapses, the JAX-RS filter tries to enlist the LRA participant to started LRA but it fails as the LRA was cancelled because of timeout.
> Now. The possible non-deterministic Narayana behaviour is that in case of failure on LRA participant enlistment the client may or may not get internal server error.
> It's because the LRA in timeouted state on client is tried to be cancelled (see https://github.com/jbosstm/narayana/blob/5.10.5.Final/rts/lra/lra-client/...).
> The {{NarayanaLRAClient#endLRA}} tries to cancel the LRA. But as the coordinator timeouted the LRA then now depends if recovery already removed the LRA or not. If the LRA was not removed yet then {{412, PRECONDITION FAILED}} is returned. If the recovery made it then {{404, NOT FOUDN}} is returned.
> Now the {{endLRA}} considers the {{404}} as not a failure that is considered as {{500, INTERNAL SERVER ERROR}} while the {{412, PRECONDITION FAILED}} is considered as internal server error. That should not be that way as the LRA spec considers {{412}} as "correct" error (see https://github.com/eclipse/microprofile-lra/blob/1.0-M1/api/src/main/java...).
> Such an anticipated return state should not be reported to client as {{500}} - {{internal server error}}.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 5 months
[JBoss JIRA] (JBTM-3337) Narayana startup fails when empty system property is provided
by Ondrej Chaloupka (Jira)
[ https://issues.redhat.com/browse/JBTM-3337?page=com.atlassian.jira.plugin... ]
Ondrej Chaloupka updated JBTM-3337:
-----------------------------------
Git Pull Request: https://github.com/jbosstm/narayana/pull/1639, https://github.com/jbosstm/narayana/pull/1642 (was: https://github.com/jbosstm/narayana/pull/1639)
> Narayana startup fails when empty system property is provided
> -------------------------------------------------------------
>
> Key: JBTM-3337
> URL: https://issues.redhat.com/browse/JBTM-3337
> Project: JBoss Transaction Manager
> Issue Type: Bug
> Components: Common
> Affects Versions: 5.10.5.Final
> Reporter: Ondrej Chaloupka
> Assignee: Ondrej Chaloupka
> Priority: Minor
> Fix For: 5.next
>
>
> When Narayana starts with an empty property then it fails during startup.
> This is documented at the JBEAP issue https://issues.redhat.com/browse/JBEAP-19690.
> As the [~rchakrab] pointed at the issue the error came from https://github.com/jbosstm/narayana/blob/5.10.5.Final/common/classes/com/...
> The error which can be observed is
> {code}
> testGetWithEmptyPropertyWithStax(com.arjuna.common.tests.propertyservice.EmptyPropertiesFactoryTest) Time elapsed: 0.023 s <<< ERROR!
> java.lang.RuntimeException: unable to load properties from /home/ochaloup/Transactions/narayana/common/target/test-classes/properties-factory-test.xml
> at com.arjuna.common.util.propertyservice.AbstractPropertiesFactory.getPropertiesFromFile(AbstractPropertiesFactory.java:107)
> at com.arjuna.common.util.propertyservice.AbstractPropertiesFactory.initDefaultProperties(AbstractPropertiesFactory.java:196)
> at com.arjuna.common.util.propertyservice.AbstractPropertiesFactory.getDefaultProperties(AbstractPropertiesFactory.java:62)
> at com.arjuna.common.tests.propertyservice.EmptyPropertiesFactoryTest.testGetWithEmptyPropertyWithStax(EmptyPropertiesFactoryTest.java:57)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
> at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
> at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272)
> at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:236)
> at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:386)
> at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:323)
> at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:143)
> Caused by: java.lang.IllegalArgumentException: key can't be empty
> at java.lang.System.checkKey(System.java:843)
> at java.lang.System.getProperty(System.java:716)
> at com.arjuna.common.util.propertyservice.AbstractPropertiesFactory.applySystemProperties(AbstractPropertiesFactory.java:125)
> at com.arjuna.common.util.propertyservice.AbstractPropertiesFactory.getPropertiesFromFile(AbstractPropertiesFactory.java:104)
> ... 28 more
> {code}
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 5 months
[JBoss JIRA] (JBTM-3339) ArjunaJTS/interop/glassfish hangs on CI
by Michael Musgrove (Jira)
[ https://issues.redhat.com/browse/JBTM-3339?page=com.atlassian.jira.plugin... ]
Michael Musgrove commented on JBTM-3339:
----------------------------------------
It is worth noting that the server never actually ran the ejb test:
> Failed to connect to the controller: Timeout waiting for the system to boot.
and the log store was uninitialised and the transaction subsystem did not seem to be initialised yet since the CLI showed "jts=false". This means the test didn't actually start since the steps are:
1. build wilfly and glassfish from source
2. start glassfish
3. start wildfly and run a cli script to configure it into JTS mode.
... etc
When I looked the wfly server had come up but hadn't made progress since the configure script had not set jts mode.
> ArjunaJTS/interop/glassfish hangs on CI
> ---------------------------------------
>
> Key: JBTM-3339
> URL: https://issues.redhat.com/browse/JBTM-3339
> Project: JBoss Transaction Manager
> Issue Type: Bug
> Components: Demonstrator
> Affects Versions: 5.10.4.Final
> Reporter: Michael Musgrove
> Priority: Minor
> Fix For: 6.later
>
> Attachments: stacktraces
>
>
> The [WildFly to GlassFish interop quickstart|[https://github.com/jbosstm/quickstart/tree/master/ArjunaJTS/i...] hung on CI. I am attaching the stackdumps but it doesn't look like Wilfly started fully so is probably an transient error. Since the quickstart is a demonstrator I am marking the priority as low. If it happens again we might want to increase its priority.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 5 months