[jboss-jira] [JBoss JIRA] (WFWIP-23) [Artemis 2.x upgrade] Stuck messages in artemis.internal.sf.my-cluster... queue after restarting nodes in cluster

Tue Jul 3 11:22:00 EDT 2018

    [ https://issues.jboss.org/browse/WFWIP-23?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13599409#comment-13599409 ] 

Michal Toth edited comment on WFWIP-23 at 7/3/18 11:21 AM:
-----------------------------------------------------------

I am checking this one using EAP QE test suite again. Make it running 20 times.
My further tests shown error in our QE Core client, needing to handle Exception during failover. 
This has been resolved.
Used {{https://eap-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/early-testing-messaging-prepare/528/artifact/jboss-eap.zip}} and our build with latest master {{https://mw-messaging-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/early-testing-messaging-prepare/121/}}.
{noformat}
$  ./repeat_test_until_fail.sh ClusterTestCase#testStopStartCluster
...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] jboss-hornetq-testsuite ............................ SUCCESS [ 12.403 s]
[INFO] tooling ............................................ SUCCESS [  8.128 s]
[INFO] apps-jms11-jdk6 .................................... SUCCESS [  1.568 s]
[INFO] tooling-eap7 ....................................... SUCCESS [  3.895 s]
[INFO] common-tests ....................................... SUCCESS [15:20 min]
[INFO] apps-jms2 .......................................... SUCCESS [  1.498 s]
[INFO] tests-eap7 ......................................... SUCCESS [  7.706 s]
[INFO] apps-jms11-jdk8 .................................... SUCCESS [  0.062 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 15:55 min
[INFO] Finished at: 2018-07-02T16:39:28+02:00
[INFO] Final Memory: 72M/1146M
[INFO] ------------------------------------------------------------------------
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 317.251 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 338.664 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 339.027 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 348.038 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 283.612 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 308.667 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 371.996 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 278.644 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 454.23 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 320.277 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 341.886 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 312.079 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 291.723 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 444.321 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1,718.165 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1,001.854 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 955.398 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 928.888 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 913.361 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 910.408 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
{noformat}


was (Author: mtoth):
I am checking this one using EAP QE test suite again. Make it running 20 times.
My further tests shown error in our QE Core client, needing to handle Exception during failover. 
This has been resolved.
Used {{https://eap-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/early-testing-messaging-prepare/528/artifact/jboss-eap.zip}} and our build with latest master {[https://mw-messaging-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/early-testing-messaging-prepare/121/}}.
{noformat}
$  ./repeat_test_until_fail.sh ClusterTestCase#testStopStartCluster
...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] jboss-hornetq-testsuite ............................ SUCCESS [ 12.403 s]
[INFO] tooling ............................................ SUCCESS [  8.128 s]
[INFO] apps-jms11-jdk6 .................................... SUCCESS [  1.568 s]
[INFO] tooling-eap7 ....................................... SUCCESS [  3.895 s]
[INFO] common-tests ....................................... SUCCESS [15:20 min]
[INFO] apps-jms2 .......................................... SUCCESS [  1.498 s]
[INFO] tests-eap7 ......................................... SUCCESS [  7.706 s]
[INFO] apps-jms11-jdk8 .................................... SUCCESS [  0.062 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 15:55 min
[INFO] Finished at: 2018-07-02T16:39:28+02:00
[INFO] Final Memory: 72M/1146M
[INFO] ------------------------------------------------------------------------
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 317.251 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 338.664 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 339.027 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 348.038 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 283.612 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 308.667 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 371.996 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 278.644 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 454.23 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 320.277 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 341.886 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 312.079 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 291.723 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 444.321 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1,718.165 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1,001.854 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 955.398 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 928.888 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 913.361 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 910.408 sec - in org.jboss.qa.hornetq.test.cluster.ClusterTestCase 
{noformat}

> [Artemis 2.x upgrade] Stuck messages in artemis.internal.sf.my-cluster... queue after restarting nodes in cluster
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: WFWIP-23
>                 URL: https://issues.jboss.org/browse/WFWIP-23
>             Project: WildFly WIP
>          Issue Type: Bug
>          Components: Artemis
>            Reporter: Miroslav Novak
>            Assignee: Jiri Ondrusek
>            Priority: Blocker
>              Labels: activemq, feature-branch-blocker
>         Attachments: journal-node-2.txt
>
>
> There are lost messages in scenario where nodes in cluster are cleanly stopped and started again. This issue was hit with Artemis 2.5.0.Final and WF Jeff's integration branch WFLY-9407_upgrade_artemis_2.4.0_with_prefix.
> Test Scenario:
> * start two servers in cluster (JGroups used for discovery)
> * send messages to testQueue0 on node-1 and node-2
> * wait until consumers on both nodes receive 300 messages
> * cleanly shut down 1st and then 2nd server
> * leave servers shut down for one minute
> * start both servers
> * wait until both consumers receive 500 messages
> * stop sending messages and receive all remaining messages
> Pass Criteria: All send messages are received by consumer
> Actual Result: There are lost messages.
> Investigation:
> There are lost messages which were sent to 2nd node. However they got stuck in queue {{.artemis.internal.sf.my-cluster.8a7e9e98-2c36-11e8-9737-fa163ea20b26}} during load balancing to 1st server.
> I'm attaching trace logs from client and servers and content of journal from 2nd server.
> This is regression against Artemis 1.5.5 thus setting blocker priority.


--
This message was sent by Atlassian JIRA
(v7.5.0#75005)