[jboss-jira] [JBoss JIRA] (WFLY-10075) [Artemis 2.x upgrade] Stuck messages in artemis.internal.sf.my-cluster... queue after restarting nodes in cluster

Thu Mar 22 09:30:00 EDT 2018

     [ https://issues.jboss.org/browse/WFLY-10075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Miroslav Novak updated WFLY-10075:
----------------------------------
    Steps to Reproduce: 
Steps to reproduce:
{code}
git clone git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git
cd eap-tests-hornetq/scripts/
git checkout f3de9baf1f8b39b810bf3d55d45f4341e3b2aa87

groovy -DEAP_ZIP_URL=https://eap-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/early-testing-messaging-prepare/258/artifact/jboss-eap.zip PrepareServers7.groovy
export WORKSPACE=$PWD
export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap
export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap
export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap
export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap
cd ../jboss-hornetq-testsuite/
mvn clean test -Dtest=ClusterTestCase#testStopStartCluster -Deap7.org.jboss.qa.hornetq.apps.clients.version=7.1521544306-SNAPSHOT -DfailIfNoTests=false  | tee log
{code}


> [Artemis 2.x upgrade] Stuck messages in artemis.internal.sf.my-cluster... queue after restarting nodes in cluster
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: WFLY-10075
>                 URL: https://issues.jboss.org/browse/WFLY-10075
>             Project: WildFly
>          Issue Type: Bug
>          Components: JMS
>            Reporter: Miroslav Novak
>            Assignee: Jeff Mesnil
>            Priority: Blocker
>         Attachments: journal-node-2.txt
>
>
> There are lost messages in scenario where nodes in cluster are cleanly stopped and started again. This issue was hit with Artemis 2.5.0.Final and WF Jeff's integration branch WFLY-9407_upgrade_artemis_2.4.0_with_prefix.
> Test Scenario:
> * start two servers in cluster (JGroups used for discovery)
> * send messages to testQueue0 on node-1 and node-2
> * wait until consumers on both nodes receive 300 messages
> * cleanly shut down 1st and then 2nd server
> * leave servers shut down for one minute
> * start both servers
> * wait until both consumers receive 500 messages
> * stop sending messages and receive all remaining messages
> Pass Criteria: All send messages are received by consumer
> Actual Result: There are lost messages.
> Investigation:
> There are lost messages which were sent to 2nd node. However they got stuck in queue {{.artemis.internal.sf.my-cluster.8a7e9e98-2c36-11e8-9737-fa163ea20b26}} during load balancing to 1st server.
> I'm attaching trace logs from client and servers and content of journal from 2nd server.
> This is regression against Artemis 1.5.5 thus setting blocker priority.


--
This message was sent by Atlassian JIRA
(v7.5.0#75005)