[jboss-jira] [JBoss JIRA] (WFLY-11605) Slave Artemis in Wildfly 10.1.0.Final is starting without Master Shutdown - Split brain

Fri Jan 25 10:50:02 EST 2019

    [ https://issues.jboss.org/browse/WFLY-11605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13687711#comment-13687711 ] 

ehsavoie Hugonnet commented on WFLY-11605:
------------------------------------------

Looks like the connection between your 2 servers is not reliable given the number of connection issues.
So this could effectively lead to a split brain.
PS: WildFly 10 is almost 3 years old and not maintained, you really should proceed to an up to date version.

> Slave Artemis in Wildfly 10.1.0.Final is starting without Master Shutdown - Split brain
> ---------------------------------------------------------------------------------------
>
>                 Key: WFLY-11605
>                 URL: https://issues.jboss.org/browse/WFLY-11605
>             Project: WildFly
>          Issue Type: Bug
>          Components: Clustering, JMS
>    Affects Versions: 10.1.0.Final
>            Reporter: Srinivas ev
>            Assignee: Jeff Mesnil
>            Priority: Blocker
>              Labels: https://developer.jboss.org/thread/279503
>         Attachments: Master and Slave logs.txt, Project component log.txt, Wildfly error abrupt failback.txt
>
>
> Scenario - Master, and Slave Running on 2 VM's.
> Master Wildfly - <replication-master cluster-name="my-cluster" group-name="srini-queues-cluster-group" check-for-live-server="true"/>
> Slave Wildfly -  <replication-slave cluster-name="my-cluster" group-name="srini-queues-cluster-group" allow-failback="false" max-saved-replicated-journal-size="1"/>
> PFB below 2 errors.
> At this time Slave will start deploying Queues even when Master wildfly is not stopped completely. Both management consoles can be reached.
> Ideally, the Master server(Wildfly) should completely stop for Slave to start the deployment of queues and start the replication.
> 2019/01/16 19:56:21,118+05:30 ERROR [org.apache.activemq.artemis.core.server] (default I/O-2) AMQ224044: error acknowledging message: java.lang.NullPointer
> Exception
>         at org.apache.activemq.artemis.core.server.impl.ServerConsumerImpl.removeReferenceByID(ServerConsumerImpl.java:811)
> 2019/01/16 19:56:21,121+05:30 ERROR [org.apache.activemq.artemis.core.server] (default I/O-2) AMQ224016: Caught exception: ActiveMQIllegalStateException[er
> rorType=ILLEGAL_STATE message=null]
>         at org.apache.activemq.artemis.core.server.impl.ServerConsumerImpl.individualAcknowledge(ServerConsumerImpl.java:772)
>         at org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.individualAcknowledge(ServerSessionImpl.java:739)
> Detailed logs are attached.
> At this stage, 1st component in my project pipeline has identified the slave as master now and stick with sending messages to it. Whereas the other components in the pipeline still consuming from Master(based on consumer count).
> This caused the break-in message flow.

--
This message was sent by Atlassian Jira
(v7.12.1#712002)