]
Jeff Mesnil resolved WFLY-5477.
-------------------------------
Fix Version/s: 10.0.0.CR5
Resolution: Done
Underlying Artemis issue has been fixed in 1.1.0.wildfly-008
Failback fails with ActiveMQIllegalStateException during
synchronization with live server
-----------------------------------------------------------------------------------------
Key: WFLY-5477
URL:
https://issues.jboss.org/browse/WFLY-5477
Project: WildFly
Issue Type: Bug
Components: JMS
Affects Versions: 10.0.0.CR2
Reporter: Miroslav Novak
Assignee: Clebert Suconic
Priority: Blocker
Fix For: 10.0.0.CR5
Attachments: backup-logs.zip, live-logs.zip, mdb-server-logs.zip,
standalone-full-ha-backup.xml, standalone-full-ha-live.xml, standalone-full-ha-mdb.xml
Sometimes happens that synchronization between live and backup fails during failback with
exception ActiveMQIllegalStateException. It causes that live does not activate and backup
stops so none of the servers is active to serve clients.
Test scenario:
1. Start 2 EAP 7.0.0.DR11 servers with Artemis configured in dedicated topology with
replicated journal
-- 1st EAP server has Artemis configured as live, 2nd EAP server has Artemis configured
as backup
-- queues InQueue and OutQueue are deployed
2. Send 2000 messages to InQueue to 1st server (live)
3. Start 3rd EAP 7.0.0.DR11 server with MDB consuming from remote InQueue and sending to
remote OutQueue in XA transaction
-- resource adapter is configured for failover
4. Kill live server when MDB is processing messages
5. Wait for backup to activate and failover to happen
6. Start live server again and wait for failback
In step 6. sometimes happens that synchronization between live and backup fails during
failback with exception:
{code}
10:05:13,493 ERROR [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for
server ActiveMQServerImpl::serverUUID=null) AMQ224000: Failure in initialisation:
ActiveMQIllegalStateException[errorType=I
LLEGAL_STATE message=AMQ119026: Backup Server was not yet in sync with live]
at
org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:232)
[artemis-server-1.1.0.jar:1.1.0]
at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_60]
{code}
and live server never activates. Also backup server stops with:
{code}
10:05:17,846 INFO [org.apache.activemq.artemis.core.server] (Thread-108) AMQ221002:
Apache ActiveMQ Artemis Message Broker version 1.1.0
[706b0cb8-6b69-11e5-904d-fd646d33ece8] stopped
10:05:17,846 INFO [org.apache.activemq.artemis.core.server] (Thread-108) AMQ221039:
Restarting as Replicating backup server after live restart
{code}
so live/backup pair is dead and server with MDB looses connection.
Attaching logs from servers and configurations.