[jboss-jira] [JBoss JIRA] (WFWIP-27) [Artemis 2.x Upgrade] Failover does not work with HTTP conncetors/acceptors

Brian Stansberry (JIRA) issues at jboss.org
Thu Jul 12 11:41:00 EDT 2018


    [ https://issues.jboss.org/browse/WFWIP-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604544#comment-13604544 ] 

Brian Stansberry commented on WFWIP-27:
---------------------------------------

FYI on some work I did and let Martyn know about last night:

I was able to do some analysis on this by adding some logging to the WF code that deals with activated/deactivated callbacks from the ActiveMQServer. Responding to those callbacks is what results in HTTPUpgradeService being started and stopped.

When it fails I'm seeing this in the logs, subsequent to activity where WF has gotten a deactivated callback and thus has stopped HTTPUpgradeService:

{code}
21:02:03,944 INFO  [org.apache.activemq.artemis.core.server] (Thread-151) AMQ221039: Restarting as Replicating backup server after live restart
{code}

Presence of that message tells me SharedNothingLiveActivation is being called, invoking this code:

{code}
                        //if we have to many backups kept or are not configured to restart just stop, otherwise restart as a backup
                        activeMQServer.fail(true);
                        ActiveMQServerLogger.LOGGER.restartingReplicatedBackupAfterFailback();
                        //                        activeMQServer.moveServerData(replicatedPolicy.getReplicaPolicy().getMaxSavedReplicatedJournalsSize());
                        activeMQServer.setHAPolicy(replicatedPolicy.getReplicaPolicy());
                        activeMQServer.start();
{code}

It was the activeMQServer.fail(true) call that triggered the preceding deactivation and stop of HTTPUpgradeService. The problem is that call also results in ActiveMQServerImpl.activateCallbacks being cleared. As a result the ActivateCallback impl that WF has registered is gone and will never be invoked again. So WF does not get notified when the ActiveMQServer activates again and does not start the HTTPUpgradeService.

This clearing of ActiveMQServerImpl.activateCallbacks is not done in Artemis 1.5.5 so we don't see the problem there.

I won't comment re: possible solutions as that is better left to Martyn/Jeff/other experts looking at this.

> [Artemis 2.x Upgrade] Failover does not work with HTTP conncetors/acceptors
> ---------------------------------------------------------------------------
>
>                 Key: WFWIP-27
>                 URL: https://issues.jboss.org/browse/WFWIP-27
>             Project: WildFly WIP
>          Issue Type: Bug
>          Components: Artemis
>            Reporter: Erich Duda
>            Assignee: Jeff Mesnil
>            Priority: Blocker
>              Labels: activemq, feature-branch-blocker
>         Attachments: clients.log, server1.log, server2.log, standalone-full-ha.xml, standalone-full-ha.xml, test.log
>
>
> Failover does not work with HTTP connectors/acceptors.
> *Scenario:*
> # There are two Wildfly servers configured as Live-Backup pair
> # There is one JMS producer and one JMS receiver which sends/receives messages
> # Live server is several times killed and restarted.
> *Expectation:* Always when the Live server is killed or restarted, clients do failover or failback.
> *Reality*: Sometimes happens that clients don't do failover.
> *Users impact:* One of basics feature of HA, failover, does not work properly.
> *Blocker* priority was set because this is regression against previous releases of Wildfly.
> *Detail description of issue*
> In the trace logs it can be seen that clients send HTTP handshake request to active backup, but the handshake fails. All checks (and logs) say that the backup is active in this time period. I tried to run the test with Netty connectors/acceptors and I didn't see this issue.
> {code:title=Backup log}
> 07:52:47,856 WARN  [org.apache.activemq.artemis.core.server] (Thread-2 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6 at 1f01f347)) AMQ222029: Could not locate page transaction 3 995, ignoring message on position PagePositionImpl [pageNr=2, messageNr=55, recordID=-1] on address=jms.queue.testQueue queue=jms.queue.testQueue
> 07:52:47,856 WARN  [org.apache.activemq.artemis.core.server] (Thread-2 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6 at 1f01f347)) AMQ222029: Could not locate page transaction 3 995, ignoring message on position PagePositionImpl [pageNr=2, messageNr=56, recordID=-1] on address=jms.queue.testQueue queue=jms.queue.testQueue
> 07:52:47,863 INFO  [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) AMQ221005: Deleting pending large message as it was not completed: Pair[a=2147485068, b=2147485067]
> 07:52:47,863 INFO  [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) AMQ221005: Deleting pending large message as it was not completed: Pair[a=2147485076, b=2147485075]
> 07:52:47,864 INFO  [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) AMQ221005: Deleting pending large message as it was not completed: Pair[a=2147484450, b=2147484449]
> 07:52:47,864 INFO  [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) AMQ221005: Deleting pending large message as it was not completed: Pair[a=2147485072, b=2147485071]
> 07:52:47,864 INFO  [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) AMQ221005: Deleting pending large message as it was not completed: Pair[a=2147484454, b=2147484453]
> 07:52:47,866 WARN  [org.apache.activemq.artemis.core.client] (activemq-discovery-group-thread-dg-group1) AMQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=0dee81c9-2d9d-11e8-ba3f-cc3d825f79a4
> 07:52:47,867 INFO  [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) AMQ221007: Server is now live
> {code}
> {code:title=Clients log}
> 07:52:49,274 Thread-80 (ActiveMQ-client-global-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector:726] Remote destination: localhost/127.0.0.1:9080
> 07:52:49,274 Thread-79 (ActiveMQ-client-global-threads) DEBUG [org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl:1040] Connector towards NettyConnector [host=localhost, port=8080, httpEnabled=false, httpUpgradeEnabled=true, useServlet=false, servletPath=/messaging/ActiveMQServlet, sslEnabled=false, useNio=true, activemqServerName=default, httpUpgradeEndpoint=acceptor] failed
> 07:52:49,275 Thread-79 (ActiveMQ-client-global-threads) DEBUG [org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl:1081] Trying backup config = TransportConfiguration(name=connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=9080&host=localhost
> 07:52:49,275 Thread-79 (ActiveMQ-client-global-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector:447] Connector NettyConnector [host=localhost, port=9080, httpEnabled=false, httpUpgradeEnabled=true, useServlet=false, servletPath=/messaging/ActiveMQServlet, sslEnabled=false, useNio=true, activemqServerName=default, httpUpgradeEndpoint=acceptor] using native epoll
> 07:52:49,275 Thread-80 (ActiveMQ-client-global-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector:797] Sending HTTP request DefaultHttpRequest(decodeResult: success, version: HTTP/1.1)
> GET  HTTP/1.1
> host: localhost
> upgrade: activemq-remoting
> connection: upgrade
> activemqServerName: default
> httpUpgradeEndpoint: acceptor
> Sec-ActiveMQRemoting-Key: QWV3bCfgh75NjWH3pZV5Ew==
> 07:52:49,275 Thread-79 (ActiveMQ-client-global-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector:601] AMQ211002: Started EPOLL Netty Connector version 4.1.16.Final to localhost:9080
> 07:52:49,275 Thread-79 (ActiveMQ-client-global-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector:726] Remote destination: localhost/127.0.0.1:9080
> 07:52:49,276 Thread-79 (ActiveMQ-client-global-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector:797] Sending HTTP request DefaultHttpRequest(decodeResult: success, version: HTTP/1.1)
> GET  HTTP/1.1
> host: localhost
> upgrade: activemq-remoting
> connection: upgrade
> activemqServerName: default
> httpUpgradeEndpoint: acceptor
> Sec-ActiveMQRemoting-Key: P9xBwRk1eZP5QjDWjqYuIg==
> 07:52:49,276 Thread-2 (ActiveMQ-client-netty-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector$HttpUpgradeHandler:876] Received msg=DefaultHttpResponse(decodeResult: success, version: HTTP/1.1)
> HTTP/1.1 200 OK
> Connection: keep-alive
> Last-Modified: Thu, 22 Mar 2018 06:47:03 GMT
> Content-Length: 2426
> Content-Type: text/html
> Accept-Ranges: bytes
> Date: Thu, 22 Mar 2018 06:52:49 GMT
> 07:52:49,276 Thread-2 (ActiveMQ-client-netty-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector$HttpUpgradeHandler:903] AMQ214023: HTTP Handshake failed, received DefaultHttpResponse(decodeResult: success, version: HTTP/1.1)
> HTTP/1.1 200 OK
> Connection: keep-alive
> Last-Modified: Thu, 22 Mar 2018 06:47:03 GMT
> Content-Length: 2426
> Content-Type: text/html
> Accept-Ranges: bytes
> Date: Thu, 22 Mar 2018 06:52:49 GMT
> 07:52:49,276 Thread-2 (ActiveMQ-client-netty-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector$HttpUpgradeHandler:876] Received msg=DefaultHttpContent(data: PooledSlicedByteBuf(ridx: 0, widx: 829, cap: 829/829, unwrapped: PooledUnsafeDirectByteBuf(ridx: 1024, widx: 1024, cap: 1024)), decoderResult: success)
> 07:52:49,276 Thread-2 (ActiveMQ-client-netty-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector$HttpUpgradeHandler:903] AMQ214023: HTTP Handshake failed, received DefaultHttpContent(data: PooledSlicedByteBuf(ridx: 0, widx: 829, cap: 829/829, unwrapped: PooledUnsafeDirectByteBuf(ridx: 1024, widx: 1024, cap: 1024)), decoderResult: success)
> 07:52:49,277 Thread-80 (ActiveMQ-client-global-threads) DEBUG [org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl:1040] Connector towards NettyConnector [host=localhost, port=9080, httpEnabled=false, httpUpgradeEnabled=true, useServlet=false, servletPath=/messaging/ActiveMQServlet, sslEnabled=false, useNio=true, activemqServerName=default, httpUpgradeEndpoint=acceptor] failed
> {code}



--
This message was sent by Atlassian JIRA
(v7.5.0#75005)



More information about the jboss-jira mailing list