[jboss-jira] [JBoss JIRA] (WFWIP-27) [Artemis 2.x Upgrade] Failover does not work with HTTP conncetors/acceptors

Jeff Mesnil (JIRA) issues at jboss.org
Fri Jul 13 05:08:02 EDT 2018


    [ https://issues.jboss.org/browse/WFWIP-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604863#comment-13604863 ] 

Jeff Mesnil commented on WFWIP-27:
----------------------------------

Thanks Brian for the analysis.

It is expected that activateCallbacks remain valid for the whole lifetime of the Artemis server (including across multiple start/stop) as WildFly uses them to be notified when Artemis is activated/deactivated in order to update WildFly resources and services state accordingly.

This regression is caused by https://issues.apache.org/jira/browse/ARTEMIS-1704 which clears the activateCallbacks whenever Artemis server is stopped/failed.
I believe the intent of ARTEMIS-1704 was to clean up the callbacks resources when the server is *shutdown* (and not when it is stopped).

The fix should be as simple as wrapping the call to activateCallbacks.clear() at https://github.com/apache/activemq-artemis/blob/master/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/ActiveMQServerImpl.java#L1112 with a check against isShutdown:

{code}
// activateCallbacks remain registered across start/stop.
// they are only cleared when the server is effectively shutdown
if (isShutdown) {
  activateCallbacks.clear();
}
{code}

[~martyn-taylor] do you agree with that analysis?

> [Artemis 2.x Upgrade] Failover does not work with HTTP conncetors/acceptors
> ---------------------------------------------------------------------------
>
>                 Key: WFWIP-27
>                 URL: https://issues.jboss.org/browse/WFWIP-27
>             Project: WildFly WIP
>          Issue Type: Bug
>          Components: Artemis
>            Reporter: Erich Duda
>            Assignee: Jeff Mesnil
>            Priority: Blocker
>              Labels: activemq, feature-branch-blocker
>         Attachments: clients.log, server1.log, server2.log, standalone-full-ha.xml, standalone-full-ha.xml, test.log
>
>
> Failover does not work with HTTP connectors/acceptors.
> *Scenario:*
> # There are two Wildfly servers configured as Live-Backup pair
> # There is one JMS producer and one JMS receiver which sends/receives messages
> # Live server is several times killed and restarted.
> *Expectation:* Always when the Live server is killed or restarted, clients do failover or failback.
> *Reality*: Sometimes happens that clients don't do failover.
> *Users impact:* One of basics feature of HA, failover, does not work properly.
> *Blocker* priority was set because this is regression against previous releases of Wildfly.
> *Detail description of issue*
> In the trace logs it can be seen that clients send HTTP handshake request to active backup, but the handshake fails. All checks (and logs) say that the backup is active in this time period. I tried to run the test with Netty connectors/acceptors and I didn't see this issue.
> {code:title=Backup log}
> 07:52:47,856 WARN  [org.apache.activemq.artemis.core.server] (Thread-2 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6 at 1f01f347)) AMQ222029: Could not locate page transaction 3 995, ignoring message on position PagePositionImpl [pageNr=2, messageNr=55, recordID=-1] on address=jms.queue.testQueue queue=jms.queue.testQueue
> 07:52:47,856 WARN  [org.apache.activemq.artemis.core.server] (Thread-2 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6 at 1f01f347)) AMQ222029: Could not locate page transaction 3 995, ignoring message on position PagePositionImpl [pageNr=2, messageNr=56, recordID=-1] on address=jms.queue.testQueue queue=jms.queue.testQueue
> 07:52:47,863 INFO  [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) AMQ221005: Deleting pending large message as it was not completed: Pair[a=2147485068, b=2147485067]
> 07:52:47,863 INFO  [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) AMQ221005: Deleting pending large message as it was not completed: Pair[a=2147485076, b=2147485075]
> 07:52:47,864 INFO  [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) AMQ221005: Deleting pending large message as it was not completed: Pair[a=2147484450, b=2147484449]
> 07:52:47,864 INFO  [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) AMQ221005: Deleting pending large message as it was not completed: Pair[a=2147485072, b=2147485071]
> 07:52:47,864 INFO  [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) AMQ221005: Deleting pending large message as it was not completed: Pair[a=2147484454, b=2147484453]
> 07:52:47,866 WARN  [org.apache.activemq.artemis.core.client] (activemq-discovery-group-thread-dg-group1) AMQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=0dee81c9-2d9d-11e8-ba3f-cc3d825f79a4
> 07:52:47,867 INFO  [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) AMQ221007: Server is now live
> {code}
> {code:title=Clients log}
> 07:52:49,274 Thread-80 (ActiveMQ-client-global-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector:726] Remote destination: localhost/127.0.0.1:9080
> 07:52:49,274 Thread-79 (ActiveMQ-client-global-threads) DEBUG [org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl:1040] Connector towards NettyConnector [host=localhost, port=8080, httpEnabled=false, httpUpgradeEnabled=true, useServlet=false, servletPath=/messaging/ActiveMQServlet, sslEnabled=false, useNio=true, activemqServerName=default, httpUpgradeEndpoint=acceptor] failed
> 07:52:49,275 Thread-79 (ActiveMQ-client-global-threads) DEBUG [org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl:1081] Trying backup config = TransportConfiguration(name=connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=9080&host=localhost
> 07:52:49,275 Thread-79 (ActiveMQ-client-global-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector:447] Connector NettyConnector [host=localhost, port=9080, httpEnabled=false, httpUpgradeEnabled=true, useServlet=false, servletPath=/messaging/ActiveMQServlet, sslEnabled=false, useNio=true, activemqServerName=default, httpUpgradeEndpoint=acceptor] using native epoll
> 07:52:49,275 Thread-80 (ActiveMQ-client-global-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector:797] Sending HTTP request DefaultHttpRequest(decodeResult: success, version: HTTP/1.1)
> GET  HTTP/1.1
> host: localhost
> upgrade: activemq-remoting
> connection: upgrade
> activemqServerName: default
> httpUpgradeEndpoint: acceptor
> Sec-ActiveMQRemoting-Key: QWV3bCfgh75NjWH3pZV5Ew==
> 07:52:49,275 Thread-79 (ActiveMQ-client-global-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector:601] AMQ211002: Started EPOLL Netty Connector version 4.1.16.Final to localhost:9080
> 07:52:49,275 Thread-79 (ActiveMQ-client-global-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector:726] Remote destination: localhost/127.0.0.1:9080
> 07:52:49,276 Thread-79 (ActiveMQ-client-global-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector:797] Sending HTTP request DefaultHttpRequest(decodeResult: success, version: HTTP/1.1)
> GET  HTTP/1.1
> host: localhost
> upgrade: activemq-remoting
> connection: upgrade
> activemqServerName: default
> httpUpgradeEndpoint: acceptor
> Sec-ActiveMQRemoting-Key: P9xBwRk1eZP5QjDWjqYuIg==
> 07:52:49,276 Thread-2 (ActiveMQ-client-netty-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector$HttpUpgradeHandler:876] Received msg=DefaultHttpResponse(decodeResult: success, version: HTTP/1.1)
> HTTP/1.1 200 OK
> Connection: keep-alive
> Last-Modified: Thu, 22 Mar 2018 06:47:03 GMT
> Content-Length: 2426
> Content-Type: text/html
> Accept-Ranges: bytes
> Date: Thu, 22 Mar 2018 06:52:49 GMT
> 07:52:49,276 Thread-2 (ActiveMQ-client-netty-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector$HttpUpgradeHandler:903] AMQ214023: HTTP Handshake failed, received DefaultHttpResponse(decodeResult: success, version: HTTP/1.1)
> HTTP/1.1 200 OK
> Connection: keep-alive
> Last-Modified: Thu, 22 Mar 2018 06:47:03 GMT
> Content-Length: 2426
> Content-Type: text/html
> Accept-Ranges: bytes
> Date: Thu, 22 Mar 2018 06:52:49 GMT
> 07:52:49,276 Thread-2 (ActiveMQ-client-netty-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector$HttpUpgradeHandler:876] Received msg=DefaultHttpContent(data: PooledSlicedByteBuf(ridx: 0, widx: 829, cap: 829/829, unwrapped: PooledUnsafeDirectByteBuf(ridx: 1024, widx: 1024, cap: 1024)), decoderResult: success)
> 07:52:49,276 Thread-2 (ActiveMQ-client-netty-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector$HttpUpgradeHandler:903] AMQ214023: HTTP Handshake failed, received DefaultHttpContent(data: PooledSlicedByteBuf(ridx: 0, widx: 829, cap: 829/829, unwrapped: PooledUnsafeDirectByteBuf(ridx: 1024, widx: 1024, cap: 1024)), decoderResult: success)
> 07:52:49,277 Thread-80 (ActiveMQ-client-global-threads) DEBUG [org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl:1040] Connector towards NettyConnector [host=localhost, port=9080, httpEnabled=false, httpUpgradeEnabled=true, useServlet=false, servletPath=/messaging/ActiveMQServlet, sslEnabled=false, useNio=true, activemqServerName=default, httpUpgradeEndpoint=acceptor] failed
> {code}



--
This message was sent by Atlassian JIRA
(v7.5.0#75005)



More information about the jboss-jira mailing list