]
Jeff Mesnil commented on WFLY-10067:
------------------------------------
[~eduda] I am not sure that the handshake failure are the cause of the issue.
The handshake fails because the server is responding with a "200 OK"
(corresponding to the home page) instead of "101 Switching protocols".
This might be normal for example when the live server is booting. However at some point,
the server should be ready and returns 101 instead of 200.
I am still processing the logs to understand the issue.
Note that the code related to the HTTP upgrade has not changed in the integration layer
for the integration of Artemis 2.x.
If there is a regression, it might be in the Artemis code or in Netty (that has been
upgraded from 4.1.9.Final to 4.1.22.Final)
[Artemis 2.x Upgrade] Failover does not work with HTTP
conncetors/acceptors
---------------------------------------------------------------------------
Key: WFLY-10067
URL:
https://issues.jboss.org/browse/WFLY-10067
Project: WildFly
Issue Type: Bug
Components: JMS
Affects Versions: 13.0.0.Beta1
Reporter: Erich Duda
Assignee: Jeff Mesnil
Priority: Blocker
Labels: feature-branch-blocker
Attachments: clients.log, server1.log, server2.log, test.log
Failover does not work with HTTP connectors/acceptors.
*Scenario:*
# There are two Wildfly servers configured as Live-Backup pair
# There is one JMS producer and one JMS receiver which sends/receives messages
# Live server is several times killed and restarted.
*Expectation:* Always when the Live server is killed or restarted, clients do failover or
failback.
*Reality*: Sometimes happens that clients don't do failover.
*Users impact:* One of basics feature of HA, failover, does not work properly.
*Blocker* priority was set because this is regression against previous releases of
Wildfly.
*Detail description of issue*
In the trace logs it can be seen that clients send HTTP handshake request to active
backup, but the handshake fails. All checks (and logs) say that the backup is active in
this time period. I tried to run the test with Netty connectors/acceptors and I didn't
see this issue.
{code:title=Backup log}
07:52:47,856 WARN [org.apache.activemq.artemis.core.server] (Thread-2
(ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f01f347))
AMQ222029: Could not locate page transaction 3 995, ignoring message on position
PagePositionImpl [pageNr=2, messageNr=55, recordID=-1] on address=jms.queue.testQueue
queue=jms.queue.testQueue
07:52:47,856 WARN [org.apache.activemq.artemis.core.server] (Thread-2
(ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f01f347))
AMQ222029: Could not locate page transaction 3 995, ignoring message on position
PagePositionImpl [pageNr=2, messageNr=56, recordID=-1] on address=jms.queue.testQueue
queue=jms.queue.testQueue
07:52:47,863 INFO [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for
server ActiveMQServerImpl::serverUUID=null) AMQ221005: Deleting pending large message as
it was not completed: Pair[a=2147485068, b=2147485067]
07:52:47,863 INFO [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for
server ActiveMQServerImpl::serverUUID=null) AMQ221005: Deleting pending large message as
it was not completed: Pair[a=2147485076, b=2147485075]
07:52:47,864 INFO [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for
server ActiveMQServerImpl::serverUUID=null) AMQ221005: Deleting pending large message as
it was not completed: Pair[a=2147484450, b=2147484449]
07:52:47,864 INFO [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for
server ActiveMQServerImpl::serverUUID=null) AMQ221005: Deleting pending large message as
it was not completed: Pair[a=2147485072, b=2147485071]
07:52:47,864 INFO [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for
server ActiveMQServerImpl::serverUUID=null) AMQ221005: Deleting pending large message as
it was not completed: Pair[a=2147484454, b=2147484453]
07:52:47,866 WARN [org.apache.activemq.artemis.core.client]
(activemq-discovery-group-thread-dg-group1) AMQ212034: There are more than one servers on
the network broadcasting the same node id. You will see this message exactly once (per
node) if a node is restarted, in which case it can be safely ignored. But if it is logged
continuously it means you really do have more than one node on the same network active
concurrently with the same node id. This could occur if you have a backup node active at
the same time as its live node. nodeID=0dee81c9-2d9d-11e8-ba3f-cc3d825f79a4
07:52:47,867 INFO [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for
server ActiveMQServerImpl::serverUUID=null) AMQ221007: Server is now live
{code}
{code:title=Clients log}
07:52:49,274 Thread-80 (ActiveMQ-client-global-threads) DEBUG
[org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector:726] Remote
destination: localhost/127.0.0.1:9080
07:52:49,274 Thread-79 (ActiveMQ-client-global-threads) DEBUG
[org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl:1040] Connector
towards NettyConnector [host=localhost, port=8080, httpEnabled=false,
httpUpgradeEnabled=true, useServlet=false, servletPath=/messaging/ActiveMQServlet,
sslEnabled=false, useNio=true, activemqServerName=default, httpUpgradeEndpoint=acceptor]
failed
07:52:49,275 Thread-79 (ActiveMQ-client-global-threads) DEBUG
[org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl:1081] Trying backup
config = TransportConfiguration(name=connector,
factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory)
?httpUpgradeEndpoint=acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=9080&host=localhost
07:52:49,275 Thread-79 (ActiveMQ-client-global-threads) DEBUG
[org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector:447] Connector
NettyConnector [host=localhost, port=9080, httpEnabled=false, httpUpgradeEnabled=true,
useServlet=false, servletPath=/messaging/ActiveMQServlet, sslEnabled=false, useNio=true,
activemqServerName=default, httpUpgradeEndpoint=acceptor] using native epoll
07:52:49,275 Thread-80 (ActiveMQ-client-global-threads) DEBUG
[org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector:797] Sending HTTP
request DefaultHttpRequest(decodeResult: success, version: HTTP/1.1)
GET HTTP/1.1
host: localhost
upgrade: activemq-remoting
connection: upgrade
activemqServerName: default
httpUpgradeEndpoint: acceptor
Sec-ActiveMQRemoting-Key: QWV3bCfgh75NjWH3pZV5Ew==
07:52:49,275 Thread-79 (ActiveMQ-client-global-threads) DEBUG
[org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector:601] AMQ211002:
Started EPOLL Netty Connector version 4.1.16.Final to localhost:9080
07:52:49,275 Thread-79 (ActiveMQ-client-global-threads) DEBUG
[org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector:726] Remote
destination: localhost/127.0.0.1:9080
07:52:49,276 Thread-79 (ActiveMQ-client-global-threads) DEBUG
[org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector:797] Sending HTTP
request DefaultHttpRequest(decodeResult: success, version: HTTP/1.1)
GET HTTP/1.1
host: localhost
upgrade: activemq-remoting
connection: upgrade
activemqServerName: default
httpUpgradeEndpoint: acceptor
Sec-ActiveMQRemoting-Key: P9xBwRk1eZP5QjDWjqYuIg==
07:52:49,276 Thread-2 (ActiveMQ-client-netty-threads) DEBUG
[org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector$HttpUpgradeHandler:876]
Received msg=DefaultHttpResponse(decodeResult: success, version: HTTP/1.1)
HTTP/1.1 200 OK
Connection: keep-alive
Last-Modified: Thu, 22 Mar 2018 06:47:03 GMT
Content-Length: 2426
Content-Type: text/html
Accept-Ranges: bytes
Date: Thu, 22 Mar 2018 06:52:49 GMT
07:52:49,276 Thread-2 (ActiveMQ-client-netty-threads) DEBUG
[org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector$HttpUpgradeHandler:903]
AMQ214023: HTTP Handshake failed, received DefaultHttpResponse(decodeResult: success,
version: HTTP/1.1)
HTTP/1.1 200 OK
Connection: keep-alive
Last-Modified: Thu, 22 Mar 2018 06:47:03 GMT
Content-Length: 2426
Content-Type: text/html
Accept-Ranges: bytes
Date: Thu, 22 Mar 2018 06:52:49 GMT
07:52:49,276 Thread-2 (ActiveMQ-client-netty-threads) DEBUG
[org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector$HttpUpgradeHandler:876]
Received msg=DefaultHttpContent(data: PooledSlicedByteBuf(ridx: 0, widx: 829, cap:
829/829, unwrapped: PooledUnsafeDirectByteBuf(ridx: 1024, widx: 1024, cap: 1024)),
decoderResult: success)
07:52:49,276 Thread-2 (ActiveMQ-client-netty-threads) DEBUG
[org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector$HttpUpgradeHandler:903]
AMQ214023: HTTP Handshake failed, received DefaultHttpContent(data:
PooledSlicedByteBuf(ridx: 0, widx: 829, cap: 829/829, unwrapped:
PooledUnsafeDirectByteBuf(ridx: 1024, widx: 1024, cap: 1024)), decoderResult: success)
07:52:49,277 Thread-80 (ActiveMQ-client-global-threads) DEBUG
[org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl:1040] Connector
towards NettyConnector [host=localhost, port=9080, httpEnabled=false,
httpUpgradeEnabled=true, useServlet=false, servletPath=/messaging/ActiveMQServlet,
sslEnabled=false, useNio=true, activemqServerName=default, httpUpgradeEndpoint=acceptor]
failed
{code}