]
Sebastian Łaskawiec commented on JGRP-2300:
-------------------------------------------
[~ethompson] The discussion is being continued on JGRP-2316.
DNS_PING in AWS ECS cannot cluster with dynamic port mappings
-------------------------------------------------------------
Key: JGRP-2300
URL:
https://issues.jboss.org/browse/JGRP-2300
Project: JGroups
Issue Type: Bug
Affects Versions: 4.0.16
Environment: AWS ECS Cluster with DNS based service discovery using
jboss/keycloak:latest containers
Reporter: Eric Thompson
Assignee: Sebastian Łaskawiec
Priority: Critical
Fix For: 4.0.16
When running an ECS cluster with jboss/keycloak:latest containers dynamic port mapping of
all ports is required to allow more than one container to run per EC2 instance. Using SRV
based service discovery records will allow each node to find the rest of the nodes, but
when a discovery request is sent the receiving node sees the sender as IP:7600 instead of
the dynamic port. It then sees this as a "new" node and tries to send discovery
requests to it. And somehow it is also getting node IDs and trying to send requests to
those!
See the following log, there are only 4 actual nodes and the each have a different 5
digit port number:
{code}
### Service discovery with dynamic port mapping
2018-10-10 20:17:44,178 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-240,ejb,17081c624290) Performing discovery of the following hosts
[10.42.3.44:7600, 10.42.3.56:32949, 10.42.3.56:32951, 10.42.3.44:32954, c5b479b7b6d5,
10.42.3.44:32952, 10.42.3.56:7600, 17081c624290, 63976b7fae70, 557cbd7891a2]
2018-10-10 20:17:44,178 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.44:7600
2018-10-10 20:17:44,179 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.56:32949
2018-10-10 20:17:44,179 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.56:32951
2018-10-10 20:17:44,180 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.44:32954
2018-10-10 20:17:44,181 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-240,ejb,17081c624290) 17081c624290: sending discovery request to c5b479b7b6d5
2018-10-10 20:17:44,181 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.44:32952
2018-10-10 20:17:44,181 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.56:7600
2018-10-10 20:17:44,182 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-237,ejb,17081c624290) Received discovery from: 17081c624290, IP: 10.42.3.56:7600
2018-10-10 20:17:44,182 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 17081c624290
2018-10-10 20:17:44,182 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-238,ejb,17081c624290) Received discovery from: 17081c624290, IP: 10.42.3.56:7600
2018-10-10 20:17:44,182 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 63976b7fae70
2018-10-10 20:17:44,183 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 557cbd7891a2
2018-10-10 20:17:44,187 WARN [org.jgroups.protocols.TCP] (TQ-Bundler-7,ejb,17081c624290)
JGRP000032: 17081c624290: no physical address for c5b479b7b6d5, dropping message
{code}
This code seems to be part of the problem in this case:
https://github.com/belaban/JGroups/blob/87d15ec848aa3d482ae792ef152f7e36e...
See that code uses the incoming address and adds it to the discocvered_hosts, but those
addresses are ALWAYS inaccurate in this case.
Because this is what the recipient of the service discovery request sees (ie: all the
ports are the default 7600):
{code}
2018-10-10 20:35:15,229 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:15,231 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:15,232 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-397,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:15,233 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:17,234 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:17,236 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:17,238 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-397,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:17,238 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:19,239 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:19,240 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:19,242 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:19,243 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:21,246 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:21,247 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:21,253 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:21,253 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:23,247 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:23,249 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:23,251 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:23,251 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:25,252 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:25,253 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:25,255 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:25,256 DEBUG [org.jgroups.protocols.dns.DNS_PING]
(thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
{code}
In this state the cluster never seems to work properly and the Keycloak interface breaks
in many frustrating ways.