[jboss-jira] [JBoss JIRA] (JGRP-2300) DNS_PING in AWS ECS cannot cluster with dynamic port mappings
Eric Thompson (Jira)
issues at jboss.org
Fri Oct 12 16:51:00 EDT 2018
Eric Thompson created JGRP-2300:
-----------------------------------
Summary: DNS_PING in AWS ECS cannot cluster with dynamic port mappings
Key: JGRP-2300
URL: https://issues.jboss.org/browse/JGRP-2300
Project: JGroups
Issue Type: Bug
Affects Versions: 4.0.16
Environment: AWS ECS Cluster with DNS based service discovery using jboss/keycloak:latest containers
Reporter: Eric Thompson
Assignee: Bela Ban
When running an ECS cluster with jboss/keycloak:latest containers dynamic port mapping of all ports is required to allow more than one container to run per EC2 instance. Using SRV based service discovery records will allow each node to find the rest of the nodes, but when a discovery request is sent the receiving node sees the sender as IP:7600 instead of the dynamic port. It then sees this as a "new" node and tries to send discovery requests to it. And somehow it is also getting node IDs and trying to send requests to those!
See the following log, there are only 4 actual nodes and the each have a different 5 digit port number:
{code}
### Service discovery with dynamic port mapping
2018-10-10 20:17:44,178 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) Performing discovery of the following hosts [10.42.3.44:7600, 10.42.3.56:32949, 10.42.3.56:32951, 10.42.3.44:32954, c5b479b7b6d5, 10.42.3.44:32952, 10.42.3.56:7600, 17081c624290, 63976b7fae70, 557cbd7891a2]
2018-10-10 20:17:44,178 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.44:7600
2018-10-10 20:17:44,179 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.56:32949
2018-10-10 20:17:44,179 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.56:32951
2018-10-10 20:17:44,180 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.44:32954
2018-10-10 20:17:44,181 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to c5b479b7b6d5
2018-10-10 20:17:44,181 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.44:32952
2018-10-10 20:17:44,181 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.56:7600
2018-10-10 20:17:44,182 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 17081c624290, IP: 10.42.3.56:7600
2018-10-10 20:17:44,182 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 17081c624290
2018-10-10 20:17:44,182 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-238,ejb,17081c624290) Received discovery from: 17081c624290, IP: 10.42.3.56:7600
2018-10-10 20:17:44,182 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 63976b7fae70
2018-10-10 20:17:44,183 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 557cbd7891a2
2018-10-10 20:17:44,187 WARN [org.jgroups.protocols.TCP] (TQ-Bundler-7,ejb,17081c624290) JGRP000032: 17081c624290: no physical address for c5b479b7b6d5, dropping message
{code}
This code seems to be part of the problem in this case: https://github.com/belaban/JGroups/blob/87d15ec848aa3d482ae792ef152f7e36e1ab625c/src/org/jgroups/protocols/dns/DNS_PING.java#L109
See that code uses the incoming address and adds it to the discocvered_hosts, but those addresses are ALWAYS inaccurate in this case.
Because this is what the recipient of the service discovery request sees (ie: all the ports are the default 7600):
{code}
2018-10-10 20:35:15,229 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:15,231 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:15,232 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-397,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:15,233 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:17,234 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:17,236 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:17,238 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-397,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:17,238 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:19,239 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:19,240 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:19,242 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:19,243 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:21,246 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:21,247 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:21,253 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:21,253 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:23,247 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:23,249 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:23,251 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:23,251 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:25,252 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:25,253 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:25,255 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
2018-10-10 20:35:25,256 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
{code}
In this state the cluster never seems to work properly and the Keycloak interface breaks in many frustrating ways.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
More information about the jboss-jira
mailing list