[jboss-jira] [JBoss JIRA] (JGRP-2300) DNS_PING in AWS ECS cannot cluster with dynamic port mappings

Sebastian Łaskawiec (Jira) issues at jboss.org
Fri Nov 30 05:03:01 EST 2018


    [ https://issues.jboss.org/browse/JGRP-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668504#comment-13668504 ] 

Sebastian Łaskawiec commented on JGRP-2300:
-------------------------------------------

[~ethompson] The discussion is being continued on JGRP-2316.

> DNS_PING in AWS ECS cannot cluster with dynamic port mappings
> -------------------------------------------------------------
>
>                 Key: JGRP-2300
>                 URL: https://issues.jboss.org/browse/JGRP-2300
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 4.0.16
>         Environment: AWS ECS Cluster with DNS based service discovery using jboss/keycloak:latest containers
>            Reporter: Eric Thompson
>            Assignee: Sebastian Łaskawiec
>            Priority: Critical
>             Fix For: 4.0.16
>
>
> When running an ECS cluster with jboss/keycloak:latest containers dynamic port mapping of all ports is required to allow more than one container to run per EC2 instance. Using SRV based service discovery records will allow each node to find the rest of the nodes, but when a discovery request is sent the receiving node sees the sender as IP:7600 instead of the dynamic port. It then sees this as a "new" node and tries to send discovery requests to it. And somehow it is also getting node IDs and trying to send requests to those!
> See the following log, there are only 4 actual nodes and the each have a different 5 digit port number:
> {code}
> ### Service discovery with dynamic port mapping
> 2018-10-10 20:17:44,178 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) Performing discovery of the following hosts [10.42.3.44:7600, 10.42.3.56:32949, 10.42.3.56:32951, 10.42.3.44:32954, c5b479b7b6d5, 10.42.3.44:32952, 10.42.3.56:7600, 17081c624290, 63976b7fae70, 557cbd7891a2]
> 2018-10-10 20:17:44,178 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.44:7600
> 2018-10-10 20:17:44,179 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.56:32949
> 2018-10-10 20:17:44,179 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.56:32951
> 2018-10-10 20:17:44,180 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.44:32954
> 2018-10-10 20:17:44,181 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to c5b479b7b6d5
> 2018-10-10 20:17:44,181 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.44:32952
> 2018-10-10 20:17:44,181 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.56:7600
> 2018-10-10 20:17:44,182 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 17081c624290, IP: 10.42.3.56:7600
> 2018-10-10 20:17:44,182 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 17081c624290
> 2018-10-10 20:17:44,182 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-238,ejb,17081c624290) Received discovery from: 17081c624290, IP: 10.42.3.56:7600
> 2018-10-10 20:17:44,182 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 63976b7fae70
> 2018-10-10 20:17:44,183 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 557cbd7891a2
> 2018-10-10 20:17:44,187 WARN  [org.jgroups.protocols.TCP] (TQ-Bundler-7,ejb,17081c624290) JGRP000032: 17081c624290: no physical address for c5b479b7b6d5, dropping message
> {code}
> This code seems to be part of the problem in this case: https://github.com/belaban/JGroups/blob/87d15ec848aa3d482ae792ef152f7e36e1ab625c/src/org/jgroups/protocols/dns/DNS_PING.java#L109
> See that code uses the incoming address and adds it to the discocvered_hosts, but those addresses are ALWAYS inaccurate in this case.
> Because this is what the recipient of the service discovery request sees (ie: all the ports are the default 7600):
> {code}
> 2018-10-10 20:35:15,229 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
> 2018-10-10 20:35:15,231 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
> 2018-10-10 20:35:15,232 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-397,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
> 2018-10-10 20:35:15,233 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
> 2018-10-10 20:35:17,234 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
> 2018-10-10 20:35:17,236 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
> 2018-10-10 20:35:17,238 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-397,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
> 2018-10-10 20:35:17,238 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
> 2018-10-10 20:35:19,239 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
> 2018-10-10 20:35:19,240 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
> 2018-10-10 20:35:19,242 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
> 2018-10-10 20:35:19,243 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
> 2018-10-10 20:35:21,246 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
> 2018-10-10 20:35:21,247 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
> 2018-10-10 20:35:21,253 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
> 2018-10-10 20:35:21,253 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
> 2018-10-10 20:35:23,247 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
> 2018-10-10 20:35:23,249 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
> 2018-10-10 20:35:23,251 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
> 2018-10-10 20:35:23,251 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
> 2018-10-10 20:35:25,252 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
> 2018-10-10 20:35:25,253 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
> 2018-10-10 20:35:25,255 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
> 2018-10-10 20:35:25,256 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
> {code}
> In this state the cluster never seems to work properly and the Keycloak interface breaks in many frustrating ways.



--
This message was sent by Atlassian Jira
(v7.12.1#712002)



More information about the jboss-jira mailing list