]
Bela Ban commented on JGRP-2296:
--------------------------------
[~ethompson] Pull from master
DNS_PING is dropping port values with SRV based service discovery
-----------------------------------------------------------------
Key: JGRP-2296
URL:
https://issues.jboss.org/browse/JGRP-2296
Project: JGroups
Issue Type: Bug
Affects Versions: 4.0.11
Environment: JGroups version 4.0.11.Final
Used in Keycloak 4.4.0
Deployed as Jboss based Docker container from jboss/keycloak into AWS ECS
Reporter: Eric Thompson
Assignee: Bela Ban
Priority: Blocker
Fix For: 4.0.16
Using DNS_PING in Jgroups 4.0.11 and SRV records the port from the SRV record is being
dropped (set to zero) and the default is used instead (7600).
I am using this Jgroups config:
{code}
<subsystem xmlns="urn:jboss:domain:jgroups:6.0">
<channels default="ee">
<channel name="ee" stack="tcp"
cluster="ejb"/>
</channels>
<stacks>
<stack name="tcp">
<transport type="TCP"
socket-binding="jgroups-tcp">
<property
name="external_addr">${env.EXTERNAL_ADDR}</property>
</transport>
<protocol type="dns.DNS_PING">
<property name="dns_query">
jgroups.${env.DNS_NAME}.svc.cluster.local
</property>
<property name="dns_record_type">
SRV
</property>
</protocol>
<protocol type="MERGE3"/>
<protocol type="FD_SOCK"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
</stacks>
</subsystem>
{code}
I have these service discovery DNS entries
{code}
$ dig jgroups.dev.auth.example.com.svc.cluster.local SRV
; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.68.rc1.58.amzn1 <<>>
jgroups.dev.auth.example.com.svc.cluster.local SRV
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 16690
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;jgroups.dev.auth.example.com.svc.cluster.local. IN SRV
;; ANSWER SECTION:
jgroups.dev.auth.example.com.svc.cluster.local. 10 IN SRV 1 1 32921
9ec82e3f-3a0e-4e30-b785-17879c63cd7d.jgroups.dev.auth.example.com.svc.cluster.local.
jgroups.dev.auth.example.com.svc.cluster.local. 10 IN SRV 1 1 32923
60b5a820-9678-4bd2-84c6-00061a52bde0.jgroups.dev.auth.example.com.svc.cluster.local.
jgroups.dev.auth.example.com.svc.cluster.local. 10 IN SRV 1 1 32915
9d9d78d0-8919-4b91-9df8-2e4e65afedae.jgroups.dev.auth.example.com.svc.cluster.local.
jgroups.dev.auth.example.com.svc.cluster.local. 10 IN SRV 1 1 32917
161f3d66-f1e3-46f4-a44f-ebda925a25c6.jgroups.dev.auth.example.com.svc.cluster.local.
;; Query time: 2 msec
;; SERVER: 10.42.3.2#53(10.42.3.2)
;; WHEN: Fri Sep 21 01:45:44 2018
;; MSG SIZE rcvd: 481
{code}
But I get this in the logs when running Keycloak in standalone cluster:
{code}
17:45:10,121 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-3,null,null) Performing
initial discovery
17:45:10,154 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-3,null,null) Entries
collected from DNS: [10.42.3.56:0, 10.42.3.56:0, 10.42.3.44:0, 10.42.3.44:0]
17:45:10,155 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-3,null,null) Discovered
IP Address with port 0 (10.42.3.56:0). Replacing with default Transport port: 7600
17:45:10,159 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-3,null,null) Discovered
IP Address with port 0 (10.42.3.56:0). Replacing with default Transport port: 7600
17:45:10,159 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-3,null,null) Discovered
IP Address with port 0 (10.42.3.44:0). Replacing with default Transport port: 7600
17:45:10,159 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-3,null,null) Discovered
IP Address with port 0 (10.42.3.44:0). Replacing with default Transport port: 7600
17:45:10,159 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-3,null,null) Performing
discovery of the following hosts [10.42.3.56:7600, 10.42.3.44:7600, e200a617bf7a]
17:45:10,159 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-3,null,null)
e200a617bf7a: sending discovery request to 10.42.3.56:7600
17:45:10,160 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-3,null,null)
e200a617bf7a: sending discovery request to 10.42.3.44:7600
17:45:10,160 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-10,ejb,e200a617bf7a)
Received discovery from: e200a617bf7a, IP: 10.42.3.44:7600
17:45:10,161 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-3,null,null)
e200a617bf7a: sending discovery request to e200a617bf7a
17:45:10,162 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-11,ejb,e200a617bf7a)
Received discovery from: e200a617bf7a, IP: 10.42.3.44:7600
{code}
As you can see it is resolving the DNS addresses, but discarding the ports.
To be clear, in this example 32923 ids the port (eg:
1 1 32923
60b5a820-9678-4bd2-84c6-00061a52bde0.jgroups.dev.auth.example.com.svc.cluster.local).
These are dynamic ports mapped to port 7600 in order to put more Keycloak containers on
each instance.
{code}
$ docker ps
CONTAINER ID IMAGE
COMMAND CREATED STATUS PORTS
NAMES
f67e39f8f403 datadog/agent:latest-jmx
"/init" 8 hours ago Up 8 hours (healthy) 8125/udp,
8126/tcp
ecs-auth-service-dev-26-datadog-agent-a2b7f783ddd0ba9cf601
bbb12f0c43a5
233747045000.dkr.ecr.us-east-2.amazonaws.com/ops/keycloak:latest
"/opt/jboss/tools/do…" 8 hours ago Up 8 hours
0.0.0.0:32923->7600/tcp, 0.0.0.0:32922->8080/tcp
ecs-auth-service-dev-26-keycloak-f4bd8f8dca9fd4cd4f00
932cad7c4fb9 datadog/agent:latest-jmx
"/init" 8 hours ago Up 8 hours (healthy) 8125/udp,
8126/tcp
ecs-auth-service-dev-26-datadog-agent-baa38a98ccaddea6f501
e200a617bf7a
233747045000.dkr.ecr.us-east-2.amazonaws.com/ops/keycloak:latest
"/opt/jboss/tools/do…" 8 hours ago Up 8 hours
0.0.0.0:32921->7600/tcp, 0.0.0.0:32920->8080/tcp
ecs-auth-service-dev-26-keycloak-e6f398e6cc8db5b5f101
73bc0b863c73 amazon/amazon-ecs-agent:latest
"/agent" 2 days ago Up 2 days
ecs-agent
{code}
This seems like it might be where ports are getting lost:
https://github.com/belaban/JGroups/blob/07060c3ba6e52ad4aad3ac799c2bc95ff...
I don't see the port number being extracted from the SRV entry and appended to the IP
returned from resolveAEntries.
Let me know if I am missing any details. This is a major blocker for development.