[keycloak-user] Keycloak cluster communication not working properly

Jens Bissinger jens.bissinger at coliquio.de
Tue Mar 19 10:50:13 EDT 2019


Hey Vlasta,

thanks. I just created this ticket https://issues.jboss.org/browse/KEYCLOAK-9855

— Jens

On 14. Mar 2019, at 12:10, Vlasta Ramik <vramik at redhat.com<mailto:vramik at redhat.com>> wrote:

Hey Jens,

would you mind to create a ticket[1] for the issue, please?

[1] https://issues.jboss.org/projects/KEYCLOAK

On 3/13/19 2:38 PM, Jens Bissinger wrote:
Hi,

we have a keycloak instance running as docker container in our AWS ECS docker environment.

For single instance this setup works great, but we failed to enhance it with a second instance for HA.

Problem: We cannot authenticate in one of instances behind the load balancer as soon as we have more than one keycloak instance.

Cluster setup:

- Keycloak v5.0.0 (docker image quay.io/keycloak/keycloak:5.0.0<http://quay.io/keycloak/keycloak:5.0.0>)
- Containers are behind AWS ALB load balancers with round-robin but without sticky sessions (the latter is important for our setup)
- JGroups with JDBC_PING configured and instances properly add/remove themselve from the configured MySQL table
- Containers run on separete EC2 hosts, TCP communication between containers is possible (port 7600 exposed also on hosts)
- Cache owners for all distributed caches are set to 2 (we also tested with 1 but without any different results)

Startup logs from infinispan look fine:

- On startup we see log message that cluster nodes can discover each other
  "ISPN000094: Received new cluster view for channel ejb: [ip-10-129-2-31.eu<http://ip-10-129-2-31.eu>-central-1.compute.internal|1] (2) [ip-10-129-2-31.eu<http://ip-10-129-2-31.eu>-central-1.compute.internal, ip-10-129-2-54.eu<http://ip-10-129-2-54.eu>-central-1.compute.internal]"
- After that also infinispan rebalancing happens
  "[Context=offlineClientSessions] ISPN100010: Finished rebalance with members [ip-10-129-2-31.eu<http://ip-10-129-2-31.eu>-central-1.compute.internal, ip-10-129-2-54.eu<http://ip-10-129-2-54.eu>-central-1.compute.internal]”

Analysis (so far):

- The problem is obviously because authentication starts on node 1. Due to round robin authentication will be continued on node 2 and this fails because node 2 does not know about the authentication session started on node 1.
- According to the documentation there should be a lookup from node 2 in the cluster for started authentication session. Seems like this is not happening, but we cannot see any log related to this.
- Also regular sessions are not distributed in the cache. We tested this running only 1 node to do the authentication and then spinning up a second node and doing a fail-over to the new node. Afterwards the regular session was gone (we are logged out).

Thank you very much.

Regards
Jens Bissinger


_______________________________________________
keycloak-user mailing list
keycloak-user at lists.jboss.org<mailto:keycloak-user at lists.jboss.org>
https://lists.jboss.org/mailman/listinfo/keycloak-user



More information about the keycloak-user mailing list