[keycloak-user] Keycloak cluster communication not working properly

Jens Bissinger jens.bissinger at coliquio.de
Wed Mar 13 09:38:14 EDT 2019


Hi,

we have a keycloak instance running as docker container in our AWS ECS docker environment.

For single instance this setup works great, but we failed to enhance it with a second instance for HA.

Problem: We cannot authenticate in one of instances behind the load balancer as soon as we have more than one keycloak instance.

Cluster setup:

- Keycloak v5.0.0 (docker image quay.io/keycloak/keycloak:5.0.0)
- Containers are behind AWS ALB load balancers with round-robin but without sticky sessions (the latter is important for our setup)
- JGroups with JDBC_PING configured and instances properly add/remove themselve from the configured MySQL table
- Containers run on separete EC2 hosts, TCP communication between containers is possible (port 7600 exposed also on hosts)
- Cache owners for all distributed caches are set to 2 (we also tested with 1 but without any different results)

Startup logs from infinispan look fine:

- On startup we see log message that cluster nodes can discover each other
  "ISPN000094: Received new cluster view for channel ejb: [ip-10-129-2-31.eu-central-1.compute.internal|1] (2) [ip-10-129-2-31.eu-central-1.compute.internal, ip-10-129-2-54.eu-central-1.compute.internal]"
- After that also infinispan rebalancing happens
  "[Context=offlineClientSessions] ISPN100010: Finished rebalance with members [ip-10-129-2-31.eu-central-1.compute.internal, ip-10-129-2-54.eu-central-1.compute.internal]”

Analysis (so far):

- The problem is obviously because authentication starts on node 1. Due to round robin authentication will be continued on node 2 and this fails because node 2 does not know about the authentication session started on node 1.
- According to the documentation there should be a lookup from node 2 in the cluster for started authentication session. Seems like this is not happening, but we cannot see any log related to this.
- Also regular sessions are not distributed in the cache. We tested this running only 1 node to do the authentication and then spinning up a second node and doing a fail-over to the new node. Afterwards the regular session was gone (we are logged out).

Thank you very much.

Regards
Jens Bissinger




More information about the keycloak-user mailing list