Hi everyone,
In our project we are creating lots of clients in Keycloak. In our loadtests with ~6000
clients we found very slow response times.
For example average response times during load tests:
23.9 sec / admin/realms/{realm}/users/{id}/role-mappings (GET)
28.2 sec /admin/realms/{realm}/clients (POST)
20.2 sec /admin/realms/{realm}/clients/{id} (DELETE)
By debugging Keycloak we found that the server is iterating over all clients in the realm.
For this finding we opened ticket
https://issues.jboss.org/browse/KEYCLOAK-9553. Initially
after Keycloak startup this could take up to 5 minutes but was much faster for subsequent
requests. We assume due to local caches. The variance of the response times is very high.
They range from <1s to timeouts after 5 minutes.
What we´ve tried so far:
First we scaled up the Keycloak instances because we thought it might be a load problem.
Turned out that it doesn´t need load to reproduce the slow responses, just enough
clients.
Then we tried to warmup the caches by running the loadtests for a longer time but couldn´t
see improvements.
We found that there are configuration options for the caches and tried to gain some
insight on the runtime behavior via jboss cli by enabling the cache statistics
/subsystem=infinispan/cache-container=keycloak/local-cache=realms:write-attribute(name=statistics-enabled,value=true)
/subsystem=infinispan/cache-container=keycloak:write-attribute(name=statistics-enabled,
value=true)
:reload
With no success. Statistics keep showing only zeros.
[standalone@localhost:9990 /] ls
subsystem=infinispan/cache-container=keycloak/local-cache=realms
component elapsed-time=0
module=undefined stores=0
memory hit-ratio=0.0
number-of-entries=0 time-since-reset=0
store hits=0
passivations=0 eviction={"EVICTION" => undefined}
activations=0 indexing=NONE
read-write-ratio=0.0 expiration={"EXPIRATION" =>
undefined}
average-read-time=0 indexing-properties=undefined
remove-hits=0 locking={"LOCKING" => undefined}
average-write-time=0 invalidations=0
remove-misses=0 transaction={"TRANSACTION" =>
undefined}
batching=false jndi-name=undefined
start=LAZY
cache-status=RUNNING misses=0
statistics-enabled=false
At last we tried to manipulate the cache settings for the realms cache
# default 10000
/subsystem=infinispan/cache-container=keycloak/local-cache=realms/memory=object:write-attribute(name=max-entries,
value=80000)
# default 10000
/subsystem=infinispan/cache-container=keycloak/local-cache=realms/memory=object:write-attribute(name=size,
value=80000)
This also had no noticeable effect on the response times.
Ah! The connection pool size for the db was also something that we tried to increase.
Setup
Keycloak is running in standalone HA mode with jgroups on Kubernetes (3 replicas). The
database is AWS RDS.
Next we want to test the scalability of Keycloak with respect to the number of clients.
Do we miss something about the cache configuration. Is the realm cache the correct one to
optimize the problematic endpoints? How can we get the cache statistics working?
resources:
request:
mem: 4Gi
cpu: 1
limit:
mem: 6Gi
cpu: 3
Java memory parameters are set
/usr/lib/jvm/java/bin/java -D[Standalone] -server -Xms3276m -Xmx4914m
-javaagent:/opt/jboss/newrelic/newrelic.jar -Djboss.modules.system.pkgs
Mit freundlichen Grüßen / Best regards
Nils El-Himoud
INST-IOT/ESW-Imb