Hi Marek
I'm working with Fabricio on the federation performance issues with Keycloak.
In answer to your question we are using the latest KC 1.9.7 version (we upgraded this week
from 1.9.2).
To give you some indication of the running a gatling direct access login test (results
below).
As you can see below in (1) using KC out of the box. Great performance - we saw 110 tx per
sec on a 4 core system.
In scenario (2) using a stubbed federator (simply an echo plugin not connecting to any
back end services), performance is unacceptable.
1) Not using the federator - Stub federator (disabled) - while 29 tx per second we could
easily get to a stable 110 tx per second.
300 Users (hitting single server)
---- Global Information --------------------------------------------------------
> request count 9185 (OK=9185 KO=0 )
> min response time 18 (OK=18 KO=- )
> max response time 723 (OK=723 KO=- )
> mean response time 27 (OK=27 KO=- )
> std deviation 44 (OK=44 KO=- )
> response time 50th percentile 20 (OK=20 KO=- )
> response time 75th percentile 21 (OK=21 KO=- )
> mean requests/sec 29.626 (OK=29.626 KO=- )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms 9185 (100%)
> 800 ms < t < 1200 ms 0 ( 0%)
> t > 1200 ms 0 ( 0%)
> failed 0 ( 0%)
2) Stub federator (enabled)- if we brought test down to 12 tx per second (about 90 users)
the response times dropped to < 1200 ms response times, however not even close to
meeting out acceptance creteria.
300 Users (hitting single server)
---- Global Information --------------------------------------------------------
> request count 8496 (OK=8496 KO=0 )
> min response time 511 (OK=511 KO=- )
> max response time 11191 (OK=11191 KO=- )
> mean response time 6832 (OK=6832 KO=- )
> std deviation 2329 (OK=2329 KO=- )
> response time 50th percentile 7194 (OK=7194 KO=- )
> response time 75th percentile 8690 (OK=8690 KO=- )
> mean requests/sec 27.404 (OK=27.404 KO=- )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms 154 ( 2%)
> 800 ms < t < 1200 ms 85 ( 1%)
> t > 1200 ms 8257 ( 97%)
> failed 0 ( 0%)
This is currently a show stopper for us and is blocking our path to production.
Do you run similar tests and how can we help you optimise the performance?
Regards
Tom.
Date: Wed, 8 Jun 2016 12:28:19 +0200
From: Marek Posolda <mposolda(a)redhat.com>
Subject: Re: [keycloak-user] Performance issues with Federation
provider enabled
To: Fabricio Milone <fabricio.milone(a)shinetech.com>, keycloak-user
<keycloak-user(a)lists.jboss.org>
Message-ID: <5757F343.1040803(a)redhat.com>
Content-Type: text/plain; charset="windows-1252"
Hi,
what's the keycloak version used? Could you try latest keycloak and
check if performance is still the issue?
Marek
On 08/06/16 01:30, Fabricio Milone wrote:
Hi all,
I sent this email yesterday with 5 or more attachments, so I think it
was blocked or something... here I go again :)
I've been running load tests on our application during the last few
weeks, and having some performance issues when my custom federator is
enabled.
The performance issue does not exist when the federator is disabled.
*Configuration*:
I have a cluster of 2 instances of Keycloak, with a standalone DB,
we've verified the DB isn't an issue when the federator is disabled.
Both instances have a quad core CPU and they are in the same network.
We?ve left the memory at 512MB. The test script, database and API that
connects to the federator are in separate machines.
*Federator*:
We have a simple custom federator that makes calls to a very
performant api, which has been tested and is ok. Additionally, we've
tested stubbing the API so the performance is not a problem there.
This federator is using a jaxb marshaller to create a request, again
tested in isolation and is performing well.
As the federator is doing a lot of calls to the API (3 per login
request), I've implemented a httpclient that uses a
PoolingHttpClientConnectionManager with 1000 connections available to
use, instead of using the standard apache httpclient from http
components. That hasn't improved a bit the performance of the system.
*Tests*:
It is a gatling scala script that could generate around ~300 (or more)
requests/second to the direct grants login endpoint using random
usernames from a list (all of them already registered using KC). The
script is doing a round robin across both instances of Keycloak with
an even distribution to each KC instance.
The idea is simulate a load of 300 to 1500 concurrent users trying to
login into our systems.
*Problem*:
If I run the tests without using a federation I can see a very good
performance, but when I try to run the tests with the custom
federation code, the performance drops from ~150 requests/second to 22
req/sec using both instances.
Memory wise, it seems to be ok. I've never seen an error related to
memory with this configuration, also if you take a look at the
attached visualVM screenshot you'll see that memory is not a problem
or it seems not to be.
CPU utilisation is very low to my mind, I'd expect more than 80% of
usage or something like that.
There is a method that is leading the CPU samples on VisualVM called
Semaphore.tryAcquire(). Not quite sure what's that for, still
investigating.
I can see that a lot of new threads are being created when the test
starts, as it creates around 60requests/second to the direct grants
login call, but it seems to be a bottleneck at some point.
So I'm wondering if there is some configuration I'm missing on
Keycloak side that could be affecting the cluster performance when a
federator is enabled. Maybe something related to jpa connections,
infinispan configuration or even wildfly.
I'd really appreciate your help on this one as I'm out of ideas.
I've attached some screenshots of visualVM and tests results from my
last run today.
Sorry for the long email and please let me know if you need further
information.
Thank you in advance,
Regards,
Fab
--
*Fabricio Milone*
Developer