Re: [keycloak-user] Does Keycloak need sticky session at the load balancer?

Thursday, 30 August 2018

On Wed, Aug 29, 2018 at 3:27 PM Rafael Weingärtner <
rafaelweingartner(a)gmail.com&gt; wrote:

...
 I think I will need a little bit of your wisdom again.

 I am now seeing the cluster between my Keycloak replicas to be created:

> ^[[0m^[[0m13:03:03,800 INFO
>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
> thread 1-2) ISPN000079: Channel ejb local address is keycloak01, physical
> addresses are [192.168.1.58:55200]
> ^[[0m^[[0m13:03:03,801 INFO
>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
> thread 1-1) ISPN000094: Received new cluster view for channel ejb:
> [keycloak02|1] (2) [keycloak02, keycloak01]
>

 The problem is that when I shutdown one of them, a logged user will
 receive the following message:

> An internal server error has occurred
>

 Then, in the log files I see the following:

> ^[[0m^[[31m13:18:04,149 ERROR
> [org.infinispan.interceptors.InvocationContextInterceptor] (default
> task-24) ISPN000136: Error executing command GetKeyValueCommand, writing
> keys []: org.infinispan.util.concurrent.TimeoutException: Replication
> timeout
>         at
>
org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:639)
> ^[[0m^[[31m13:18:15,262 ERROR
> [org.infinispan.interceptors.InvocationContextInterceptor]
> (expiration-thread--p22-t1) ISPN000136: Error executing command
> RemoveExpiredCommand, writing keys [468d1940-7293-4824-9e86-4aece6cd6744]:
> org.infinispan.util.concurrent.TimeoutException: Replication timeout for
> keycloak02
>
 I see you just killed the node (e.g. kill -9 <pid>), so that it exited
without saying "goodbye". In that case JGroups FD_* protocols on the other
node need to do their work and discover the failure. If you have any
commands in flight, they might fail. I highly encourage you to use a larger
cluster (with odd number of nodes if possible). Having only two nodes can
be a bit dangerous. Imagine a partition split, after the split heals, which
node is right? Hard to tell...

...

 I would say that this is expected as the node is down. However, it should
 not be a problem for the whole system.

 My replication settings are the following:

> <distributed-cache name="sessions" mode="SYNC"
owners="2"/>
> <distributed-cache name="authenticationSessions" mode="SYNC"
owners="2"/>
> <distributed-cache name="offlineSessions" mode="SYNC"
owners="2"/>
> <distributed-cache name="clientSessions" mode="SYNC"
owners="2"/>
> <distributed-cache name="offlineClientSessions" mode="SYNC"
owners="2"/>
> <distributed-cache name="loginFailures" mode="SYNC"
owners="2"/>
>

 Do I need to change something else?
 Here's the exactly the same problem. With number of owners=2 and 2 nodes,
this is essentially a replicated cache (despite some differences in logic).
I'd advice using at least 3 nodes (or even better 5).

...

 On Wed, Aug 29, 2018 at 9:51 AM, Rafael Weingärtner <
 rafaelweingartner(a)gmail.com&gt; wrote:

> Ah no problem. It was my fault. I forgot to start debugging from the
> ground up  (connectivity, firewalls, applications and so on )
>
> On Wed, Aug 29, 2018 at 9:49 AM, Bela Ban <bban(a)redhat.com&gt; wrote:
>
>> Excellent! Unfortunately, JGroups cannot detect this...
>>
>> On 29/08/18 14:42, Rafael Weingärtner wrote:
>>
>>> Thanks!
>>> The problem was caused by firewalld blocking Multicast traffic.
>>>
>>> On Fri, Aug 24, 2018 at 7:28 AM, Sebastian Laskawiec <
>>> slaskawi(a)redhat.com <mailto:slaskawi@redhat.com>> wrote:
>>>
>>>     Great write-up! Bookmarked!
>>>
>>>     On Thu, Aug 23, 2018 at 4:36 PM Bela Ban <bban(a)redhat.com
>>>     <mailto:bban@redhat.com>> wrote:
>>>
>>>         Have you checked
>>>
>>>
https://github.com/belaban/workshop/blob/master/slides/admin.adoc#problem...
>>>         <
>>>
https://github.com/belaban/workshop/blob/master/slides/admin.adoc#problem...
>>> >?
>>>
>>>         On 23/08/18 13:53, Sebastian Laskawiec wrote:
>>>          > +Bela Ban <mailto:bban@redhat.com
<mailto:bban@redhat.com>>
>>>          >
>>>          > As I expected, the cluster doesn't form.
>>>          >
>>>          > I'm not sure where and why those UDP discovery packets are
>>>         rejected. I
>>>          > just stumbled upon this thread [1], which you may find
>>>         useful. Maybe
>>>          > Bela will also have an idea what's going on there.
>>>          >
>>>          > If you won't manage to get UDP working, you can always
fall
>>>         back into
>>>          > TCP (and MPING).
>>>          >
>>>          > [1]
>>>
>>> https://serverfault.com/questions/211482/tools-to-test-multicast-routing
>>>         <
>>> https://serverfault.com/questions/211482/tools-to-test-multicast-routing
>>> >
>>>          >
>>>          > On Thu, Aug 23, 2018 at 1:26 PM Rafael Weingärtner
>>>          > <rafaelweingartner(a)gmail.com
>>>         <mailto:rafaelweingartner@gmail.com>
>>>         <mailto:rafaelweingartner@gmail.com
>>>
>>>         <mailto:rafaelweingartner@gmail.com>>> wrote:
>>>          >
>>>          >     Thanks for the reply Sebastian!
>>>          >
>>>          >
>>>          >         Note, that IP Multicasting is disabled in many data
>>>         centers (I
>>>          >         have never found out why they do it, but I've seen
it
>>>         many, many
>>>          >         times). So make sure your cluster forms correctly
>>>         (just grep
>>>          >         logs and look for "view").
>>>          >
>>>          >
>>>          >     I thought about that. Then, I used tcpdump, and I can
>>> see the
>>>          >     multicast packets from both Keycloak replicas. However,
>>>         it seems
>>>          >     that these packets are being ignored.
>>>          >
>>>          >         root@Keycloak01:/# tcpdump -i eth0 port 7600 or port
>>>         55200 or
>>>          >         port 45700 or port 45688 or port 23364 or port 4712
>>>         or port 4713
>>>          >         tcpdump: verbose output suppressed, use -v or -vv for
>>>         full
>>>          >         protocol decode
>>>          >         listening on eth0, link-type EN10MB (Ethernet),
>>>         capture size
>>>          >         262144 bytes
>>>          >         11:13:36.540080 IP keycloak02.local.55200 >
>>>         230.0.0.4.45688:
>>>          >         UDP, length 83
>>>          >         11:13:41.288449 IP keycloak02.local.55200 >
>>>         230.0.0.4.45688:
>>>          >         UDP, length 83
>>>          >         11:13:46.342606 IP keycloak02.local.55200 >
>>>         230.0.0.4.45688:
>>>          >         UDP, length 83
>>>          >
>>>          >
>>>          >         root@keycloak02:/# tcpdump -i eth0 port 7600 or port
>>>         55200 or
>>>          >         port 45700 or port 45688 or port 23364 or port 4712
>>>         or port 4713
>>>          >         tcpdump: verbose output suppressed, use -v or -vv for
>>>         full
>>>          >         protocol decode
>>>          >         listening on eth0, link-type EN10MB (Ethernet),
>>>         capture size
>>>          >         262144 bytes
>>>          >         11:12:14.218317 IP Keycloak01.local.55200 >
>>>         230.0.0.4.45688:
>>>          >         UDP, length 83
>>>          >         11:12:23.146798 IP Keycloak01.local.55200 >
>>>         230.0.0.4.45688:
>>>          >         UDP, length 83
>>>          >         11:12:27.201888 IP Keycloak01.local.55200 >
>>>         230.0.0.4.45688:
>>>          >         UDP, length 83
>>>          >
>>>          >
>>>          >
>>>          >     Here go the log entries. I filtered by “view”. This is
>>>         from Keycloak01.
>>>          >
>>>          >         ^[[0m^[[0m11:16:57,896 INFO
>>>          >
>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
>>>          >         (MSC service thread 1-4) ISPN000094: Received new
>>>         cluster view
>>>          >         for channel ejb: [keycloak01|0] (1) [keycloak01]
>>>          >         ^[[0m^[[0m11:16:57,896 INFO
>>>          >
>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
>>>          >         (MSC service thread 1-2) ISPN000094: Received new
>>>         cluster view
>>>          >         for channel ejb: [keycloak01|0] (1) [keycloak01]
>>>          >         ^[[0m^[[0m11:16:57,897 INFO
>>>          >
>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
>>>          >         (MSC service thread 1-1) ISPN000094: Received new
>>>         cluster view
>>>          >         for channel ejb: [keycloak01|0] (1) [keycloak01]
>>>          >         ^[[0m^[[0m11:16:57,898 INFO
>>>          >
>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
>>>          >         (MSC service thread 1-3) ISPN000094: Received new
>>>         cluster view
>>>          >         for channel ejb: [keycloak01|0] (1) [keycloak01]
>>>          >         ^[[0m^[[0m11:16:57,962 INFO
>>>          >
>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
>>>          >         (MSC service thread 1-1) ISPN000094: Received new
>>>         cluster view
>>>          >         for channel ejb: [keycloak01|0] (1) [keycloak01]
>>>          >
>>>          >
>>>          >     I expected it to be only one.  I mean, I first started
>>>         Keycloak01,
>>>          >     and just then Keycloak02. Next, we have the logs from
>>>         Keycloak02.
>>>          >
>>>          >         ^[[0m^[[0m11:17:34,950 INFO
>>>          >
>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
>>>          >         (MSC service thread 1-3) ISPN000094: Received new
>>>         cluster view
>>>          >         for channel ejb: [keycloak02|0] (1) [keycloak02]
>>>          >         ^[[0m^[[0m11:17:34,952 INFO
>>>          >
>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
>>>          >         (MSC service thread 1-4) ISPN000094: Received new
>>>         cluster view
>>>          >         for channel ejb: [keycloak02|0] (1) [keycloak02]
>>>          >         ^[[0m^[[0m11:17:34,957 INFO
>>>          >
>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
>>>          >         (MSC service thread 1-1) ISPN000094: Received new
>>>         cluster view
>>>          >         for channel ejb: [keycloak02|0] (1) [keycloak02]
>>>          >         ^[[0m^[[0m11:17:34,957 INFO
>>>          >
>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
>>>          >         (MSC service thread 1-2) ISPN000094: Received new
>>>         cluster view
>>>          >         for channel ejb: [keycloak02|0] (1) [keycloak02]
>>>          >         ^[[0m^[[0m11:17:35,052 INFO
>>>          >
>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
>>>          >         (MSC service thread 1-1) ISPN000094: Received new
>>>         cluster view
>>>          >         for channel ejb: [keycloak02|0] (1) [keycloak02
>>>          >
>>>          >
>>>          >     They are similar. It seems that both applications are not
>>>         seeing
>>>          >     each other. At first, I thought that the problem was
>>>         caused by
>>>          >     “owners=1” configuration (the lack of data
>>>         synchronization between
>>>          >     replicas). I then changed it to “owners=2”, but still, if
>>>         I log in
>>>          >     the Keycloak01 and then force my request to go two
>>>         Keycloak02, my
>>>          >     session is not there, and I am requested to log in again.
>>>          >
>>>          >     Do you need some other log entries or configuration
>>> files?
>>>          >
>>>          >     Again, thanks for your reply and help!
>>>          >
>>>          >     On Thu, Aug 23, 2018 at 5:24 AM, Sebastian Laskawiec
>>>          >     <slaskawi(a)redhat.com <mailto:slaskawi@redhat.com>
>>>         <mailto:slaskawi@redhat.com
<mailto:slaskawi@redhat.com>>>
>>> wrote:
>>>          >
>>>          >
>>>          >
>>>          >         On Wed, Aug 22, 2018 at 10:24 PM Rafael Weingärtner
>>>          >         <rafaelweingartner(a)gmail.com
>>>         <mailto:rafaelweingartner@gmail.com>
>>>          >         <mailto:rafaelweingartner@gmail.com
>>>
>>>         <mailto:rafaelweingartner@gmail.com>>> wrote:
>>>          >
>>>          >             Hello Keycloakers,
>>>          >
>>>          >             I have some doubts regarding Keycloak and load
>>>         balancers. I
>>>          >             set up two
>>>          >             keycloak replicas to provide HA. To start them I
>>>         am using
>>>          >             “./standalone.sh
>>>          >             --server-config=standalone-ha.xml”.  I am
>>>         assuming that they
>>>          >             will use
>>>          >             multicast to replicate information between nodes,
>>>         right?
>>>          >
>>>          >
>>>          >         That is correct. It uses PING protocol, which in turn
>>>         uses IP
>>>          >         Multicasting for discovery.
>>>          >
>>>          >         Note, that IP Multicasting is disabled in many data
>>>         centers (I
>>>          >         have never found out why they do it, but I've seen
it
>>>         many, many
>>>          >         times). So make sure your cluster forms correctly
>>>         (just grep
>>>          >         logs and look for "view").
>>>          >
>>>          >             Then, I set up a load balancer layer using Apache
>>>         HTTPD and
>>>          >             AJP connector
>>>          >             via 8009 port. To make everything work I needed
>>>         to use
>>>          >             sticky session;
>>>          >             otherwise, the login would never happen. I am
>>>         fine with the
>>>          >             sticky session,
>>>          >             however, if I stop one of the replicas where the
>>>         user is
>>>          >             logged in, when
>>>          >             the user access Keycloak again, he/she is asked
>>>         to present
>>>          >             the credentials
>>>          >             as if he/she was not logged in the other Keycloak
>>>         replica.
>>>          >             Is that the
>>>          >             expected behavior?
>>>          >
>>>          >
>>>          >         My intuition tells me that your cluster didn't
form
>>>         correctly
>>>          >         (as I mentioned before, grep the logs and look for
>>> "view"
>>>          >         generated by JGroups). Therefore, if you enable
>>>         sticky session,
>>>          >         all your requests get to the same Keycloak instance,
>>>         which has
>>>          >         everything in the local cache. That's why it works
>>> fine.
>>>          >
>>>          >
>>>          >             Is there some troubleshooting or test that I can
>>>         perform to
>>>          >             check if
>>>          >             replication is being executed?
>>>          >
>>>          >
>>>          >         Let's start with investigating the logs. Later on
we
>>>         can check JMX.
>>>          >
>>>          >
>>>          >             --
>>>          >             Rafael Weingärtner
>>>          >             _______________________________________________
>>>          >             keycloak-user mailing list
>>>          > keycloak-user(a)lists.jboss.org
>>>         <mailto:keycloak-user@lists.jboss.org>
>>>          >             <mailto:keycloak-user@lists.jboss.org
>>>         <mailto:keycloak-user@lists.jboss.org>>
>>>          > https://lists.jboss.org/mailman/listinfo/keycloak-user
>>>         <https://lists.jboss.org/mailman/listinfo/keycloak-user>
>>>          >
>>>          >
>>>          >
>>>          >
>>>          >     --
>>>          >     Rafael Weingärtner
>>>          >
>>>
>>>         --         Bela Ban | http://www.jgroups.org
>>>
>>>
>>>
>>>
>>> --
>>> Rafael Weingärtner
>>>
>>
>> --
>> Bela Ban | http://www.jgroups.org
>>
>>
>
>
> --
> Rafael Weingärtner
>

 --
 Rafael Weingärtner

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

Re: [keycloak-user] Does Keycloak need sticky session at the load balancer?