[keycloak-user] Does Keycloak need sticky session at the load balancer?
Sebastian Laskawiec
slaskawi at redhat.com
Thu Aug 30 04:02:34 EDT 2018
On Wed, Aug 29, 2018 at 3:27 PM Rafael Weingärtner <
rafaelweingartner at gmail.com> wrote:
> I think I will need a little bit of your wisdom again.
>
> I am now seeing the cluster between my Keycloak replicas to be created:
>
>> ^[[0m^[[0m13:03:03,800 INFO
>> [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>> thread 1-2) ISPN000079: Channel ejb local address is keycloak01, physical
>> addresses are [192.168.1.58:55200]
>> ^[[0m^[[0m13:03:03,801 INFO
>> [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>> thread 1-1) ISPN000094: Received new cluster view for channel ejb:
>> [keycloak02|1] (2) [keycloak02, keycloak01]
>>
>
> The problem is that when I shutdown one of them, a logged user will
> receive the following message:
>
>> An internal server error has occurred
>>
>
>
> Then, in the log files I see the following:
>
>> ^[[0m^[[31m13:18:04,149 ERROR
>> [org.infinispan.interceptors.InvocationContextInterceptor] (default
>> task-24) ISPN000136: Error executing command GetKeyValueCommand, writing
>> keys []: org.infinispan.util.concurrent.TimeoutException: Replication
>> timeout
>> at
>> org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:639)
>> ^[[0m^[[31m13:18:15,262 ERROR
>> [org.infinispan.interceptors.InvocationContextInterceptor]
>> (expiration-thread--p22-t1) ISPN000136: Error executing command
>> RemoveExpiredCommand, writing keys [468d1940-7293-4824-9e86-4aece6cd6744]:
>> org.infinispan.util.concurrent.TimeoutException: Replication timeout for
>> keycloak02
>>
>
I see you just killed the node (e.g. kill -9 <pid>), so that it exited
without saying "goodbye". In that case JGroups FD_* protocols on the other
node need to do their work and discover the failure. If you have any
commands in flight, they might fail. I highly encourage you to use a larger
cluster (with odd number of nodes if possible). Having only two nodes can
be a bit dangerous. Imagine a partition split, after the split heals, which
node is right? Hard to tell...
>
> I would say that this is expected as the node is down. However, it should
> not be a problem for the whole system.
>
> My replication settings are the following:
>
>> <distributed-cache name="sessions" mode="SYNC" owners="2"/>
>> <distributed-cache name="authenticationSessions" mode="SYNC" owners="2"/>
>> <distributed-cache name="offlineSessions" mode="SYNC" owners="2"/>
>> <distributed-cache name="clientSessions" mode="SYNC" owners="2"/>
>> <distributed-cache name="offlineClientSessions" mode="SYNC" owners="2"/>
>> <distributed-cache name="loginFailures" mode="SYNC" owners="2"/>
>>
>
> Do I need to change something else?
>
Here's the exactly the same problem. With number of owners=2 and 2 nodes,
this is essentially a replicated cache (despite some differences in logic).
I'd advice using at least 3 nodes (or even better 5).
>
> On Wed, Aug 29, 2018 at 9:51 AM, Rafael Weingärtner <
> rafaelweingartner at gmail.com> wrote:
>
>> Ah no problem. It was my fault. I forgot to start debugging from the
>> ground up (connectivity, firewalls, applications and so on )
>>
>> On Wed, Aug 29, 2018 at 9:49 AM, Bela Ban <bban at redhat.com> wrote:
>>
>>> Excellent! Unfortunately, JGroups cannot detect this...
>>>
>>> On 29/08/18 14:42, Rafael Weingärtner wrote:
>>>
>>>> Thanks!
>>>> The problem was caused by firewalld blocking Multicast traffic.
>>>>
>>>> On Fri, Aug 24, 2018 at 7:28 AM, Sebastian Laskawiec <
>>>> slaskawi at redhat.com <mailto:slaskawi at redhat.com>> wrote:
>>>>
>>>> Great write-up! Bookmarked!
>>>>
>>>> On Thu, Aug 23, 2018 at 4:36 PM Bela Ban <bban at redhat.com
>>>> <mailto:bban at redhat.com>> wrote:
>>>>
>>>> Have you checked
>>>>
>>>> https://github.com/belaban/workshop/blob/master/slides/admin.adoc#problem-1-members-don-t-find-each-other
>>>> <
>>>> https://github.com/belaban/workshop/blob/master/slides/admin.adoc#problem-1-members-don-t-find-each-other
>>>> >?
>>>>
>>>> On 23/08/18 13:53, Sebastian Laskawiec wrote:
>>>> > +Bela Ban <mailto:bban at redhat.com <mailto:bban at redhat.com>>
>>>> >
>>>> > As I expected, the cluster doesn't form.
>>>> >
>>>> > I'm not sure where and why those UDP discovery packets are
>>>> rejected. I
>>>> > just stumbled upon this thread [1], which you may find
>>>> useful. Maybe
>>>> > Bela will also have an idea what's going on there.
>>>> >
>>>> > If you won't manage to get UDP working, you can always fall
>>>> back into
>>>> > TCP (and MPING).
>>>> >
>>>> > [1]
>>>>
>>>> https://serverfault.com/questions/211482/tools-to-test-multicast-routing
>>>> <
>>>> https://serverfault.com/questions/211482/tools-to-test-multicast-routing
>>>> >
>>>> >
>>>> > On Thu, Aug 23, 2018 at 1:26 PM Rafael Weingärtner
>>>> > <rafaelweingartner at gmail.com
>>>> <mailto:rafaelweingartner at gmail.com>
>>>> <mailto:rafaelweingartner at gmail.com
>>>>
>>>> <mailto:rafaelweingartner at gmail.com>>> wrote:
>>>> >
>>>> > Thanks for the reply Sebastian!
>>>> >
>>>> >
>>>> > Note, that IP Multicasting is disabled in many data
>>>> centers (I
>>>> > have never found out why they do it, but I've seen it
>>>> many, many
>>>> > times). So make sure your cluster forms correctly
>>>> (just grep
>>>> > logs and look for "view").
>>>> >
>>>> >
>>>> > I thought about that. Then, I used tcpdump, and I can
>>>> see the
>>>> > multicast packets from both Keycloak replicas. However,
>>>> it seems
>>>> > that these packets are being ignored.
>>>> >
>>>> > root at Keycloak01:/# tcpdump -i eth0 port 7600 or port
>>>> 55200 or
>>>> > port 45700 or port 45688 or port 23364 or port 4712
>>>> or port 4713
>>>> > tcpdump: verbose output suppressed, use -v or -vv for
>>>> full
>>>> > protocol decode
>>>> > listening on eth0, link-type EN10MB (Ethernet),
>>>> capture size
>>>> > 262144 bytes
>>>> > 11:13:36.540080 IP keycloak02.local.55200 >
>>>> 230.0.0.4.45688:
>>>> > UDP, length 83
>>>> > 11:13:41.288449 IP keycloak02.local.55200 >
>>>> 230.0.0.4.45688:
>>>> > UDP, length 83
>>>> > 11:13:46.342606 IP keycloak02.local.55200 >
>>>> 230.0.0.4.45688:
>>>> > UDP, length 83
>>>> >
>>>> >
>>>> > root at keycloak02:/# tcpdump -i eth0 port 7600 or port
>>>> 55200 or
>>>> > port 45700 or port 45688 or port 23364 or port 4712
>>>> or port 4713
>>>> > tcpdump: verbose output suppressed, use -v or -vv for
>>>> full
>>>> > protocol decode
>>>> > listening on eth0, link-type EN10MB (Ethernet),
>>>> capture size
>>>> > 262144 bytes
>>>> > 11:12:14.218317 IP Keycloak01.local.55200 >
>>>> 230.0.0.4.45688:
>>>> > UDP, length 83
>>>> > 11:12:23.146798 IP Keycloak01.local.55200 >
>>>> 230.0.0.4.45688:
>>>> > UDP, length 83
>>>> > 11:12:27.201888 IP Keycloak01.local.55200 >
>>>> 230.0.0.4.45688:
>>>> > UDP, length 83
>>>> >
>>>> >
>>>> >
>>>> > Here go the log entries. I filtered by “view”. This is
>>>> from Keycloak01.
>>>> >
>>>> > ^[[0m^[[0m11:16:57,896 INFO
>>>> >
>>>> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
>>>> > (MSC service thread 1-4) ISPN000094: Received new
>>>> cluster view
>>>> > for channel ejb: [keycloak01|0] (1) [keycloak01]
>>>> > ^[[0m^[[0m11:16:57,896 INFO
>>>> >
>>>> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
>>>> > (MSC service thread 1-2) ISPN000094: Received new
>>>> cluster view
>>>> > for channel ejb: [keycloak01|0] (1) [keycloak01]
>>>> > ^[[0m^[[0m11:16:57,897 INFO
>>>> >
>>>> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
>>>> > (MSC service thread 1-1) ISPN000094: Received new
>>>> cluster view
>>>> > for channel ejb: [keycloak01|0] (1) [keycloak01]
>>>> > ^[[0m^[[0m11:16:57,898 INFO
>>>> >
>>>> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
>>>> > (MSC service thread 1-3) ISPN000094: Received new
>>>> cluster view
>>>> > for channel ejb: [keycloak01|0] (1) [keycloak01]
>>>> > ^[[0m^[[0m11:16:57,962 INFO
>>>> >
>>>> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
>>>> > (MSC service thread 1-1) ISPN000094: Received new
>>>> cluster view
>>>> > for channel ejb: [keycloak01|0] (1) [keycloak01]
>>>> >
>>>> >
>>>> > I expected it to be only one. I mean, I first started
>>>> Keycloak01,
>>>> > and just then Keycloak02. Next, we have the logs from
>>>> Keycloak02.
>>>> >
>>>> > ^[[0m^[[0m11:17:34,950 INFO
>>>> >
>>>> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
>>>> > (MSC service thread 1-3) ISPN000094: Received new
>>>> cluster view
>>>> > for channel ejb: [keycloak02|0] (1) [keycloak02]
>>>> > ^[[0m^[[0m11:17:34,952 INFO
>>>> >
>>>> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
>>>> > (MSC service thread 1-4) ISPN000094: Received new
>>>> cluster view
>>>> > for channel ejb: [keycloak02|0] (1) [keycloak02]
>>>> > ^[[0m^[[0m11:17:34,957 INFO
>>>> >
>>>> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
>>>> > (MSC service thread 1-1) ISPN000094: Received new
>>>> cluster view
>>>> > for channel ejb: [keycloak02|0] (1) [keycloak02]
>>>> > ^[[0m^[[0m11:17:34,957 INFO
>>>> >
>>>> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
>>>> > (MSC service thread 1-2) ISPN000094: Received new
>>>> cluster view
>>>> > for channel ejb: [keycloak02|0] (1) [keycloak02]
>>>> > ^[[0m^[[0m11:17:35,052 INFO
>>>> >
>>>> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
>>>> > (MSC service thread 1-1) ISPN000094: Received new
>>>> cluster view
>>>> > for channel ejb: [keycloak02|0] (1) [keycloak02
>>>> >
>>>> >
>>>> > They are similar. It seems that both applications are not
>>>> seeing
>>>> > each other. At first, I thought that the problem was
>>>> caused by
>>>> > “owners=1” configuration (the lack of data
>>>> synchronization between
>>>> > replicas). I then changed it to “owners=2”, but still, if
>>>> I log in
>>>> > the Keycloak01 and then force my request to go two
>>>> Keycloak02, my
>>>> > session is not there, and I am requested to log in again.
>>>> >
>>>> > Do you need some other log entries or configuration
>>>> files?
>>>> >
>>>> > Again, thanks for your reply and help!
>>>> >
>>>> > On Thu, Aug 23, 2018 at 5:24 AM, Sebastian Laskawiec
>>>> > <slaskawi at redhat.com <mailto:slaskawi at redhat.com>
>>>> <mailto:slaskawi at redhat.com <mailto:slaskawi at redhat.com>>>
>>>> wrote:
>>>> >
>>>> >
>>>> >
>>>> > On Wed, Aug 22, 2018 at 10:24 PM Rafael Weingärtner
>>>> > <rafaelweingartner at gmail.com
>>>> <mailto:rafaelweingartner at gmail.com>
>>>> > <mailto:rafaelweingartner at gmail.com
>>>>
>>>> <mailto:rafaelweingartner at gmail.com>>> wrote:
>>>> >
>>>> > Hello Keycloakers,
>>>> >
>>>> > I have some doubts regarding Keycloak and load
>>>> balancers. I
>>>> > set up two
>>>> > keycloak replicas to provide HA. To start them I
>>>> am using
>>>> > “./standalone.sh
>>>> > --server-config=standalone-ha.xml”. I am
>>>> assuming that they
>>>> > will use
>>>> > multicast to replicate information between nodes,
>>>> right?
>>>> >
>>>> >
>>>> > That is correct. It uses PING protocol, which in turn
>>>> uses IP
>>>> > Multicasting for discovery.
>>>> >
>>>> > Note, that IP Multicasting is disabled in many data
>>>> centers (I
>>>> > have never found out why they do it, but I've seen it
>>>> many, many
>>>> > times). So make sure your cluster forms correctly
>>>> (just grep
>>>> > logs and look for "view").
>>>> >
>>>> > Then, I set up a load balancer layer using Apache
>>>> HTTPD and
>>>> > AJP connector
>>>> > via 8009 port. To make everything work I needed
>>>> to use
>>>> > sticky session;
>>>> > otherwise, the login would never happen. I am
>>>> fine with the
>>>> > sticky session,
>>>> > however, if I stop one of the replicas where the
>>>> user is
>>>> > logged in, when
>>>> > the user access Keycloak again, he/she is asked
>>>> to present
>>>> > the credentials
>>>> > as if he/she was not logged in the other Keycloak
>>>> replica.
>>>> > Is that the
>>>> > expected behavior?
>>>> >
>>>> >
>>>> > My intuition tells me that your cluster didn't form
>>>> correctly
>>>> > (as I mentioned before, grep the logs and look for
>>>> "view"
>>>> > generated by JGroups). Therefore, if you enable
>>>> sticky session,
>>>> > all your requests get to the same Keycloak instance,
>>>> which has
>>>> > everything in the local cache. That's why it works
>>>> fine.
>>>> >
>>>> >
>>>> > Is there some troubleshooting or test that I can
>>>> perform to
>>>> > check if
>>>> > replication is being executed?
>>>> >
>>>> >
>>>> > Let's start with investigating the logs. Later on we
>>>> can check JMX.
>>>> >
>>>> >
>>>> > --
>>>> > Rafael Weingärtner
>>>> > _______________________________________________
>>>> > keycloak-user mailing list
>>>> > keycloak-user at lists.jboss.org
>>>> <mailto:keycloak-user at lists.jboss.org>
>>>> > <mailto:keycloak-user at lists.jboss.org
>>>> <mailto:keycloak-user at lists.jboss.org>>
>>>> > https://lists.jboss.org/mailman/listinfo/keycloak-user
>>>> <https://lists.jboss.org/mailman/listinfo/keycloak-user>
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Rafael Weingärtner
>>>> >
>>>>
>>>> -- Bela Ban | http://www.jgroups.org
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Rafael Weingärtner
>>>>
>>>
>>> --
>>> Bela Ban | http://www.jgroups.org
>>>
>>>
>>
>>
>> --
>> Rafael Weingärtner
>>
>
>
>
> --
> Rafael Weingärtner
>
More information about the keycloak-user
mailing list