[keycloak-user] Does Keycloak need sticky session at the load balancer?
Rafael Weingärtner
rafaelweingartner at gmail.com
Thu Aug 30 07:02:45 EDT 2018
Awesome, thanks for the help, Sebastian. I have a question regarding these
"owners" numbers. What happens if I set this number to (let's say) 10 and I
only spin up 7 nodes? Is it a valid deployment? And, will everything work
just fine? Or, would I start to get errors?
On Thu, Aug 30, 2018 at 5:02 AM, Sebastian Laskawiec <slaskawi at redhat.com>
wrote:
> On Wed, Aug 29, 2018 at 3:27 PM Rafael Weingärtner <
> rafaelweingartner at gmail.com> wrote:
>
>> I think I will need a little bit of your wisdom again.
>>
>> I am now seeing the cluster between my Keycloak replicas to be created:
>>
>>> ^[[0m^[[0m13:03:03,800 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
>>> (MSC service thread 1-2) ISPN000079: Channel ejb local address is
>>> keycloak01, physical addresses are [192.168.1.58:55200]
>>> ^[[0m^[[0m13:03:03,801 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
>>> (MSC service thread 1-1) ISPN000094: Received new cluster view for channel
>>> ejb: [keycloak02|1] (2) [keycloak02, keycloak01]
>>>
>>
>> The problem is that when I shutdown one of them, a logged user will
>> receive the following message:
>>
>>> An internal server error has occurred
>>>
>>
>>
>> Then, in the log files I see the following:
>>
>>> ^[[0m^[[31m13:18:04,149 ERROR [org.infinispan.interceptors.InvocationContextInterceptor]
>>> (default task-24) ISPN000136: Error executing command GetKeyValueCommand,
>>> writing keys []: org.infinispan.util.concurrent.TimeoutException:
>>> Replication timeout
>>> at org.infinispan.remoting.transport.jgroups.
>>> JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:639)
>>> ^[[0m^[[31m13:18:15,262 ERROR [org.infinispan.interceptors.InvocationContextInterceptor]
>>> (expiration-thread--p22-t1) ISPN000136: Error executing command
>>> RemoveExpiredCommand, writing keys [468d1940-7293-4824-9e86-4aece6cd6744]:
>>> org.infinispan.util.concurrent.TimeoutException: Replication timeout
>>> for keycloak02
>>>
>>
> I see you just killed the node (e.g. kill -9 <pid>), so that it exited
> without saying "goodbye". In that case JGroups FD_* protocols on the other
> node need to do their work and discover the failure. If you have any
> commands in flight, they might fail. I highly encourage you to use a larger
> cluster (with odd number of nodes if possible). Having only two nodes can
> be a bit dangerous. Imagine a partition split, after the split heals, which
> node is right? Hard to tell...
>
>
>>
>> I would say that this is expected as the node is down. However, it should
>> not be a problem for the whole system.
>>
>> My replication settings are the following:
>>
>>> <distributed-cache name="sessions" mode="SYNC" owners="2"/>
>>> <distributed-cache name="authenticationSessions" mode="SYNC" owners="2"/>
>>> <distributed-cache name="offlineSessions" mode="SYNC" owners="2"/>
>>> <distributed-cache name="clientSessions" mode="SYNC" owners="2"/>
>>> <distributed-cache name="offlineClientSessions" mode="SYNC" owners="2"/>
>>> <distributed-cache name="loginFailures" mode="SYNC" owners="2"/>
>>>
>>
>> Do I need to change something else?
>>
> Here's the exactly the same problem. With number of owners=2 and 2 nodes,
> this is essentially a replicated cache (despite some differences in logic).
> I'd advice using at least 3 nodes (or even better 5).
>
>>
>> On Wed, Aug 29, 2018 at 9:51 AM, Rafael Weingärtner <
>> rafaelweingartner at gmail.com> wrote:
>>
>>> Ah no problem. It was my fault. I forgot to start debugging from the
>>> ground up (connectivity, firewalls, applications and so on )
>>>
>>> On Wed, Aug 29, 2018 at 9:49 AM, Bela Ban <bban at redhat.com> wrote:
>>>
>>>> Excellent! Unfortunately, JGroups cannot detect this...
>>>>
>>>> On 29/08/18 14:42, Rafael Weingärtner wrote:
>>>>
>>>>> Thanks!
>>>>> The problem was caused by firewalld blocking Multicast traffic.
>>>>>
>>>>> On Fri, Aug 24, 2018 at 7:28 AM, Sebastian Laskawiec <
>>>>> slaskawi at redhat.com <mailto:slaskawi at redhat.com>> wrote:
>>>>>
>>>>> Great write-up! Bookmarked!
>>>>>
>>>>> On Thu, Aug 23, 2018 at 4:36 PM Bela Ban <bban at redhat.com
>>>>> <mailto:bban at redhat.com>> wrote:
>>>>>
>>>>> Have you checked
>>>>> https://github.com/belaban/workshop/blob/master/slides/
>>>>> admin.adoc#problem-1-members-don-t-find-each-other
>>>>> <https://github.com/belaban/workshop/blob/master/slides/
>>>>> admin.adoc#problem-1-members-don-t-find-each-other>?
>>>>>
>>>>> On 23/08/18 13:53, Sebastian Laskawiec wrote:
>>>>> > +Bela Ban <mailto:bban at redhat.com <mailto:bban at redhat.com>>
>>>>> >
>>>>> > As I expected, the cluster doesn't form.
>>>>> >
>>>>> > I'm not sure where and why those UDP discovery packets are
>>>>> rejected. I
>>>>> > just stumbled upon this thread [1], which you may find
>>>>> useful. Maybe
>>>>> > Bela will also have an idea what's going on there.
>>>>> >
>>>>> > If you won't manage to get UDP working, you can always fall
>>>>> back into
>>>>> > TCP (and MPING).
>>>>> >
>>>>> > [1]
>>>>> https://serverfault.com/questions/211482/tools-to-
>>>>> test-multicast-routing
>>>>> <https://serverfault.com/questions/211482/tools-to-
>>>>> test-multicast-routing>
>>>>> >
>>>>> > On Thu, Aug 23, 2018 at 1:26 PM Rafael Weingärtner
>>>>> > <rafaelweingartner at gmail.com
>>>>> <mailto:rafaelweingartner at gmail.com>
>>>>> <mailto:rafaelweingartner at gmail.com
>>>>>
>>>>> <mailto:rafaelweingartner at gmail.com>>> wrote:
>>>>> >
>>>>> > Thanks for the reply Sebastian!
>>>>> >
>>>>> >
>>>>> > Note, that IP Multicasting is disabled in many data
>>>>> centers (I
>>>>> > have never found out why they do it, but I've seen
>>>>> it
>>>>> many, many
>>>>> > times). So make sure your cluster forms correctly
>>>>> (just grep
>>>>> > logs and look for "view").
>>>>> >
>>>>> >
>>>>> > I thought about that. Then, I used tcpdump, and I can
>>>>> see the
>>>>> > multicast packets from both Keycloak replicas. However,
>>>>> it seems
>>>>> > that these packets are being ignored.
>>>>> >
>>>>> > root at Keycloak01:/# tcpdump -i eth0 port 7600 or
>>>>> port
>>>>> 55200 or
>>>>> > port 45700 or port 45688 or port 23364 or port 4712
>>>>> or port 4713
>>>>> > tcpdump: verbose output suppressed, use -v or -vv
>>>>> for
>>>>> full
>>>>> > protocol decode
>>>>> > listening on eth0, link-type EN10MB (Ethernet),
>>>>> capture size
>>>>> > 262144 bytes
>>>>> > 11:13:36.540080 IP keycloak02.local.55200 >
>>>>> 230.0.0.4.45688:
>>>>> > UDP, length 83
>>>>> > 11:13:41.288449 IP keycloak02.local.55200 >
>>>>> 230.0.0.4.45688:
>>>>> > UDP, length 83
>>>>> > 11:13:46.342606 IP keycloak02.local.55200 >
>>>>> 230.0.0.4.45688:
>>>>> > UDP, length 83
>>>>> >
>>>>> >
>>>>> > root at keycloak02:/# tcpdump -i eth0 port 7600 or
>>>>> port
>>>>> 55200 or
>>>>> > port 45700 or port 45688 or port 23364 or port 4712
>>>>> or port 4713
>>>>> > tcpdump: verbose output suppressed, use -v or -vv
>>>>> for
>>>>> full
>>>>> > protocol decode
>>>>> > listening on eth0, link-type EN10MB (Ethernet),
>>>>> capture size
>>>>> > 262144 bytes
>>>>> > 11:12:14.218317 IP Keycloak01.local.55200 >
>>>>> 230.0.0.4.45688:
>>>>> > UDP, length 83
>>>>> > 11:12:23.146798 IP Keycloak01.local.55200 >
>>>>> 230.0.0.4.45688:
>>>>> > UDP, length 83
>>>>> > 11:12:27.201888 IP Keycloak01.local.55200 >
>>>>> 230.0.0.4.45688:
>>>>> > UDP, length 83
>>>>> >
>>>>> >
>>>>> >
>>>>> > Here go the log entries. I filtered by “view”. This is
>>>>> from Keycloak01.
>>>>> >
>>>>> > ^[[0m^[[0m11:16:57,896 INFO
>>>>> > [org.infinispan.remoting.
>>>>> transport.jgroups.JGroupsTransport]
>>>>> > (MSC service thread 1-4) ISPN000094: Received new
>>>>> cluster view
>>>>> > for channel ejb: [keycloak01|0] (1) [keycloak01]
>>>>> > ^[[0m^[[0m11:16:57,896 INFO
>>>>> > [org.infinispan.remoting.
>>>>> transport.jgroups.JGroupsTransport]
>>>>> > (MSC service thread 1-2) ISPN000094: Received new
>>>>> cluster view
>>>>> > for channel ejb: [keycloak01|0] (1) [keycloak01]
>>>>> > ^[[0m^[[0m11:16:57,897 INFO
>>>>> > [org.infinispan.remoting.
>>>>> transport.jgroups.JGroupsTransport]
>>>>> > (MSC service thread 1-1) ISPN000094: Received new
>>>>> cluster view
>>>>> > for channel ejb: [keycloak01|0] (1) [keycloak01]
>>>>> > ^[[0m^[[0m11:16:57,898 INFO
>>>>> > [org.infinispan.remoting.
>>>>> transport.jgroups.JGroupsTransport]
>>>>> > (MSC service thread 1-3) ISPN000094: Received new
>>>>> cluster view
>>>>> > for channel ejb: [keycloak01|0] (1) [keycloak01]
>>>>> > ^[[0m^[[0m11:16:57,962 INFO
>>>>> > [org.infinispan.remoting.
>>>>> transport.jgroups.JGroupsTransport]
>>>>> > (MSC service thread 1-1) ISPN000094: Received new
>>>>> cluster view
>>>>> > for channel ejb: [keycloak01|0] (1) [keycloak01]
>>>>> >
>>>>> >
>>>>> > I expected it to be only one. I mean, I first started
>>>>> Keycloak01,
>>>>> > and just then Keycloak02. Next, we have the logs from
>>>>> Keycloak02.
>>>>> >
>>>>> > ^[[0m^[[0m11:17:34,950 INFO
>>>>> > [org.infinispan.remoting.
>>>>> transport.jgroups.JGroupsTransport]
>>>>> > (MSC service thread 1-3) ISPN000094: Received new
>>>>> cluster view
>>>>> > for channel ejb: [keycloak02|0] (1) [keycloak02]
>>>>> > ^[[0m^[[0m11:17:34,952 INFO
>>>>> > [org.infinispan.remoting.
>>>>> transport.jgroups.JGroupsTransport]
>>>>> > (MSC service thread 1-4) ISPN000094: Received new
>>>>> cluster view
>>>>> > for channel ejb: [keycloak02|0] (1) [keycloak02]
>>>>> > ^[[0m^[[0m11:17:34,957 INFO
>>>>> > [org.infinispan.remoting.
>>>>> transport.jgroups.JGroupsTransport]
>>>>> > (MSC service thread 1-1) ISPN000094: Received new
>>>>> cluster view
>>>>> > for channel ejb: [keycloak02|0] (1) [keycloak02]
>>>>> > ^[[0m^[[0m11:17:34,957 INFO
>>>>> > [org.infinispan.remoting.
>>>>> transport.jgroups.JGroupsTransport]
>>>>> > (MSC service thread 1-2) ISPN000094: Received new
>>>>> cluster view
>>>>> > for channel ejb: [keycloak02|0] (1) [keycloak02]
>>>>> > ^[[0m^[[0m11:17:35,052 INFO
>>>>> > [org.infinispan.remoting.
>>>>> transport.jgroups.JGroupsTransport]
>>>>> > (MSC service thread 1-1) ISPN000094: Received new
>>>>> cluster view
>>>>> > for channel ejb: [keycloak02|0] (1) [keycloak02
>>>>> >
>>>>> >
>>>>> > They are similar. It seems that both applications are
>>>>> not
>>>>> seeing
>>>>> > each other. At first, I thought that the problem was
>>>>> caused by
>>>>> > “owners=1” configuration (the lack of data
>>>>> synchronization between
>>>>> > replicas). I then changed it to “owners=2”, but still,
>>>>> if
>>>>> I log in
>>>>> > the Keycloak01 and then force my request to go two
>>>>> Keycloak02, my
>>>>> > session is not there, and I am requested to log in
>>>>> again.
>>>>> >
>>>>> > Do you need some other log entries or configuration
>>>>> files?
>>>>> >
>>>>> > Again, thanks for your reply and help!
>>>>> >
>>>>> > On Thu, Aug 23, 2018 at 5:24 AM, Sebastian Laskawiec
>>>>> > <slaskawi at redhat.com <mailto:slaskawi at redhat.com>
>>>>> <mailto:slaskawi at redhat.com <mailto:slaskawi at redhat.com>>>
>>>>> wrote:
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Wed, Aug 22, 2018 at 10:24 PM Rafael Weingärtner
>>>>> > <rafaelweingartner at gmail.com
>>>>> <mailto:rafaelweingartner at gmail.com>
>>>>> > <mailto:rafaelweingartner at gmail.com
>>>>>
>>>>> <mailto:rafaelweingartner at gmail.com>>> wrote:
>>>>> >
>>>>> > Hello Keycloakers,
>>>>> >
>>>>> > I have some doubts regarding Keycloak and load
>>>>> balancers. I
>>>>> > set up two
>>>>> > keycloak replicas to provide HA. To start them I
>>>>> am using
>>>>> > “./standalone.sh
>>>>> > --server-config=standalone-ha.xml”. I am
>>>>> assuming that they
>>>>> > will use
>>>>> > multicast to replicate information between
>>>>> nodes,
>>>>> right?
>>>>> >
>>>>> >
>>>>> > That is correct. It uses PING protocol, which in
>>>>> turn
>>>>> uses IP
>>>>> > Multicasting for discovery.
>>>>> >
>>>>> > Note, that IP Multicasting is disabled in many data
>>>>> centers (I
>>>>> > have never found out why they do it, but I've seen
>>>>> it
>>>>> many, many
>>>>> > times). So make sure your cluster forms correctly
>>>>> (just grep
>>>>> > logs and look for "view").
>>>>> >
>>>>> > Then, I set up a load balancer layer using
>>>>> Apache
>>>>> HTTPD and
>>>>> > AJP connector
>>>>> > via 8009 port. To make everything work I needed
>>>>> to use
>>>>> > sticky session;
>>>>> > otherwise, the login would never happen. I am
>>>>> fine with the
>>>>> > sticky session,
>>>>> > however, if I stop one of the replicas where the
>>>>> user is
>>>>> > logged in, when
>>>>> > the user access Keycloak again, he/she is asked
>>>>> to present
>>>>> > the credentials
>>>>> > as if he/she was not logged in the other
>>>>> Keycloak
>>>>> replica.
>>>>> > Is that the
>>>>> > expected behavior?
>>>>> >
>>>>> >
>>>>> > My intuition tells me that your cluster didn't form
>>>>> correctly
>>>>> > (as I mentioned before, grep the logs and look for
>>>>> "view"
>>>>> > generated by JGroups). Therefore, if you enable
>>>>> sticky session,
>>>>> > all your requests get to the same Keycloak instance,
>>>>> which has
>>>>> > everything in the local cache. That's why it works
>>>>> fine.
>>>>> >
>>>>> >
>>>>> > Is there some troubleshooting or test that I can
>>>>> perform to
>>>>> > check if
>>>>> > replication is being executed?
>>>>> >
>>>>> >
>>>>> > Let's start with investigating the logs. Later on we
>>>>> can check JMX.
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Rafael Weingärtner
>>>>> > _______________________________________________
>>>>> > keycloak-user mailing list
>>>>> > keycloak-user at lists.jboss.org
>>>>> <mailto:keycloak-user at lists.jboss.org>
>>>>> > <mailto:keycloak-user at lists.jboss.org
>>>>> <mailto:keycloak-user at lists.jboss.org>>
>>>>> > https://lists.jboss.org/mailman/listinfo/keycloak-user
>>>>> <https://lists.jboss.org/mailman/listinfo/keycloak-user>
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Rafael Weingärtner
>>>>> >
>>>>>
>>>>> -- Bela Ban | http://www.jgroups.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Rafael Weingärtner
>>>>>
>>>>
>>>> --
>>>> Bela Ban | http://www.jgroups.org
>>>>
>>>>
>>>
>>>
>>> --
>>> Rafael Weingärtner
>>>
>>
>>
>>
>> --
>> Rafael Weingärtner
>>
>
--
Rafael Weingärtner
More information about the keycloak-user
mailing list