[keycloak-user] Cross-DC Replication not working for `sessions` cache

Sebastian Laskawiec slaskawi at redhat.com
Thu Aug 23 08:11:29 EDT 2018


On Tue, Aug 21, 2018 at 3:12 PM Hayden Fuss <hfuss at bandwidth.com> wrote:

> Hey guys,
>
> Thank you for the updates! We'll stick to Infinispan 8.2.8 so that there
> an no surprises.
>
> We upgraded JGroups 3 and added KUBE_PING to Infinispan 8.2.8, as well as
> for Keycloak, and so we've gotten cross-DC working with two Keycloaks and
> two ISPN's in each DC.
>
> In our first round of HA testing, Keycloak's OIDC endpoints have been
> fairly resilient when unable to connect LDAP, MariaDB, and the whole ISPN
> cluster (we just destroy the OpenShift Services and wait 5 minutes while
> testing the endpoints). However, we've noticed if we delete a ISPN pod
> forcefully, we'll experience some timeouts with the
> /token?username&password grant as the *new *pod comes up.
>
>
If I remember correctly, during the summit we used Sync Replicated caches.
So each request needs to be committed by all the nodes before Infinispan's
put method returns. So if you're not doing a graceful leave (e.g. kill pods
with a very short termination time), a killed node won't be able to say
"goodbye" to the cluster. The cluster will need to detect that a node died.
That's why you observe those kind of behavior.


> We believe its due to our liveness/readiness probes being too optimistic
> since ISPN 8.2.8 does not have a health check like ISPN 9.X. I've been
> unable to find a prescribed way of health checking ISPN 8.2.8.
>

It's tricky for 8.2.8. In general you will need to use similar approach to
is_running.sh but query different fields. Maybe try something similar to
this (of course you may adjust cache container and cache names):
/subsystem=infinispan/cache-container=keycloak/replicated-cache=work:read-attribute(name=cache-status)


>
> For now I'm waiting for the 9990 socket to open as the liveness probe, and
> reusing the is_running.sh from ISPN 9.X for the readiness probe (attached),
> and ISPN pods are considered "Ready" to receive traffic from the OpenShift
> Service much sooner than they were when we used the probes that came with
> ISPN 9.X. Aside from setting the delay on the probes to be longer, do
> either of you know a more accurate way to health check ISPN 8.2.8?
>

You may try to use the cli and extract some values out of it. However I
usually advice to be conservative and give Infinispan some time to become
ready.


>
> Thanks again for the time and info. We greatly appreciate it as its been
> very helpful!
>
> Best,
> Hayden
>
> On Tue, Aug 21, 2018 at 5:26 AM Marek Posolda <mposolda at redhat.com> wrote:
>
>> On 11/08/18 14:26, Sebastian Laskawiec wrote:
>>
>>
>>
>> pt., 10.08.2018, 21:59 użytkownik Hayden Fuss <hfuss at bandwidth.com>
>> napisał:
>>
>>> Hello Sebastian and Marek,
>>>
>>> Thank you very much for suggestions. We had confirmed replication across
>>> the ISPN clusters was working with the CLI, so we tried attaching the
>>> remote debugger but didn't find anything useful to tell us why Keycloak
>>> couldn't remotely store the sessions in the ISPN cluster.
>>>
>>
>> Thanks for letting us know.
>>
>>
>>> Based on what Marek described, we decided to downgrade our ISPN cluster
>>> to 8.2.8 rather than use 9.3.1 and incorporate the demo code. It was our
>>> understanding that demo code would provide an SPI that enabled the ISPN
>>> cluster for persistent user storage (but not realms, clients, keys) which
>>> is not desirable for us as of now.
>>>
>>
>> Hmmm that's pretty interesting. For the Summit demo we used a fresh
>> master build. So ISPN 9.x should work without any problems. Perhaps Marek
>> can shed some light on this issue.
>>
>> The current Keycloak master supports cross-dc integration with infinispan
>> server 8.2.8.Final and JDG 7.1. That's what we are testing and what is
>> officially described as recommended infinispan-server version in our
>> documentation:
>> https://www.keycloak.org/docs/latest/server_installation/index.html#crossdc-mode
>>
>> In the recent PR for upgrade Keycloak to Wildfly 13, there will be the
>> upgrade to JDG 7.2 and infinispan server to 9.2.4.Final (this is same as
>> the infinispan version in the Wildfly 13).
>>
>> The summit demo used the infinispan server 9.3 AFAIR, but this required
>> some updates in the Keycloak code, which was done by overriding default
>> userSessions to the "updated-infinispan" provider. The code of this
>> updated-infinispan is in the rh-sso project sources:
>>
>> https://github.com/rhdemo/rh-sso/blob/master/standalone-openshift-cfg/configuration/standalone-openshift-jdg.xml#L676-L681
>>
>> Even with this overriden provider, I've tested just the Keycloak parts,
>> which were needed for the demo itself. I did not try to run our cross-dc
>> automated tests. So no guarantee that everything works as expected.
>>
>> In other words, if you have a choice for the infinispan-server version
>> and you don't need infinispan-server 9.X, it's recommended to stay with the
>> infinispan-server 8.2.8.
>>
>> Marek
>>
>>
>> BTW, do you have a demo pushed into some repo, so that we could check it
>> out?
>>
>>
>>> Downgrading to 8.2.8 (had to create our own image
>>> https://github.com/brix4dayz/infinispan/tree/8.2.x) fixed our sessions
>>> replication issue, the only thing is KUBE_PING/DNS_PING isn't available
>>> with the JGroups version that comes with 8.2.8. Based on what I'm seeing
>>> from this PR https://github.com/jboss-dockerfiles/keycloak/pull/96/files
>>> its possible to add a newer version of JGroups to Keycloak so I'll attempt
>>> to do that for ISPN so we can have local clustering for ISPN and Keycloak
>>> in OpenShift.
>>>
>>
>> Kube ping has basically two versions, 1.x which requires JGroups 4 and
>> 0.9.x, which works with JGroups 3 and 4. Let me know if you hit any
>> problems incorporating kube ping into your project. I might be able to help
>> you.
>>
>>
>>> If there's a better way to go about the JGroups version issue let us
>>> know. Thanks again!
>>>
>>
>> TBH I'm really interested why keycloak doesn't store sessions in ISPN. In
>> my opinion, we should find out how to fix this problem and stay with ISPN
>> 9. I would recommend downgrading ISPN as the last resort approach.
>>
>>
>>> Best,
>>> Hayd
>>>
>>> On Thu, Aug 9, 2018 at 3:27 AM Marek Posolda <mposolda at redhat.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I didn't check everything, but one thing I noted is, that in your
>>>> keycloak-standalone-ha.xml, you don't have "alternative" providers
>>>> configured.
>>>>
>>>> For Keycloak to work with the infinispan 9.2.X server or newer, it was
>>>> needed to configure providers like this:
>>>>
>>>> https://github.com/rhdemo/rh-sso/blob/master/standalone-openshift-cfg/configuration/standalone-openshift-jdg.xml#L676-L681
>>>> .
>>>>
>>>> There is also a need to add the userStorage to your realm, which can be
>>>> done through admin console or by importing the realm. See:
>>>> https://github.com/rhdemo/rh-sso/blob/master/realm-summit.json#L1051
>>>>
>>>> Marek
>>>>
>>>>
>>>> On 08/08/18 15:07, Sebastian Laskawiec wrote:
>>>> > On Tue, Aug 7, 2018 at 3:28 PM Hayden Fuss <hfuss at bandwidth.com>
>>>> wrote:
>>>> >
>>>> >> Hello,
>>>> >>
>>>> >> We are attempting to run Keycloak on two OpenShift clusters using
>>>> remote
>>>> >> ISPNs and a single MariaDB instance. We're hacking together the
>>>> Keycloak on
>>>> >> Openshift blogpost, the JDG-as-a-service demo from Summit, RH SSO
>>>> demo from
>>>> >> Summit, and following the Keycloak/RH SSO basic setup guide to
>>>> Cross-DC
>>>> >> replication. The hope is do an initial evaluation of Keycloak's
>>>> >> availability.
>>>> >>
>>>> >> We were able to create a new user on master (site1), disable the
>>>> user on
>>>> >> master2 (site2), and see the user was disabled on master. So ISPN
>>>> >> replication seems to be working because the work cache was
>>>> replicated to
>>>> >> invalidate the local caches. However, the sessions cache does not
>>>> seem to
>>>> >> be replicated because when logged in as the same user on the two
>>>> different
>>>> >> Keycloaks (in Incognito mode) there is only one active session shown
>>>> on
>>>> >> both UIs and the timestamp/IP/etc is different for the listed
>>>> session.
>>>> >>
>>>> > So at this point the Infinispan cluster within a single DC works
>>>> correctly
>>>> > [1] (the one that is formed by KUBE_PING). The Cross-DC cluster (also
>>>> known
>>>> > as the Global Cluster) also works correctly [2]. Users cache
>>>> replicates
>>>> > fine but sessions don't.
>>>> >
>>>> > If I understood everything correctly, there might be two issues there.
>>>> >
>>>> > The first one is Infinispan misconfiguration (I briefly looked
>>>> through the
>>>> > configuration and can not spot any mistake but there might be some
>>>> typo or
>>>> > anything like that). That one is easy to be verified, just put an
>>>> entry on
>>>> > one node (e.g. using REST [3]) and see if it's available on the other
>>>> one
>>>> > (again, using REST for example [4]).
>>>> >
>>>> > If this test works fine, you can check if Keycloak forwards traffic
>>>> to the
>>>> > Infinispan cluster. The easiest way is to set a breakpoint somewhere
>>>> > in
>>>> org.keycloak.models.sessions.infinispan.changes.sessions.LastSessionRefreshChecker#shouldSaveClientSessionToRemoteCache
>>>> > and
>>>> org.keycloak.models.sessions.infinispan.changes.sessions.LastSessionRefreshChecker#shouldSaveUserSessionToRemoteCache.
>>>> >
>>>> > [1] can be verified by calling `oc logs infinispan-app | grep view`
>>>> > [2] can be verified by calling `oc logs infinispan-app | grep
>>>> "x-site"`
>>>> > [3] curl -d test ISPN_IP:8080/rest/sessions/test
>>>> > [4] curl ISPN_IP2:8080/rest/sessions/test
>>>> >
>>>> >
>>>> >> We are using the latest, stable Keycloak image, version 4.1.0.Final,
>>>> and
>>>> >> the latest, stable Infinispan image for to act as our data grid,
>>>> version
>>>> >> 9.3.1.Final, which we know differs from the 8.2.8 version Keycloak
>>>> uses for
>>>> >> its local caches.
>>>> >>
>>>> >> We were trying one Keycloak node and two ISPN nodes in each cluster,
>>>> but
>>>> >> for simplicity we've attached logs where we only ran one Keycloak
>>>> and one
>>>> >> ISPN in each cluster.
>>>> >> We were connecting to the two different Keycloaks via two different
>>>> >> OpenShift Routes without a load balancer to fake sticky sessions for
>>>> now.
>>>> >> Keycloak connects to ISPN via a "HotRod" Service. ISPN connects to
>>>> other
>>>> >> nodes within the same cluster via KUBE_PING, and discovers the other
>>>> >> cluster via TCPPING hitting a particular OpenShift app node from that
>>>> >> cluster that exposes the "discovery" Service with a NodePort. The
>>>> Keycloaks
>>>> >> share the single MariaDB through a NodePort Service in one of the
>>>> clusters
>>>> >> as well.
>>>> >>
>>>> >> The logs didn't seem to contain any of the messages in the trouble
>>>> shooting
>>>> >> guide. We had trouble using JMX to check the ISPNs because they were
>>>> >> running in containers, but we've using the CLI tool and the
>>>> Infinispan
>>>> >> management console to try to troubleshoot but any key we pulled from
>>>> the
>>>> >> logs that we thought was a session ID was not in the caches and we
>>>> could
>>>> >> not find a way to simply list all keys in the caches.
>>>> >>
>>>> >> Below is a viewable link to a zip containing logs from the scenario
>>>> >> described in the second paragraph, and our config files.
>>>> >>
>>>> >>
>>>> >>
>>>> https://drive.google.com/open?id=0B_OCdNCEtoCYOU12T3dEUFplS193VFNFbEFYclB4Tm5WR0o4
>>>> >>
>>>> >> Thanks for your time and help!
>>>> >>
>>>> >> Best,
>>>> >> Hayden
>>>> >> _______________________________________________
>>>> >> keycloak-user mailing list
>>>> >> keycloak-user at lists.jboss.org
>>>> >> https://lists.jboss.org/mailman/listinfo/keycloak-user
>>>> >>
>>>> > _______________________________________________
>>>> > keycloak-user mailing list
>>>> > keycloak-user at lists.jboss.org
>>>> > https://lists.jboss.org/mailman/listinfo/keycloak-user
>>>>
>>>>
>>>>
>>


More information about the keycloak-user mailing list