[keycloak-user] Standalone HA tokens not immediately shared among nodes

D V dv at glyphy.com
Tue Sep 25 14:55:12 EDT 2018


Thanks for the responses, folks. The issue now isn't the inability to set
all caches to replicated. It's that a get-user-by-id is called whenever a
node has to process a an authentication via a refresh token that wasn't
also issued by that same node. See the last paragraph of
http://lists.jboss.org/pipermail/keycloak-user/2018-September/015549.html .
The results are cached, but only on the original issuing node. I was
expecting the user-by-id information to be shared between keycloak nodes to
avoid external service calls, but perhaps this is by design? If so, could
you explain why?

On Tue, Sep 25, 2018 at 9:14 AM Marek Posolda <mposolda at redhat.com> wrote:

> Some more info about our caches:
> https://www.keycloak.org/docs/latest/server_installation/index.html#cache-configuration
>
> Not sure if this info should be updated and some more things to be
> clearified?
>
> Marek
>
> On 25/09/18 15:12, Marek Posolda wrote:
>
> Sorry, I did not read whole thread.
>
> Just a quick note, that caches "realms", "users", "keys" and
> "authorization" are supposed to be local caches. The pattern, we're using
> ATM is, that every cluster node caches it's data (realms, users etc)
> locally. In case that some objects are updated (EG. realm or users), there
> is separate cache "work", which make sure to notify other cluster nodes (or
> even nodes on all the other DCs), so all the nodes can invalidate
> particular cached object from their caches.
>
> Caches "realms", "users", "keys" and "authorization" are not meant to be
> replicated/distributed, but local. So this NotSerializableException doesn't
> look like a bug to me.
>
> Marek
>
> On 25/09/18 12:06, Sebastian Laskawiec wrote:
>
> Thanks a lot for checking this.
>
> This seems like a bug to me, so I filled
> https://issues.jboss.org/browse/KEYCLOAK-8415. Unfortunately, we are
> preparing for some urgent work on the product side and I can't promise you
> when we will be able to look into this. I highly encourage you to
> contribute a fix if you are in hurry or just subscribe to the ticket and
> wait till we find a free slot to get it fixed.
>
> Thanks,
> Sebastian
>
> On Thu, Sep 20, 2018 at 4:27 PM D V <dv at glyphy.com> wrote:
>
>> OK. So, with all caches being replicated, there's an error on startup:
>>
>> 2018-09-20 14:03:38,307 ERROR
>> [org.infinispan.remoting.rpc.RpcManagerImpl] (ServerService Thread Pool --
>> 62) ISPN000073: Unexpected error while replicating:
>> org.infinispan.commons.marshall.NotSerializableException:
>> org.keycloak.models.PasswordPolicy$Builder
>> Caused by: an exception which occurred:
>> in field org.keycloak.models.PasswordPolicy.builder
>> in object org.keycloak.models.PasswordPolicy at 6ab5350d
>> in field
>> org.keycloak.models.cache.infinispan.entities.CachedRealm.passwordPolicy
>> in object
>> org.keycloak.models.cache.infinispan.entities.CachedRealm at 7864be21
>> in object
>> org.keycloak.models.cache.infinispan.entities.CachedRealm at 7864be21
>> in object org.infinispan.commands.write.PutKeyValueCommand at fec4dc5e
>> in object org.infinispan.commands.remote.SingleRpcCommand at 3f2e5d1a
>>
>> If I make the realms cache local but leave the rest replicated, I observe
>> the same behaviour: the node that didn't issue the original set of
>> refresh/access tokens does a getUserById lookup, which in my case results
>> in a network call against a remote service.
>>
>> I noticed there are caches running that aren't mentioned in the config,
>> like userRevisions. These are local and adding them to the config as
>> replicated doesn't actually make them as such.
>>
>> On Thu, Sep 20, 2018 at 7:36 AM Sebastian Laskawiec <slaskawi at redhat.com>
>> wrote:
>>
>>> Could you please try to unify the caches? Please replace all local-cache
>>> and distributed-cache with replicated-cache.
>>>
>>> Even though using distributed caches over replicated ones should be the
>>> cause, I think those local caches might cause the behavior you're
>>> describing.
>>>
>>> On Wed, Sep 19, 2018 at 3:21 PM D V <dv at glyphy.com> wrote:
>>>
>>>> Makes sense re: replicated caches. Here's my infinispan subsystem
>>>> config right now:
>>>>
>>>>         <subsystem xmlns="urn:jboss:domain:infinispan:4.0">
>>>>             <cache-container name="keycloak"
>>>> jndi-name="infinispan/Keycloak" statistics-enabled="true">
>>>>                 <transport lock-timeout="60000"/>
>>>>                 <local-cache name="realms" statistics-enabled="true">
>>>>                     <eviction max-entries="10000" strategy="LRU"/>
>>>>                 </local-cache>
>>>>                 <local-cache name="users" statistics-enabled="true">
>>>>                     <eviction max-entries="10000" strategy="LRU"/>
>>>>                 </local-cache>
>>>>
>>>>                 <!--
>>>>                 These two need to be replicated or the node that didn't
>>>> issue the initial refresh token
>>>>                 will return "invalid_grant" errors when attempting to
>>>> auth with that refresh token.
>>>>                 -->
>>>>                 <replicated-cache name="sessions"
>>>> statistics-enabled="true"/>
>>>>                 <replicated-cache name="clientSessions"
>>>> statistics-enabled="true"/>
>>>>
>>>>                 <distributed-cache name="authenticationSessions"
>>>> mode="SYNC" owners="1" statistics-enabled="true"/>
>>>>                 <distributed-cache name="offlineSessions" mode="SYNC"
>>>> owners="1" statistics-enabled="true"/>
>>>>                 <distributed-cache name="offlineClientSessions"
>>>> mode="SYNC" owners="1" statistics-enabled="true"/>
>>>>                 <distributed-cache name="loginFailures" mode="SYNC"
>>>> owners="1" statistics-enabled="true"/>
>>>>                 <local-cache name="authorization"
>>>> statistics-enabled="true">
>>>>                     <eviction max-entries="10000" strategy="LRU"/>
>>>>                 </local-cache>
>>>>                 <replicated-cache name="work" mode="SYNC"
>>>> statistics-enabled="true"/>
>>>>                 <local-cache name="keys" statistics-enabled="true">
>>>>                     <eviction max-entries="1000" strategy="LRU"/>
>>>>                     <expiration max-idle="3600000"/>
>>>>                 </local-cache>
>>>>                 <distributed-cache name="actionTokens" mode="SYNC"
>>>> owners="2" statistics-enabled="true">
>>>>                     <eviction max-entries="-1" strategy="NONE"/>
>>>>                     <expiration max-idle="-1" interval="300000"/>
>>>>                 </distributed-cache>
>>>>             </cache-container>
>>>>             <cache-container name="server" aliases="singleton cluster"
>>>> default-cache="default" module="org.wildfly.clustering.server">
>>>>                 <transport lock-timeout="60000"/>
>>>>                 <replicated-cache name="default">
>>>>                     <transaction mode="BATCH"/>
>>>>                 </replicated-cache>
>>>>             </cache-container>
>>>>             <cache-container name="web" default-cache="dist"
>>>> module="org.wildfly.clustering.web.infinispan">
>>>>                 <transport lock-timeout="60000"/>
>>>>                 <distributed-cache name="dist">
>>>>                     <locking isolation="REPEATABLE_READ"/>
>>>>                     <transaction mode="BATCH"/>
>>>>                     <file-store/>
>>>>                 </distributed-cache>
>>>>             </cache-container>
>>>>             <cache-container name="ejb" aliases="sfsb"
>>>> default-cache="dist" module="org.wildfly.clustering.ejb.infinispan">
>>>>                 <transport lock-timeout="60000"/>
>>>>                 <distributed-cache name="dist">
>>>>                     <locking isolation="REPEATABLE_READ"/>
>>>>                     <transaction mode="BATCH"/>
>>>>                     <file-store/>
>>>>                 </distributed-cache>
>>>>             </cache-container>
>>>>             <cache-container name="hibernate"
>>>> default-cache="local-query" module="org.hibernate.infinispan">
>>>>                 <transport lock-timeout="60000"/>
>>>>                 <local-cache name="local-query">
>>>>                     <eviction strategy="LRU" max-entries="10000"/>
>>>>                     <expiration max-idle="100000"/>
>>>>                 </local-cache>
>>>>                 <invalidation-cache name="entity">
>>>>                     <transaction mode="NON_XA"/>
>>>>                     <eviction strategy="LRU" max-entries="10000"/>
>>>>                     <expiration max-idle="100000"/>
>>>>                 </invalidation-cache>
>>>>                 <replicated-cache name="timestamps" mode="ASYNC"/>
>>>>             </cache-container>
>>>>         </subsystem>
>>>>
>>>> The scenario I'm testing:
>>>> 1. Auth with grant_type=password on node1.
>>>> 2. Shut down node1.
>>>> 3. Auth with grant_type=refresh_token on node2.
>>>>
>>>> When client_sessions is not replicated (distributed, with owners=1, as
>>>> in the distribution's standalone-ha.xml), I get this on node2:
>>>> {
>>>>     "error": "invalid_grant",
>>>>     "error_description": "Session doesn't have required client"
>>>> }
>>>>
>>>> When sessions is not replicated:
>>>> {
>>>>     "error": "invalid_grant",
>>>>     "error_description": "Session not active"
>>>> }
>>>>
>>>> On Wed, Sep 19, 2018 at 6:56 AM Sebastian Laskawiec <
>>>> slaskawi at redhat.com> wrote:
>>>>
>>>>> Thanks for letting us know DV!
>>>>>
>>>>> Setting the number of owners equal to the cluster size doesn't make
>>>>> any sense. You might use a replicated cache in that scenarios (which works
>>>>> the same way apart from some Infinispan internal behavior, which can be
>>>>> omitted in your case). Could you please paste your Infinispan
>>>>> configuration? Maybe there's some hint there...
>>>>>
>>>>> Thanks,
>>>>> Seb
>>>>>
>>>>> On Tue, Sep 18, 2018 at 11:02 PM D V <dv at glyphy.com> wrote:
>>>>>
>>>>>> The issue was resolved in a somewhat unexpected way. I had a custom
>>>>>> org.keycloak.storage.UserStorageProviderFactory SPI registered that
>>>>>> returned providers
>>>>>> implementing org.keycloak.storage.user.UserLookupProvider,
>>>>>> but org.keycloak.storage.user.UserLookupProvider#getUserById method wasn't
>>>>>> fully filled out. I just had it return null. It wasn't obvious to me that
>>>>>> it was required (or under what circumstances). Once I implemented it, the
>>>>>> experiments in my original message passed. I did have to set owners to 2
>>>>>> for the "sessions" and "clientSessions" distributed cache infinispan
>>>>>> configs.
>>>>>>
>>>>>> One thing I noticed is that node2 (the one that doesn't get hit on
>>>>>> the initial password auth) has to do a lookup via getUserById the first
>>>>>> time it handles a grant_type=refresh_token auth. Is the data it needs not
>>>>>> shared across the cluster? It seems to be cached only locally on the node.
>>>>>> Just as a test I tried to set all configured non-local caches to be
>>>>>> replicated and it didn't help. Any thoughts about this?
>>>>>>
>>>>>> Thanks,
>>>>>> DV
>>>>>>
>>>>>>>
>
>


More information about the keycloak-user mailing list