[keycloak-user] Standalone HA tokens not immediately shared among nodes

Tue Sep 25 06:06:45 EDT 2018

Thanks a lot for checking this.

This seems like a bug to me, so I filled
https://issues.jboss.org/browse/KEYCLOAK-8415. Unfortunately, we are
preparing for some urgent work on the product side and I can't promise you
when we will be able to look into this. I highly encourage you to
contribute a fix if you are in hurry or just subscribe to the ticket and
wait till we find a free slot to get it fixed.

Thanks,
Sebastian

On Thu, Sep 20, 2018 at 4:27 PM D V <dv at glyphy.com> wrote:

> OK. So, with all caches being replicated, there's an error on startup:
>
> 2018-09-20 14:03:38,307 ERROR [org.infinispan.remoting.rpc.RpcManagerImpl]
> (ServerService Thread Pool -- 62) ISPN000073: Unexpected error while
> replicating: org.infinispan.commons.marshall.NotSerializableException:
> org.keycloak.models.PasswordPolicy$Builder
> Caused by: an exception which occurred:
> in field org.keycloak.models.PasswordPolicy.builder
> in object org.keycloak.models.PasswordPolicy at 6ab5350d
> in field
> org.keycloak.models.cache.infinispan.entities.CachedRealm.passwordPolicy
> in object
> org.keycloak.models.cache.infinispan.entities.CachedRealm at 7864be21
> in object
> org.keycloak.models.cache.infinispan.entities.CachedRealm at 7864be21
> in object org.infinispan.commands.write.PutKeyValueCommand at fec4dc5e
> in object org.infinispan.commands.remote.SingleRpcCommand at 3f2e5d1a
>
> If I make the realms cache local but leave the rest replicated, I observe
> the same behaviour: the node that didn't issue the original set of
> refresh/access tokens does a getUserById lookup, which in my case results
> in a network call against a remote service.
>
> I noticed there are caches running that aren't mentioned in the config,
> like userRevisions. These are local and adding them to the config as
> replicated doesn't actually make them as such.
>
> On Thu, Sep 20, 2018 at 7:36 AM Sebastian Laskawiec <slaskawi at redhat.com>
> wrote:
>
>> Could you please try to unify the caches? Please replace all local-cache
>> and distributed-cache with replicated-cache.
>>
>> Even though using distributed caches over replicated ones should be the
>> cause, I think those local caches might cause the behavior you're
>> describing.
>>
>> On Wed, Sep 19, 2018 at 3:21 PM D V <dv at glyphy.com> wrote:
>>
>>> Makes sense re: replicated caches. Here's my infinispan subsystem config
>>> right now:
>>>
>>>         <subsystem xmlns="urn:jboss:domain:infinispan:4.0">
>>>             <cache-container name="keycloak"
>>> jndi-name="infinispan/Keycloak" statistics-enabled="true">
>>>                 <transport lock-timeout="60000"/>
>>>                 <local-cache name="realms" statistics-enabled="true">
>>>                     <eviction max-entries="10000" strategy="LRU"/>
>>>                 </local-cache>
>>>                 <local-cache name="users" statistics-enabled="true">
>>>                     <eviction max-entries="10000" strategy="LRU"/>
>>>                 </local-cache>
>>>
>>>                 <!--
>>>                 These two need to be replicated or the node that didn't
>>> issue the initial refresh token
>>>                 will return "invalid_grant" errors when attempting to
>>> auth with that refresh token.
>>>                 -->
>>>                 <replicated-cache name="sessions"
>>> statistics-enabled="true"/>
>>>                 <replicated-cache name="clientSessions"
>>> statistics-enabled="true"/>
>>>
>>>                 <distributed-cache name="authenticationSessions"
>>> mode="SYNC" owners="1" statistics-enabled="true"/>
>>>                 <distributed-cache name="offlineSessions" mode="SYNC"
>>> owners="1" statistics-enabled="true"/>
>>>                 <distributed-cache name="offlineClientSessions"
>>> mode="SYNC" owners="1" statistics-enabled="true"/>
>>>                 <distributed-cache name="loginFailures" mode="SYNC"
>>> owners="1" statistics-enabled="true"/>
>>>                 <local-cache name="authorization"
>>> statistics-enabled="true">
>>>                     <eviction max-entries="10000" strategy="LRU"/>
>>>                 </local-cache>
>>>                 <replicated-cache name="work" mode="SYNC"
>>> statistics-enabled="true"/>
>>>                 <local-cache name="keys" statistics-enabled="true">
>>>                     <eviction max-entries="1000" strategy="LRU"/>
>>>                     <expiration max-idle="3600000"/>
>>>                 </local-cache>
>>>                 <distributed-cache name="actionTokens" mode="SYNC"
>>> owners="2" statistics-enabled="true">
>>>                     <eviction max-entries="-1" strategy="NONE"/>
>>>                     <expiration max-idle="-1" interval="300000"/>
>>>                 </distributed-cache>
>>>             </cache-container>
>>>             <cache-container name="server" aliases="singleton cluster"
>>> default-cache="default" module="org.wildfly.clustering.server">
>>>                 <transport lock-timeout="60000"/>
>>>                 <replicated-cache name="default">
>>>                     <transaction mode="BATCH"/>
>>>                 </replicated-cache>
>>>             </cache-container>
>>>             <cache-container name="web" default-cache="dist"
>>> module="org.wildfly.clustering.web.infinispan">
>>>                 <transport lock-timeout="60000"/>
>>>                 <distributed-cache name="dist">
>>>                     <locking isolation="REPEATABLE_READ"/>
>>>                     <transaction mode="BATCH"/>
>>>                     <file-store/>
>>>                 </distributed-cache>
>>>             </cache-container>
>>>             <cache-container name="ejb" aliases="sfsb"
>>> default-cache="dist" module="org.wildfly.clustering.ejb.infinispan">
>>>                 <transport lock-timeout="60000"/>
>>>                 <distributed-cache name="dist">
>>>                     <locking isolation="REPEATABLE_READ"/>
>>>                     <transaction mode="BATCH"/>
>>>                     <file-store/>
>>>                 </distributed-cache>
>>>             </cache-container>
>>>             <cache-container name="hibernate"
>>> default-cache="local-query" module="org.hibernate.infinispan">
>>>                 <transport lock-timeout="60000"/>
>>>                 <local-cache name="local-query">
>>>                     <eviction strategy="LRU" max-entries="10000"/>
>>>                     <expiration max-idle="100000"/>
>>>                 </local-cache>
>>>                 <invalidation-cache name="entity">
>>>                     <transaction mode="NON_XA"/>
>>>                     <eviction strategy="LRU" max-entries="10000"/>
>>>                     <expiration max-idle="100000"/>
>>>                 </invalidation-cache>
>>>                 <replicated-cache name="timestamps" mode="ASYNC"/>
>>>             </cache-container>
>>>         </subsystem>
>>>
>>> The scenario I'm testing:
>>> 1. Auth with grant_type=password on node1.
>>> 2. Shut down node1.
>>> 3. Auth with grant_type=refresh_token on node2.
>>>
>>> When client_sessions is not replicated (distributed, with owners=1, as
>>> in the distribution's standalone-ha.xml), I get this on node2:
>>> {
>>>     "error": "invalid_grant",
>>>     "error_description": "Session doesn't have required client"
>>> }
>>>
>>> When sessions is not replicated:
>>> {
>>>     "error": "invalid_grant",
>>>     "error_description": "Session not active"
>>> }
>>>
>>> On Wed, Sep 19, 2018 at 6:56 AM Sebastian Laskawiec <slaskawi at redhat.com>
>>> wrote:
>>>
>>>> Thanks for letting us know DV!
>>>>
>>>> Setting the number of owners equal to the cluster size doesn't make any
>>>> sense. You might use a replicated cache in that scenarios (which works the
>>>> same way apart from some Infinispan internal behavior, which can be omitted
>>>> in your case). Could you please paste your Infinispan configuration? Maybe
>>>> there's some hint there...
>>>>
>>>> Thanks,
>>>> Seb
>>>>
>>>> On Tue, Sep 18, 2018 at 11:02 PM D V <dv at glyphy.com> wrote:
>>>>
>>>>> The issue was resolved in a somewhat unexpected way. I had a custom
>>>>> org.keycloak.storage.UserStorageProviderFactory SPI registered that
>>>>> returned providers
>>>>> implementing org.keycloak.storage.user.UserLookupProvider,
>>>>> but org.keycloak.storage.user.UserLookupProvider#getUserById method wasn't
>>>>> fully filled out. I just had it return null. It wasn't obvious to me that
>>>>> it was required (or under what circumstances). Once I implemented it, the
>>>>> experiments in my original message passed. I did have to set owners to 2
>>>>> for the "sessions" and "clientSessions" distributed cache infinispan
>>>>> configs.
>>>>>
>>>>> One thing I noticed is that node2 (the one that doesn't get hit on the
>>>>> initial password auth) has to do a lookup via getUserById the first time it
>>>>> handles a grant_type=refresh_token auth. Is the data it needs not shared
>>>>> across the cluster? It seems to be cached only locally on the node. Just as
>>>>> a test I tried to set all configured non-local caches to be replicated and
>>>>> it didn't help. Any thoughts about this?
>>>>>
>>>>> Thanks,
>>>>> DV
>>>>>
>>>>>>