Re: [keycloak-user] Standalone HA tokens not immediately shared among nodes

Tuesday, 25 September 2018

Sorry, I did not read whole thread.

Just a quick note, that caches "realms", "users", "keys" and

"authorization" are supposed to be local caches. The pattern, we're 
using ATM is, that every cluster node caches it's data (realms, users 
etc) locally. In case that some objects are updated (EG. realm or 
users), there is separate cache "work", which make sure to notify other 
cluster nodes (or even nodes on all the other DCs), so all the nodes can 
invalidate particular cached object from their caches.

Caches "realms", "users", "keys" and
"authorization" are not meant to be 
replicated/distributed, but local. So this NotSerializableException 
doesn't look like a bug to me.

Marek

On 25/09/18 12:06, Sebastian Laskawiec wrote:
...
 Thanks a lot for checking this.

 This seems like a bug to me, so I filled 
 https://issues.jboss.org/browse/KEYCLOAK-8415. Unfortunately, we are 
 preparing for some urgent work on the product side and I can't promise 
 you when we will be able to look into this. I highly encourage you to 
 contribute a fix if you are in hurry or just subscribe to the ticket 
 and wait till we find a free slot to get it fixed.

 Thanks,
 Sebastian

 On Thu, Sep 20, 2018 at 4:27 PM D V <dv(a)glyphy.com 
 <mailto:dv@glyphy.com>> wrote:

     OK. So, with all caches being replicated, there's an error on
     startup:

     2018-09-20 14:03:38,307 ERROR
     [org.infinispan.remoting.rpc.RpcManagerImpl] (ServerService Thread
     Pool -- 62) ISPN000073: Unexpected error while replicating:
     org.infinispan.commons.marshall.NotSerializableException:
     org.keycloak.models.PasswordPolicy$Builder
     Caused by: an exception which occurred:
     in field org.keycloak.models.PasswordPolicy.builder
     in object org.keycloak.models.PasswordPolicy@6ab5350d
     in field
     org.keycloak.models.cache.infinispan.entities.CachedRealm.passwordPolicy
     in object
     org.keycloak.models.cache.infinispan.entities.CachedRealm@7864be21
     in object
     org.keycloak.models.cache.infinispan.entities.CachedRealm@7864be21
     in object org.infinispan.commands.write.PutKeyValueCommand@fec4dc5e
     in object org.infinispan.commands.remote.SingleRpcCommand@3f2e5d1a

     If I make the realms cache local but leave the rest replicated, I
     observe the same behaviour: the node that didn't issue the
     original set of refresh/access tokens does a getUserById lookup,
     which in my case results in a network call against a remote service.

     I noticed there are caches running that aren't mentioned in the
     config, like userRevisions. These are local and adding them to the
     config as replicated doesn't actually make them as such.

     On Thu, Sep 20, 2018 at 7:36 AM Sebastian Laskawiec
     <slaskawi(a)redhat.com <mailto:slaskawi@redhat.com>> wrote:

         Could you please try to unify the caches? Please replace
         all local-cache and distributed-cache with replicated-cache.

         Even though using distributed caches over replicated ones
         should be the cause, I think those local caches might cause
         the behavior you're describing.

         On Wed, Sep 19, 2018 at 3:21 PM D V <dv(a)glyphy.com
         <mailto:dv@glyphy.com>> wrote:

             Makes sense re: replicated caches. Here's my infinispan
             subsystem config right now:

                     <subsystem xmlns="urn:jboss:domain:infinispan:4.0">
                         <cache-container name="keycloak"
             jndi-name="infinispan/Keycloak"
statistics-enabled="true">
                             <transport lock-timeout="60000"/>
                             <local-cache name="realms"
             statistics-enabled="true">
                                 <eviction max-entries="10000"
             strategy="LRU"/>
                             </local-cache>
                             <local-cache name="users"
             statistics-enabled="true">
                                 <eviction max-entries="10000"
             strategy="LRU"/>
                             </local-cache>

                             <!--
                             These two need to be replicated or the
             node that didn't issue the initial refresh token
                             will return "invalid_grant" errors when
             attempting to auth with that refresh token.
                             -->
                             <replicated-cache name="sessions"
             statistics-enabled="true"/>
                             <replicated-cache name="clientSessions"
             statistics-enabled="true"/>

                             <distributed-cache
             name="authenticationSessions" mode="SYNC"
owners="1"
             statistics-enabled="true"/>
                             <distributed-cache name="offlineSessions"
             mode="SYNC" owners="1"
statistics-enabled="true"/>
                             <distributed-cache
             name="offlineClientSessions" mode="SYNC"
owners="1"
             statistics-enabled="true"/>
                             <distributed-cache name="loginFailures"
             mode="SYNC" owners="1"
statistics-enabled="true"/>
                             <local-cache name="authorization"
             statistics-enabled="true">
                                 <eviction max-entries="10000"
             strategy="LRU"/>
                             </local-cache>
                             <replicated-cache name="work"
mode="SYNC"
             statistics-enabled="true"/>
                             <local-cache name="keys"
             statistics-enabled="true">
                                 <eviction max-entries="1000"
             strategy="LRU"/>
                                 <expiration max-idle="3600000"/>
                             </local-cache>
                             <distributed-cache name="actionTokens"
             mode="SYNC" owners="2"
statistics-enabled="true">
                                 <eviction max-entries="-1"
             strategy="NONE"/>
                                 <expiration max-idle="-1"
             interval="300000"/>
             </distributed-cache>
                         </cache-container>
                         <cache-container name="server"
             aliases="singleton cluster" default-cache="default"
             module="org.wildfly.clustering.server">
                             <transport lock-timeout="60000"/>
                             <replicated-cache name="default">
                                 <transaction mode="BATCH"/>
             </replicated-cache>
                         </cache-container>
                         <cache-container name="web"
             default-cache="dist"
             module="org.wildfly.clustering.web.infinispan">
                             <transport lock-timeout="60000"/>
                             <distributed-cache name="dist">
                                 <locking isolation="REPEATABLE_READ"/>
                                 <transaction mode="BATCH"/>
                                 <file-store/>
             </distributed-cache>
                         </cache-container>
                         <cache-container name="ejb"
aliases="sfsb"
             default-cache="dist"
             module="org.wildfly.clustering.ejb.infinispan">
                             <transport lock-timeout="60000"/>
                             <distributed-cache name="dist">
                                 <locking isolation="REPEATABLE_READ"/>
                                 <transaction mode="BATCH"/>
                                 <file-store/>
             </distributed-cache>
                         </cache-container>
                         <cache-container name="hibernate"
             default-cache="local-query"
module="org.hibernate.infinispan">
                             <transport lock-timeout="60000"/>
                             <local-cache name="local-query">
                                 <eviction strategy="LRU"
             max-entries="10000"/>
                                 <expiration max-idle="100000"/>
                             </local-cache>
                             <invalidation-cache name="entity">
                                 <transaction mode="NON_XA"/>
                                 <eviction strategy="LRU"
             max-entries="10000"/>
                                 <expiration max-idle="100000"/>
             </invalidation-cache>
                             <replicated-cache name="timestamps"
             mode="ASYNC"/>
                         </cache-container>
                     </subsystem>

             The scenario I'm testing:
             1. Auth with grant_type=password on node1.
             2. Shut down node1.
             3. Auth with grant_type=refresh_token on node2.

             When client_sessions is not replicated (distributed, with
             owners=1, as in the distribution's standalone-ha.xml), I
             get this on node2:
             {
                 "error": "invalid_grant",
                 "error_description": "Session doesn't have required
             client"
             }

             When sessions is not replicated:
             {
                 "error": "invalid_grant",
                 "error_description": "Session not active"
             }

             On Wed, Sep 19, 2018 at 6:56 AM Sebastian Laskawiec
             <slaskawi(a)redhat.com <mailto:slaskawi@redhat.com>> wrote:

                 Thanks for letting us know DV!

                 Setting the number of owners equal to the cluster size
                 doesn't make any sense. You might use a replicated
                 cache in that scenarios (which works the same way
                 apart from some Infinispan internal behavior, which
                 can be omitted in your case). Could you please paste
                 your Infinispan configuration? Maybe there's some hint
                 there...

                 Thanks,
                 Seb

                 On Tue, Sep 18, 2018 at 11:02 PM D V <dv(a)glyphy.com
                 <mailto:dv@glyphy.com>> wrote:

                     The issue was resolved in a somewhat unexpected
                     way. I had a custom
                     org.keycloak.storage.UserStorageProviderFactory
                     SPI registered that returned providers
                     implementing org.keycloak.storage.user.UserLookupProvider,
                     but org.keycloak.storage.user.UserLookupProvider#getUserById
                     method wasn't fully filled out. I just had it
                     return null. It wasn't obvious to me that it was
                     required (or under what circumstances). Once I
                     implemented it, the experiments in my original
                     message passed. I did have to set owners to 2 for
                     the "sessions" and "clientSessions" distributed
                     cache infinispan configs.

                     One thing I noticed is that node2 (the one that
                     doesn't get hit on the initial password auth) has
                     to do a lookup via getUserById the first time it
                     handles a grant_type=refresh_token auth. Is the
                     data it needs not shared across the cluster? It
                     seems to be cached only locally on the node. Just
                     as a test I tried to set all configured non-local
                     caches to be replicated and it didn't help. Any
                     thoughts about this?

                     Thanks,
                     DV

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

Re: [keycloak-user] Standalone HA tokens not immediately shared among nodes