[keycloak-user] Standalone HA tokens not immediately shared among nodes
Marek Posolda
mposolda at redhat.com
Tue Sep 25 09:14:24 EDT 2018
Some more info about our caches:
https://www.keycloak.org/docs/latest/server_installation/index.html#cache-configuration
Not sure if this info should be updated and some more things to be
clearified?
Marek
On 25/09/18 15:12, Marek Posolda wrote:
> Sorry, I did not read whole thread.
>
> Just a quick note, that caches "realms", "users", "keys" and
> "authorization" are supposed to be local caches. The pattern, we're
> using ATM is, that every cluster node caches it's data (realms, users
> etc) locally. In case that some objects are updated (EG. realm or
> users), there is separate cache "work", which make sure to notify
> other cluster nodes (or even nodes on all the other DCs), so all the
> nodes can invalidate particular cached object from their caches.
>
> Caches "realms", "users", "keys" and "authorization" are not meant to
> be replicated/distributed, but local. So this NotSerializableException
> doesn't look like a bug to me.
>
> Marek
>
> On 25/09/18 12:06, Sebastian Laskawiec wrote:
>> Thanks a lot for checking this.
>>
>> This seems like a bug to me, so I filled
>> https://issues.jboss.org/browse/KEYCLOAK-8415. Unfortunately, we are
>> preparing for some urgent work on the product side and I can't
>> promise you when we will be able to look into this. I highly
>> encourage you to contribute a fix if you are in hurry or just
>> subscribe to the ticket and wait till we find a free slot to get it
>> fixed.
>>
>> Thanks,
>> Sebastian
>>
>> On Thu, Sep 20, 2018 at 4:27 PM D V <dv at glyphy.com
>> <mailto:dv at glyphy.com>> wrote:
>>
>> OK. So, with all caches being replicated, there's an error on
>> startup:
>>
>> 2018-09-20 14:03:38,307 ERROR
>> [org.infinispan.remoting.rpc.RpcManagerImpl] (ServerService
>> Thread Pool -- 62) ISPN000073: Unexpected error while
>> replicating:
>> org.infinispan.commons.marshall.NotSerializableException:
>> org.keycloak.models.PasswordPolicy$Builder
>> Caused by: an exception which occurred:
>> in field org.keycloak.models.PasswordPolicy.builder
>> in object org.keycloak.models.PasswordPolicy at 6ab5350d
>> in field
>> org.keycloak.models.cache.infinispan.entities.CachedRealm.passwordPolicy
>> in object
>> org.keycloak.models.cache.infinispan.entities.CachedRealm at 7864be21
>> in object
>> org.keycloak.models.cache.infinispan.entities.CachedRealm at 7864be21
>> in object org.infinispan.commands.write.PutKeyValueCommand at fec4dc5e
>> in object org.infinispan.commands.remote.SingleRpcCommand at 3f2e5d1a
>>
>> If I make the realms cache local but leave the rest replicated, I
>> observe the same behaviour: the node that didn't issue the
>> original set of refresh/access tokens does a getUserById lookup,
>> which in my case results in a network call against a remote service.
>>
>> I noticed there are caches running that aren't mentioned in the
>> config, like userRevisions. These are local and adding them to
>> the config as replicated doesn't actually make them as such.
>>
>> On Thu, Sep 20, 2018 at 7:36 AM Sebastian Laskawiec
>> <slaskawi at redhat.com <mailto:slaskawi at redhat.com>> wrote:
>>
>> Could you please try to unify the caches? Please replace
>> all local-cache and distributed-cache with replicated-cache.
>>
>> Even though using distributed caches over replicated ones
>> should be the cause, I think those local caches might cause
>> the behavior you're describing.
>>
>> On Wed, Sep 19, 2018 at 3:21 PM D V <dv at glyphy.com
>> <mailto:dv at glyphy.com>> wrote:
>>
>> Makes sense re: replicated caches. Here's my infinispan
>> subsystem config right now:
>>
>> <subsystem xmlns="urn:jboss:domain:infinispan:4.0">
>> <cache-container name="keycloak"
>> jndi-name="infinispan/Keycloak" statistics-enabled="true">
>> <transport lock-timeout="60000"/>
>> <local-cache name="realms"
>> statistics-enabled="true">
>> <eviction max-entries="10000"
>> strategy="LRU"/>
>> </local-cache>
>> <local-cache name="users"
>> statistics-enabled="true">
>> <eviction max-entries="10000"
>> strategy="LRU"/>
>> </local-cache>
>>
>> <!--
>> These two need to be replicated or the
>> node that didn't issue the initial refresh token
>> will return "invalid_grant" errors when
>> attempting to auth with that refresh token.
>> -->
>> <replicated-cache name="sessions"
>> statistics-enabled="true"/>
>> <replicated-cache name="clientSessions"
>> statistics-enabled="true"/>
>>
>> <distributed-cache
>> name="authenticationSessions" mode="SYNC" owners="1"
>> statistics-enabled="true"/>
>> <distributed-cache name="offlineSessions"
>> mode="SYNC" owners="1" statistics-enabled="true"/>
>> <distributed-cache
>> name="offlineClientSessions" mode="SYNC" owners="1"
>> statistics-enabled="true"/>
>> <distributed-cache name="loginFailures"
>> mode="SYNC" owners="1" statistics-enabled="true"/>
>> <local-cache name="authorization"
>> statistics-enabled="true">
>> <eviction max-entries="10000"
>> strategy="LRU"/>
>> </local-cache>
>> <replicated-cache name="work" mode="SYNC"
>> statistics-enabled="true"/>
>> <local-cache name="keys"
>> statistics-enabled="true">
>> <eviction max-entries="1000"
>> strategy="LRU"/>
>> <expiration max-idle="3600000"/>
>> </local-cache>
>> <distributed-cache name="actionTokens"
>> mode="SYNC" owners="2" statistics-enabled="true">
>> <eviction max-entries="-1"
>> strategy="NONE"/>
>> <expiration max-idle="-1"
>> interval="300000"/>
>> </distributed-cache>
>> </cache-container>
>> <cache-container name="server"
>> aliases="singleton cluster" default-cache="default"
>> module="org.wildfly.clustering.server">
>> <transport lock-timeout="60000"/>
>> <replicated-cache name="default">
>> <transaction mode="BATCH"/>
>> </replicated-cache>
>> </cache-container>
>> <cache-container name="web"
>> default-cache="dist"
>> module="org.wildfly.clustering.web.infinispan">
>> <transport lock-timeout="60000"/>
>> <distributed-cache name="dist">
>> <locking isolation="REPEATABLE_READ"/>
>> <transaction mode="BATCH"/>
>> <file-store/>
>> </distributed-cache>
>> </cache-container>
>> <cache-container name="ejb" aliases="sfsb"
>> default-cache="dist"
>> module="org.wildfly.clustering.ejb.infinispan">
>> <transport lock-timeout="60000"/>
>> <distributed-cache name="dist">
>> <locking isolation="REPEATABLE_READ"/>
>> <transaction mode="BATCH"/>
>> <file-store/>
>> </distributed-cache>
>> </cache-container>
>> <cache-container name="hibernate"
>> default-cache="local-query"
>> module="org.hibernate.infinispan">
>> <transport lock-timeout="60000"/>
>> <local-cache name="local-query">
>> <eviction strategy="LRU"
>> max-entries="10000"/>
>> <expiration max-idle="100000"/>
>> </local-cache>
>> <invalidation-cache name="entity">
>> <transaction mode="NON_XA"/>
>> <eviction strategy="LRU"
>> max-entries="10000"/>
>> <expiration max-idle="100000"/>
>> </invalidation-cache>
>> <replicated-cache name="timestamps"
>> mode="ASYNC"/>
>> </cache-container>
>> </subsystem>
>>
>> The scenario I'm testing:
>> 1. Auth with grant_type=password on node1.
>> 2. Shut down node1.
>> 3. Auth with grant_type=refresh_token on node2.
>>
>> When client_sessions is not replicated (distributed, with
>> owners=1, as in the distribution's standalone-ha.xml), I
>> get this on node2:
>> {
>> "error": "invalid_grant",
>> "error_description": "Session doesn't have required
>> client"
>> }
>>
>> When sessions is not replicated:
>> {
>> "error": "invalid_grant",
>> "error_description": "Session not active"
>> }
>>
>> On Wed, Sep 19, 2018 at 6:56 AM Sebastian Laskawiec
>> <slaskawi at redhat.com <mailto:slaskawi at redhat.com>> wrote:
>>
>> Thanks for letting us know DV!
>>
>> Setting the number of owners equal to the cluster
>> size doesn't make any sense. You might use a
>> replicated cache in that scenarios (which works the
>> same way apart from some Infinispan internal
>> behavior, which can be omitted in your case). Could
>> you please paste your Infinispan configuration? Maybe
>> there's some hint there...
>>
>> Thanks,
>> Seb
>>
>> On Tue, Sep 18, 2018 at 11:02 PM D V <dv at glyphy.com
>> <mailto:dv at glyphy.com>> wrote:
>>
>> The issue was resolved in a somewhat unexpected
>> way. I had a custom
>> org.keycloak.storage.UserStorageProviderFactory
>> SPI registered that returned providers
>> implementing org.keycloak.storage.user.UserLookupProvider,
>> but org.keycloak.storage.user.UserLookupProvider#getUserById
>> method wasn't fully filled out. I just had it
>> return null. It wasn't obvious to me that it was
>> required (or under what circumstances). Once I
>> implemented it, the experiments in my original
>> message passed. I did have to set owners to 2 for
>> the "sessions" and "clientSessions" distributed
>> cache infinispan configs.
>>
>> One thing I noticed is that node2 (the one that
>> doesn't get hit on the initial password auth) has
>> to do a lookup via getUserById the first time it
>> handles a grant_type=refresh_token auth. Is the
>> data it needs not shared across the cluster? It
>> seems to be cached only locally on the node. Just
>> as a test I tried to set all configured non-local
>> caches to be replicated and it didn't help. Any
>> thoughts about this?
>>
>> Thanks,
>> DV
>>
>
More information about the keycloak-user
mailing list