[keycloak-user] Standalone HA tokens not immediately shared among nodes

Wed Sep 19 09:21:21 EDT 2018

Makes sense re: replicated caches. Here's my infinispan subsystem config
right now:

        <subsystem xmlns="urn:jboss:domain:infinispan:4.0">
            <cache-container name="keycloak"
jndi-name="infinispan/Keycloak" statistics-enabled="true">
                <transport lock-timeout="60000"/>
                <local-cache name="realms" statistics-enabled="true">
                    <eviction max-entries="10000" strategy="LRU"/>
                </local-cache>
                <local-cache name="users" statistics-enabled="true">
                    <eviction max-entries="10000" strategy="LRU"/>
                </local-cache>

                <!--
                These two need to be replicated or the node that didn't
issue the initial refresh token
                will return "invalid_grant" errors when attempting to auth
with that refresh token.
                -->
                <replicated-cache name="sessions"
statistics-enabled="true"/>
                <replicated-cache name="clientSessions"
statistics-enabled="true"/>

                <distributed-cache name="authenticationSessions"
mode="SYNC" owners="1" statistics-enabled="true"/>
                <distributed-cache name="offlineSessions" mode="SYNC"
owners="1" statistics-enabled="true"/>
                <distributed-cache name="offlineClientSessions" mode="SYNC"
owners="1" statistics-enabled="true"/>
                <distributed-cache name="loginFailures" mode="SYNC"
owners="1" statistics-enabled="true"/>
                <local-cache name="authorization" statistics-enabled="true">
                    <eviction max-entries="10000" strategy="LRU"/>
                </local-cache>
                <replicated-cache name="work" mode="SYNC"
statistics-enabled="true"/>
                <local-cache name="keys" statistics-enabled="true">
                    <eviction max-entries="1000" strategy="LRU"/>
                    <expiration max-idle="3600000"/>
                </local-cache>
                <distributed-cache name="actionTokens" mode="SYNC"
owners="2" statistics-enabled="true">
                    <eviction max-entries="-1" strategy="NONE"/>
                    <expiration max-idle="-1" interval="300000"/>
                </distributed-cache>
            </cache-container>
            <cache-container name="server" aliases="singleton cluster"
default-cache="default" module="org.wildfly.clustering.server">
                <transport lock-timeout="60000"/>
                <replicated-cache name="default">
                    <transaction mode="BATCH"/>
                </replicated-cache>
            </cache-container>
            <cache-container name="web" default-cache="dist"
module="org.wildfly.clustering.web.infinispan">
                <transport lock-timeout="60000"/>
                <distributed-cache name="dist">
                    <locking isolation="REPEATABLE_READ"/>
                    <transaction mode="BATCH"/>
                    <file-store/>
                </distributed-cache>
            </cache-container>
            <cache-container name="ejb" aliases="sfsb" default-cache="dist"
module="org.wildfly.clustering.ejb.infinispan">
                <transport lock-timeout="60000"/>
                <distributed-cache name="dist">
                    <locking isolation="REPEATABLE_READ"/>
                    <transaction mode="BATCH"/>
                    <file-store/>
                </distributed-cache>
            </cache-container>
            <cache-container name="hibernate" default-cache="local-query"
module="org.hibernate.infinispan">
                <transport lock-timeout="60000"/>
                <local-cache name="local-query">
                    <eviction strategy="LRU" max-entries="10000"/>
                    <expiration max-idle="100000"/>
                </local-cache>
                <invalidation-cache name="entity">
                    <transaction mode="NON_XA"/>
                    <eviction strategy="LRU" max-entries="10000"/>
                    <expiration max-idle="100000"/>
                </invalidation-cache>
                <replicated-cache name="timestamps" mode="ASYNC"/>
            </cache-container>
        </subsystem>

The scenario I'm testing:
1. Auth with grant_type=password on node1.
2. Shut down node1.
3. Auth with grant_type=refresh_token on node2.

When client_sessions is not replicated (distributed, with owners=1, as in
the distribution's standalone-ha.xml), I get this on node2:
{
    "error": "invalid_grant",
    "error_description": "Session doesn't have required client"
}

When sessions is not replicated:
{
    "error": "invalid_grant",
    "error_description": "Session not active"
}

On Wed, Sep 19, 2018 at 6:56 AM Sebastian Laskawiec <slaskawi at redhat.com>
wrote:

> Thanks for letting us know DV!
>
> Setting the number of owners equal to the cluster size doesn't make any
> sense. You might use a replicated cache in that scenarios (which works the
> same way apart from some Infinispan internal behavior, which can be omitted
> in your case). Could you please paste your Infinispan configuration? Maybe
> there's some hint there...
>
> Thanks,
> Seb
>
> On Tue, Sep 18, 2018 at 11:02 PM D V <dv at glyphy.com> wrote:
>
>> The issue was resolved in a somewhat unexpected way. I had a custom
>> org.keycloak.storage.UserStorageProviderFactory SPI registered that
>> returned providers
>> implementing org.keycloak.storage.user.UserLookupProvider,
>> but org.keycloak.storage.user.UserLookupProvider#getUserById method wasn't
>> fully filled out. I just had it return null. It wasn't obvious to me that
>> it was required (or under what circumstances). Once I implemented it, the
>> experiments in my original message passed. I did have to set owners to 2
>> for the "sessions" and "clientSessions" distributed cache infinispan
>> configs.
>>
>> One thing I noticed is that node2 (the one that doesn't get hit on the
>> initial password auth) has to do a lookup via getUserById the first time it
>> handles a grant_type=refresh_token auth. Is the data it needs not shared
>> across the cluster? It seems to be cached only locally on the node. Just as
>> a test I tried to set all configured non-local caches to be replicated and
>> it didn't help. Any thoughts about this?
>>
>> Thanks,
>> DV
>>
>>>