Could you please try to unify the caches? Please replace all local-cache
and distributed-cache with replicated-cache.
Even though using distributed caches over replicated ones should be the
cause, I think those local caches might cause the behavior you're
describing.
On Wed, Sep 19, 2018 at 3:21 PM D V <dv(a)glyphy.com> wrote:
Makes sense re: replicated caches. Here's my infinispan subsystem
config
right now:
<subsystem xmlns="urn:jboss:domain:infinispan:4.0">
<cache-container name="keycloak"
jndi-name="infinispan/Keycloak" statistics-enabled="true">
<transport lock-timeout="60000"/>
<local-cache name="realms"
statistics-enabled="true">
<eviction max-entries="10000"
strategy="LRU"/>
</local-cache>
<local-cache name="users"
statistics-enabled="true">
<eviction max-entries="10000"
strategy="LRU"/>
</local-cache>
<!--
These two need to be replicated or the node that didn't
issue the initial refresh token
will return "invalid_grant" errors when attempting to auth
with that refresh token.
-->
<replicated-cache name="sessions"
statistics-enabled="true"/>
<replicated-cache name="clientSessions"
statistics-enabled="true"/>
<distributed-cache name="authenticationSessions"
mode="SYNC" owners="1" statistics-enabled="true"/>
<distributed-cache name="offlineSessions"
mode="SYNC"
owners="1" statistics-enabled="true"/>
<distributed-cache name="offlineClientSessions"
mode="SYNC" owners="1" statistics-enabled="true"/>
<distributed-cache name="loginFailures"
mode="SYNC"
owners="1" statistics-enabled="true"/>
<local-cache name="authorization"
statistics-enabled="true">
<eviction max-entries="10000"
strategy="LRU"/>
</local-cache>
<replicated-cache name="work" mode="SYNC"
statistics-enabled="true"/>
<local-cache name="keys"
statistics-enabled="true">
<eviction max-entries="1000"
strategy="LRU"/>
<expiration max-idle="3600000"/>
</local-cache>
<distributed-cache name="actionTokens"
mode="SYNC"
owners="2" statistics-enabled="true">
<eviction max-entries="-1"
strategy="NONE"/>
<expiration max-idle="-1"
interval="300000"/>
</distributed-cache>
</cache-container>
<cache-container name="server" aliases="singleton
cluster"
default-cache="default" module="org.wildfly.clustering.server">
<transport lock-timeout="60000"/>
<replicated-cache name="default">
<transaction mode="BATCH"/>
</replicated-cache>
</cache-container>
<cache-container name="web" default-cache="dist"
module="org.wildfly.clustering.web.infinispan">
<transport lock-timeout="60000"/>
<distributed-cache name="dist">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</distributed-cache>
</cache-container>
<cache-container name="ejb" aliases="sfsb"
default-cache="dist"
module="org.wildfly.clustering.ejb.infinispan">
<transport lock-timeout="60000"/>
<distributed-cache name="dist">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</distributed-cache>
</cache-container>
<cache-container name="hibernate"
default-cache="local-query"
module="org.hibernate.infinispan">
<transport lock-timeout="60000"/>
<local-cache name="local-query">
<eviction strategy="LRU"
max-entries="10000"/>
<expiration max-idle="100000"/>
</local-cache>
<invalidation-cache name="entity">
<transaction mode="NON_XA"/>
<eviction strategy="LRU"
max-entries="10000"/>
<expiration max-idle="100000"/>
</invalidation-cache>
<replicated-cache name="timestamps"
mode="ASYNC"/>
</cache-container>
</subsystem>
The scenario I'm testing:
1. Auth with grant_type=password on node1.
2. Shut down node1.
3. Auth with grant_type=refresh_token on node2.
When client_sessions is not replicated (distributed, with owners=1, as in
the distribution's standalone-ha.xml), I get this on node2:
{
"error": "invalid_grant",
"error_description": "Session doesn't have required client"
}
When sessions is not replicated:
{
"error": "invalid_grant",
"error_description": "Session not active"
}
On Wed, Sep 19, 2018 at 6:56 AM Sebastian Laskawiec <slaskawi(a)redhat.com>
wrote:
> Thanks for letting us know DV!
>
> Setting the number of owners equal to the cluster size doesn't make any
> sense. You might use a replicated cache in that scenarios (which works the
> same way apart from some Infinispan internal behavior, which can be omitted
> in your case). Could you please paste your Infinispan configuration? Maybe
> there's some hint there...
>
> Thanks,
> Seb
>
> On Tue, Sep 18, 2018 at 11:02 PM D V <dv(a)glyphy.com> wrote:
>
>> The issue was resolved in a somewhat unexpected way. I had a custom
>> org.keycloak.storage.UserStorageProviderFactory SPI registered that
>> returned providers
>> implementing org.keycloak.storage.user.UserLookupProvider,
>> but org.keycloak.storage.user.UserLookupProvider#getUserById method wasn't
>> fully filled out. I just had it return null. It wasn't obvious to me that
>> it was required (or under what circumstances). Once I implemented it, the
>> experiments in my original message passed. I did have to set owners to 2
>> for the "sessions" and "clientSessions" distributed cache
infinispan
>> configs.
>>
>> One thing I noticed is that node2 (the one that doesn't get hit on the
>> initial password auth) has to do a lookup via getUserById the first time it
>> handles a grant_type=refresh_token auth. Is the data it needs not shared
>> across the cluster? It seems to be cached only locally on the node. Just as
>> a test I tried to set all configured non-local caches to be replicated and
>> it didn't help. Any thoughts about this?
>>
>> Thanks,
>> DV
>>
>>>