[keycloak-dev] Issues with clustered invalidation caches

Sat Nov 5 10:11:05 EDT 2016

We don't currently evict the user cache if a realm has changed. This is 
an issue if somebody has modified a UserStorageProvider. Should we evict 
user cache for any realm changes?  Or should I modify the cache 
RealmAdapter and detect whether a UserStorageProvider is being modified 
(or any of its subcomponents(mappers)).?

On 11/3/16 1:10 PM, Bill Burke wrote:
> +1 to this.  We are already hacking Infinispan horribly and doing nasty
> work arounds to get the behavior we want.
>
> Some notes:
>
> * You have to be sure to invalidate indexes (email->user,
> username->user) correctly
>
> * We cache relationships in some scenarios, i.e. the list of clients in
> a realm.  Your proposal makes things so much easier for caching
> relationships as we know the difference between a realm delete and a
> realm update and a client create, update, and removal
>
> * Cache is currently not interacted with at all for queries that return
> multiple users (getUser(), searchForUser, etc.).  We invalidate the
> realm on these queries as we don't know if the caller is going to
> perform any updates on those users.
>
> * Role and group removal is a tricky thing as role/group mappings are
> cached with the user.  You still have the iteration problem discusssed
> earlier and you have to decide which is more efficient, evicting all
> users in the realm, or iterating over all users that might have this
> role/group mapping.  There's also client scope mapping, client mappers,
> and broker mappers which may also reference the role/group.
>
> * If a user storage provider is removed, this is also a tricky scenario
> and you may be better off evicting all users instead of iterating over
> all cached users to find all that were loaded by this provider.
>
> * Client removal is tricky because you also have to remove all user role
> mappings, scope mappings, that are cached.
>
> There may be other cases I'm missing.
>
>
> On 11/3/16 7:06 AM, Marek Posolda wrote:
>> I was looking at the cache issue reported by customer. I found the cause
>> of it and couple of other related issues:
>>
>> KEYCLOAK-3857 - Bad performance with clustered invalidation cache when
>> updating object
>> KEYCLOAK-3858 - Removing model object send lots of invalidation messages
>> across cluster
>> KEYCLOAK-3859 - Lots of userCache invalidation messages when
>> invalidating realm
>> KEYCLOAK-3860 - All realm users are invalidated from cache when removing
>> some realm object
>>
>>
>> In shortcut, our cache works fine in local mode. But in cluster, there
>> are issues with the invalidation caches . We don't have issues with
>> stale entries, but this is purchased but lots of various performance
>> issues like:
>>
>> - There are lots of invalidation messages sent across the cluster
>>
>> - Eviction on the node, which received invalidation event, is also very
>> uneffective. For example evicting realm with 1000 roles needs to call
>> 1000 predicates, which iterates the cache 1000 times.
>>
>> - Invalidation cache doesn't allow to differ between the context why the
>> object was invalidated. For example when I update realm settings on
>> node1, I need to invalidate just the CachedRealm object, but not all the
>> other objects dependent on the realm. However the invalidation event
>> received on node2 doesn't know, if I invalidated CachedRealm because of
>> realm update or because of realm removal. So for more safety, it assumes
>> removal, which evicts all realm objects! See
>> https://issues.jboss.org/browse/KEYCLOAK-3857 for details.
>>
>> - Finally we have the workaround with the "invalidation.key" objects in
>> our invalidation caches. This is currently needed because when
>> invalidating object on node1, the invalidation event is NOT received on
>> node2 unless the object is there. Hence the workaround with the
>> "invalidation.key" records just to avoid this limitation of invalidation
>> cache.
>>
>>
>> For solve all these issues, I propose:
>> - Instead of relying on invalidation caches, we will send notification
>> across cluster what happened (eg. message "realm XY was updated"). All
>> the nodes will receive this notification and will evict all their
>> locally cached objects accordingly and bump their revisions locally.
>> This would be much more stable, performant and will allow us to remove
>> some workarounds.
>>
>> Some details:
>>
>> - The caches "realms" and "users" won't be "invalidation" caches, but
>> they will be "local" caches.
>>
>> - When any object needs to be removed from cache because of some reason
>> (eg. updating realm), the notification message will be sent from node1
>> to all other cluster nodes. We will use the replicated cache for that.
>> Node1 will send the notification message like "realm XY was updated" .
>>
>> - Other cluster nodes will receive this message and they will locally
>> trigger evictions of all the objects dependent on particular realm. In
>> case of realm update, it's just the CachedRealm object itself. In case
>> of realm removal, it is all realm objects etc.
>>
>> - Note message will contain also context "realm XY was updated" or
>> "realm XY was removed" . Not just "invalidate realm XY". This allows
>> much more flexibility and in particular avoid the issues like
>> https://issues.jboss.org/browse/KEYCLOAK-3857 .
>>
>> - We already have replicated cache called "work", which we are using to
>> notify other cluster nodes about various events. So we will just use
>> this one. No need to add another replicated cache, we will probably just
>> need to configure LRU eviction for the existing one.
>>
>> - Also note that messages will be always received. We won't need
>> workaround with "invalidation.key" objects anymore.
>>
>> - Also we don't need recursive evictions (which has very poor
>> performance. See https://issues.jboss.org/browse/KEYCLOAK-3857 ),
>> because receiving node will know exactly what happened. It will remove
>> objects just the same way like the "sender" node.
>>
>> - Finally the amount of traffic sent across the cluster will be much lower.
>>
>> This sounds like the big step, but IMO it's not that bad :) Note that we
>> already have all the predicates in place for individual objects. The
>> only change will be about sending/receiving notifications across
>> cluster. I think I am able to prototype something by tomorrow to
>> doublecheck this approach working and then finish it somewhen middle
>> next week. WDYT?
>>
>> Marek
>>
>> _______________________________________________
>> keycloak-dev mailing list
>> keycloak-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/keycloak-dev
> _______________________________________________
> keycloak-dev mailing list
> keycloak-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/keycloak-dev