[keycloak-dev] Issues with clustered invalidation caches

Sun Nov 6 13:25:21 EST 2016

+1 to evict the user cache just for the UserStorageProvider updates (or 
it's subcomponents). Not evict for every realm update as that sounds 
like the unecesary performance penalty.

Marek


On 05/11/16 15:11, Bill Burke wrote:
> We don't currently evict the user cache if a realm has changed. This is
> an issue if somebody has modified a UserStorageProvider. Should we evict
> user cache for any realm changes?  Or should I modify the cache
> RealmAdapter and detect whether a UserStorageProvider is being modified
> (or any of its subcomponents(mappers)).?
>
>
> On 11/3/16 1:10 PM, Bill Burke wrote:
>> +1 to this.  We are already hacking Infinispan horribly and doing nasty
>> work arounds to get the behavior we want.
>>
>> Some notes:
>>
>> * You have to be sure to invalidate indexes (email->user,
>> username->user) correctly
>>
>> * We cache relationships in some scenarios, i.e. the list of clients in
>> a realm.  Your proposal makes things so much easier for caching
>> relationships as we know the difference between a realm delete and a
>> realm update and a client create, update, and removal
>>
>> * Cache is currently not interacted with at all for queries that return
>> multiple users (getUser(), searchForUser, etc.).  We invalidate the
>> realm on these queries as we don't know if the caller is going to
>> perform any updates on those users.
>>
>> * Role and group removal is a tricky thing as role/group mappings are
>> cached with the user.  You still have the iteration problem discusssed
>> earlier and you have to decide which is more efficient, evicting all
>> users in the realm, or iterating over all users that might have this
>> role/group mapping.  There's also client scope mapping, client mappers,
>> and broker mappers which may also reference the role/group.
>>
>> * If a user storage provider is removed, this is also a tricky scenario
>> and you may be better off evicting all users instead of iterating over
>> all cached users to find all that were loaded by this provider.
>>
>> * Client removal is tricky because you also have to remove all user role
>> mappings, scope mappings, that are cached.
>>
>> There may be other cases I'm missing.
>>
>>
>> On 11/3/16 7:06 AM, Marek Posolda wrote:
>>> I was looking at the cache issue reported by customer. I found the cause
>>> of it and couple of other related issues:
>>>
>>> KEYCLOAK-3857 - Bad performance with clustered invalidation cache when
>>> updating object
>>> KEYCLOAK-3858 - Removing model object send lots of invalidation messages
>>> across cluster
>>> KEYCLOAK-3859 - Lots of userCache invalidation messages when
>>> invalidating realm
>>> KEYCLOAK-3860 - All realm users are invalidated from cache when removing
>>> some realm object
>>>
>>>
>>> In shortcut, our cache works fine in local mode. But in cluster, there
>>> are issues with the invalidation caches . We don't have issues with
>>> stale entries, but this is purchased but lots of various performance
>>> issues like:
>>>
>>> - There are lots of invalidation messages sent across the cluster
>>>
>>> - Eviction on the node, which received invalidation event, is also very
>>> uneffective. For example evicting realm with 1000 roles needs to call
>>> 1000 predicates, which iterates the cache 1000 times.
>>>
>>> - Invalidation cache doesn't allow to differ between the context why the
>>> object was invalidated. For example when I update realm settings on
>>> node1, I need to invalidate just the CachedRealm object, but not all the
>>> other objects dependent on the realm. However the invalidation event
>>> received on node2 doesn't know, if I invalidated CachedRealm because of
>>> realm update or because of realm removal. So for more safety, it assumes
>>> removal, which evicts all realm objects! See
>>> https://issues.jboss.org/browse/KEYCLOAK-3857 for details.
>>>
>>> - Finally we have the workaround with the "invalidation.key" objects in
>>> our invalidation caches. This is currently needed because when
>>> invalidating object on node1, the invalidation event is NOT received on
>>> node2 unless the object is there. Hence the workaround with the
>>> "invalidation.key" records just to avoid this limitation of invalidation
>>> cache.
>>>
>>>
>>> For solve all these issues, I propose:
>>> - Instead of relying on invalidation caches, we will send notification
>>> across cluster what happened (eg. message "realm XY was updated"). All
>>> the nodes will receive this notification and will evict all their
>>> locally cached objects accordingly and bump their revisions locally.
>>> This would be much more stable, performant and will allow us to remove
>>> some workarounds.
>>>
>>> Some details:
>>>
>>> - The caches "realms" and "users" won't be "invalidation" caches, but
>>> they will be "local" caches.
>>>
>>> - When any object needs to be removed from cache because of some reason
>>> (eg. updating realm), the notification message will be sent from node1
>>> to all other cluster nodes. We will use the replicated cache for that.
>>> Node1 will send the notification message like "realm XY was updated" .
>>>
>>> - Other cluster nodes will receive this message and they will locally
>>> trigger evictions of all the objects dependent on particular realm. In
>>> case of realm update, it's just the CachedRealm object itself. In case
>>> of realm removal, it is all realm objects etc.
>>>
>>> - Note message will contain also context "realm XY was updated" or
>>> "realm XY was removed" . Not just "invalidate realm XY". This allows
>>> much more flexibility and in particular avoid the issues like
>>> https://issues.jboss.org/browse/KEYCLOAK-3857 .
>>>
>>> - We already have replicated cache called "work", which we are using to
>>> notify other cluster nodes about various events. So we will just use
>>> this one. No need to add another replicated cache, we will probably just
>>> need to configure LRU eviction for the existing one.
>>>
>>> - Also note that messages will be always received. We won't need
>>> workaround with "invalidation.key" objects anymore.
>>>
>>> - Also we don't need recursive evictions (which has very poor
>>> performance. See https://issues.jboss.org/browse/KEYCLOAK-3857 ),
>>> because receiving node will know exactly what happened. It will remove
>>> objects just the same way like the "sender" node.
>>>
>>> - Finally the amount of traffic sent across the cluster will be much lower.
>>>
>>> This sounds like the big step, but IMO it's not that bad :) Note that we
>>> already have all the predicates in place for individual objects. The
>>> only change will be about sending/receiving notifications across
>>> cluster. I think I am able to prototype something by tomorrow to
>>> doublecheck this approach working and then finish it somewhen middle
>>> next week. WDYT?
>>>
>>> Marek
>>>
>>> _______________________________________________
>>> keycloak-dev mailing list
>>> keycloak-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/keycloak-dev
>> _______________________________________________
>> keycloak-dev mailing list
>> keycloak-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/keycloak-dev
> _______________________________________________
> keycloak-dev mailing list
> keycloak-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/keycloak-dev