Re: [keycloak-dev] Issues with clustered invalidation caches

Thursday, 3 November 2016

+1 to this.  We are already hacking Infinispan horribly and doing nasty 
work arounds to get the behavior we want.

Some notes:

* You have to be sure to invalidate indexes (email->user, 
username->user) correctly

* We cache relationships in some scenarios, i.e. the list of clients in 
a realm.  Your proposal makes things so much easier for caching 
relationships as we know the difference between a realm delete and a 
realm update and a client create, update, and removal

* Cache is currently not interacted with at all for queries that return 
multiple users (getUser(), searchForUser, etc.).  We invalidate the 
realm on these queries as we don't know if the caller is going to 
perform any updates on those users.

* Role and group removal is a tricky thing as role/group mappings are 
cached with the user.  You still have the iteration problem discusssed 
earlier and you have to decide which is more efficient, evicting all 
users in the realm, or iterating over all users that might have this 
role/group mapping.  There's also client scope mapping, client mappers, 
and broker mappers which may also reference the role/group.

* If a user storage provider is removed, this is also a tricky scenario 
and you may be better off evicting all users instead of iterating over 
all cached users to find all that were loaded by this provider.

* Client removal is tricky because you also have to remove all user role 
mappings, scope mappings, that are cached.

There may be other cases I'm missing.

On 11/3/16 7:06 AM, Marek Posolda wrote:
...
 I was looking at the cache issue reported by customer. I found the
cause
 of it and couple of other related issues:

 KEYCLOAK-3857 - Bad performance with clustered invalidation cache when
 updating object
 KEYCLOAK-3858 - Removing model object send lots of invalidation messages
 across cluster
 KEYCLOAK-3859 - Lots of userCache invalidation messages when
 invalidating realm
 KEYCLOAK-3860 - All realm users are invalidated from cache when removing
 some realm object

 In shortcut, our cache works fine in local mode. But in cluster, there
 are issues with the invalidation caches . We don't have issues with
 stale entries, but this is purchased but lots of various performance
 issues like:

 - There are lots of invalidation messages sent across the cluster

 - Eviction on the node, which received invalidation event, is also very
 uneffective. For example evicting realm with 1000 roles needs to call
 1000 predicates, which iterates the cache 1000 times.

 - Invalidation cache doesn't allow to differ between the context why the
 object was invalidated. For example when I update realm settings on
 node1, I need to invalidate just the CachedRealm object, but not all the
 other objects dependent on the realm. However the invalidation event
 received on node2 doesn't know, if I invalidated CachedRealm because of
 realm update or because of realm removal. So for more safety, it assumes
 removal, which evicts all realm objects! See
 https://issues.jboss.org/browse/KEYCLOAK-3857 for details.

 - Finally we have the workaround with the "invalidation.key" objects in
 our invalidation caches. This is currently needed because when
 invalidating object on node1, the invalidation event is NOT received on
 node2 unless the object is there. Hence the workaround with the
 "invalidation.key" records just to avoid this limitation of invalidation
 cache.

 For solve all these issues, I propose:
 - Instead of relying on invalidation caches, we will send notification
 across cluster what happened (eg. message "realm XY was updated"). All
 the nodes will receive this notification and will evict all their
 locally cached objects accordingly and bump their revisions locally.
 This would be much more stable, performant and will allow us to remove
 some workarounds.

 Some details:

 - The caches "realms" and "users" won't be
"invalidation" caches, but
 they will be "local" caches.

 - When any object needs to be removed from cache because of some reason
 (eg. updating realm), the notification message will be sent from node1
 to all other cluster nodes. We will use the replicated cache for that.
 Node1 will send the notification message like "realm XY was updated" .

 - Other cluster nodes will receive this message and they will locally
 trigger evictions of all the objects dependent on particular realm. In
 case of realm update, it's just the CachedRealm object itself. In case
 of realm removal, it is all realm objects etc.

 - Note message will contain also context "realm XY was updated" or
 "realm XY was removed" . Not just "invalidate realm XY". This allows
 much more flexibility and in particular avoid the issues like
 https://issues.jboss.org/browse/KEYCLOAK-3857 .

 - We already have replicated cache called "work", which we are using to
 notify other cluster nodes about various events. So we will just use
 this one. No need to add another replicated cache, we will probably just
 need to configure LRU eviction for the existing one.

 - Also note that messages will be always received. We won't need
 workaround with "invalidation.key" objects anymore.

 - Also we don't need recursive evictions (which has very poor
 performance. See https://issues.jboss.org/browse/KEYCLOAK-3857 ),
 because receiving node will know exactly what happened. It will remove
 objects just the same way like the "sender" node.

 - Finally the amount of traffic sent across the cluster will be much lower.

 This sounds like the big step, but IMO it's not that bad :) Note that we
 already have all the predicates in place for individual objects. The
 only change will be about sending/receiving notifications across
 cluster. I think I am able to prototype something by tomorrow to
 doublecheck this approach working and then finish it somewhen middle
 next week. WDYT?

 Marek

 _______________________________________________
 keycloak-dev mailing list
 keycloak-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/keycloak-dev 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: [keycloak-dev] Issues with clustered invalidation caches