[keycloak-dev] Issues with clustered invalidation caches

Stian Thorgersen sthorger at redhat.com
Thu Nov 3 09:09:47 EDT 2016


Sounds like a plan to me and I can't see us fixing it in a more trivial
matter either.

This also aligns better with what we've recently discussed with regards to
cross DC support. Cross DC support would use JDG and have a replicate cache
between DCs to send invalidation messages, which is exactly what you are
proposing so this shoulod be a step towards that. It may make sense that
you work that in straight away. Basically we should support propagating
invalidation messages by using an external Infinispan/JDG cluster as the
store for the cache.

On 3 November 2016 at 12:06, Marek Posolda <mposolda at redhat.com> wrote:

> I was looking at the cache issue reported by customer. I found the cause
> of it and couple of other related issues:
>
> KEYCLOAK-3857 - Bad performance with clustered invalidation cache when
> updating object
> KEYCLOAK-3858 - Removing model object send lots of invalidation messages
> across cluster
> KEYCLOAK-3859 - Lots of userCache invalidation messages when
> invalidating realm
> KEYCLOAK-3860 - All realm users are invalidated from cache when removing
> some realm object
>
>
> In shortcut, our cache works fine in local mode. But in cluster, there
> are issues with the invalidation caches . We don't have issues with
> stale entries, but this is purchased but lots of various performance
> issues like:
>
> - There are lots of invalidation messages sent across the cluster
>
> - Eviction on the node, which received invalidation event, is also very
> uneffective. For example evicting realm with 1000 roles needs to call
> 1000 predicates, which iterates the cache 1000 times.
>
> - Invalidation cache doesn't allow to differ between the context why the
> object was invalidated. For example when I update realm settings on
> node1, I need to invalidate just the CachedRealm object, but not all the
> other objects dependent on the realm. However the invalidation event
> received on node2 doesn't know, if I invalidated CachedRealm because of
> realm update or because of realm removal. So for more safety, it assumes
> removal, which evicts all realm objects! See
> https://issues.jboss.org/browse/KEYCLOAK-3857 for details.
>
> - Finally we have the workaround with the "invalidation.key" objects in
> our invalidation caches. This is currently needed because when
> invalidating object on node1, the invalidation event is NOT received on
> node2 unless the object is there. Hence the workaround with the
> "invalidation.key" records just to avoid this limitation of invalidation
> cache.
>
>
> For solve all these issues, I propose:
> - Instead of relying on invalidation caches, we will send notification
> across cluster what happened (eg. message "realm XY was updated"). All
> the nodes will receive this notification and will evict all their
> locally cached objects accordingly and bump their revisions locally.
> This would be much more stable, performant and will allow us to remove
> some workarounds.
>
> Some details:
>
> - The caches "realms" and "users" won't be "invalidation" caches, but
> they will be "local" caches.
>
> - When any object needs to be removed from cache because of some reason
> (eg. updating realm), the notification message will be sent from node1
> to all other cluster nodes. We will use the replicated cache for that.
> Node1 will send the notification message like "realm XY was updated" .
>
> - Other cluster nodes will receive this message and they will locally
> trigger evictions of all the objects dependent on particular realm. In
> case of realm update, it's just the CachedRealm object itself. In case
> of realm removal, it is all realm objects etc.
>
> - Note message will contain also context "realm XY was updated" or
> "realm XY was removed" . Not just "invalidate realm XY". This allows
> much more flexibility and in particular avoid the issues like
> https://issues.jboss.org/browse/KEYCLOAK-3857 .
>
> - We already have replicated cache called "work", which we are using to
> notify other cluster nodes about various events. So we will just use
> this one. No need to add another replicated cache, we will probably just
> need to configure LRU eviction for the existing one.
>
> - Also note that messages will be always received. We won't need
> workaround with "invalidation.key" objects anymore.
>
> - Also we don't need recursive evictions (which has very poor
> performance. See https://issues.jboss.org/browse/KEYCLOAK-3857 ),
> because receiving node will know exactly what happened. It will remove
> objects just the same way like the "sender" node.
>
> - Finally the amount of traffic sent across the cluster will be much lower.
>
> This sounds like the big step, but IMO it's not that bad :) Note that we
> already have all the predicates in place for individual objects. The
> only change will be about sending/receiving notifications across
> cluster. I think I am able to prototype something by tomorrow to
> doublecheck this approach working and then finish it somewhen middle
> next week. WDYT?
>
> Marek
>
> _______________________________________________
> keycloak-dev mailing list
> keycloak-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/keycloak-dev
>


More information about the keycloak-dev mailing list