[keycloak-dev] Issues with clustered invalidation caches

Wed Nov 16 17:32:48 EST 2016

So cache changes are in. Summary of changes:

- "realms" and "users" caches are now always local caches (even in cluster)

- The replicated cache is used to send the messages between cluster 
nodes to notify what should be invalidated and the records in "realms" 
and "users" caches are always invalidated locally and their revisions is 
increased

- I've reused the existing "work" replicated cache to send the messages 
between cluster nodes. This was already here before. So no needed to add 
another replicated cache.

- I've also tested the cross-DC (2 keycloak nodes not communicating 
directly between each other, but through external JDG server). It's 
working fine and invalidation messages are correctly propagated as 
expected. I've added some temporary documentation here 
https://github.com/keycloak/keycloak/blob/master/misc/CrossDataCenter.md 
. However userSessions, offlineSessions and loginFailures doesn't yet 
work in cross-DC setup. It seems this will need to wait for Keycloak 3.X 
though...

Marek

On 04/11/16 09:02, Marek Posolda wrote:
> On 03/11/16 14:09, Stian Thorgersen wrote:
>> Sounds like a plan to me and I can't see us fixing it in a more
>> trivial matter either.
>>
>> This also aligns better with what we've recently discussed with
>> regards to cross DC support. Cross DC support would use JDG and have a
>> replicate cache between DCs to send invalidation messages, which is
>> exactly what you are proposing so this shoulod be a step towards that.
>> It may make sense that you work that in straight away. Basically we
>> should support propagating invalidation messages by using an external
>> Infinispan/JDG cluster as the store for the cache.
> So it seems, we may need something like InvalidationManager SPI, as an
> abstraction just for the actual transport (sending/receiving
> invalidation messages) ?
>
> By default, it will just be replicated infinispan cache, but it will be
> possible to use something completely different for actual message
> transport (JMS messaging, external JDG for cross DC support etc). WDYT?
>
> Marek
>> On 3 November 2016 at 12:06, Marek Posolda <mposolda at redhat.com
>> <mailto:mposolda at redhat.com>> wrote:
>>
>>      I was looking at the cache issue reported by customer. I found the
>>      cause
>>      of it and couple of other related issues:
>>
>>      KEYCLOAK-3857 - Bad performance with clustered invalidation cache when
>>      updating object
>>      KEYCLOAK-3858 - Removing model object send lots of invalidation
>>      messages
>>      across cluster
>>      KEYCLOAK-3859 - Lots of userCache invalidation messages when
>>      invalidating realm
>>      KEYCLOAK-3860 - All realm users are invalidated from cache when
>>      removing
>>      some realm object
>>
>>
>>      In shortcut, our cache works fine in local mode. But in cluster, there
>>      are issues with the invalidation caches . We don't have issues with
>>      stale entries, but this is purchased but lots of various performance
>>      issues like:
>>
>>      - There are lots of invalidation messages sent across the cluster
>>
>>      - Eviction on the node, which received invalidation event, is also
>>      very
>>      uneffective. For example evicting realm with 1000 roles needs to call
>>      1000 predicates, which iterates the cache 1000 times.
>>
>>      - Invalidation cache doesn't allow to differ between the context
>>      why the
>>      object was invalidated. For example when I update realm settings on
>>      node1, I need to invalidate just the CachedRealm object, but not
>>      all the
>>      other objects dependent on the realm. However the invalidation event
>>      received on node2 doesn't know, if I invalidated CachedRealm
>>      because of
>>      realm update or because of realm removal. So for more safety, it
>>      assumes
>>      removal, which evicts all realm objects! See
>>      https://issues.jboss.org/browse/KEYCLOAK-3857
>>      <https://issues.jboss.org/browse/KEYCLOAK-3857> for details.
>>
>>      - Finally we have the workaround with the "invalidation.key"
>>      objects in
>>      our invalidation caches. This is currently needed because when
>>      invalidating object on node1, the invalidation event is NOT
>>      received on
>>      node2 unless the object is there. Hence the workaround with the
>>      "invalidation.key" records just to avoid this limitation of
>>      invalidation
>>      cache.
>>
>>
>>      For solve all these issues, I propose:
>>      - Instead of relying on invalidation caches, we will send notification
>>      across cluster what happened (eg. message "realm XY was updated"). All
>>      the nodes will receive this notification and will evict all their
>>      locally cached objects accordingly and bump their revisions locally.
>>      This would be much more stable, performant and will allow us to remove
>>      some workarounds.
>>
>>      Some details:
>>
>>      - The caches "realms" and "users" won't be "invalidation" caches, but
>>      they will be "local" caches.
>>
>>      - When any object needs to be removed from cache because of some
>>      reason
>>      (eg. updating realm), the notification message will be sent from node1
>>      to all other cluster nodes. We will use the replicated cache for that.
>>      Node1 will send the notification message like "realm XY was updated" .
>>
>>      - Other cluster nodes will receive this message and they will locally
>>      trigger evictions of all the objects dependent on particular realm. In
>>      case of realm update, it's just the CachedRealm object itself. In case
>>      of realm removal, it is all realm objects etc.
>>
>>      - Note message will contain also context "realm XY was updated" or
>>      "realm XY was removed" . Not just "invalidate realm XY". This allows
>>      much more flexibility and in particular avoid the issues like
>>      https://issues.jboss.org/browse/KEYCLOAK-3857
>>      <https://issues.jboss.org/browse/KEYCLOAK-3857> .
>>
>>      - We already have replicated cache called "work", which we are
>>      using to
>>      notify other cluster nodes about various events. So we will just use
>>      this one. No need to add another replicated cache, we will
>>      probably just
>>      need to configure LRU eviction for the existing one.
>>
>>      - Also note that messages will be always received. We won't need
>>      workaround with "invalidation.key" objects anymore.
>>
>>      - Also we don't need recursive evictions (which has very poor
>>      performance. See https://issues.jboss.org/browse/KEYCLOAK-3857
>>      <https://issues.jboss.org/browse/KEYCLOAK-3857> ),
>>      because receiving node will know exactly what happened. It will remove
>>      objects just the same way like the "sender" node.
>>
>>      - Finally the amount of traffic sent across the cluster will be
>>      much lower.
>>
>>      This sounds like the big step, but IMO it's not that bad :) Note
>>      that we
>>      already have all the predicates in place for individual objects. The
>>      only change will be about sending/receiving notifications across
>>      cluster. I think I am able to prototype something by tomorrow to
>>      doublecheck this approach working and then finish it somewhen middle
>>      next week. WDYT?
>>
>>      Marek
>>
>>      _______________________________________________
>>      keycloak-dev mailing list
>>      keycloak-dev at lists.jboss.org <mailto:keycloak-dev at lists.jboss.org>
>>      https://lists.jboss.org/mailman/listinfo/keycloak-dev
>>      <https://lists.jboss.org/mailman/listinfo/keycloak-dev>
>>
>>
> _______________________________________________
> keycloak-dev mailing list
> keycloak-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/keycloak-dev