[keycloak-dev] offlineSessions data in cache vs db

Wed Dec 13 08:29:57 EST 2017

On 27/11/17 19:53, Tonnis Wildeboer wrote:
> Hello Keycloak Devs,
>
> [I posted this to keycloak-user, but got no response.]
>
> Ultimately, what we want to do is migrate three nodes from one namespace
> to another within a Kubernetes cluster as follows:
> Start with three nodes in one Kubernetes namespace that define a
> cluster. Then add three more nodes to the cluster in a new namespace
> that shares the same subnet and database, then kill off the original
> three nodes, effectively migrating the cluster to the new namespace and
> we want to do all this without anyone being logged out. The namespace
> distinction is invisible to Keycloak, as far as I can tell.
>
> What we have tried:
> * Start with 3 standalone-ha mode instances clustered with
> JGroups/JDBC_PING.
> * Set the number of cache owners for sessions to 6.
> * Start the three new instances in the new Kubernetes namespace,
> configured exactly the same as the first three - that is, same db, same
> number of cache owners.
> * Kill -9 the original three (I know now that it should be a kill -3,
> but don't know if that matters in this case).
>
> But it seems this caused offlineSession tokens to be expired immediately.
>
> I found this in the online documentation
> (http://www.keycloak.org/docs/latest/server_installation/index.html#server-cache-configuration):
>
>
>   > The second type of cache handles managing user sessions, offline
> tokens, and keeping track of login failures... The data held in these
> caches is temporary, in memory only, but is possibly replicated across
> the cluster.
>
>   > The sessions, authenticationSessions, offlineSessions and
> loginFailures caches are the only caches that may perform replication.
> Entries are not replicated to every single node, but instead one or more
> nodes is chosen as an owner of that data. If a node is not the owner of
> a specific cache entry it queries the cluster to obtain it. What this
> means for failover is that if all the nodes that own a piece of data go
> down, that data is lost forever. By default, Keycloak only specifies one
> owner for data. So if that one node goes down that data is lost. This
> usually means that users will be logged out and will have to login again.
>
> It appears, based on these documentation comments and our experience,
> that the "source of truth" regarding offlineSessions is the data in the
> "owner" caches, is NOT the database, as I would have expected. It also
> seems to be the case that if a node joins the cluster (as defined by
> JGroups/JDBC_PING), it will NOT be able to populate its offlineSessions
> cache from the database, but must rely on replication from one of the
> owner nodes.
>
> Questions:
> 1. Is the above understanding regarding the db vs cache correct?
> 2. If so, please explain the design/reasoning behind this behavior.
> Otherwise, please correct my understanding.
> 3. Is there a way to perform this simple migration without losing any
> sessions?
Hi,

sorry for late response.

- Offline sessions are saved in infinispan and there is write to DB just 
during login and during remove of the offline session.

- We don't want to write to DB at every offline token refresh. Or even 
read from DB. That's for performance purpose.

- The cache is populated from DB just during startup of server (in case 
of singlenode, non-cluster environment), or during startup of cluster 
coordinator. In case that more cluster servers are started concurrently, 
the other nodes will "help" coordinator node to preload offline sessions 
from DB. We use Infinispan DistributedExecutionService for this.

- Once sessions are initialized from DB to infinispan after startup, the 
DB is not used anymore for reading sessions. It's just infinispan. DB is 
here just to ensure that sessions are not lost after server restart (or 
after restart of whole cluster in case of cluster). So yes, the 
Infinispan cache is "source of truth" where Keycloak reads data during 
refreshToken requests from users

- For your scenario, I think that if you kill the 3 nodes from the "old" 
cluster and then start 3 nodes in the new cluster, the new cluster will 
preload offlineSessions from DB, so offlineSessions should be visible.

- On the other hand, if you have "old" 3 nodes still running and you 
start "new" 3 nodes to join existing cluster, the preloading from DB 
won't happen. However things should still work though as long as you 
really have 6 owners set on *every* cluster node. When new node joins 
the cluster, there is a phase called "rebalance" when the cache of new 
node is going to be populated from the other cluster nodes (more details 
about this in infinispan docs). But the rebalance takes some time... 
It's also possible that you first need to "read" the cache on the newly 
joining cluster nodes (EG. send at least one refreshToken request to 
it). In any way, you should be able to monitor through JMX how many 
items are available in the cache on every node. So you can assume that 
node is initialized and "rebalance" is finished after there is correct 
count of records in the cache. There are also some messages in the 
server.log once rebalance is started and once it is finished. So make 
sure to not kill the "old" servers before rebalance is really finished 
on new servers.

Hope this helps,

Marek
>
> Thanks,
>
> --Tonnis
> _______________________________________________
> keycloak-dev mailing list
> keycloak-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/keycloak-dev