[keycloak-dev] offlineSessions data in cache vs db

Mon Nov 27 13:53:08 EST 2017

Hello Keycloak Devs,

[I posted this to keycloak-user, but got no response.]

Ultimately, what we want to do is migrate three nodes from one namespace 
to another within a Kubernetes cluster as follows:
Start with three nodes in one Kubernetes namespace that define a 
cluster. Then add three more nodes to the cluster in a new namespace 
that shares the same subnet and database, then kill off the original 
three nodes, effectively migrating the cluster to the new namespace and 
we want to do all this without anyone being logged out. The namespace 
distinction is invisible to Keycloak, as far as I can tell.

What we have tried:
* Start with 3 standalone-ha mode instances clustered with 
JGroups/JDBC_PING.
* Set the number of cache owners for sessions to 6.
* Start the three new instances in the new Kubernetes namespace, 
configured exactly the same as the first three - that is, same db, same 
number of cache owners.
* Kill -9 the original three (I know now that it should be a kill -3, 
but don't know if that matters in this case).

But it seems this caused offlineSession tokens to be expired immediately.

I found this in the online documentation 
(http://www.keycloak.org/docs/latest/server_installation/index.html#server-cache-configuration): 

 > The second type of cache handles managing user sessions, offline 
tokens, and keeping track of login failures... The data held in these 
caches is temporary, in memory only, but is possibly replicated across 
the cluster.

 > The sessions, authenticationSessions, offlineSessions and 
loginFailures caches are the only caches that may perform replication. 
Entries are not replicated to every single node, but instead one or more 
nodes is chosen as an owner of that data. If a node is not the owner of 
a specific cache entry it queries the cluster to obtain it. What this 
means for failover is that if all the nodes that own a piece of data go 
down, that data is lost forever. By default, Keycloak only specifies one 
owner for data. So if that one node goes down that data is lost. This 
usually means that users will be logged out and will have to login again.

It appears, based on these documentation comments and our experience, 
that the "source of truth" regarding offlineSessions is the data in the 
"owner" caches, is NOT the database, as I would have expected. It also 
seems to be the case that if a node joins the cluster (as defined by 
JGroups/JDBC_PING), it will NOT be able to populate its offlineSessions 
cache from the database, but must rely on replication from one of the 
owner nodes.

Questions:
1. Is the above understanding regarding the db vs cache correct?
2. If so, please explain the design/reasoning behind this behavior. 
Otherwise, please correct my understanding.
3. Is there a way to perform this simple migration without losing any 
sessions?

Thanks,

--Tonnis