This appears to be an issue only when rolling out a version that has a
remote-store while the old version without a remote-store is running. That
is, if I completely stop the old version and deploy the new version with a
remote-store, it starts properly. If even a single instance of the old
version is running, all new versions get stuck during start-up looking for
the coordinator.
Is this a known issue, and if so, is there a known workaround?
On Thu, Mar 21, 2019 at 11:33 AM D V <dv(a)glyphy.com> wrote:
Hi list,
I'm trying to run several instances of keycloak using a standalone-ha
configuration within the same datacenter. At the same time I'd like to be
able to offload both `sessions` and `clientSessions` caches to a remote
infinispan cluster within the same datacenter in order to minimize user
logouts when keycloak instances are restarted. Eventually, I plan to set up
a Cassandra store on the remote ISPN side to persist sessions. At the
moment, though, I can't even get Keycloak to start.
The configuration for the two caches in the keycloak config looks like
this:
<replicated-cache name="sessions" statistics-enabled="true">
<state-transfer timeout="600000" />
<remote-store remote-servers="ispn-socket"
passivation="false"
cache="sessions" shared="true" purge="false"/>
</replicated-cache>
<replicated-cache name="clientSessions"
statistics-enabled="true">
<state-transfer timeout="600000" />
<remote-store remote-servers="ispn-socket"
cache="clientSessions"
passivation="false" shared="true" purge="false"/>
</replicated-cache>
The remote cache container configuration:
<remote-cache-container name="ispn-remote"
default-remote-cluster="ispn-cluster">
<remote-clusters>
<remote-cluster name="ispn-cluster"
socket-bindings="ispn-socket"
/>
</remote-clusters>
</remote-cache-container>
The socket binding is:
<outbound-socket-binding name="ispn-socket">
<remote-destination host="${env.ISPN_HOST:ispn}"
port="${env.ISPN_PORT:11222}" />
</outbound-socket-binding>
$ISPN_HOST points to a load balancer that's proxying each ISPN node in a
round-robin fashion.
On the remote Infinispan side I'm using a slightly modified version of
their clustered.xml configuration and have set up the cache-container as
follows:
<cache-container name="clustered" default-cache="default"
statistics="true">
<transport lock-timeout="3600000"/>
<distributed-cache name="default"/>
<replicated-cache name="sessions" statistics="true">
<state-transfer timeout="3600000"/>
</replicated-cache>
<replicated-cache name="clientSessions" statistics="true">
<state-transfer timeout="3600000"/>
</replicated-cache>
</cache-container>
The ISPN nodes are clustered using a UDP-based JGroups stack. They form a
cluster successfully. I can add a cache entry manually with ispn-cli.sh on
one node and have it appear on another. Keycloak can connect to the remote
Infinispan cluster with hotrod. However, at start-up it seems to hang after
the following point in the logs:
...
ISPN004006: Server sent new topology view (id=9, age=0) containing 3
addresses: [10.39.32.74:11222, 10.39.32.73:11222, 10.39.32.72:11222]
WFLYCLINF0002: Started work cache from keycloak container
WFLYCLINF0002: Started sessions cache from keycloak container
WFLYCLINF0002: Started clientSessions cache from keycloak container
...
HHH000397: Using ASTQueryTranslatorFactory
Remote store configured for cache 'sessions'
Remote store configured for cache 'clientSessions'
There's a sleeping thread at this point:
"ServerService Thread Pool -- 59" #148 prio=5 os_prio=0
tid=0x00000000032e7800 nid=0xfc waiting on condition [0x00007f6d9928f000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
org.keycloak.models.sessions.infinispan.initializer.CacheInitializer.loadSessions(CacheInitializer.java:36)
at
org.keycloak.models.sessions.infinispan.InfinispanUserSessionProviderFactory$7.run(InfinispanUserSessionProviderFactory.java:317)
at
org.keycloak.models.utils.KeycloakModelUtils.runJobInTransaction(KeycloakModelUtils.java:228)
at
org.keycloak.models.sessions.infinispan.InfinispanUserSessionProviderFactory.loadSessionsFromRemoteCache(InfinispanUserSessionProviderFactory.java:306)
at
org.keycloak.models.sessions.infinispan.InfinispanUserSessionProviderFactory.loadSessionsFromRemoteCaches(InfinispanUserSessionProviderFactory.java:298)
at
org.keycloak.models.sessions.infinispan.InfinispanUserSessionProviderFactory.access$500(InfinispanUserSessionProviderFactory.java:68)
at
org.keycloak.models.sessions.infinispan.InfinispanUserSessionProviderFactory$1.lambda$onEvent$0(InfinispanUserSessionProviderFactory.java:127)
at
org.keycloak.models.sessions.infinispan.InfinispanUserSessionProviderFactory$1$$Lambda$1162/1971420018.run(Unknown
Source)
at
org.keycloak.models.utils.KeycloakModelUtils.runJobInTransaction(KeycloakModelUtils.java:228)
at
org.keycloak.models.utils.KeycloakModelUtils.runJobInTransactionWithTimeout(KeycloakModelUtils.java:268)
at
org.keycloak.models.sessions.infinispan.InfinispanUserSessionProviderFactory$1.onEvent(InfinispanUserSessionProviderFactory.java:121)
at
org.keycloak.services.DefaultKeycloakSessionFactory.publish(DefaultKeycloakSessionFactory.java:69)
at
org.keycloak.services.resources.KeycloakApplication.<init>(KeycloakApplication.java:174)
...
The code appears to be looking for a coordinator on the work cache, but
never finds one. Am I missing some configuration to achieve my goals, or is
this particular use case not supported?
Thanks for any help!
D