[keycloak-user] Standalone HA tokens not immediately shared among nodes

Tue Sep 11 18:25:32 EDT 2018

Hi list,

I'm trying to cluster several Keycloak 4.0.0 nodes running in docker
containers. I'm seeing "interesting" (unexpected) behaviour when requesting
new OIDC tokens with "grant_type=password" (Direct Grant/ROPC) and then
attempting to get a new set with "grant_type=refresh_token" .

After I start two nodes (containers), if I issue a "grant_type=password"
request to the node that started first, requests for
"grant_type=refresh_token" on the newer node fail. If I issue the
"grant_type=password" request to the node that started last, requests for
"grant_type=refresh_token" succeed on any node AND all future
password/refresh_token requests work correctly no matter which node handles
the request.

So, let's say node1 starts first and node2 starts second:
1. Password auth on node1: OK
2. Refresh token auth on node2 with token from previous step: Error:
invalid_grant (Invalid refresh token)
3. Refresh token auth on node1 with token from step 1: OK (new set of
refresh+access tokens)
BUT!
4. Password auth on node2: OK
5. Refresh token auth on node1 with token from previous step: OK! (new set
of refresh+access tokens)
6. Refresh token auth on node2 with token from step 4: OK
7. Password auth sequence from steps 1-3: also OK!

It's as though the node that starts most recently needs a password auth
request to "wake up" and start communicating with the rest of the cluster.
Once it does, everything's in sync.

Some facts that are hopefully relevant:
* Keycloak 4.0.0 docker image base
* standalone-ha.xml from distribution with changes in JGroups subsystem.
I'm using JDBC PING configured for the same DB as Keycloak itself, which is
MySQL 5.7. See the subsystem config below.
* Custom org.keycloak.storage.UserStorageProviderFactory SPI, which creates
a provider that makes an HTTP call to an external authentication service to
validate username/password credentials.
* A couple of custom themes.
* One realm with a handful of clients provisioned via a shell script that
just calls kcadm.sh and jboss-cli.sh
* There's a simple LB in front of both instances

JGroups subsystem config:
        <subsystem xmlns="urn:jboss:domain:jgroups:5.0">
            <channels default="ee">
                <channel name="ee" stack="tcp" cluster="ejb"/>
            </channels>
            <stacks>
                <stack name="tcp">
                    <transport type="TCP" socket-binding="jgroups-tcp">
                        <property
name="external_addr">${env.HOST}</property>
                        <property
name="external_port">${env.PORT_7600}</property>
                    </transport>
                    <protocol type="org.jgroups.protocols.JDBC_PING">
                        <property name="datasource_jndi_name">
                            java:jboss/datasources/KeycloakDS
                        </property>
                    </protocol>
                    <protocol type="MERGE3"/>
                    <protocol type="FD_SOCK"/>
                    <protocol type="FD_ALL"/>
                    <protocol type="VERIFY_SUSPECT"/>
                    <protocol type="pbcast.NAKACK2"/>
                    <protocol type="UNICAST3"/>
                    <protocol type="pbcast.STABLE"/>
                    <protocol type="pbcast.GMS"/>
                    <protocol type="MFC"/>
                    <protocol type="FRAG2"/>
                </stack>
            </stacks>
        </subsystem>

$HOST and $PORT_7600 are set to external host:port combination that allows
the two instances to communicate.

There's also a socket-binding to a public interface:
<socket-binding name="jgroups-tcp" port="7600"/>

In the JGroups and Infinispan log entries I can see the two nodes do find
each other and are able to communicate. I haven't been able to get
ispn-cli.sh to connect to the internal Infinispan instances running in
containers, so I can't confirm that they have the same entries, but as
described in flows above they do eventually work together.

Is there a configuration change I'm missing somewhere to make the new node
joining the cluster become aware of the other one?

Thanks for any help,
DV