[keycloak-user] Keycloak Domain TCP Clustering of Sessions in AWS not auto-failing over to node

Wed Aug 28 09:49:18 EDT 2019

I would strongly advice on getting clustering to work on standalone
configuration (./standalone.sh -c standalone-ha.xml). This way you will get
rid of all nose from the domain configuration. Once that part works fine,
then you may try to get this online with domain configuration.

I briefly looked at this guide:
http://www.mastertheboss.com/jboss-server/jboss-cluster/how-to-configure-jboss-eap-and-wildfly-to-use-tcpping
and it should be enough to give you some hints about the configuration.

More comments and hints inlined.

On Tue, Aug 27, 2019 at 11:58 PM JTK <jonesy at sydow.org> wrote:

> Thanks,
>
>  I've read over numerous articles to include the ones you listed and I've
> still been unable to location why they are not clustering.
>
> When I run this command I get the following output:
>
> /profile=full-ha/subsystem=jgroups/channel=ee:write-attribute(name=stack,value=tcpping)
> {
>     "outcome" => "success",
>     "result" => undefined,
>     "server-groups" => {"auth-server-group" => {"host" => {
>         "dev-master" => {"dev-master" => {"response" => {"outcome" =>
> "success"}}},
>         "dev--slave1" => {"dev-slave1" => {"response" => {
>             "outcome" => "success",
>             "result" => undefined
>         }}}
>     }}}
> }
>
> Is that the normal output for the slave?
>
> Here is some other information from the logs:
> From the Master Node
> [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 50)
> dev-master: no members discovered after 3004 ms: creating cluster as first
> member
> This was with the Slave1 node up and running:
> [Host Controller] 21:35:53,349 INFO  [org.jboss.as.host.controller] (Host
> Controller Service Threads - 2) WFLYHC0148: Connected to master host
> controller at remote://10.10.10.77:9999
>
> On Master:
> [org.infinispan.factories.GlobalComponentRegistry] (MSC service thread
> 1-2) ISPN000128: Infinispan version: Infinispan 'Infinity Minus ONE +2'
> 9.4.8.Final
> [Server:dev-master] 21:36:02,081 INFO
>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
> thread 1-4) ISPN000078: Starting JGroups channel ee
> [Server:dev-master] 21:36:02,086 INFO
>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
> thread 1-2) ISPN000078: Starting JGroups channel ee
> [Server:dev-master] 21:36:02,087 INFO
>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
> thread 1-1) ISPN000078: Starting JGroups channel ee
> [Server:dev-master] 21:36:02,089 INFO  [org.infinispan.CLUSTER] (MSC
> service thread 1-4) ISPN000094: Received new cluster view for channel ee:
> [dev-master|0] (1) [dev-master]
> [Server:dev-master] 21:36:02,090 INFO
>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
> thread 1-3) ISPN000078: Starting JGroups channel ee
> [Server:dev-master] 21:36:02,091 INFO  [org.infinispan.CLUSTER] (MSC
> service thread 1-1) ISPN000094: Received new cluster view for channel ee:
> [dev-master|0] (1) [dev-master]
> [Server:dev-master] 21:36:02,091 INFO  [org.infinispan.CLUSTER] (MSC
> service thread 1-2) ISPN000094: Received new cluster view for channel ee:
> [dev-master|0] (1) [dev-master]
> [Server:dev-master] 21:36:02,091 INFO  [org.infinispan.CLUSTER] (MSC
> service thread 1-3) ISPN000094: Received new cluster view for channel ee:
> [dev-master|0] (1) [dev-master]
> [Server:dev-master] 21:36:02,104 INFO
>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
> thread 1-2) ISPN000079: Channel ee local address is dev-master, physical
> addresses are [10.10.10.77:7600]
> [Server:dev-master] 21:36:02,129 INFO
>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
> thread 1-1) ISPN000079: Channel ee local address is dev-master, physical
> addresses are [10.10.10.77:7600]
> [Server:dev-master] 21:36:02,149 INFO
>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
> thread 1-4) ISPN000079: Channel ee local address is dev-master, physical
> addresses are [10.10.10.77:7600]
> [Server:dev-master] 21:36:02,151 INFO
>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
> thread 1-3) ISPN000079: Channel ee local address is dev-master, physical
> addresses are [10.10.10.77:7600]
> [Server:dev-master] 21:36:02,296 INFO
>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
> thread 1-2) ISPN000078: Starting JGroups channel ee
> [Server:dev-master] 21:36:02,297 INFO  [org.infinispan.CLUSTER] (MSC
> service thread 1-2) ISPN000094: Received new cluster view for channel ee:
> [dev-master|0] (1) [dev-master]
> [Server:dev-master] 21:36:02,325 INFO
>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
> thread 1-2) ISPN000079: Channel ee local address is dev-master, physical
> addresses are [10.10.10.77:7600]
>
> This is the configuration from the domain.xml on the master
>                 <channels default="ee">
>                     <channel name="ee" stack="tcp"/>
>

I believe this doesn't look right. It seems you have two channels, which
seems problematic. There should be only one.

>                     <channel name="ee" stack="tcpping"/>
>                 </channels>
>                 <stacks>
>                     <stack name="tcp">
>                         <transport type="TCP"
> socket-binding="jgroups-tcp"/>
>                         <protocol type="MERGE3"/>
>                         <protocol type="FD_SOCK"/>
>                         <protocol type="FD_ALL"/>
>                         <protocol type="VERIFY_SUSPECT"/>
>                         <protocol type="pbcast.NAKACK2"/>
>                         <protocol type="UNICAST3"/>
>                         <protocol type="pbcast.STABLE"/>
>                         <protocol type="pbcast.GMS"/>
>                         <protocol type="MFC"/>
>                         <protocol type="FRAG3"/>
>                     </stack>
>                     <stack name="tcpping">
>                         <transport type="TCP"
> socket-binding="jgroups-tcp"/>
>                         <protocol type="org.jgroups.protocols.TCPPING">
>

This can be simplified to <protocol type="TCPPING">

>                             <property name="initial_hosts">
>                                 ${jboss.cluster.tcp.initial_hosts}
>

Once trying it out on standalone confiuration, you mey manipulate this
property by booting the cluster up with `./standalone.sh -c
standalone-ha.xml
-Djboss.cluster.tcp.initial_hosts="10.10.10.77[7600],10.10.11.27[7600]"`. I
think that way will be way faster than manipulating the xml configuraiton.

>                             </property>
>                             <property name="port_range">
>                                 0
>                             </property>
>                         </protocol>
>                         <protocol type="MERGE3"/>
>                         <protocol type="FD_SOCK"/>
>                         <protocol type="FD_ALL"/>
>                         <protocol type="VERIFY_SUSPECT"/>
>                         <protocol type="pbcast.NAKACK2"/>
>                         <protocol type="UNICAST3"/>
>                         <protocol type="pbcast.STABLE"/>
>                         <protocol type="pbcast.GMS"/>
>                         <protocol type="MFC"/>
>                         <protocol type="FRAG3"/>
>                     </stack>
>
>     <server-groups>
>         <server-group name="auth-server-group" profile="full-ha">
>             <jvm name="default">
>                 <heap size="64m" max-size="512m"/>
>             </jvm>
>             <socket-binding-group ref="ha-sockets"/>
>             <system-properties>
>                 <property name="jboss.cluster.tcp.initial_hosts"
> value="10.10.10.77[7600],10.10.11.27[7600]"/>
>             </system-properties>
>         </server-group>
>     </server-groups>
>
> This is a question which I find conflicting information when reading
> RedHat/JBOSS/WildFly
>  From the Master, this is the host.xml file
>  I've had both the master and the slave listed here. Which do I need or do
> I need both?  And the same goes for configuring on the Slave's host.xml file
>
>     <servers>
>         <server name="dev-slave1" group="auth-server-group"
> auto-start="true">
>             <socket-bindings port-offset="0"/>
>         </server>
>     </servers>
>
> The same question would apply to the host-master.xml on the Master and the
> host-slave.xml on the Slave in reference to:
> This is from the host-master.xml on the Master
>     <servers>
>         <server name="dev-sentinel-master" group="auth-server-group"
> auto-start="true">
>             <socket-bindings port-offset="0"/>
>         </server>
>     </servers>
>
> Included screenshot showing WildFly Management Console with both servers
> up and green in the auth-server-group of the full-ha profile
> https://i.imgur.com/8g124Ss.png
>
> I know I'm just missing something small, and I'm not getting any errors in
> the logs. Is there anyway to get more TRACE or DEBUG in regards to
> clustering?
>
> Thanks!
>
>
> On Tue, Aug 27, 2019 at 5:09 AM Sebastian Laskawiec <slaskawi at redhat.com>
> wrote:
>
>>
>>
>> On Mon, Aug 26, 2019 at 5:32 PM JTK <jonesy at sydow.org> wrote:
>>
>>> I have two nodes setup in a cluster using TCP port 7600 and I see them
>>> join
>>> the cluster in the logs.
>>> On Master: [Host Controller] 15:07:18,293 INFO
>>>  [org.jboss.as.domain.controller] (Host Controller Service Threads - 7)
>>> WFLYHC0019: Registered remote slave host "dev-slave1", JBoss Keycloak
>>> 6.0.1
>>> (WildFly 8.0.0.Final)
>>> On Slave: [Host Controller] 15:03:12,603 INFO
>>>  [org.jboss.as.host.controller] (Host Controller Service Threads - 3)
>>> WFLYHC0148: Connected to master host controller at remote://
>>> 10.10.10.77:9999
>>>
>>> In the WildFly admin panel I see the server group: auth-server-group
>>> which
>>> is ha and then I see both servers in the group and they are both green.
>>>
>>> I've set the distributed-cache setup to 2 in domain.xml, so it should be
>>> sharing session information:
>>>                     <distributed-cache name="sessions" owners="2"/>
>>>                     <distributed-cache name="authenticationSessions"
>>> owners="2"/>
>>>                     <distributed-cache name="offlineSessions"
>>> owners="2"/>
>>>                     <distributed-cache name="clientSessions" owners="2"/>
>>>                     <distributed-cache name="offlineClientSessions"
>>> owners="2"/>
>>>                     <distributed-cache name="loginFailures" owners="2"/>
>>>                     <distributed-cache name="actionTokens" owners="2">
>>>
>>> Here is the logs on the master showing there a new cluster has been
>>> received:
>>> 2019-08-26 15:03:19,776 INFO  [org.infinispan.CLUSTER] (MSC service
>>> thread
>>> 1-1) ISPN000094: Received new cluster view for channel ejb:
>>> [dev-master|0]
>>> (1) [dev-master]
>>> 2019-08-26 15:03:19,779 INFO  [org.infinispan.CLUSTER] (MSC service
>>> thread
>>> 1-3) ISPN000094: Received new cluster view for channel ejb:
>>> [dev-master|0]
>>> (1) [dev-master]
>>> 2019-08-26 15:03:19,780 INFO  [org.infinispan.CLUSTER] (MSC service
>>> thread
>>> 1-2) ISPN000094: Received new cluster view for channel ejb:
>>> [dev-master|0]
>>> (1) [dev-master]
>>> 2019-08-26 15:03:19,780 INFO  [org.infinispan.CLUSTER] (MSC service
>>> thread
>>> 1-4) ISPN000094: Received new cluster view for channel ejb:
>>> [dev-master|0]
>>> (1) [dev-master]
>>> 2019-08-26 15:03:19,875 INFO  [org.infinispan.CLUSTER] (MSC service
>>> thread
>>> 1-1) ISPN000094: Received new cluster view for channel ejb:
>>> [dev-master|0]
>>> (1) [dev-master]
>>>
>>> And on the slave:
>>> 2019-08-26 15:07:29,567 INFO  [org.infinispan.CLUSTER] (MSC service
>>> thread
>>> 1-2) ISPN000094: Received new cluster view for channel ejb:
>>> [dev-slave1|0]
>>> (1) [dev-slave1]
>>> 2019-08-26 15:07:29,572 INFO  [org.infinispan.CLUSTER] (MSC service
>>> thread
>>> 1-3) ISPN000094: Received new cluster view for channel ejb:
>>> [dev-slave1|0]
>>> (1) [dev-slave1]
>>> 2019-08-26 15:07:29,572 INFO  [org.infinispan.CLUSTER] (MSC service
>>> thread
>>> 1-4) ISPN000094: Received new cluster view for channel ejb:
>>> [dev-slave1|0]
>>> (1) [dev-slave1]
>>> 2019-08-26 15:07:29,574 INFO  [org.infinispan.CLUSTER] (MSC service
>>> thread
>>> 1-1) ISPN000094: Received new cluster view for channel ejb:
>>> [dev-slave1|0]
>>> (1) [dev-slave1]
>>> 2019-08-26 15:07:29,635 INFO  [org.infinispan.CLUSTER] (MSC service
>>> thread
>>> 1-3) ISPN000094: Received new cluster view for channel ejb:
>>> [dev-slave1|0]
>>> (1) [dev-slave1]
>>>
>>
>> This definitely doesn't look right. The view id (which is increasing
>> monotonically) is 0. Which means this is an initial view and none of the
>> new members joined. Clearly, the discovery protocol is not configured
>> properly and both nodes are in separate (singleton) clusters.
>>
>>
>>> I believe I read somewhere that I was supposed to see the master and
>>> slave
>>> together in the logs an not just master or slave. Maybe this is my issue,
>>> but I don't know how to resolve it.
>>>
>>> I can't use multi-cast as it's disabled in AWS and almost all cloud
>>> providers.
>>>
>>
>> The easiest option is to use TCPPING. However, it requires you to put all
>> nodes IPs in its configuration [1]. There are other options as well, e.g.
>> S3 Ping [2] and its rewritten (and much better) version - Native S3 Ping
>> [3].
>>
>> You may also be interested in using JDBC_PING. Please have a look at the
>> our blogs [4][5].
>>
>> [1] http://jgroups.org/manual4/index.html#TCPPING_Prot
>> [2] http://jgroups.org/manual4/index.html#_s3_ping
>> [3] https://github.com/jgroups-extras/native-s3-ping
>> [4] https://www.keycloak.org/2019/05/keycloak-cluster-setup.html
>> [5] https://www.keycloak.org/2019/08/keycloak-jdbc-ping.html
>>
>>
>>>
>>> When I launch the master and let it come up, then launch the slave I can
>>> see all the traffic for the session on the master. As soon as I stop the
>>> master, the slave is looking for the master, but when I click on the
>>> website, it just hangs waiting for a connection and then eventually logs
>>> me
>>> out, and I end up logging back in, and now I'm on the slave node. The
>>> shared sessions are not happening. Is there something else I need to do
>>> or
>>> set?
>>>
>>
>> It looks like a consequence of the JGroups discovery issue. Please try to
>> fix the clustering problem and then see if this one appears again.
>>
>>
>>>
>>> I have this setup in my domain.xml configuration as well:
>>>         <server-group name="auth-server-group" profile="ha">
>>>             <jvm name="default">
>>>                 <heap size="64m" max-size="512m"/>
>>>             </jvm>
>>>             <socket-binding-group ref="ha-sockets"/>
>>>             <system-properties>
>>>                 <property name="jboss.cluster.tcp.initial_hosts"
>>> value="10.10.10.77[7600],10.10.10.27[7600]"/>
>>>             </system-properties>
>>>         </server-group>
>>>
>>> In my host.xml on the slave I have this setup to reach back to the master
>>> as the domain controller
>>>     <domain-controller>
>>>         <remote protocol="remote" host="${jboss.domain.master.address}"
>>> port="${jboss.domain.master.port:9999}"
>>> security-realm="ManagementRealm"/>
>>>    </domain-controller>
>>>
>>> Any help would be appreciated
>>> _______________________________________________
>>> keycloak-user mailing list
>>> keycloak-user at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/keycloak-user
>>>
>>