[keycloak-user] Keycloak Domain TCP Clustering of Sessions in AWS not auto-failing over to node

Thu Aug 29 07:26:04 EDT 2019

So what is the status at this point? Did you manage to get everything
running?

Thinking a bit more about this thread I realized, that I should probably
ask a few fundamental questions before we started:
- What's the reasoning behind using Keycloak in domain mode? What exactly
do you try to achieve?
- Standalone mode is slightly easier in terms of configuration. Maybe it
would be sufficient for your case?
- How about using our Docker image? This way, you could easily use
JDBC_PING (which of course you can use without our Container image, but
achieving this using our image is much easier [1])

More comments inlined.

[1] https://www.keycloak.org/2019/08/keycloak-jdbc-ping.html

On Wed, Aug 28, 2019 at 4:16 PM Sebastian Laskawiec <slaskawi at redhat.com>
wrote:

> Let me add @Radoslav Husar <rhusar at redhat.com> and @Paul Ferraro
> <pferraro at redhat.com> to the loop. Perhaps they might give you some more
> accurate hints than I did.
>
> On Wed, Aug 28, 2019 at 4:13 PM JTK <jonesy at sydow.org> wrote:
>
>> OK,
>>
>> I originally had just the one channel and the same result. I added he
>> other channel when I was reading articles on JBOSS through RedHat.
>> Channel definition for two or more cache-container at JBoss EAP clustered
>> environment
>> https://access.redhat.com/solutions/3880301
>>
>> I've also used <protocol type="TCPPING"> with the same results
>>
>> I've looked at that URL previously,
>> http://www.mastertheboss.com/jboss-server/jboss-cluster/how-to-configure-jboss-eap-and-wildfly-to-use-tcpping
>> It references a different way than most articles at the beginning<socket-discovery-protocol
>> And it shows it using names and not IPs - I'm not sure if that's the new
>> way or you can also use IPs and it's using the standalone configuration, so
>> no need to look externally for other nodes on the network.
>>
>
Probably @Paul Ferraro <paul.ferraro at redhat.com> or @Radoslav Husar
<rhusar at redhat.com> could clarify that.

> While most of the articles out there are using the legacy method which is
>> later addresses and what I'm using.
>> Even if I get it working as a stand-alone, it's not much help when there
>> are multiple .xml files which must be configured across the different
>> instances... i.e. domain.xml, host.xml, host-master.xml and host-slave.xml
>>
>
The standalone mode uses only standalone.xml (or standalone-ha.xml)
configuration. You should probably pick the standalone-ha.xml and
concentrate on modifying only this file. Note, that for standalone mode,
you should run the server with the following commend: `./standalone.sh -c
standalone-ha.xml`.

I strongly suggest reading our documentation about running Keycloak in
different modes located here:
https://www.keycloak.org/docs/latest/server_installation/index.html#_operating-mode

>
>> I'm going to do more troubleshooting but with no errors in my logs
>> pertaining to the cluster outside of:
>> [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 50)
>> dev-master: no members discovered after 3004 ms: creating cluster as first
>> member
>> That doesn't give me a good place to troubleshoot.
>> Does anyone know if this is the correct response when running this
>> command?
>> /profile=full-ha/subsystem=jgroups/channel=ee:write-
>> attribute(name=stack,value=tcpping)
>> {
>>     "outcome" => "success",
>>     "result" => undefined,
>>     "server-groups" => {"auth-server-group" => {"host" => {
>>         "dev-master" => {"dev-master" => {"response" => {"outcome" =>
>> "success"}}},
>>         "dev-slave1" => {"dev-slave1" => {"response" => {
>>             "outcome" => "success",
>>             "result" => undefined
>>         }}}
>>     }}}
>> }
>>
>> No response from dev-slave1... is that normal?
>>
>> Thanks for your help, this has been frustrating. I think we are going to
>> buy JBOSS support and move to RedHat SSO but for now it's simple Proof of
>> Concept and I'm just missing the full clustering aspect of it.
>>
>> On Wed, Aug 28, 2019 at 8:50 AM Sebastian Laskawiec <slaskawi at redhat.com>
>> wrote:
>>
>>> I would strongly advice on getting clustering to work on standalone
>>> configuration (./standalone.sh -c standalone-ha.xml). This way you will get
>>> rid of all nose from the domain configuration. Once that part works fine,
>>> then you may try to get this online with domain configuration.
>>>
>>> I briefly looked at this guide:
>>> http://www.mastertheboss.com/jboss-server/jboss-cluster/how-to-configure-jboss-eap-and-wildfly-to-use-tcpping
>>> and it should be enough to give you some hints about the configuration.
>>>
>>> More comments and hints inlined.
>>>
>>> On Tue, Aug 27, 2019 at 11:58 PM JTK <jonesy at sydow.org> wrote:
>>>
>>>> Thanks,
>>>>
>>>>  I've read over numerous articles to include the ones you listed and
>>>> I've still been unable to location why they are not clustering.
>>>>
>>>> When I run this command I get the following output:
>>>>
>>>> /profile=full-ha/subsystem=jgroups/channel=ee:write-attribute(name=stack,value=tcpping)
>>>> {
>>>>     "outcome" => "success",
>>>>     "result" => undefined,
>>>>     "server-groups" => {"auth-server-group" => {"host" => {
>>>>         "dev-master" => {"dev-master" => {"response" => {"outcome" =>
>>>> "success"}}},
>>>>         "dev--slave1" => {"dev-slave1" => {"response" => {
>>>>             "outcome" => "success",
>>>>             "result" => undefined
>>>>         }}}
>>>>     }}}
>>>> }
>>>>
>>>> Is that the normal output for the slave?
>>>>
>>>> Here is some other information from the logs:
>>>> From the Master Node
>>>> [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 50)
>>>> dev-master: no members discovered after 3004 ms: creating cluster as first
>>>> member
>>>> This was with the Slave1 node up and running:
>>>> [Host Controller] 21:35:53,349 INFO  [org.jboss.as.host.controller]
>>>> (Host Controller Service Threads - 2) WFLYHC0148: Connected to master host
>>>> controller at remote://10.10.10.77:9999
>>>>
>>>> On Master:
>>>> [org.infinispan.factories.GlobalComponentRegistry] (MSC service thread
>>>> 1-2) ISPN000128: Infinispan version: Infinispan 'Infinity Minus ONE +2'
>>>> 9.4.8.Final
>>>> [Server:dev-master] 21:36:02,081 INFO
>>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>>>> thread 1-4) ISPN000078: Starting JGroups channel ee
>>>> [Server:dev-master] 21:36:02,086 INFO
>>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>>>> thread 1-2) ISPN000078: Starting JGroups channel ee
>>>> [Server:dev-master] 21:36:02,087 INFO
>>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>>>> thread 1-1) ISPN000078: Starting JGroups channel ee
>>>> [Server:dev-master] 21:36:02,089 INFO  [org.infinispan.CLUSTER] (MSC
>>>> service thread 1-4) ISPN000094: Received new cluster view for channel ee:
>>>> [dev-master|0] (1) [dev-master]
>>>> [Server:dev-master] 21:36:02,090 INFO
>>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>>>> thread 1-3) ISPN000078: Starting JGroups channel ee
>>>> [Server:dev-master] 21:36:02,091 INFO  [org.infinispan.CLUSTER] (MSC
>>>> service thread 1-1) ISPN000094: Received new cluster view for channel ee:
>>>> [dev-master|0] (1) [dev-master]
>>>> [Server:dev-master] 21:36:02,091 INFO  [org.infinispan.CLUSTER] (MSC
>>>> service thread 1-2) ISPN000094: Received new cluster view for channel ee:
>>>> [dev-master|0] (1) [dev-master]
>>>> [Server:dev-master] 21:36:02,091 INFO  [org.infinispan.CLUSTER] (MSC
>>>> service thread 1-3) ISPN000094: Received new cluster view for channel ee:
>>>> [dev-master|0] (1) [dev-master]
>>>> [Server:dev-master] 21:36:02,104 INFO
>>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>>>> thread 1-2) ISPN000079: Channel ee local address is dev-master, physical
>>>> addresses are [10.10.10.77:7600]
>>>> [Server:dev-master] 21:36:02,129 INFO
>>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>>>> thread 1-1) ISPN000079: Channel ee local address is dev-master, physical
>>>> addresses are [10.10.10.77:7600]
>>>> [Server:dev-master] 21:36:02,149 INFO
>>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>>>> thread 1-4) ISPN000079: Channel ee local address is dev-master, physical
>>>> addresses are [10.10.10.77:7600]
>>>> [Server:dev-master] 21:36:02,151 INFO
>>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>>>> thread 1-3) ISPN000079: Channel ee local address is dev-master, physical
>>>> addresses are [10.10.10.77:7600]
>>>> [Server:dev-master] 21:36:02,296 INFO
>>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>>>> thread 1-2) ISPN000078: Starting JGroups channel ee
>>>> [Server:dev-master] 21:36:02,297 INFO  [org.infinispan.CLUSTER] (MSC
>>>> service thread 1-2) ISPN000094: Received new cluster view for channel ee:
>>>> [dev-master|0] (1) [dev-master]
>>>> [Server:dev-master] 21:36:02,325 INFO
>>>>  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service
>>>> thread 1-2) ISPN000079: Channel ee local address is dev-master, physical
>>>> addresses are [10.10.10.77:7600]
>>>>
>>>> This is the configuration from the domain.xml on the master
>>>>                 <channels default="ee">
>>>>                     <channel name="ee" stack="tcp"/>
>>>>
>>>
>>> I believe this doesn't look right. It seems you have two channels, which
>>> seems problematic. There should be only one.
>>>
>>>
>>>>                     <channel name="ee" stack="tcpping"/>
>>>>                 </channels>
>>>>                 <stacks>
>>>>                     <stack name="tcp">
>>>>                         <transport type="TCP"
>>>> socket-binding="jgroups-tcp"/>
>>>>                         <protocol type="MERGE3"/>
>>>>                         <protocol type="FD_SOCK"/>
>>>>                         <protocol type="FD_ALL"/>
>>>>                         <protocol type="VERIFY_SUSPECT"/>
>>>>                         <protocol type="pbcast.NAKACK2"/>
>>>>                         <protocol type="UNICAST3"/>
>>>>                         <protocol type="pbcast.STABLE"/>
>>>>                         <protocol type="pbcast.GMS"/>
>>>>                         <protocol type="MFC"/>
>>>>                         <protocol type="FRAG3"/>
>>>>                     </stack>
>>>>                     <stack name="tcpping">
>>>>                         <transport type="TCP"
>>>> socket-binding="jgroups-tcp"/>
>>>>                         <protocol type="org.jgroups.protocols.TCPPING">
>>>>
>>>
>>> This can be simplified to <protocol type="TCPPING">
>>>
>>>
>>>>                             <property name="initial_hosts">
>>>>                                 ${jboss.cluster.tcp.initial_hosts}
>>>>
>>>
>>> Once trying it out on standalone confiuration, you mey manipulate this
>>> property by booting the cluster up with `./standalone.sh -c
>>> standalone-ha.xml
>>> -Djboss.cluster.tcp.initial_hosts="10.10.10.77[7600],10.10.11.27[7600]"`. I
>>> think that way will be way faster than manipulating the xml configuraiton.
>>>
>>>
>>>>                             </property>
>>>>                             <property name="port_range">
>>>>                                 0
>>>>                             </property>
>>>>                         </protocol>
>>>>                         <protocol type="MERGE3"/>
>>>>                         <protocol type="FD_SOCK"/>
>>>>                         <protocol type="FD_ALL"/>
>>>>                         <protocol type="VERIFY_SUSPECT"/>
>>>>                         <protocol type="pbcast.NAKACK2"/>
>>>>                         <protocol type="UNICAST3"/>
>>>>                         <protocol type="pbcast.STABLE"/>
>>>>                         <protocol type="pbcast.GMS"/>
>>>>                         <protocol type="MFC"/>
>>>>                         <protocol type="FRAG3"/>
>>>>                     </stack>
>>>>
>>>>     <server-groups>
>>>>         <server-group name="auth-server-group" profile="full-ha">
>>>>             <jvm name="default">
>>>>                 <heap size="64m" max-size="512m"/>
>>>>             </jvm>
>>>>             <socket-binding-group ref="ha-sockets"/>
>>>>             <system-properties>
>>>>                 <property name="jboss.cluster.tcp.initial_hosts"
>>>> value="10.10.10.77[7600],10.10.11.27[7600]"/>
>>>>             </system-properties>
>>>>         </server-group>
>>>>     </server-groups>
>>>>
>>>> This is a question which I find conflicting information when reading
>>>> RedHat/JBOSS/WildFly
>>>>  From the Master, this is the host.xml file
>>>>  I've had both the master and the slave listed here. Which do I need or
>>>> do I need both?  And the same goes for configuring on the Slave's host.xml
>>>> file
>>>>
>>>>     <servers>
>>>>         <server name="dev-slave1" group="auth-server-group"
>>>> auto-start="true">
>>>>             <socket-bindings port-offset="0"/>
>>>>         </server>
>>>>     </servers>
>>>>
>>>> The same question would apply to the host-master.xml on the Master and
>>>> the host-slave.xml on the Slave in reference to:
>>>> This is from the host-master.xml on the Master
>>>>     <servers>
>>>>         <server name="dev-sentinel-master" group="auth-server-group"
>>>> auto-start="true">
>>>>             <socket-bindings port-offset="0"/>
>>>>         </server>
>>>>     </servers>
>>>>
>>>> Included screenshot showing WildFly Management Console with both
>>>> servers up and green in the auth-server-group of the full-ha profile
>>>> https://i.imgur.com/8g124Ss.png
>>>>
>>>> I know I'm just missing something small, and I'm not getting any errors
>>>> in the logs. Is there anyway to get more TRACE or DEBUG in regards to
>>>> clustering?
>>>>
>>>> Thanks!
>>>>
>>>>
>>>> On Tue, Aug 27, 2019 at 5:09 AM Sebastian Laskawiec <
>>>> slaskawi at redhat.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Mon, Aug 26, 2019 at 5:32 PM JTK <jonesy at sydow.org> wrote:
>>>>>
>>>>>> I have two nodes setup in a cluster using TCP port 7600 and I see
>>>>>> them join
>>>>>> the cluster in the logs.
>>>>>> On Master: [Host Controller] 15:07:18,293 INFO
>>>>>>  [org.jboss.as.domain.controller] (Host Controller Service Threads -
>>>>>> 7)
>>>>>> WFLYHC0019: Registered remote slave host "dev-slave1", JBoss Keycloak
>>>>>> 6.0.1
>>>>>> (WildFly 8.0.0.Final)
>>>>>> On Slave: [Host Controller] 15:03:12,603 INFO
>>>>>>  [org.jboss.as.host.controller] (Host Controller Service Threads - 3)
>>>>>> WFLYHC0148: Connected to master host controller at remote://
>>>>>> 10.10.10.77:9999
>>>>>>
>>>>>> In the WildFly admin panel I see the server group: auth-server-group
>>>>>> which
>>>>>> is ha and then I see both servers in the group and they are both
>>>>>> green.
>>>>>>
>>>>>> I've set the distributed-cache setup to 2 in domain.xml, so it should
>>>>>> be
>>>>>> sharing session information:
>>>>>>                     <distributed-cache name="sessions" owners="2"/>
>>>>>>                     <distributed-cache name="authenticationSessions"
>>>>>> owners="2"/>
>>>>>>                     <distributed-cache name="offlineSessions"
>>>>>> owners="2"/>
>>>>>>                     <distributed-cache name="clientSessions"
>>>>>> owners="2"/>
>>>>>>                     <distributed-cache name="offlineClientSessions"
>>>>>> owners="2"/>
>>>>>>                     <distributed-cache name="loginFailures"
>>>>>> owners="2"/>
>>>>>>                     <distributed-cache name="actionTokens" owners="2">
>>>>>>
>>>>>> Here is the logs on the master showing there a new cluster has been
>>>>>> received:
>>>>>> 2019-08-26 15:03:19,776 INFO  [org.infinispan.CLUSTER] (MSC service
>>>>>> thread
>>>>>> 1-1) ISPN000094: Received new cluster view for channel ejb:
>>>>>> [dev-master|0]
>>>>>> (1) [dev-master]
>>>>>> 2019-08-26 15:03:19,779 INFO  [org.infinispan.CLUSTER] (MSC service
>>>>>> thread
>>>>>> 1-3) ISPN000094: Received new cluster view for channel ejb:
>>>>>> [dev-master|0]
>>>>>> (1) [dev-master]
>>>>>> 2019-08-26 15:03:19,780 INFO  [org.infinispan.CLUSTER] (MSC service
>>>>>> thread
>>>>>> 1-2) ISPN000094: Received new cluster view for channel ejb:
>>>>>> [dev-master|0]
>>>>>> (1) [dev-master]
>>>>>> 2019-08-26 15:03:19,780 INFO  [org.infinispan.CLUSTER] (MSC service
>>>>>> thread
>>>>>> 1-4) ISPN000094: Received new cluster view for channel ejb:
>>>>>> [dev-master|0]
>>>>>> (1) [dev-master]
>>>>>> 2019-08-26 15:03:19,875 INFO  [org.infinispan.CLUSTER] (MSC service
>>>>>> thread
>>>>>> 1-1) ISPN000094: Received new cluster view for channel ejb:
>>>>>> [dev-master|0]
>>>>>> (1) [dev-master]
>>>>>>
>>>>>> And on the slave:
>>>>>> 2019-08-26 15:07:29,567 INFO  [org.infinispan.CLUSTER] (MSC service
>>>>>> thread
>>>>>> 1-2) ISPN000094: Received new cluster view for channel ejb:
>>>>>> [dev-slave1|0]
>>>>>> (1) [dev-slave1]
>>>>>> 2019-08-26 15:07:29,572 INFO  [org.infinispan.CLUSTER] (MSC service
>>>>>> thread
>>>>>> 1-3) ISPN000094: Received new cluster view for channel ejb:
>>>>>> [dev-slave1|0]
>>>>>> (1) [dev-slave1]
>>>>>> 2019-08-26 15:07:29,572 INFO  [org.infinispan.CLUSTER] (MSC service
>>>>>> thread
>>>>>> 1-4) ISPN000094: Received new cluster view for channel ejb:
>>>>>> [dev-slave1|0]
>>>>>> (1) [dev-slave1]
>>>>>> 2019-08-26 15:07:29,574 INFO  [org.infinispan.CLUSTER] (MSC service
>>>>>> thread
>>>>>> 1-1) ISPN000094: Received new cluster view for channel ejb:
>>>>>> [dev-slave1|0]
>>>>>> (1) [dev-slave1]
>>>>>> 2019-08-26 15:07:29,635 INFO  [org.infinispan.CLUSTER] (MSC service
>>>>>> thread
>>>>>> 1-3) ISPN000094: Received new cluster view for channel ejb:
>>>>>> [dev-slave1|0]
>>>>>> (1) [dev-slave1]
>>>>>>
>>>>>
>>>>> This definitely doesn't look right. The view id (which is increasing
>>>>> monotonically) is 0. Which means this is an initial view and none of the
>>>>> new members joined. Clearly, the discovery protocol is not configured
>>>>> properly and both nodes are in separate (singleton) clusters.
>>>>>
>>>>>
>>>>>> I believe I read somewhere that I was supposed to see the master and
>>>>>> slave
>>>>>> together in the logs an not just master or slave. Maybe this is my
>>>>>> issue,
>>>>>> but I don't know how to resolve it.
>>>>>>
>>>>>> I can't use multi-cast as it's disabled in AWS and almost all cloud
>>>>>> providers.
>>>>>>
>>>>>
>>>>> The easiest option is to use TCPPING. However, it requires you to put
>>>>> all nodes IPs in its configuration [1]. There are other options as well,
>>>>> e.g. S3 Ping [2] and its rewritten (and much better) version - Native S3
>>>>> Ping [3].
>>>>>
>>>>> You may also be interested in using JDBC_PING. Please have a look at
>>>>> the our blogs [4][5].
>>>>>
>>>>> [1] http://jgroups.org/manual4/index.html#TCPPING_Prot
>>>>> [2] http://jgroups.org/manual4/index.html#_s3_ping
>>>>> [3] https://github.com/jgroups-extras/native-s3-ping
>>>>> [4] https://www.keycloak.org/2019/05/keycloak-cluster-setup.html
>>>>> [5] https://www.keycloak.org/2019/08/keycloak-jdbc-ping.html
>>>>>
>>>>>
>>>>>>
>>>>>> When I launch the master and let it come up, then launch the slave I
>>>>>> can
>>>>>> see all the traffic for the session on the master. As soon as I stop
>>>>>> the
>>>>>> master, the slave is looking for the master, but when I click on the
>>>>>> website, it just hangs waiting for a connection and then eventually
>>>>>> logs me
>>>>>> out, and I end up logging back in, and now I'm on the slave node. The
>>>>>> shared sessions are not happening. Is there something else I need to
>>>>>> do or
>>>>>> set?
>>>>>>
>>>>>
>>>>> It looks like a consequence of the JGroups discovery issue. Please try
>>>>> to fix the clustering problem and then see if this one appears again.
>>>>>
>>>>>
>>>>>>
>>>>>> I have this setup in my domain.xml configuration as well:
>>>>>>         <server-group name="auth-server-group" profile="ha">
>>>>>>             <jvm name="default">
>>>>>>                 <heap size="64m" max-size="512m"/>
>>>>>>             </jvm>
>>>>>>             <socket-binding-group ref="ha-sockets"/>
>>>>>>             <system-properties>
>>>>>>                 <property name="jboss.cluster.tcp.initial_hosts"
>>>>>> value="10.10.10.77[7600],10.10.10.27[7600]"/>
>>>>>>             </system-properties>
>>>>>>         </server-group>
>>>>>>
>>>>>> In my host.xml on the slave I have this setup to reach back to the
>>>>>> master
>>>>>> as the domain controller
>>>>>>     <domain-controller>
>>>>>>         <remote protocol="remote"
>>>>>> host="${jboss.domain.master.address}"
>>>>>> port="${jboss.domain.master.port:9999}"
>>>>>> security-realm="ManagementRealm"/>
>>>>>>    </domain-controller>
>>>>>>
>>>>>> Any help would be appreciated
>>>>>> _______________________________________________
>>>>>> keycloak-user mailing list
>>>>>> keycloak-user at lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/keycloak-user
>>>>>>
>>>>>