[keycloak-user] Standalone HA tokens not immediately shared among nodes

Mon Sep 17 14:25:21 EDT 2018

I should also follow-up and say that I've got the same setup running in a
remote Mesos+Marathon cluster. It's got a different lb setup, but is
exhibiting the same behaviour.
One other note: I've uploaded the load balancer logs in the same directory.
Not sure if they're of any use.

Cheers,
DV

On Mon, Sep 17, 2018 at 12:28 PM D V <dv at glyphy.com> wrote:

> Hmm ... maybe the lb is pinging the port? I'm running dockercloud/haproxy,
> which autodetects open ports. However, I'm excluding port 7600 so that it
> doesn't try to route application requests to JGroups ports. The
> SocketTimeoutException only happens once at startup, though. I don't see it
> later when I start running auth tests.
>
> Thanks for the pointer to Infinispan statistics query. I ran it for both
> nodes and saved the results in the "ispn-cluster-query" subdirectory in
> (previous shared)
> https://drive.google.com/drive/folders/1AiyLtTXu2AxEbVBdR-5kfJLxqoYladBn?usp=sharing
> :
> - "start" prefix is for the output of the query right after starting the
> nodes. Node1 starts first, then node2.
> - "first-auth" is the initial grant_type=password auth. In this set of
> tests it was done on node1.
> - "refresh-auth" is the subsequent failing grant_type=refresh_token . It's
> successful on node1 and failing on node2.
> - "post-node2-auth" is after grant_type=password auth is executed on node2
> (which brings the cluster in-sync).
>
> I couldn't spot any issues in the output with my untrained eyes. I wonder,
> should the statistics be pulled from the sessions distributed cache as
> well? Is that the one that would be consulted during
> grant_type=refresh_token auth?
>
> Thanks,
> DV
>
> On Mon, Sep 17, 2018 at 4:27 AM Sebastian Laskawiec <slaskawi at redhat.com>
> wrote:
>
>> So the only thing that look suspicious is this:
>> JGRP000006: failed accepting connection from peer:
>> java.net.SocketTimeoutException: Read timed out
>>
>> It might indicate that some other application tried to connect to
>> Keycloak on port 7600 and immediately disconnected. That leads to a
>> question on your environment, are you sure you are looking into proper
>> applications servers? Perhaps some other applications (Wildfly for example,
>> since Keycloak is built on Wildfly) are trying to join the cluster.
>>
>> However, if the answer is yes, the next thing to check are Infinispan
>> statistics over JMX or JBoss CLI. Here's a sample query you may use:
>> /subsystem=infinispan/cache-container=keycloak/replicated-cache=*:query
>> And then have a look at number of entries and number of entries in the
>> cluster.
>>
>> @Marek Posolda <mposolda at redhat.com>. perhaps this ring you any bell?
>> ISPN seems fine here (at least from the logs and symptoms DV's describing.
>>
>> On Thu, Sep 13, 2018 at 6:53 PM D V <dv at glyphy.com> wrote:
>>
>>> Weird indeed. Yes, the logs indicate two nodes. I've uploaded the full
>>> start-up logs here:
>>> https://drive.google.com/drive/folders/1AiyLtTXu2AxEbVBdR-5kfJLxqoYladBn?usp=sharing
>>> . I started node 1, let it settle, then started node 2. You can see that
>>> node1 starts with just itself, but later node2 joins the cluster and caches
>>> are rebalanced.
>>>
>>> As for the experiment, I tried waiting for a few minutes after both
>>> nodes started in case there's some synchronization delay somewhere, but it
>>> didn't change the outcome.
>>>
>>> Thanks,
>>> DV
>>>
>>> On Wed, Sep 12, 2018 at 3:22 AM Sebastian Laskawiec <slaskawi at redhat.com>
>>> wrote:
>>>
>>>> Hmmm this sounds a bit weird... like there was some delay in the
>>>> communication path.
>>>>
>>>> Could you please look through your logs and look for lines including
>>>> "view" keyword? Are there two nodes, as expected? How the timestamps relate
>>>> to your experiment?
>>>>
>>>