I should also follow-up and say that I've got the same setup running in a
remote Mesos+Marathon cluster. It's got a different lb setup, but is
exhibiting the same behaviour.
One other note: I've uploaded the load balancer logs in the same directory.
Not sure if they're of any use.
Cheers,
DV
On Mon, Sep 17, 2018 at 12:28 PM D V <dv(a)glyphy.com> wrote:
Hmm ... maybe the lb is pinging the port? I'm running
dockercloud/haproxy,
which autodetects open ports. However, I'm excluding port 7600 so that it
doesn't try to route application requests to JGroups ports. The
SocketTimeoutException only happens once at startup, though. I don't see it
later when I start running auth tests.
Thanks for the pointer to Infinispan statistics query. I ran it for both
nodes and saved the results in the "ispn-cluster-query" subdirectory in
(previous shared)
https://drive.google.com/drive/folders/1AiyLtTXu2AxEbVBdR-5kfJLxqoYladBn?...
:
- "start" prefix is for the output of the query right after starting the
nodes. Node1 starts first, then node2.
- "first-auth" is the initial grant_type=password auth. In this set of
tests it was done on node1.
- "refresh-auth" is the subsequent failing grant_type=refresh_token . It's
successful on node1 and failing on node2.
- "post-node2-auth" is after grant_type=password auth is executed on node2
(which brings the cluster in-sync).
I couldn't spot any issues in the output with my untrained eyes. I wonder,
should the statistics be pulled from the sessions distributed cache as
well? Is that the one that would be consulted during
grant_type=refresh_token auth?
Thanks,
DV
On Mon, Sep 17, 2018 at 4:27 AM Sebastian Laskawiec <slaskawi(a)redhat.com>
wrote:
> So the only thing that look suspicious is this:
> JGRP000006: failed accepting connection from peer:
> java.net.SocketTimeoutException: Read timed out
>
> It might indicate that some other application tried to connect to
> Keycloak on port 7600 and immediately disconnected. That leads to a
> question on your environment, are you sure you are looking into proper
> applications servers? Perhaps some other applications (Wildfly for example,
> since Keycloak is built on Wildfly) are trying to join the cluster.
>
> However, if the answer is yes, the next thing to check are Infinispan
> statistics over JMX or JBoss CLI. Here's a sample query you may use:
> /subsystem=infinispan/cache-container=keycloak/replicated-cache=*:query
> And then have a look at number of entries and number of entries in the
> cluster.
>
> @Marek Posolda <mposolda(a)redhat.com>. perhaps this ring you any bell?
> ISPN seems fine here (at least from the logs and symptoms DV's describing.
>
> On Thu, Sep 13, 2018 at 6:53 PM D V <dv(a)glyphy.com> wrote:
>
>> Weird indeed. Yes, the logs indicate two nodes. I've uploaded the full
>> start-up logs here:
>>
https://drive.google.com/drive/folders/1AiyLtTXu2AxEbVBdR-5kfJLxqoYladBn?...
>> . I started node 1, let it settle, then started node 2. You can see that
>> node1 starts with just itself, but later node2 joins the cluster and caches
>> are rebalanced.
>>
>> As for the experiment, I tried waiting for a few minutes after both
>> nodes started in case there's some synchronization delay somewhere, but it
>> didn't change the outcome.
>>
>> Thanks,
>> DV
>>
>> On Wed, Sep 12, 2018 at 3:22 AM Sebastian Laskawiec <slaskawi(a)redhat.com>
>> wrote:
>>
>>> Hmmm this sounds a bit weird... like there was some delay in the
>>> communication path.
>>>
>>> Could you please look through your logs and look for lines including
>>> "view" keyword? Are there two nodes, as expected? How the
timestamps relate
>>> to your experiment?
>>>
>>