Hmm ... maybe the lb is pinging the port? I'm running dockercloud/haproxy,
which autodetects open ports. However, I'm excluding port 7600 so that it
doesn't try to route application requests to JGroups ports. The
SocketTimeoutException only happens once at startup, though. I don't see it
later when I start running auth tests.
Thanks for the pointer to Infinispan statistics query. I ran it for both
nodes and saved the results in the "ispn-cluster-query" subdirectory in
(previous shared)
https://drive.google.com/drive/folders/1AiyLtTXu2AxEbVBdR-5kfJLxqoYladBn?...
:
- "start" prefix is for the output of the query right after starting the
nodes. Node1 starts first, then node2.
- "first-auth" is the initial grant_type=password auth. In this set of
tests it was done on node1.
- "refresh-auth" is the subsequent failing grant_type=refresh_token . It's
successful on node1 and failing on node2.
- "post-node2-auth" is after grant_type=password auth is executed on node2
(which brings the cluster in-sync).
I couldn't spot any issues in the output with my untrained eyes. I wonder,
should the statistics be pulled from the sessions distributed cache as
well? Is that the one that would be consulted during
grant_type=refresh_token auth?
Thanks,
DV
On Mon, Sep 17, 2018 at 4:27 AM Sebastian Laskawiec <slaskawi(a)redhat.com>
wrote:
So the only thing that look suspicious is this:
JGRP000006: failed accepting connection from peer:
java.net.SocketTimeoutException: Read timed out
It might indicate that some other application tried to connect to Keycloak
on port 7600 and immediately disconnected. That leads to a question on your
environment, are you sure you are looking into proper applications servers?
Perhaps some other applications (Wildfly for example, since Keycloak is
built on Wildfly) are trying to join the cluster.
However, if the answer is yes, the next thing to check are Infinispan
statistics over JMX or JBoss CLI. Here's a sample query you may use:
/subsystem=infinispan/cache-container=keycloak/replicated-cache=*:query
And then have a look at number of entries and number of entries in the
cluster.
@Marek Posolda <mposolda(a)redhat.com>. perhaps this ring you any bell?
ISPN seems fine here (at least from the logs and symptoms DV's describing.
On Thu, Sep 13, 2018 at 6:53 PM D V <dv(a)glyphy.com> wrote:
> Weird indeed. Yes, the logs indicate two nodes. I've uploaded the full
> start-up logs here:
>
https://drive.google.com/drive/folders/1AiyLtTXu2AxEbVBdR-5kfJLxqoYladBn?...
> . I started node 1, let it settle, then started node 2. You can see that
> node1 starts with just itself, but later node2 joins the cluster and caches
> are rebalanced.
>
> As for the experiment, I tried waiting for a few minutes after both nodes
> started in case there's some synchronization delay somewhere, but it didn't
> change the outcome.
>
> Thanks,
> DV
>
> On Wed, Sep 12, 2018 at 3:22 AM Sebastian Laskawiec <slaskawi(a)redhat.com>
> wrote:
>
>> Hmmm this sounds a bit weird... like there was some delay in the
>> communication path.
>>
>> Could you please look through your logs and look for lines including
>> "view" keyword? Are there two nodes, as expected? How the timestamps
relate
>> to your experiment?
>>
>