]
Sebastian Łaskawiec commented on ISPN-7489:
-------------------------------------------
[~belaban] explained me what's causing those problems:
{quote}
you're trying to connect to a server but are not able to do so within
TCP.sock_conn_timeout ms (default: 2000). The SocketTimeoutExceptions are all on
socket.connect().
{quote}
So it seems the old nodes (which are meant to be killed) are still in the cluster view.
From further investigation I can see that there's something wrong with node shutdown
procedure:
{code}
[transactions-repository-1-5zjqt] *** JBossAS process (84) received TERM signal ***
[transactions-repository-1-40mc9] [GC (Allocation Failure) 619101K->324299K(1013632K),
0.0197678 secs]
[transactions-repository-1-40mc9] 12:58:16,981 ERROR [org.jgroups.protocols.TCP]
(jgroups-25,transactions-repository-1-40mc9) JGRP000029: transactions-repository-1-40mc9:
failed sending message to transactions-repository-1-5zjqt (71 bytes):
java.net.SocketTimeoutException: connect timed out, headers: GMS: GmsHeader[VIEW_ACK],
UNICAST3: DATA, seqno=3994, TP: [cluster_name=cluster]
{code}
Node {{transactions-repository-1-5zjqt}} receives TERM signal (Kubernetes asks it shut
down gracefully) and since then, all messages to that node fails.
org.jgroups.protocols.TCP emits errors when node leaves the cluster
-------------------------------------------------------------------
Key: ISPN-7489
URL:
https://issues.jboss.org/browse/ISPN-7489
Project: Infinispan
Issue Type: Bug
Components: Cloud Integrations, Core
Affects Versions: 9.0.0.CR1
Environment: * OpenShift {{v1.5.0-alpha.2+e4b43ee}}
* Custom Infinispan Server build (based on [these
instructions|https://github.com/slaskawi/infinispan-1/tree/custom_image]). SHA1
{{2b0731b21649a88a75ed71d21b9cc06ba365e947}}
Reporter: Sebastian Łaskawiec
When I was performing [Spring Session and Kubernetes Rolling Update
demo|https://bluejeans.com/s/pYKUg/] I encountered a couple of problems.
One of the is this:
{noformat}
[transactions-repository-1-04x09] 18:09:12,193 ERROR [org.jgroups.protocols.TCP]
(jgroups-30,transactions-repository-1-04x09) JGRP000029: transactions-repository-1-04x09:
failed sending message to transactions-repository-1-4z05w (71 bytes):
java.net.SocketTimeoutException: connect timed out, headers: GMS: GmsHeader[VIEW_ACK],
UNICAST3: DATA, seqno=5262, TP: [cluster_name=cluster]
[transactions-repository-1-1f8dx] 18:09:12,310 ERROR [org.jgroups.protocols.TCP]
(jgroups-16,transactions-repository-1-1f8dx) JGRP000029: transactions-repository-1-1f8dx:
failed sending message to transactions-repository-1-4z05w (71 bytes):
java.net.SocketTimeoutException: connect timed out, headers: GMS: GmsHeader[VIEW_ACK],
UNICAST3: DATA, seqno=6259, TP: [cluster_name=cluster]
[transactions-repository-1-04x09] 18:09:12,997 ERROR [org.jgroups.protocols.TCP]
(jgroups-22,transactions-repository-1-04x09) JGRP000029: transactions-repository-1-04x09:
failed sending message to transactions-repository-1-4z05w (71 bytes):
java.net.SocketTimeoutException: connect timed out, headers: GMS: GmsHeader[VIEW_ACK],
UNICAST3: DATA, seqno=5262, TP: [cluster_name=cluster]
[transactions-repository-1-1f8dx] 18:09:13,113 ERROR [org.jgroups.protocols.TCP]
(jgroups-16,transactions-repository-1-1f8dx) JGRP000029: transactions-repository-1-1f8dx:
failed sending message to transactions-repository-1-4z05w (71 bytes):
java.net.SocketTimeoutException: connect timed out, headers: GMS: GmsHeader[VIEW_ACK],
UNICAST3: DATA, seqno=6259, TP: [cluster_name=cluster]
{noformat}
Full logs from Rolling Update process might be found here:
https://gist.github.com/slaskawi/530241bb695f1f490bcb25eabaf9d676
Steps to reproduce:
* Start local OpenShift Cluster
* invoke `./init_infrastructure.sh` from
https://github.com/slaskawi/presentations/tree/ISPN-7487-reproducer
* invoke `cd transaction-creator && mvn fabric8:run`
* Do the rolling update: `oc deploy transactions-repository --latest -n myproject`
* Observe logs `kubetail -l environment=infrastructure`