[JBoss JIRA] (ISPN-7489) org.jgroups.protocols.TCP emits errors when node leaves the cluster

Wednesday, 22 February 2017

    [
https://issues.jboss.org/browse/ISPN-7489?page=com.atlassian.jira.plugin....
] 

Sebastian Łaskawiec commented on ISPN-7489:
-------------------------------------------

[~belaban] explained me what's causing those problems:
{quote}
you're trying to connect to a server but are not able to do so within
TCP.sock_conn_timeout ms (default: 2000). The SocketTimeoutExceptions are all on
socket.connect().
{quote}

So it seems the old nodes (which are meant to be killed) are still in the cluster view.
From further investigation I can see that there's something wrong with node shutdown
procedure:
{code}
[transactions-repository-1-5zjqt] *** JBossAS process (84) received TERM signal ***
[transactions-repository-1-40mc9] [GC (Allocation Failure)  619101K->324299K(1013632K),
0.0197678 secs]
[transactions-repository-1-40mc9] 12:58:16,981 ERROR [org.jgroups.protocols.TCP]
(jgroups-25,transactions-repository-1-40mc9) JGRP000029: transactions-repository-1-40mc9:
failed sending message to transactions-repository-1-5zjqt (71 bytes):
java.net.SocketTimeoutException: connect timed out, headers: GMS: GmsHeader[VIEW_ACK],
UNICAST3: DATA, seqno=3994, TP: [cluster_name=cluster]
{code}

Node {{transactions-repository-1-5zjqt}} receives TERM signal (Kubernetes asks it shut
down gracefully) and since then, all messages to that node fails.

...
 org.jgroups.protocols.TCP emits errors when node leaves the cluster
 -------------------------------------------------------------------

                 Key: ISPN-7489
                 URL: https://issues.jboss.org/browse/ISPN-7489
             Project: Infinispan
          Issue Type: Bug
          Components: Cloud Integrations, Core
    Affects Versions: 9.0.0.CR1
         Environment: * OpenShift {{v1.5.0-alpha.2+e4b43ee}}
 * Custom Infinispan Server build (based on [these
instructions|https://github.com/slaskawi/infinispan-1/tree/custom_image]). SHA1
{{2b0731b21649a88a75ed71d21b9cc06ba365e947}}
            Reporter: Sebastian Łaskawiec

 When I was performing [Spring Session and Kubernetes Rolling Update
demo|https://bluejeans.com/s/pYKUg/] I encountered a couple of problems.
 One of the is this:
 {noformat}
 [transactions-repository-1-04x09] 18:09:12,193 ERROR [org.jgroups.protocols.TCP]
(jgroups-30,transactions-repository-1-04x09) JGRP000029: transactions-repository-1-04x09:
failed sending message to transactions-repository-1-4z05w (71 bytes):
java.net.SocketTimeoutException: connect timed out, headers: GMS: GmsHeader[VIEW_ACK],
UNICAST3: DATA, seqno=5262, TP: [cluster_name=cluster]
 [transactions-repository-1-1f8dx] 18:09:12,310 ERROR [org.jgroups.protocols.TCP]
(jgroups-16,transactions-repository-1-1f8dx) JGRP000029: transactions-repository-1-1f8dx:
failed sending message to transactions-repository-1-4z05w (71 bytes):
java.net.SocketTimeoutException: connect timed out, headers: GMS: GmsHeader[VIEW_ACK],
UNICAST3: DATA, seqno=6259, TP: [cluster_name=cluster]
 [transactions-repository-1-04x09] 18:09:12,997 ERROR [org.jgroups.protocols.TCP]
(jgroups-22,transactions-repository-1-04x09) JGRP000029: transactions-repository-1-04x09:
failed sending message to transactions-repository-1-4z05w (71 bytes):
java.net.SocketTimeoutException: connect timed out, headers: GMS: GmsHeader[VIEW_ACK],
UNICAST3: DATA, seqno=5262, TP: [cluster_name=cluster]
 [transactions-repository-1-1f8dx] 18:09:13,113 ERROR [org.jgroups.protocols.TCP]
(jgroups-16,transactions-repository-1-1f8dx) JGRP000029: transactions-repository-1-1f8dx:
failed sending message to transactions-repository-1-4z05w (71 bytes):
java.net.SocketTimeoutException: connect timed out, headers: GMS: GmsHeader[VIEW_ACK],
UNICAST3: DATA, seqno=6259, TP: [cluster_name=cluster]
 {noformat}
 Full logs from Rolling Update process might be found here:
https://gist.github.com/slaskawi/530241bb695f1f490bcb25eabaf9d676
 Steps to reproduce:
 * Start local OpenShift Cluster
 * invoke `./init_infrastructure.sh` from
https://github.com/slaskawi/presentations/tree/ISPN-7487-reproducer
 * invoke `cd transaction-creator && mvn fabric8:run`
 * Do the rolling update: `oc deploy transactions-repository --latest -n myproject`
 * Observe logs `kubetail -l environment=infrastructure` 

--
This message was sent by Atlassian JIRA
(v7.2.3#72005)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009