[JBoss JIRA] (WFLY-12214) JGRP000029: failed sending message: java.net.ConnectException: Connection refused

Thursday, 20 June 2019

     [
https://issues.jboss.org/browse/WFLY-12214?page=com.atlassian.jira.plugin...
]

Tommasso Borgato updated WFLY-12214:
------------------------------------
    Description: 
The error is observed in fail-over clustering tests where fail-over is
"shutdown" and the jgroups subsystem is as follows:

{noformat}
        <subsystem xmlns="urn:jboss:domain:jgroups:7.0">
            <channels default="ee">
                <channel name="ee" stack="tcp"
cluster="ejb"/>
            </channels>
            <stacks default="tcp">
                <stack name="udp">
                    <transport type="UDP"
socket-binding="jgroups-udp">
                        <property name="ip_ttl">
                            32
                        </property>
                    </transport>
                    <protocol type="PING"/>
                    <protocol type="MERGE3"/>
                    <socket-protocol type="FD_SOCK"
socket-binding="jgroups-udp-fd"/>
                    <protocol type="FD_ALL"/>
                    <protocol type="VERIFY_SUSPECT"/>
                    <protocol type="pbcast.NAKACK2"/>
                    <protocol type="UNICAST3"/>
                    <protocol type="pbcast.STABLE"/>
                    <protocol type="pbcast.GMS"/>
                    <protocol type="UFC"/>
                    <protocol type="MFC"/>
                    <protocol type="FRAG3"/>
                </stack>
                <stack name="tcp">
                    <transport type="TCP"
socket-binding="jgroups-tcp"/>
                    <socket-protocol type="MPING"
socket-binding="jgroups-mping"/>
                    <protocol type="MERGE3"/>
                    <socket-protocol type="FD_SOCK"
socket-binding="jgroups-tcp-fd"/>
                    <protocol type="FD_ALL"/>
                    <protocol type="VERIFY_SUSPECT"/>
                    <protocol type="pbcast.NAKACK2"/>
                    <protocol type="UNICAST3"/>
                    <protocol type="pbcast.STABLE"/>
                    <protocol type="pbcast.GMS"/>
                    <protocol type="MFC"/>
                    <protocol type="FRAG3"/>
                </stack>
            </stacks>
        </subsystem>
{noformat}

It wasn't observed with the previous version:

{noformat}
        <subsystem xmlns="urn:jboss:domain:jgroups:6.0">
            <channels default="ee">
                <channel name="ee" stack="tcp"
cluster="ejb"/>
            </channels>
            <stacks default="tcp">
                <stack name="udp">
                    <transport type="UDP"
socket-binding="jgroups-udp">
                        <property name="ip_ttl">
                            32
                        </property>
                    </transport>
                    <protocol type="PING"/>
                    <protocol type="MERGE3"/>
                    <protocol type="FD_SOCK"/>
                    <protocol type="FD_ALL"/>
                    <protocol type="VERIFY_SUSPECT"/>
                    <protocol type="pbcast.NAKACK2"/>
                    <protocol type="UNICAST3"/>
                    <protocol type="pbcast.STABLE"/>
                    <protocol type="pbcast.GMS"/>
                    <protocol type="UFC"/>
                    <protocol type="MFC"/>
                    <protocol type="FRAG3"/>
                </stack>
                <stack name="tcp">
                    <transport type="TCP"
socket-binding="jgroups-tcp"/>
                    <socket-protocol type="MPING"
socket-binding="jgroups-mping"/>
                    <protocol type="MERGE3"/>
                    <protocol type="FD_SOCK"/>
                    <protocol type="FD_ALL"/>
                    <protocol type="VERIFY_SUSPECT"/>
                    <protocol type="pbcast.NAKACK2"/>
                    <protocol type="UNICAST3"/>
                    <protocol type="pbcast.STABLE"/>
                    <protocol type="pbcast.GMS"/>
                    <protocol type="MFC"/>
                    <protocol type="FRAG3"/>
                </stack>
            </stacks>
        </subsystem>
{noformat}

Right after one node (wildfly1) is shut down and restarted and the next node (wildfly2) is
shut down we see:

{noformat}
2019-06-20 08:00:46,880 INFO  [org.wildfly.extension.undertow] (ServerService Thread Pool
-- 82) WFLYUT0021: Registered web context: '/clusterbench-granular' for server
'default-server'
2019-06-20 08:00:46,939 INFO  [org.wildfly.extension.undertow] (ServerService Thread Pool
-- 88) WFLYUT0021: Registered web context: '/clusterbench' for server
'default-server'
2019-06-20 08:00:47,024 INFO  [org.wildfly.extension.undertow] (ServerService Thread Pool
-- 85) WFLYUT0021: Registered web context: '/clusterbench-passivating' for server
'default-server'
2019-06-20 08:00:47,331 INFO  [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0010:
Deployed "clusterbench-ee8.ear" (runtime-name :
"clusterbench-ee8.ear")
2019-06-20 08:00:47,335 INFO  [org.jboss.as.server] (ServerService Thread Pool -- 47)
WFLYSRV0010: Deployed "postgresql-connector.jar" (runtime-name :
"postgresql-connector.jar")
2019-06-20 08:00:47,560 INFO  [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212:
Resuming server
2019-06-20 08:00:47,562 INFO  [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http
management interface listening on http://10.0.146.117:9990/management
2019-06-20 08:00:47,563 INFO  [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin
console listening on http://10.0.146.117:9990
2019-06-20 08:00:47,563 INFO  [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly
Full 18.0.0.Beta1-SNAPSHOT (WildFly Core 9.0.1.Final) started in 26227ms - Started 1065 of
1293 services (538 services are lazy, passive or on-demand)
2019-06-20 08:02:19,985 ERROR [org.jgroups.protocols.TCP] (TQ-Bundler-8,ejb,wildfly1)
JGRP000029: wildfly1: failed sending message to wildfly2 (134 bytes):
java.net.ConnectException: Connection refused (Connection refused), headers: FORK:
ejb:ejb, UNICAST3: DATA, seqno=12148, TP: [cluster=ejb]
2019-06-20 08:02:20,002 INFO  [org.infinispan.CLUSTER] (thread-216,ejb,wildfly1)
ISPN000094: Received new cluster view for channel ejb: [wildfly3|6] (3) [wildfly3,
wildfly4, wildfly1]
2019-06-20 08:02:20,006 ERROR [org.jgroups.protocols.TCP] (TQ-Bundler-8,ejb,wildfly1)
JGRP000029: wildfly1: failed sending message to wildfly2 (60 bytes):
java.net.ConnectException: Connection refused (Connection refused), headers: UNICAST3:
ACK, seqno=284, conn_id=4, ts=109, TP: [cluster=ejb]
2019-06-20 08:02:20,027 INFO  [org.infinispan.CLUSTER] (thread-216,ejb,wildfly1)
ISPN100001: Node wildfly2 left the cluster
2019-06-20 08:02:20,031 INFO  [org.infinispan.CLUSTER] (thread-216,ejb,wildfly1)
ISPN000094: Received new cluster view for channel ejb: [wildfly3|6] (3) [wildfly3,
wildfly4, wildfly1]
{noformat}

Whole logs
[here|https://eap-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/EAP7/vie...].

The number of errors is about 3000 per node;

Overall fail-rate is still low: about 0.55%, but it has increased sensibly if compared to
the previous version where it was about 0.01%.

  was:
The error is observed in fail-over clustering tests where fail-over is
"shutdown" and the jgroups subsystem is as follows:

{noformat}
        <subsystem xmlns="urn:jboss:domain:jgroups:7.0">
            <channels default="ee">
                <channel name="ee" stack="tcp"
cluster="ejb"/>
            </channels>
            <stacks default="tcp">
                <stack name="udp">
                    <transport type="UDP"
socket-binding="jgroups-udp">
                        <property name="ip_ttl">
                            32
                        </property>
                    </transport>
                    <protocol type="PING"/>
                    <protocol type="MERGE3"/>
                    <socket-protocol type="FD_SOCK"
socket-binding="jgroups-udp-fd"/>
                    <protocol type="FD_ALL"/>
                    <protocol type="VERIFY_SUSPECT"/>
                    <protocol type="pbcast.NAKACK2"/>
                    <protocol type="UNICAST3"/>
                    <protocol type="pbcast.STABLE"/>
                    <protocol type="pbcast.GMS"/>
                    <protocol type="UFC"/>
                    <protocol type="MFC"/>
                    <protocol type="FRAG3"/>
                </stack>
                <stack name="tcp">
                    <transport type="TCP"
socket-binding="jgroups-tcp"/>
                    <socket-protocol type="MPING"
socket-binding="jgroups-mping"/>
                    <protocol type="MERGE3"/>
                    <socket-protocol type="FD_SOCK"
socket-binding="jgroups-tcp-fd"/>
                    <protocol type="FD_ALL"/>
                    <protocol type="VERIFY_SUSPECT"/>
                    <protocol type="pbcast.NAKACK2"/>
                    <protocol type="UNICAST3"/>
                    <protocol type="pbcast.STABLE"/>
                    <protocol type="pbcast.GMS"/>
                    <protocol type="MFC"/>
                    <protocol type="FRAG3"/>
                </stack>
            </stacks>
        </subsystem>
{noformat}

It wasn't observed with the previous version:

{noformat}
        <subsystem xmlns="urn:jboss:domain:jgroups:6.0">
            <channels default="ee">
                <channel name="ee" stack="tcp"
cluster="ejb"/>
            </channels>
            <stacks default="tcp">
                <stack name="udp">
                    <transport type="UDP"
socket-binding="jgroups-udp">
                        <property name="ip_ttl">
                            32
                        </property>
                    </transport>
                    <protocol type="PING"/>
                    <protocol type="MERGE3"/>
                    <protocol type="FD_SOCK"/>
                    <protocol type="FD_ALL"/>
                    <protocol type="VERIFY_SUSPECT"/>
                    <protocol type="pbcast.NAKACK2"/>
                    <protocol type="UNICAST3"/>
                    <protocol type="pbcast.STABLE"/>
                    <protocol type="pbcast.GMS"/>
                    <protocol type="UFC"/>
                    <protocol type="MFC"/>
                    <protocol type="FRAG3"/>
                </stack>
                <stack name="tcp">
                    <transport type="TCP"
socket-binding="jgroups-tcp"/>
                    <socket-protocol type="MPING"
socket-binding="jgroups-mping"/>
                    <protocol type="MERGE3"/>
                    <protocol type="FD_SOCK"/>
                    <protocol type="FD_ALL"/>
                    <protocol type="VERIFY_SUSPECT"/>
                    <protocol type="pbcast.NAKACK2"/>
                    <protocol type="UNICAST3"/>
                    <protocol type="pbcast.STABLE"/>
                    <protocol type="pbcast.GMS"/>
                    <protocol type="MFC"/>
                    <protocol type="FRAG3"/>
                </stack>
            </stacks>
        </subsystem>
{noformat}

Right after one node (wildfly1) is shut down and restarted and the next node (wildfly2) is
shut down we see:

{noformat}
2019-06-20 08:00:46,880 INFO  [org.wildfly.extension.undertow] (ServerService Thread Pool
-- 82) WFLYUT0021: Registered web context: '/clusterbench-granular' for server
'default-server'
2019-06-20 08:00:46,939 INFO  [org.wildfly.extension.undertow] (ServerService Thread Pool
-- 88) WFLYUT0021: Registered web context: '/clusterbench' for server
'default-server'
2019-06-20 08:00:47,024 INFO  [org.wildfly.extension.undertow] (ServerService Thread Pool
-- 85) WFLYUT0021: Registered web context: '/clusterbench-passivating' for server
'default-server'
2019-06-20 08:00:47,331 INFO  [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0010:
Deployed "clusterbench-ee8.ear" (runtime-name :
"clusterbench-ee8.ear")
2019-06-20 08:00:47,335 INFO  [org.jboss.as.server] (ServerService Thread Pool -- 47)
WFLYSRV0010: Deployed "postgresql-connector.jar" (runtime-name :
"postgresql-connector.jar")
2019-06-20 08:00:47,560 INFO  [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212:
Resuming server
2019-06-20 08:00:47,562 INFO  [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http
management interface listening on http://10.0.146.117:9990/management
2019-06-20 08:00:47,563 INFO  [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin
console listening on http://10.0.146.117:9990
2019-06-20 08:00:47,563 INFO  [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly
Full 18.0.0.Beta1-SNAPSHOT (WildFly Core 9.0.1.Final) started in 26227ms - Started 1065 of
1293 services (538 services are lazy, passive or on-demand)
2019-06-20 08:02:19,985 ERROR [org.jgroups.protocols.TCP] (TQ-Bundler-8,ejb,wildfly1)
JGRP000029: wildfly1: failed sending message to wildfly2 (134 bytes):
java.net.ConnectException: Connection refused (Connection refused), headers: FORK:
ejb:ejb, UNICAST3: DATA, seqno=12148, TP: [cluster=ejb]
2019-06-20 08:02:20,002 INFO  [org.infinispan.CLUSTER] (thread-216,ejb,wildfly1)
ISPN000094: Received new cluster view for channel ejb: [wildfly3|6] (3) [wildfly3,
wildfly4, wildfly1]
2019-06-20 08:02:20,006 ERROR [org.jgroups.protocols.TCP] (TQ-Bundler-8,ejb,wildfly1)
JGRP000029: wildfly1: failed sending message to wildfly2 (60 bytes):
java.net.ConnectException: Connection refused (Connection refused), headers: UNICAST3:
ACK, seqno=284, conn_id=4, ts=109, TP: [cluster=ejb]
2019-06-20 08:02:20,027 INFO  [org.infinispan.CLUSTER] (thread-216,ejb,wildfly1)
ISPN100001: Node wildfly2 left the cluster
2019-06-20 08:02:20,031 INFO  [org.infinispan.CLUSTER] (thread-216,ejb,wildfly1)
ISPN000094: Received new cluster view for channel ejb: [wildfly3|6] (3) [wildfly3,
wildfly4, wildfly1]
{noformat}

The number of errors is about 3000 per node;

Overall fail-rate is still low: about 0.55%, but it has increased sensibly if compared to
the previous version where it was about 0.01%.

...
 JGRP000029: failed sending message: java.net.ConnectException:
Connection refused
 ---------------------------------------------------------------------------------

                 Key: WFLY-12214
                 URL: https://issues.jboss.org/browse/WFLY-12214
             Project: WildFly
          Issue Type: Bug
          Components: Clustering
    Affects Versions: 18.0.0.Beta1
            Reporter: Tommasso Borgato
            Assignee: Paul Ferraro
            Priority: Major

 The error is observed in fail-over clustering tests where fail-over is
"shutdown" and the jgroups subsystem is as follows:
 {noformat}
         <subsystem xmlns="urn:jboss:domain:jgroups:7.0">
             <channels default="ee">
                 <channel name="ee" stack="tcp"
cluster="ejb"/>
             </channels>
             <stacks default="tcp">
                 <stack name="udp">
                     <transport type="UDP"
socket-binding="jgroups-udp">
                         <property name="ip_ttl">
                             32
                         </property>
                     </transport>
                     <protocol type="PING"/>
                     <protocol type="MERGE3"/>
                     <socket-protocol type="FD_SOCK"
socket-binding="jgroups-udp-fd"/>
                     <protocol type="FD_ALL"/>
                     <protocol type="VERIFY_SUSPECT"/>
                     <protocol type="pbcast.NAKACK2"/>
                     <protocol type="UNICAST3"/>
                     <protocol type="pbcast.STABLE"/>
                     <protocol type="pbcast.GMS"/>
                     <protocol type="UFC"/>
                     <protocol type="MFC"/>
                     <protocol type="FRAG3"/>
                 </stack>
                 <stack name="tcp">
                     <transport type="TCP"
socket-binding="jgroups-tcp"/>
                     <socket-protocol type="MPING"
socket-binding="jgroups-mping"/>
                     <protocol type="MERGE3"/>
                     <socket-protocol type="FD_SOCK"
socket-binding="jgroups-tcp-fd"/>
                     <protocol type="FD_ALL"/>
                     <protocol type="VERIFY_SUSPECT"/>
                     <protocol type="pbcast.NAKACK2"/>
                     <protocol type="UNICAST3"/>
                     <protocol type="pbcast.STABLE"/>
                     <protocol type="pbcast.GMS"/>
                     <protocol type="MFC"/>
                     <protocol type="FRAG3"/>
                 </stack>
             </stacks>
         </subsystem>
 {noformat}
 It wasn't observed with the previous version:
 {noformat}
         <subsystem xmlns="urn:jboss:domain:jgroups:6.0">
             <channels default="ee">
                 <channel name="ee" stack="tcp"
cluster="ejb"/>
             </channels>
             <stacks default="tcp">
                 <stack name="udp">
                     <transport type="UDP"
socket-binding="jgroups-udp">
                         <property name="ip_ttl">
                             32
                         </property>
                     </transport>
                     <protocol type="PING"/>
                     <protocol type="MERGE3"/>
                     <protocol type="FD_SOCK"/>
                     <protocol type="FD_ALL"/>
                     <protocol type="VERIFY_SUSPECT"/>
                     <protocol type="pbcast.NAKACK2"/>
                     <protocol type="UNICAST3"/>
                     <protocol type="pbcast.STABLE"/>
                     <protocol type="pbcast.GMS"/>
                     <protocol type="UFC"/>
                     <protocol type="MFC"/>
                     <protocol type="FRAG3"/>
                 </stack>
                 <stack name="tcp">
                     <transport type="TCP"
socket-binding="jgroups-tcp"/>
                     <socket-protocol type="MPING"
socket-binding="jgroups-mping"/>
                     <protocol type="MERGE3"/>
                     <protocol type="FD_SOCK"/>
                     <protocol type="FD_ALL"/>
                     <protocol type="VERIFY_SUSPECT"/>
                     <protocol type="pbcast.NAKACK2"/>
                     <protocol type="UNICAST3"/>
                     <protocol type="pbcast.STABLE"/>
                     <protocol type="pbcast.GMS"/>
                     <protocol type="MFC"/>
                     <protocol type="FRAG3"/>
                 </stack>
             </stacks>
         </subsystem>
 {noformat}
 Right after one node (wildfly1) is shut down and restarted and the next node (wildfly2)
is shut down we see:
 {noformat}
 2019-06-20 08:00:46,880 INFO  [org.wildfly.extension.undertow] (ServerService Thread Pool
-- 82) WFLYUT0021: Registered web context: '/clusterbench-granular' for server
'default-server'
 2019-06-20 08:00:46,939 INFO  [org.wildfly.extension.undertow] (ServerService Thread Pool
-- 88) WFLYUT0021: Registered web context: '/clusterbench' for server
'default-server'
 2019-06-20 08:00:47,024 INFO  [org.wildfly.extension.undertow] (ServerService Thread Pool
-- 85) WFLYUT0021: Registered web context: '/clusterbench-passivating' for server
'default-server'
 2019-06-20 08:00:47,331 INFO  [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0010:
Deployed "clusterbench-ee8.ear" (runtime-name :
"clusterbench-ee8.ear")
 2019-06-20 08:00:47,335 INFO  [org.jboss.as.server] (ServerService Thread Pool -- 47)
WFLYSRV0010: Deployed "postgresql-connector.jar" (runtime-name :
"postgresql-connector.jar")
 2019-06-20 08:00:47,560 INFO  [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212:
Resuming server
 2019-06-20 08:00:47,562 INFO  [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http
management interface listening on http://10.0.146.117:9990/management
 2019-06-20 08:00:47,563 INFO  [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin
console listening on http://10.0.146.117:9990
 2019-06-20 08:00:47,563 INFO  [org.jboss.as] (Controller Boot Thread) WFLYSRV0025:
WildFly Full 18.0.0.Beta1-SNAPSHOT (WildFly Core 9.0.1.Final) started in 26227ms - Started
1065 of 1293 services (538 services are lazy, passive or on-demand)
 2019-06-20 08:02:19,985 ERROR [org.jgroups.protocols.TCP] (TQ-Bundler-8,ejb,wildfly1)
JGRP000029: wildfly1: failed sending message to wildfly2 (134 bytes):
java.net.ConnectException: Connection refused (Connection refused), headers: FORK:
ejb:ejb, UNICAST3: DATA, seqno=12148, TP: [cluster=ejb]
 2019-06-20 08:02:20,002 INFO  [org.infinispan.CLUSTER] (thread-216,ejb,wildfly1)
ISPN000094: Received new cluster view for channel ejb: [wildfly3|6] (3) [wildfly3,
wildfly4, wildfly1]
 2019-06-20 08:02:20,006 ERROR [org.jgroups.protocols.TCP] (TQ-Bundler-8,ejb,wildfly1)
JGRP000029: wildfly1: failed sending message to wildfly2 (60 bytes):
java.net.ConnectException: Connection refused (Connection refused), headers: UNICAST3:
ACK, seqno=284, conn_id=4, ts=109, TP: [cluster=ejb]
 2019-06-20 08:02:20,027 INFO  [org.infinispan.CLUSTER] (thread-216,ejb,wildfly1)
ISPN100001: Node wildfly2 left the cluster
 2019-06-20 08:02:20,031 INFO  [org.infinispan.CLUSTER] (thread-216,ejb,wildfly1)
ISPN000094: Received new cluster view for channel ejb: [wildfly3|6] (3) [wildfly3,
wildfly4, wildfly1]
 {noformat}
 Whole logs
[here|https://eap-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/EAP7/vie...].
 The number of errors is about 3000 per node;
 Overall fail-rate is still low: about 0.55%, but it has increased sensibly if compared to
the previous version where it was about 0.01%. 

--
This message was sent by Atlassian Jira
(v7.12.1#712002)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006