[jboss-jira] [JBoss JIRA] (WFLY-12214) JGRP000029: failed sending message: java.net.ConnectException: Connection refused
Tommasso Borgato (Jira)
issues at jboss.org
Thu Jun 20 05:06:01 EDT 2019
[ https://issues.jboss.org/browse/WFLY-12214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tommasso Borgato updated WFLY-12214:
------------------------------------
Description:
The error is observed in fail-over clustering tests where fail-over is "shutdown" and the jgroups subsystem is as follows:
{noformat}
<subsystem xmlns="urn:jboss:domain:jgroups:7.0">
<channels default="ee">
<channel name="ee" stack="tcp" cluster="ejb"/>
</channels>
<stacks default="tcp">
<stack name="udp">
<transport type="UDP" socket-binding="jgroups-udp">
<property name="ip_ttl">
32
</property>
</transport>
<protocol type="PING"/>
<protocol type="MERGE3"/>
<socket-protocol type="FD_SOCK" socket-binding="jgroups-udp-fd"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="UFC"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
<stack name="tcp">
<transport type="TCP" socket-binding="jgroups-tcp"/>
<socket-protocol type="MPING" socket-binding="jgroups-mping"/>
<protocol type="MERGE3"/>
<socket-protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
</stacks>
</subsystem>
{noformat}
It wasn't observed with the previous version:
{noformat}
<subsystem xmlns="urn:jboss:domain:jgroups:6.0">
<channels default="ee">
<channel name="ee" stack="tcp" cluster="ejb"/>
</channels>
<stacks default="tcp">
<stack name="udp">
<transport type="UDP" socket-binding="jgroups-udp">
<property name="ip_ttl">
32
</property>
</transport>
<protocol type="PING"/>
<protocol type="MERGE3"/>
<protocol type="FD_SOCK"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="UFC"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
<stack name="tcp">
<transport type="TCP" socket-binding="jgroups-tcp"/>
<socket-protocol type="MPING" socket-binding="jgroups-mping"/>
<protocol type="MERGE3"/>
<protocol type="FD_SOCK"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
</stacks>
</subsystem>
{noformat}
Right after one node (wildfly1) is shut down and restarted and the next node (wildfly2) is shut down we see:
{noformat}
2019-06-20 08:00:46,880 INFO [org.wildfly.extension.undertow] (ServerService Thread Pool -- 82) WFLYUT0021: Registered web context: '/clusterbench-granular' for server 'default-server'
2019-06-20 08:00:46,939 INFO [org.wildfly.extension.undertow] (ServerService Thread Pool -- 88) WFLYUT0021: Registered web context: '/clusterbench' for server 'default-server'
2019-06-20 08:00:47,024 INFO [org.wildfly.extension.undertow] (ServerService Thread Pool -- 85) WFLYUT0021: Registered web context: '/clusterbench-passivating' for server 'default-server'
2019-06-20 08:00:47,331 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0010: Deployed "clusterbench-ee8.ear" (runtime-name : "clusterbench-ee8.ear")
2019-06-20 08:00:47,335 INFO [org.jboss.as.server] (ServerService Thread Pool -- 47) WFLYSRV0010: Deployed "postgresql-connector.jar" (runtime-name : "postgresql-connector.jar")
2019-06-20 08:00:47,560 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server
2019-06-20 08:00:47,562 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.0.146.117:9990/management
2019-06-20 08:00:47,563 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.0.146.117:9990
2019-06-20 08:00:47,563 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 18.0.0.Beta1-SNAPSHOT (WildFly Core 9.0.1.Final) started in 26227ms - Started 1065 of 1293 services (538 services are lazy, passive or on-demand)
2019-06-20 08:02:19,985 ERROR [org.jgroups.protocols.TCP] (TQ-Bundler-8,ejb,wildfly1) JGRP000029: wildfly1: failed sending message to wildfly2 (134 bytes): java.net.ConnectException: Connection refused (Connection refused), headers: FORK: ejb:ejb, UNICAST3: DATA, seqno=12148, TP: [cluster=ejb]
2019-06-20 08:02:20,002 INFO [org.infinispan.CLUSTER] (thread-216,ejb,wildfly1) ISPN000094: Received new cluster view for channel ejb: [wildfly3|6] (3) [wildfly3, wildfly4, wildfly1]
2019-06-20 08:02:20,006 ERROR [org.jgroups.protocols.TCP] (TQ-Bundler-8,ejb,wildfly1) JGRP000029: wildfly1: failed sending message to wildfly2 (60 bytes): java.net.ConnectException: Connection refused (Connection refused), headers: UNICAST3: ACK, seqno=284, conn_id=4, ts=109, TP: [cluster=ejb]
2019-06-20 08:02:20,027 INFO [org.infinispan.CLUSTER] (thread-216,ejb,wildfly1) ISPN100001: Node wildfly2 left the cluster
2019-06-20 08:02:20,031 INFO [org.infinispan.CLUSTER] (thread-216,ejb,wildfly1) ISPN000094: Received new cluster view for channel ejb: [wildfly3|6] (3) [wildfly3, wildfly4, wildfly1]
{noformat}
Whole logs [here|https://eap-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/EAP7/view/EAP7-Clustering/view/EAP7-Clustering-Database/job/eap-7.x-clustering-db-session-shutdown-repl-postgresql-10.1-offload-profile/11/].
The number of errors is about 3000 per node;
Overall fail-rate is still low: about 0.55%, but it has increased sensibly if compared to the previous version where it was about 0.01%.
was:
The error is observed in fail-over clustering tests where fail-over is "shutdown" and the jgroups subsystem is as follows:
{noformat}
<subsystem xmlns="urn:jboss:domain:jgroups:7.0">
<channels default="ee">
<channel name="ee" stack="tcp" cluster="ejb"/>
</channels>
<stacks default="tcp">
<stack name="udp">
<transport type="UDP" socket-binding="jgroups-udp">
<property name="ip_ttl">
32
</property>
</transport>
<protocol type="PING"/>
<protocol type="MERGE3"/>
<socket-protocol type="FD_SOCK" socket-binding="jgroups-udp-fd"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="UFC"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
<stack name="tcp">
<transport type="TCP" socket-binding="jgroups-tcp"/>
<socket-protocol type="MPING" socket-binding="jgroups-mping"/>
<protocol type="MERGE3"/>
<socket-protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
</stacks>
</subsystem>
{noformat}
It wasn't observed with the previous version:
{noformat}
<subsystem xmlns="urn:jboss:domain:jgroups:6.0">
<channels default="ee">
<channel name="ee" stack="tcp" cluster="ejb"/>
</channels>
<stacks default="tcp">
<stack name="udp">
<transport type="UDP" socket-binding="jgroups-udp">
<property name="ip_ttl">
32
</property>
</transport>
<protocol type="PING"/>
<protocol type="MERGE3"/>
<protocol type="FD_SOCK"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="UFC"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
<stack name="tcp">
<transport type="TCP" socket-binding="jgroups-tcp"/>
<socket-protocol type="MPING" socket-binding="jgroups-mping"/>
<protocol type="MERGE3"/>
<protocol type="FD_SOCK"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
</stacks>
</subsystem>
{noformat}
Right after one node (wildfly1) is shut down and restarted and the next node (wildfly2) is shut down we see:
{noformat}
2019-06-20 08:00:46,880 INFO [org.wildfly.extension.undertow] (ServerService Thread Pool -- 82) WFLYUT0021: Registered web context: '/clusterbench-granular' for server 'default-server'
2019-06-20 08:00:46,939 INFO [org.wildfly.extension.undertow] (ServerService Thread Pool -- 88) WFLYUT0021: Registered web context: '/clusterbench' for server 'default-server'
2019-06-20 08:00:47,024 INFO [org.wildfly.extension.undertow] (ServerService Thread Pool -- 85) WFLYUT0021: Registered web context: '/clusterbench-passivating' for server 'default-server'
2019-06-20 08:00:47,331 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0010: Deployed "clusterbench-ee8.ear" (runtime-name : "clusterbench-ee8.ear")
2019-06-20 08:00:47,335 INFO [org.jboss.as.server] (ServerService Thread Pool -- 47) WFLYSRV0010: Deployed "postgresql-connector.jar" (runtime-name : "postgresql-connector.jar")
2019-06-20 08:00:47,560 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server
2019-06-20 08:00:47,562 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.0.146.117:9990/management
2019-06-20 08:00:47,563 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.0.146.117:9990
2019-06-20 08:00:47,563 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 18.0.0.Beta1-SNAPSHOT (WildFly Core 9.0.1.Final) started in 26227ms - Started 1065 of 1293 services (538 services are lazy, passive or on-demand)
2019-06-20 08:02:19,985 ERROR [org.jgroups.protocols.TCP] (TQ-Bundler-8,ejb,wildfly1) JGRP000029: wildfly1: failed sending message to wildfly2 (134 bytes): java.net.ConnectException: Connection refused (Connection refused), headers: FORK: ejb:ejb, UNICAST3: DATA, seqno=12148, TP: [cluster=ejb]
2019-06-20 08:02:20,002 INFO [org.infinispan.CLUSTER] (thread-216,ejb,wildfly1) ISPN000094: Received new cluster view for channel ejb: [wildfly3|6] (3) [wildfly3, wildfly4, wildfly1]
2019-06-20 08:02:20,006 ERROR [org.jgroups.protocols.TCP] (TQ-Bundler-8,ejb,wildfly1) JGRP000029: wildfly1: failed sending message to wildfly2 (60 bytes): java.net.ConnectException: Connection refused (Connection refused), headers: UNICAST3: ACK, seqno=284, conn_id=4, ts=109, TP: [cluster=ejb]
2019-06-20 08:02:20,027 INFO [org.infinispan.CLUSTER] (thread-216,ejb,wildfly1) ISPN100001: Node wildfly2 left the cluster
2019-06-20 08:02:20,031 INFO [org.infinispan.CLUSTER] (thread-216,ejb,wildfly1) ISPN000094: Received new cluster view for channel ejb: [wildfly3|6] (3) [wildfly3, wildfly4, wildfly1]
{noformat}
The number of errors is about 3000 per node;
Overall fail-rate is still low: about 0.55%, but it has increased sensibly if compared to the previous version where it was about 0.01%.
> JGRP000029: failed sending message: java.net.ConnectException: Connection refused
> ---------------------------------------------------------------------------------
>
> Key: WFLY-12214
> URL: https://issues.jboss.org/browse/WFLY-12214
> Project: WildFly
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 18.0.0.Beta1
> Reporter: Tommasso Borgato
> Assignee: Paul Ferraro
> Priority: Major
>
> The error is observed in fail-over clustering tests where fail-over is "shutdown" and the jgroups subsystem is as follows:
> {noformat}
> <subsystem xmlns="urn:jboss:domain:jgroups:7.0">
> <channels default="ee">
> <channel name="ee" stack="tcp" cluster="ejb"/>
> </channels>
> <stacks default="tcp">
> <stack name="udp">
> <transport type="UDP" socket-binding="jgroups-udp">
> <property name="ip_ttl">
> 32
> </property>
> </transport>
> <protocol type="PING"/>
> <protocol type="MERGE3"/>
> <socket-protocol type="FD_SOCK" socket-binding="jgroups-udp-fd"/>
> <protocol type="FD_ALL"/>
> <protocol type="VERIFY_SUSPECT"/>
> <protocol type="pbcast.NAKACK2"/>
> <protocol type="UNICAST3"/>
> <protocol type="pbcast.STABLE"/>
> <protocol type="pbcast.GMS"/>
> <protocol type="UFC"/>
> <protocol type="MFC"/>
> <protocol type="FRAG3"/>
> </stack>
> <stack name="tcp">
> <transport type="TCP" socket-binding="jgroups-tcp"/>
> <socket-protocol type="MPING" socket-binding="jgroups-mping"/>
> <protocol type="MERGE3"/>
> <socket-protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/>
> <protocol type="FD_ALL"/>
> <protocol type="VERIFY_SUSPECT"/>
> <protocol type="pbcast.NAKACK2"/>
> <protocol type="UNICAST3"/>
> <protocol type="pbcast.STABLE"/>
> <protocol type="pbcast.GMS"/>
> <protocol type="MFC"/>
> <protocol type="FRAG3"/>
> </stack>
> </stacks>
> </subsystem>
> {noformat}
> It wasn't observed with the previous version:
> {noformat}
> <subsystem xmlns="urn:jboss:domain:jgroups:6.0">
> <channels default="ee">
> <channel name="ee" stack="tcp" cluster="ejb"/>
> </channels>
> <stacks default="tcp">
> <stack name="udp">
> <transport type="UDP" socket-binding="jgroups-udp">
> <property name="ip_ttl">
> 32
> </property>
> </transport>
> <protocol type="PING"/>
> <protocol type="MERGE3"/>
> <protocol type="FD_SOCK"/>
> <protocol type="FD_ALL"/>
> <protocol type="VERIFY_SUSPECT"/>
> <protocol type="pbcast.NAKACK2"/>
> <protocol type="UNICAST3"/>
> <protocol type="pbcast.STABLE"/>
> <protocol type="pbcast.GMS"/>
> <protocol type="UFC"/>
> <protocol type="MFC"/>
> <protocol type="FRAG3"/>
> </stack>
> <stack name="tcp">
> <transport type="TCP" socket-binding="jgroups-tcp"/>
> <socket-protocol type="MPING" socket-binding="jgroups-mping"/>
> <protocol type="MERGE3"/>
> <protocol type="FD_SOCK"/>
> <protocol type="FD_ALL"/>
> <protocol type="VERIFY_SUSPECT"/>
> <protocol type="pbcast.NAKACK2"/>
> <protocol type="UNICAST3"/>
> <protocol type="pbcast.STABLE"/>
> <protocol type="pbcast.GMS"/>
> <protocol type="MFC"/>
> <protocol type="FRAG3"/>
> </stack>
> </stacks>
> </subsystem>
> {noformat}
> Right after one node (wildfly1) is shut down and restarted and the next node (wildfly2) is shut down we see:
> {noformat}
> 2019-06-20 08:00:46,880 INFO [org.wildfly.extension.undertow] (ServerService Thread Pool -- 82) WFLYUT0021: Registered web context: '/clusterbench-granular' for server 'default-server'
> 2019-06-20 08:00:46,939 INFO [org.wildfly.extension.undertow] (ServerService Thread Pool -- 88) WFLYUT0021: Registered web context: '/clusterbench' for server 'default-server'
> 2019-06-20 08:00:47,024 INFO [org.wildfly.extension.undertow] (ServerService Thread Pool -- 85) WFLYUT0021: Registered web context: '/clusterbench-passivating' for server 'default-server'
> 2019-06-20 08:00:47,331 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0010: Deployed "clusterbench-ee8.ear" (runtime-name : "clusterbench-ee8.ear")
> 2019-06-20 08:00:47,335 INFO [org.jboss.as.server] (ServerService Thread Pool -- 47) WFLYSRV0010: Deployed "postgresql-connector.jar" (runtime-name : "postgresql-connector.jar")
> 2019-06-20 08:00:47,560 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server
> 2019-06-20 08:00:47,562 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.0.146.117:9990/management
> 2019-06-20 08:00:47,563 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.0.146.117:9990
> 2019-06-20 08:00:47,563 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 18.0.0.Beta1-SNAPSHOT (WildFly Core 9.0.1.Final) started in 26227ms - Started 1065 of 1293 services (538 services are lazy, passive or on-demand)
> 2019-06-20 08:02:19,985 ERROR [org.jgroups.protocols.TCP] (TQ-Bundler-8,ejb,wildfly1) JGRP000029: wildfly1: failed sending message to wildfly2 (134 bytes): java.net.ConnectException: Connection refused (Connection refused), headers: FORK: ejb:ejb, UNICAST3: DATA, seqno=12148, TP: [cluster=ejb]
> 2019-06-20 08:02:20,002 INFO [org.infinispan.CLUSTER] (thread-216,ejb,wildfly1) ISPN000094: Received new cluster view for channel ejb: [wildfly3|6] (3) [wildfly3, wildfly4, wildfly1]
> 2019-06-20 08:02:20,006 ERROR [org.jgroups.protocols.TCP] (TQ-Bundler-8,ejb,wildfly1) JGRP000029: wildfly1: failed sending message to wildfly2 (60 bytes): java.net.ConnectException: Connection refused (Connection refused), headers: UNICAST3: ACK, seqno=284, conn_id=4, ts=109, TP: [cluster=ejb]
> 2019-06-20 08:02:20,027 INFO [org.infinispan.CLUSTER] (thread-216,ejb,wildfly1) ISPN100001: Node wildfly2 left the cluster
> 2019-06-20 08:02:20,031 INFO [org.infinispan.CLUSTER] (thread-216,ejb,wildfly1) ISPN000094: Received new cluster view for channel ejb: [wildfly3|6] (3) [wildfly3, wildfly4, wildfly1]
> {noformat}
> Whole logs [here|https://eap-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/EAP7/view/EAP7-Clustering/view/EAP7-Clustering-Database/job/eap-7.x-clustering-db-session-shutdown-repl-postgresql-10.1-offload-profile/11/].
> The number of errors is about 3000 per node;
> Overall fail-rate is still low: about 0.55%, but it has increased sensibly if compared to the previous version where it was about 0.01%.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
More information about the jboss-jira
mailing list