[
https://issues.jboss.org/browse/WFLY-12214?page=com.atlassian.jira.plugin...
]
Tommasso Borgato updated WFLY-12214:
------------------------------------
Description:
The error is observed in fail-over clustering tests where fail-over is
"shutdown" and the jgroups subsystem is as follows:
{noformat}
<subsystem xmlns="urn:jboss:domain:jgroups:7.0">
<channels default="ee">
<channel name="ee" stack="tcp"
cluster="ejb"/>
</channels>
<stacks default="tcp">
<stack name="udp">
<transport type="UDP"
socket-binding="jgroups-udp">
<property name="ip_ttl">
32
</property>
</transport>
<protocol type="PING"/>
<protocol type="MERGE3"/>
<socket-protocol type="FD_SOCK"
socket-binding="jgroups-udp-fd"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="UFC"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
<stack name="tcp">
<transport type="TCP"
socket-binding="jgroups-tcp"/>
<socket-protocol type="MPING"
socket-binding="jgroups-mping"/>
<protocol type="MERGE3"/>
<socket-protocol type="FD_SOCK"
socket-binding="jgroups-tcp-fd"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
</stacks>
</subsystem>
{noformat}
It wasn't observed with the previous version:
{noformat}
<subsystem xmlns="urn:jboss:domain:jgroups:6.0">
<channels default="ee">
<channel name="ee" stack="tcp"
cluster="ejb"/>
</channels>
<stacks default="tcp">
<stack name="udp">
<transport type="UDP"
socket-binding="jgroups-udp">
<property name="ip_ttl">
32
</property>
</transport>
<protocol type="PING"/>
<protocol type="MERGE3"/>
<protocol type="FD_SOCK"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="UFC"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
<stack name="tcp">
<transport type="TCP"
socket-binding="jgroups-tcp"/>
<socket-protocol type="MPING"
socket-binding="jgroups-mping"/>
<protocol type="MERGE3"/>
<protocol type="FD_SOCK"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
</stacks>
</subsystem>
{noformat}
Right after one node (wildfly1) is shut down and restarted and the next node (wildfly2) is
shut down we see:
{noformat}
2019-06-20 08:00:46,880 INFO [org.wildfly.extension.undertow] (ServerService Thread Pool
-- 82) WFLYUT0021: Registered web context: '/clusterbench-granular' for server
'default-server'
2019-06-20 08:00:46,939 INFO [org.wildfly.extension.undertow] (ServerService Thread Pool
-- 88) WFLYUT0021: Registered web context: '/clusterbench' for server
'default-server'
2019-06-20 08:00:47,024 INFO [org.wildfly.extension.undertow] (ServerService Thread Pool
-- 85) WFLYUT0021: Registered web context: '/clusterbench-passivating' for server
'default-server'
2019-06-20 08:00:47,331 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0010:
Deployed "clusterbench-ee8.ear" (runtime-name :
"clusterbench-ee8.ear")
2019-06-20 08:00:47,335 INFO [org.jboss.as.server] (ServerService Thread Pool -- 47)
WFLYSRV0010: Deployed "postgresql-connector.jar" (runtime-name :
"postgresql-connector.jar")
2019-06-20 08:00:47,560 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212:
Resuming server
2019-06-20 08:00:47,562 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http
management interface listening on
http://10.0.146.117:9990/management
2019-06-20 08:00:47,563 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin
console listening on
http://10.0.146.117:9990
2019-06-20 08:00:47,563 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly
Full 18.0.0.Beta1-SNAPSHOT (WildFly Core 9.0.1.Final) started in 26227ms - Started 1065 of
1293 services (538 services are lazy, passive or on-demand)
2019-06-20 08:02:19,985 ERROR [org.jgroups.protocols.TCP] (TQ-Bundler-8,ejb,wildfly1)
JGRP000029: wildfly1: failed sending message to wildfly2 (134 bytes):
java.net.ConnectException: Connection refused (Connection refused), headers: FORK:
ejb:ejb, UNICAST3: DATA, seqno=12148, TP: [cluster=ejb]
2019-06-20 08:02:20,002 INFO [org.infinispan.CLUSTER] (thread-216,ejb,wildfly1)
ISPN000094: Received new cluster view for channel ejb: [wildfly3|6] (3) [wildfly3,
wildfly4, wildfly1]
2019-06-20 08:02:20,006 ERROR [org.jgroups.protocols.TCP] (TQ-Bundler-8,ejb,wildfly1)
JGRP000029: wildfly1: failed sending message to wildfly2 (60 bytes):
java.net.ConnectException: Connection refused (Connection refused), headers: UNICAST3:
ACK, seqno=284, conn_id=4, ts=109, TP: [cluster=ejb]
2019-06-20 08:02:20,027 INFO [org.infinispan.CLUSTER] (thread-216,ejb,wildfly1)
ISPN100001: Node wildfly2 left the cluster
2019-06-20 08:02:20,031 INFO [org.infinispan.CLUSTER] (thread-216,ejb,wildfly1)
ISPN000094: Received new cluster view for channel ejb: [wildfly3|6] (3) [wildfly3,
wildfly4, wildfly1]
{noformat}
Whole logs
[
here|https://eap-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/EAP7/vie...].
The number of errors is about 3000 per node;
Overall fail-rate is still low: about 0.55%, but it has increased sensibly if compared to
the previous version where it was about 0.01%.
was:
The error is observed in fail-over clustering tests where fail-over is
"shutdown" and the jgroups subsystem is as follows:
{noformat}
<subsystem xmlns="urn:jboss:domain:jgroups:7.0">
<channels default="ee">
<channel name="ee" stack="tcp"
cluster="ejb"/>
</channels>
<stacks default="tcp">
<stack name="udp">
<transport type="UDP"
socket-binding="jgroups-udp">
<property name="ip_ttl">
32
</property>
</transport>
<protocol type="PING"/>
<protocol type="MERGE3"/>
<socket-protocol type="FD_SOCK"
socket-binding="jgroups-udp-fd"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="UFC"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
<stack name="tcp">
<transport type="TCP"
socket-binding="jgroups-tcp"/>
<socket-protocol type="MPING"
socket-binding="jgroups-mping"/>
<protocol type="MERGE3"/>
<socket-protocol type="FD_SOCK"
socket-binding="jgroups-tcp-fd"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
</stacks>
</subsystem>
{noformat}
It wasn't observed with the previous version:
{noformat}
<subsystem xmlns="urn:jboss:domain:jgroups:6.0">
<channels default="ee">
<channel name="ee" stack="tcp"
cluster="ejb"/>
</channels>
<stacks default="tcp">
<stack name="udp">
<transport type="UDP"
socket-binding="jgroups-udp">
<property name="ip_ttl">
32
</property>
</transport>
<protocol type="PING"/>
<protocol type="MERGE3"/>
<protocol type="FD_SOCK"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="UFC"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
<stack name="tcp">
<transport type="TCP"
socket-binding="jgroups-tcp"/>
<socket-protocol type="MPING"
socket-binding="jgroups-mping"/>
<protocol type="MERGE3"/>
<protocol type="FD_SOCK"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
</stacks>
</subsystem>
{noformat}
Right after one node (wildfly1) is shut down and restarted and the next node (wildfly2) is
shut down we see:
{noformat}
2019-06-20 08:00:46,880 INFO [org.wildfly.extension.undertow] (ServerService Thread Pool
-- 82) WFLYUT0021: Registered web context: '/clusterbench-granular' for server
'default-server'
2019-06-20 08:00:46,939 INFO [org.wildfly.extension.undertow] (ServerService Thread Pool
-- 88) WFLYUT0021: Registered web context: '/clusterbench' for server
'default-server'
2019-06-20 08:00:47,024 INFO [org.wildfly.extension.undertow] (ServerService Thread Pool
-- 85) WFLYUT0021: Registered web context: '/clusterbench-passivating' for server
'default-server'
2019-06-20 08:00:47,331 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0010:
Deployed "clusterbench-ee8.ear" (runtime-name :
"clusterbench-ee8.ear")
2019-06-20 08:00:47,335 INFO [org.jboss.as.server] (ServerService Thread Pool -- 47)
WFLYSRV0010: Deployed "postgresql-connector.jar" (runtime-name :
"postgresql-connector.jar")
2019-06-20 08:00:47,560 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212:
Resuming server
2019-06-20 08:00:47,562 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http
management interface listening on
http://10.0.146.117:9990/management
2019-06-20 08:00:47,563 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin
console listening on
http://10.0.146.117:9990
2019-06-20 08:00:47,563 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly
Full 18.0.0.Beta1-SNAPSHOT (WildFly Core 9.0.1.Final) started in 26227ms - Started 1065 of
1293 services (538 services are lazy, passive or on-demand)
2019-06-20 08:02:19,985 ERROR [org.jgroups.protocols.TCP] (TQ-Bundler-8,ejb,wildfly1)
JGRP000029: wildfly1: failed sending message to wildfly2 (134 bytes):
java.net.ConnectException: Connection refused (Connection refused), headers: FORK:
ejb:ejb, UNICAST3: DATA, seqno=12148, TP: [cluster=ejb]
2019-06-20 08:02:20,002 INFO [org.infinispan.CLUSTER] (thread-216,ejb,wildfly1)
ISPN000094: Received new cluster view for channel ejb: [wildfly3|6] (3) [wildfly3,
wildfly4, wildfly1]
2019-06-20 08:02:20,006 ERROR [org.jgroups.protocols.TCP] (TQ-Bundler-8,ejb,wildfly1)
JGRP000029: wildfly1: failed sending message to wildfly2 (60 bytes):
java.net.ConnectException: Connection refused (Connection refused), headers: UNICAST3:
ACK, seqno=284, conn_id=4, ts=109, TP: [cluster=ejb]
2019-06-20 08:02:20,027 INFO [org.infinispan.CLUSTER] (thread-216,ejb,wildfly1)
ISPN100001: Node wildfly2 left the cluster
2019-06-20 08:02:20,031 INFO [org.infinispan.CLUSTER] (thread-216,ejb,wildfly1)
ISPN000094: Received new cluster view for channel ejb: [wildfly3|6] (3) [wildfly3,
wildfly4, wildfly1]
{noformat}
The number of errors is about 3000 per node;
Overall fail-rate is still low: about 0.55%, but it has increased sensibly if compared to
the previous version where it was about 0.01%.
JGRP000029: failed sending message: java.net.ConnectException:
Connection refused
---------------------------------------------------------------------------------
Key: WFLY-12214
URL:
https://issues.jboss.org/browse/WFLY-12214
Project: WildFly
Issue Type: Bug
Components: Clustering
Affects Versions: 18.0.0.Beta1
Reporter: Tommasso Borgato
Assignee: Paul Ferraro
Priority: Major
The error is observed in fail-over clustering tests where fail-over is
"shutdown" and the jgroups subsystem is as follows:
{noformat}
<subsystem xmlns="urn:jboss:domain:jgroups:7.0">
<channels default="ee">
<channel name="ee" stack="tcp"
cluster="ejb"/>
</channels>
<stacks default="tcp">
<stack name="udp">
<transport type="UDP"
socket-binding="jgroups-udp">
<property name="ip_ttl">
32
</property>
</transport>
<protocol type="PING"/>
<protocol type="MERGE3"/>
<socket-protocol type="FD_SOCK"
socket-binding="jgroups-udp-fd"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="UFC"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
<stack name="tcp">
<transport type="TCP"
socket-binding="jgroups-tcp"/>
<socket-protocol type="MPING"
socket-binding="jgroups-mping"/>
<protocol type="MERGE3"/>
<socket-protocol type="FD_SOCK"
socket-binding="jgroups-tcp-fd"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
</stacks>
</subsystem>
{noformat}
It wasn't observed with the previous version:
{noformat}
<subsystem xmlns="urn:jboss:domain:jgroups:6.0">
<channels default="ee">
<channel name="ee" stack="tcp"
cluster="ejb"/>
</channels>
<stacks default="tcp">
<stack name="udp">
<transport type="UDP"
socket-binding="jgroups-udp">
<property name="ip_ttl">
32
</property>
</transport>
<protocol type="PING"/>
<protocol type="MERGE3"/>
<protocol type="FD_SOCK"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="UFC"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
<stack name="tcp">
<transport type="TCP"
socket-binding="jgroups-tcp"/>
<socket-protocol type="MPING"
socket-binding="jgroups-mping"/>
<protocol type="MERGE3"/>
<protocol type="FD_SOCK"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
</stacks>
</subsystem>
{noformat}
Right after one node (wildfly1) is shut down and restarted and the next node (wildfly2)
is shut down we see:
{noformat}
2019-06-20 08:00:46,880 INFO [org.wildfly.extension.undertow] (ServerService Thread Pool
-- 82) WFLYUT0021: Registered web context: '/clusterbench-granular' for server
'default-server'
2019-06-20 08:00:46,939 INFO [org.wildfly.extension.undertow] (ServerService Thread Pool
-- 88) WFLYUT0021: Registered web context: '/clusterbench' for server
'default-server'
2019-06-20 08:00:47,024 INFO [org.wildfly.extension.undertow] (ServerService Thread Pool
-- 85) WFLYUT0021: Registered web context: '/clusterbench-passivating' for server
'default-server'
2019-06-20 08:00:47,331 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0010:
Deployed "clusterbench-ee8.ear" (runtime-name :
"clusterbench-ee8.ear")
2019-06-20 08:00:47,335 INFO [org.jboss.as.server] (ServerService Thread Pool -- 47)
WFLYSRV0010: Deployed "postgresql-connector.jar" (runtime-name :
"postgresql-connector.jar")
2019-06-20 08:00:47,560 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212:
Resuming server
2019-06-20 08:00:47,562 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http
management interface listening on
http://10.0.146.117:9990/management
2019-06-20 08:00:47,563 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin
console listening on
http://10.0.146.117:9990
2019-06-20 08:00:47,563 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025:
WildFly Full 18.0.0.Beta1-SNAPSHOT (WildFly Core 9.0.1.Final) started in 26227ms - Started
1065 of 1293 services (538 services are lazy, passive or on-demand)
2019-06-20 08:02:19,985 ERROR [org.jgroups.protocols.TCP] (TQ-Bundler-8,ejb,wildfly1)
JGRP000029: wildfly1: failed sending message to wildfly2 (134 bytes):
java.net.ConnectException: Connection refused (Connection refused), headers: FORK:
ejb:ejb, UNICAST3: DATA, seqno=12148, TP: [cluster=ejb]
2019-06-20 08:02:20,002 INFO [org.infinispan.CLUSTER] (thread-216,ejb,wildfly1)
ISPN000094: Received new cluster view for channel ejb: [wildfly3|6] (3) [wildfly3,
wildfly4, wildfly1]
2019-06-20 08:02:20,006 ERROR [org.jgroups.protocols.TCP] (TQ-Bundler-8,ejb,wildfly1)
JGRP000029: wildfly1: failed sending message to wildfly2 (60 bytes):
java.net.ConnectException: Connection refused (Connection refused), headers: UNICAST3:
ACK, seqno=284, conn_id=4, ts=109, TP: [cluster=ejb]
2019-06-20 08:02:20,027 INFO [org.infinispan.CLUSTER] (thread-216,ejb,wildfly1)
ISPN100001: Node wildfly2 left the cluster
2019-06-20 08:02:20,031 INFO [org.infinispan.CLUSTER] (thread-216,ejb,wildfly1)
ISPN000094: Received new cluster view for channel ejb: [wildfly3|6] (3) [wildfly3,
wildfly4, wildfly1]
{noformat}
Whole logs
[
here|https://eap-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/EAP7/vie...].
The number of errors is about 3000 per node;
Overall fail-rate is still low: about 0.55%, but it has increased sensibly if compared to
the previous version where it was about 0.01%.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)