[JBoss JIRA] (WFLY-10773) JGRP000029: failed sending message: java.io.IOException: Socket Closed

Tuesday, 31 July 2018

     [
https://issues.jboss.org/browse/WFLY-10773?page=com.atlassian.jira.plugin...
]

tommaso borgato updated WFLY-10773:
-----------------------------------
    Description: 
The error was observed in scenario
{{*[eap-7x-db-failover-db-session-shutdown-repl-sync-mysql-5-7|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-mysql-5-7_JJB/22/]*}}:
a 4 nodes cluster with a mod_jk load balancer where fail-over is introduced by server
shut-down and re-start; 

The cluster nodes were configured to use TCP stack for communication:

{code:xml}
<subsystem xmlns="urn:jboss:domain:jgroups:6.0"
default-stack="tcp">
    <channels default="ee">
        <channel name="ee" stack="tcp"
cluster="ejb"/>
    </channels>
    <stacks>
        <stack name="udp">
            <transport type="UDP"
socket-binding="jgroups-udp"/>
            <protocol type="PING"/>
            <protocol type="MERGE3"/>
            <protocol type="FD_SOCK"/>
            <protocol type="FD_ALL"/>
            <protocol type="VERIFY_SUSPECT"/>
            <protocol type="pbcast.NAKACK2"/>
            <protocol type="UNICAST3"/>
            <protocol type="pbcast.STABLE"/>
            <protocol type="pbcast.GMS"/>
            <protocol type="UFC"/>
            <protocol type="MFC"/>
            <protocol type="FRAG3"/>
        </stack>
        <stack name="tcp">
            <transport type="TCP"
socket-binding="jgroups-tcp"/>
            <socket-protocol type="MPING"
socket-binding="jgroups-mping"/>
            <protocol type="MERGE3"/>
            <protocol type="FD_SOCK"/>
            <protocol type="FD_ALL"/>
            <protocol type="VERIFY_SUSPECT"/>
            <protocol type="pbcast.NAKACK2"/>
            <protocol type="UNICAST3"/>
            <protocol type="pbcast.STABLE"/>
            <protocol type="pbcast.GMS"/>
            <protocol type="MFC"/>
            <protocol type="FRAG3"/>
        </stack>
    </stacks>
</subsystem>
{code}

The 4 cluster nodes store session data into an ivalidation cache backed by a MYSQL
Database:

{code:xml}
<invalidation-cache name="offload">
	<locking isolation="REPEATABLE_READ"/>
	<transaction mode="BATCH"/>
	<jdbc-store data-source="testDS" fetch-state="false"
passivation="false" purge="false" shared="true"
dialect="MYSQL">
		<table prefix="s">
			<id-column name="id" type="VARCHAR(255)"/>
			<data-column name="datum" type="VARBINARY(10000)"/>
			<timestamp-column name="version" type="BIGINT"/>
		</table>
	</jdbc-store>
</invalidation-cache>
{code}

The error was observed on node
{{*[dev214|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-mysql-5-7_JJB/22/console-dev214/]*}};
here and attempt to isolate the events that may be relevant:

* node dev213 was shut-down and re-started but had not yet re-joined the cluster:
{noformat}
[JBossINF] [0m[0m02:19:07,082 INFO  [org.infinispan.CLUSTER] (thread-21,ejb,dev214)
ISPN100001: Node dev213 left the cluster
{noformat}

* current node dev214 is initating shut-down:
{noformat}
2018/07/31 02:21:43:593 EDT [INFO ][Thread-88] HOST
dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - [SHUTDOWN] JBossShutdown server host:
dev214:9990
{noformat}

* then we observe the error:
{noformat}
[JBossINF] [0m[31m02:21:44,588 ERROR [org.jgroups.protocols.TCP]
(TQ-Bundler-30,ejb,dev214) JGRP000029: dev214: failed sending message to dev215 (59
bytes): java.io.IOException: Socket Closed, headers: UNICAST3: ACK, seqno=137, conn_id=1,
ts=131, TP: [cluster_name=ejb]
{noformat}

* current node dev214 completes shut-down:
{noformat}
2018/07/31 02:21:45:459 EDT [DEBUG][RMI TCP Connection(27)-10.16.91.122] HOST
dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - Server is down.
{noformat}

  was:
The error was observed in scenario
{{*[eap-7x-db-failover-db-session-shutdown-repl-sync-mysql-5-7|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-mysql-5-7_JJB/22/]*}}:
a 4 nodes cluster with a mod_jk load balancer where fail-over is introduced by server
shut-down and re-start; 

The cluster nodes were configured to use the TCP stack for communication:

{code:xml}
<subsystem xmlns="urn:jboss:domain:jgroups:6.0"
default-stack="tcp">
    <channels default="ee">
        <channel name="ee" stack="tcp"
cluster="ejb"/>
    </channels>
    <stacks>
        <stack name="udp">
            <transport type="UDP"
socket-binding="jgroups-udp"/>
            <protocol type="PING"/>
            <protocol type="MERGE3"/>
            <protocol type="FD_SOCK"/>
            <protocol type="FD_ALL"/>
            <protocol type="VERIFY_SUSPECT"/>
            <protocol type="pbcast.NAKACK2"/>
            <protocol type="UNICAST3"/>
            <protocol type="pbcast.STABLE"/>
            <protocol type="pbcast.GMS"/>
            <protocol type="UFC"/>
            <protocol type="MFC"/>
            <protocol type="FRAG3"/>
        </stack>
        <stack name="tcp">
            <transport type="TCP"
socket-binding="jgroups-tcp"/>
            <socket-protocol type="MPING"
socket-binding="jgroups-mping"/>
            <protocol type="MERGE3"/>
            <protocol type="FD_SOCK"/>
            <protocol type="FD_ALL"/>
            <protocol type="VERIFY_SUSPECT"/>
            <protocol type="pbcast.NAKACK2"/>
            <protocol type="UNICAST3"/>
            <protocol type="pbcast.STABLE"/>
            <protocol type="pbcast.GMS"/>
            <protocol type="MFC"/>
            <protocol type="FRAG3"/>
        </stack>
    </stacks>
</subsystem>
{code}

The 4 cluster nodes store session data into an ivalidation cache backed by a MYSQL
Database:

{code:xml}
<invalidation-cache name="offload">
	<locking isolation="REPEATABLE_READ"/>
	<transaction mode="BATCH"/>
	<jdbc-store data-source="testDS" fetch-state="false"
passivation="false" purge="false" shared="true"
dialect="MYSQL">
		<table prefix="s">
			<id-column name="id" type="VARCHAR(255)"/>
			<data-column name="datum" type="VARBINARY(10000)"/>
			<timestamp-column name="version" type="BIGINT"/>
		</table>
	</jdbc-store>
</invalidation-cache>
{code}

The error was observed on node
{{*[dev214|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-mysql-5-7_JJB/22/console-dev214/]*}};
here and attempt to isolate the events that may be relevant:

* node dev213 was shut-down and re-started but had not yet re-joined the cluster:
{noformat}
[JBossINF] [0m[0m02:19:07,082 INFO  [org.infinispan.CLUSTER] (thread-21,ejb,dev214)
ISPN100001: Node dev213 left the cluster
{noformat}

* current node dev214 is initating shut-down:
{noformat}
2018/07/31 02:21:43:593 EDT [INFO ][Thread-88] HOST
dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - [SHUTDOWN] JBossShutdown server host:
dev214:9990
{noformat}

* then we observe the error:
{noformat}
[JBossINF] [0m[31m02:21:44,588 ERROR [org.jgroups.protocols.TCP]
(TQ-Bundler-30,ejb,dev214) JGRP000029: dev214: failed sending message to dev215 (59
bytes): java.io.IOException: Socket Closed, headers: UNICAST3: ACK, seqno=137, conn_id=1,
ts=131, TP: [cluster_name=ejb]
{noformat}

* current node dev214 completes shut-down:
{noformat}
2018/07/31 02:21:45:459 EDT [DEBUG][RMI TCP Connection(27)-10.16.91.122] HOST
dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - Server is down.
{noformat}

...
 JGRP000029: failed sending message: java.io.IOException: Socket
Closed
 ----------------------------------------------------------------------

                 Key: WFLY-10773
                 URL: https://issues.jboss.org/browse/WFLY-10773
             Project: WildFly
          Issue Type: Bug
          Components: Clustering
    Affects Versions: 14.0.0.CR1
            Reporter: tommaso borgato
            Assignee: Paul Ferraro

 The error was observed in scenario
{{*[eap-7x-db-failover-db-session-shutdown-repl-sync-mysql-5-7|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-mysql-5-7_JJB/22/]*}}:
a 4 nodes cluster with a mod_jk load balancer where fail-over is introduced by server
shut-down and re-start; 
 The cluster nodes were configured to use TCP stack for communication:
 {code:xml}
 <subsystem xmlns="urn:jboss:domain:jgroups:6.0"
default-stack="tcp">
     <channels default="ee">
         <channel name="ee" stack="tcp"
cluster="ejb"/>
     </channels>
     <stacks>
         <stack name="udp">
             <transport type="UDP"
socket-binding="jgroups-udp"/>
             <protocol type="PING"/>
             <protocol type="MERGE3"/>
             <protocol type="FD_SOCK"/>
             <protocol type="FD_ALL"/>
             <protocol type="VERIFY_SUSPECT"/>
             <protocol type="pbcast.NAKACK2"/>
             <protocol type="UNICAST3"/>
             <protocol type="pbcast.STABLE"/>
             <protocol type="pbcast.GMS"/>
             <protocol type="UFC"/>
             <protocol type="MFC"/>
             <protocol type="FRAG3"/>
         </stack>
         <stack name="tcp">
             <transport type="TCP"
socket-binding="jgroups-tcp"/>
             <socket-protocol type="MPING"
socket-binding="jgroups-mping"/>
             <protocol type="MERGE3"/>
             <protocol type="FD_SOCK"/>
             <protocol type="FD_ALL"/>
             <protocol type="VERIFY_SUSPECT"/>
             <protocol type="pbcast.NAKACK2"/>
             <protocol type="UNICAST3"/>
             <protocol type="pbcast.STABLE"/>
             <protocol type="pbcast.GMS"/>
             <protocol type="MFC"/>
             <protocol type="FRAG3"/>
         </stack>
     </stacks>
 </subsystem>
 {code}
 The 4 cluster nodes store session data into an ivalidation cache backed by a MYSQL
Database:
 {code:xml}
 <invalidation-cache name="offload">
 	<locking isolation="REPEATABLE_READ"/>
 	<transaction mode="BATCH"/>
 	<jdbc-store data-source="testDS" fetch-state="false"
passivation="false" purge="false" shared="true"
dialect="MYSQL">
 		<table prefix="s">
 			<id-column name="id" type="VARCHAR(255)"/>
 			<data-column name="datum" type="VARBINARY(10000)"/>
 			<timestamp-column name="version" type="BIGINT"/>
 		</table>
 	</jdbc-store>
 </invalidation-cache>
 {code}
 The error was observed on node
{{*[dev214|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-mysql-5-7_JJB/22/console-dev214/]*}};
here and attempt to isolate the events that may be relevant:
 * node dev213 was shut-down and re-started but had not yet re-joined the cluster:
 {noformat}
 [JBossINF] [0m[0m02:19:07,082 INFO  [org.infinispan.CLUSTER] (thread-21,ejb,dev214)
ISPN100001: Node dev213 left the cluster
 {noformat}
 * current node dev214 is initating shut-down:
 {noformat}
 2018/07/31 02:21:43:593 EDT [INFO ][Thread-88] HOST
dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - [SHUTDOWN] JBossShutdown server host:
dev214:9990
 {noformat}
 * then we observe the error:
 {noformat}
 [JBossINF] [0m[31m02:21:44,588 ERROR [org.jgroups.protocols.TCP]
(TQ-Bundler-30,ejb,dev214) JGRP000029: dev214: failed sending message to dev215 (59
bytes): java.io.IOException: Socket Closed, headers: UNICAST3: ACK, seqno=137, conn_id=1,
ts=131, TP: [cluster_name=ejb]
 {noformat}
 * current node dev214 completes shut-down:
 {noformat}
 2018/07/31 02:21:45:459 EDT [DEBUG][RMI TCP Connection(27)-10.16.91.122] HOST
dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - Server is down.
 {noformat} 

--
This message was sent by Atlassian JIRA
(v7.5.0#75005)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006