[
https://issues.jboss.org/browse/WFLY-10773?page=com.atlassian.jira.plugin...
]
tommaso borgato updated WFLY-10773:
-----------------------------------
Description:
The error was observed in scenario
{{*[eap-7x-db-failover-db-session-shutdown-repl-sync-mysql-5-7|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-mysql-5-7_JJB/22/]*}}:
a 4 nodes cluster with a mod_jk load balancer where fail-over is introduced by server
shut-down and re-start;
The cluster nodes were configured to use TCP stack for communication:
{code:xml}
<subsystem xmlns="urn:jboss:domain:jgroups:6.0"
default-stack="tcp">
<channels default="ee">
<channel name="ee" stack="tcp"
cluster="ejb"/>
</channels>
<stacks>
<stack name="udp">
<transport type="UDP"
socket-binding="jgroups-udp"/>
<protocol type="PING"/>
<protocol type="MERGE3"/>
<protocol type="FD_SOCK"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="UFC"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
<stack name="tcp">
<transport type="TCP"
socket-binding="jgroups-tcp"/>
<socket-protocol type="MPING"
socket-binding="jgroups-mping"/>
<protocol type="MERGE3"/>
<protocol type="FD_SOCK"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
</stacks>
</subsystem>
{code}
The 4 cluster nodes store session data into an ivalidation cache backed by a MYSQL
Database:
{code:xml}
<invalidation-cache name="offload">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<jdbc-store data-source="testDS" fetch-state="false"
passivation="false" purge="false" shared="true"
dialect="MYSQL">
<table prefix="s">
<id-column name="id" type="VARCHAR(255)"/>
<data-column name="datum" type="VARBINARY(10000)"/>
<timestamp-column name="version" type="BIGINT"/>
</table>
</jdbc-store>
</invalidation-cache>
{code}
The error was observed on node
{{*[dev214|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-mysql-5-7_JJB/22/console-dev214/]*}};
here and attempt to isolate the events that may be relevant:
* node dev213 was shut-down and re-started but had not yet re-joined the cluster:
{noformat}
[JBossINF] [0m[0m02:19:07,082 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev214)
ISPN100001: Node dev213 left the cluster
{noformat}
* current node dev214 is initating shut-down:
{noformat}
2018/07/31 02:21:43:593 EDT [INFO ][Thread-88] HOST
dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - [SHUTDOWN] JBossShutdown server host:
dev214:9990
{noformat}
* then we observe the error:
{noformat}
[JBossINF] [0m[31m02:21:44,588 ERROR [org.jgroups.protocols.TCP]
(TQ-Bundler-30,ejb,dev214) JGRP000029: dev214: failed sending message to dev215 (59
bytes): java.io.IOException: Socket Closed, headers: UNICAST3: ACK, seqno=137, conn_id=1,
ts=131, TP: [cluster_name=ejb]
{noformat}
* current node dev214 completes shut-down:
{noformat}
2018/07/31 02:21:45:459 EDT [DEBUG][RMI TCP Connection(27)-10.16.91.122] HOST
dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - Server is down.
{noformat}
was:
The error was observed in scenario
{{*[eap-7x-db-failover-db-session-shutdown-repl-sync-mysql-5-7|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-mysql-5-7_JJB/22/]*}}:
a 4 nodes cluster with a mod_jk load balancer where fail-over is introduced by server
shut-down and re-start;
The cluster nodes were configured to use the TCP stack for communication:
{code:xml}
<subsystem xmlns="urn:jboss:domain:jgroups:6.0"
default-stack="tcp">
<channels default="ee">
<channel name="ee" stack="tcp"
cluster="ejb"/>
</channels>
<stacks>
<stack name="udp">
<transport type="UDP"
socket-binding="jgroups-udp"/>
<protocol type="PING"/>
<protocol type="MERGE3"/>
<protocol type="FD_SOCK"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="UFC"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
<stack name="tcp">
<transport type="TCP"
socket-binding="jgroups-tcp"/>
<socket-protocol type="MPING"
socket-binding="jgroups-mping"/>
<protocol type="MERGE3"/>
<protocol type="FD_SOCK"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
</stacks>
</subsystem>
{code}
The 4 cluster nodes store session data into an ivalidation cache backed by a MYSQL
Database:
{code:xml}
<invalidation-cache name="offload">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<jdbc-store data-source="testDS" fetch-state="false"
passivation="false" purge="false" shared="true"
dialect="MYSQL">
<table prefix="s">
<id-column name="id" type="VARCHAR(255)"/>
<data-column name="datum" type="VARBINARY(10000)"/>
<timestamp-column name="version" type="BIGINT"/>
</table>
</jdbc-store>
</invalidation-cache>
{code}
The error was observed on node
{{*[dev214|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-mysql-5-7_JJB/22/console-dev214/]*}};
here and attempt to isolate the events that may be relevant:
* node dev213 was shut-down and re-started but had not yet re-joined the cluster:
{noformat}
[JBossINF] [0m[0m02:19:07,082 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev214)
ISPN100001: Node dev213 left the cluster
{noformat}
* current node dev214 is initating shut-down:
{noformat}
2018/07/31 02:21:43:593 EDT [INFO ][Thread-88] HOST
dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - [SHUTDOWN] JBossShutdown server host:
dev214:9990
{noformat}
* then we observe the error:
{noformat}
[JBossINF] [0m[31m02:21:44,588 ERROR [org.jgroups.protocols.TCP]
(TQ-Bundler-30,ejb,dev214) JGRP000029: dev214: failed sending message to dev215 (59
bytes): java.io.IOException: Socket Closed, headers: UNICAST3: ACK, seqno=137, conn_id=1,
ts=131, TP: [cluster_name=ejb]
{noformat}
* current node dev214 completes shut-down:
{noformat}
2018/07/31 02:21:45:459 EDT [DEBUG][RMI TCP Connection(27)-10.16.91.122] HOST
dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - Server is down.
{noformat}
JGRP000029: failed sending message: java.io.IOException: Socket
Closed
----------------------------------------------------------------------
Key: WFLY-10773
URL:
https://issues.jboss.org/browse/WFLY-10773
Project: WildFly
Issue Type: Bug
Components: Clustering
Affects Versions: 14.0.0.CR1
Reporter: tommaso borgato
Assignee: Paul Ferraro
The error was observed in scenario
{{*[eap-7x-db-failover-db-session-shutdown-repl-sync-mysql-5-7|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-mysql-5-7_JJB/22/]*}}:
a 4 nodes cluster with a mod_jk load balancer where fail-over is introduced by server
shut-down and re-start;
The cluster nodes were configured to use TCP stack for communication:
{code:xml}
<subsystem xmlns="urn:jboss:domain:jgroups:6.0"
default-stack="tcp">
<channels default="ee">
<channel name="ee" stack="tcp"
cluster="ejb"/>
</channels>
<stacks>
<stack name="udp">
<transport type="UDP"
socket-binding="jgroups-udp"/>
<protocol type="PING"/>
<protocol type="MERGE3"/>
<protocol type="FD_SOCK"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="UFC"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
<stack name="tcp">
<transport type="TCP"
socket-binding="jgroups-tcp"/>
<socket-protocol type="MPING"
socket-binding="jgroups-mping"/>
<protocol type="MERGE3"/>
<protocol type="FD_SOCK"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
</stacks>
</subsystem>
{code}
The 4 cluster nodes store session data into an ivalidation cache backed by a MYSQL
Database:
{code:xml}
<invalidation-cache name="offload">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<jdbc-store data-source="testDS" fetch-state="false"
passivation="false" purge="false" shared="true"
dialect="MYSQL">
<table prefix="s">
<id-column name="id" type="VARCHAR(255)"/>
<data-column name="datum" type="VARBINARY(10000)"/>
<timestamp-column name="version" type="BIGINT"/>
</table>
</jdbc-store>
</invalidation-cache>
{code}
The error was observed on node
{{*[dev214|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-mysql-5-7_JJB/22/console-dev214/]*}};
here and attempt to isolate the events that may be relevant:
* node dev213 was shut-down and re-started but had not yet re-joined the cluster:
{noformat}
[JBossINF] [0m[0m02:19:07,082 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev214)
ISPN100001: Node dev213 left the cluster
{noformat}
* current node dev214 is initating shut-down:
{noformat}
2018/07/31 02:21:43:593 EDT [INFO ][Thread-88] HOST
dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - [SHUTDOWN] JBossShutdown server host:
dev214:9990
{noformat}
* then we observe the error:
{noformat}
[JBossINF] [0m[31m02:21:44,588 ERROR [org.jgroups.protocols.TCP]
(TQ-Bundler-30,ejb,dev214) JGRP000029: dev214: failed sending message to dev215 (59
bytes): java.io.IOException: Socket Closed, headers: UNICAST3: ACK, seqno=137, conn_id=1,
ts=131, TP: [cluster_name=ejb]
{noformat}
* current node dev214 completes shut-down:
{noformat}
2018/07/31 02:21:45:459 EDT [DEBUG][RMI TCP Connection(27)-10.16.91.122] HOST
dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - Server is down.
{noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)