[infinispan-issues] [JBoss JIRA] (ISPN-7489) org.jgroups.protocols.TCP emits errors when node leaves the cluster
Sebastian Łaskawiec (JIRA)
issues at jboss.org
Mon Feb 27 06:57:01 EST 2017
[ https://issues.jboss.org/browse/ISPN-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13369556#comment-13369556 ]
Sebastian Łaskawiec commented on ISPN-7489:
-------------------------------------------
After another round of discussion with [~belaban] and [~dan.berindei] we figured out what happened:
{quote}
so here’s what happens: we have A|5=A,B,C and A leaves. A then installs view B|6=BC, and B and C send the VIEW-ACK to A
if A leaves before getting all VIEW-ACKs, you’ll see that error
the time A waits for all acks is defined by GMS.view_ack_collection_timeout
UNICAST3.conn_close_timeout should be lowered, the defualt in 4.final is 240s == 4 minutes
that 4 minutes plus the preceding max_retransmit_time of 1 min -> 5 minutes
{quote}
For the sake of the demo I've been using:
{code}
<protocol type="UNICAST3">
<property name="conn_close_timeout">5000</property>
</protocol>
{code}
> org.jgroups.protocols.TCP emits errors when node leaves the cluster
> -------------------------------------------------------------------
>
> Key: ISPN-7489
> URL: https://issues.jboss.org/browse/ISPN-7489
> Project: Infinispan
> Issue Type: Bug
> Components: Cloud Integrations, Core
> Affects Versions: 9.0.0.CR1
> Environment: * OpenShift {{v1.5.0-alpha.2+e4b43ee}}
> * Custom Infinispan Server build (based on [these instructions|https://github.com/slaskawi/infinispan-1/tree/custom_image]). SHA1 {{2b0731b21649a88a75ed71d21b9cc06ba365e947}}
> Reporter: Sebastian Łaskawiec
>
> When I was performing [Spring Session and Kubernetes Rolling Update demo|https://bluejeans.com/s/pYKUg/] I encountered a couple of problems.
> One of the is this:
> {noformat}
> [transactions-repository-1-04x09] 18:09:12,193 ERROR [org.jgroups.protocols.TCP] (jgroups-30,transactions-repository-1-04x09) JGRP000029: transactions-repository-1-04x09: failed sending message to transactions-repository-1-4z05w (71 bytes): java.net.SocketTimeoutException: connect timed out, headers: GMS: GmsHeader[VIEW_ACK], UNICAST3: DATA, seqno=5262, TP: [cluster_name=cluster]
> [transactions-repository-1-1f8dx] 18:09:12,310 ERROR [org.jgroups.protocols.TCP] (jgroups-16,transactions-repository-1-1f8dx) JGRP000029: transactions-repository-1-1f8dx: failed sending message to transactions-repository-1-4z05w (71 bytes): java.net.SocketTimeoutException: connect timed out, headers: GMS: GmsHeader[VIEW_ACK], UNICAST3: DATA, seqno=6259, TP: [cluster_name=cluster]
> [transactions-repository-1-04x09] 18:09:12,997 ERROR [org.jgroups.protocols.TCP] (jgroups-22,transactions-repository-1-04x09) JGRP000029: transactions-repository-1-04x09: failed sending message to transactions-repository-1-4z05w (71 bytes): java.net.SocketTimeoutException: connect timed out, headers: GMS: GmsHeader[VIEW_ACK], UNICAST3: DATA, seqno=5262, TP: [cluster_name=cluster]
> [transactions-repository-1-1f8dx] 18:09:13,113 ERROR [org.jgroups.protocols.TCP] (jgroups-16,transactions-repository-1-1f8dx) JGRP000029: transactions-repository-1-1f8dx: failed sending message to transactions-repository-1-4z05w (71 bytes): java.net.SocketTimeoutException: connect timed out, headers: GMS: GmsHeader[VIEW_ACK], UNICAST3: DATA, seqno=6259, TP: [cluster_name=cluster]
> {noformat}
> Full logs from Rolling Update process might be found here: https://gist.github.com/slaskawi/530241bb695f1f490bcb25eabaf9d676
> Steps to reproduce:
> * Start local OpenShift Cluster
> * invoke `./init_infrastructure.sh` from https://github.com/slaskawi/presentations/tree/ISPN-7487-reproducer
> * invoke `cd transaction-creator && mvn fabric8:run`
> * Do the rolling update: `oc deploy transactions-repository --latest -n myproject`
> * Observe logs `kubetail -l environment=infrastructure`
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
More information about the infinispan-issues
mailing list