[Red Hat JIRA] (ISPN-12597) Disable JGroups error JGRP000006 stacktrace when configuring xsite on OCP and AWS LB
by Dmitry Volodin (Jira)
[ https://issues.redhat.com/browse/ISPN-12597?page=com.atlassian.jira.plugi... ]
Dmitry Volodin updated ISPN-12597:
----------------------------------
Summary: Disable JGroups error JGRP000006 stacktrace when configuring xsite on OCP and AWS LB (was: Disable JGroups error JGRP000006 stacktarce when configuring xsite on OCP and AWS LB)
> Disable JGroups error JGRP000006 stacktrace when configuring xsite on OCP and AWS LB
> ------------------------------------------------------------------------------------
>
> Key: ISPN-12597
> URL: https://issues.redhat.com/browse/ISPN-12597
> Project: Infinispan
> Issue Type: Enhancement
> Components: Cross-Site Replication, OpenShift, Operator
> Affects Versions: 11.0.7.Final
> Reporter: Dmitry Volodin
> Assignee: Pedro Ruivo
> Priority: Minor
>
> When configuring Cross-Site on OCP with operator and LoadBalancer on AWS (probably, will occurs on Azure and others cloud platforms) there are a large number of stack traces places inside Pod logs
>
> {code:java}
> 07:51:13,524 WARN (TcpServer.Acceptor[7900]-2,relay-global,_example-infinispan-1-11053:site1) [org.jgroups.protocols.TCP] JGRP000006: failed accepting connection from peer Socket[addr=/10.131.0.1,port=11781,localport=7900]: java.io.EOFException java.io.EOFException at java.base/java.io.DataInputStream.readFully(DataInputStream.java:202) at org.jgroups.blocks.cs.TcpConnection.readPeerAddress(TcpConnection.java:247) at org.jgroups.blocks.cs.TcpConnection.<init>(TcpConnection.java:53) at org.jgroups.blocks.cs.TcpServer$Acceptor.handleAccept(TcpServer.java:126) at org.jgroups.blocks.cs.TcpServer$Acceptor.run(TcpServer.java:111) at java.base/java.lang.Thread.run(Thread.java:834){code}
> The reason of this logs that's AWS/OCP pings LoadBalancer with probe each time with interval. No the same errors while internal (Service) cross-site configuration was made.
> It would be nice to have an option to disable this kind of traces probably for LB configuration only, as customers confusing this traces as a problem
>
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 3 months
[Red Hat JIRA] (ISPN-12598) Hot Rod java client retries too many times
by Dan Berindei (Jira)
[ https://issues.redhat.com/browse/ISPN-12598?page=com.atlassian.jira.plugi... ]
Dan Berindei updated ISPN-12598:
--------------------------------
Description:
The Java Hot Rod client has a {{maxRetries}} configuration option which tells it how many times to retry an operation after a failure (default: 10).
When the number of retries is exceeded, the client does not fail immediately: instead, it tries to switch to another site, and tries {{maxRetries}} times on the new site as well. The client doesn't keep track of the clusters it switched off of, so it seems possible to go in an infinite loop, switching from one site to the next.
If the client cannot switch to another site (e.g. because it was configured with a single site), it logs a debug message (`Cluster might have completely shut down, try resetting transport layer and topology id`) and tries the current site again for {{maxRetries}} times. So the actual number of retries with a single site is {{2 * maxRetries}} (or 1, if {{maxRetries == 0}}).
Maybe automatic site switching is a good idea in some cases, but I'm not convinced it should be the default behaviour. At the very least, site switching should be decided at the remote cache manager level, when the client fails to open a new connection to any server in the current site, and not based on the number of retries done for any particular operation.
was:
The Java Hot Rod client has a {{maxRetries}} configuration option which tells it how many times to retry an operation after a failure (default: 10).
When the number of retries is exceeded, the client does not fail immediately: instead, it tries to switch to another site, and tries {{maxRetries}} times on the new site as well. The client doesn't keep track of the clusters it switched off of, so it seems possible to go in an infinite loop, switching from one site to the next.
If the client cannot switch to another site (e.g. because it was configured with a single site), it logs a debug message (`Cluster might have completely shut down, try resetting transport layer and topology id`) and tries the current site again for {{maxRetries}} times. So the actual number of retries with a single site is {{2 * maxRetries}}.
Maybe automatic site switching is a good idea in some cases, but I'm not convinced it should be the default behaviour. At the very least, site switching should be decided at the remote cache manager level, when the client fails to open a new connection to any server in the current site, and not based on the number of retries done for any particular operation.
> Hot Rod java client retries too many times
> ------------------------------------------
>
> Key: ISPN-12598
> URL: https://issues.redhat.com/browse/ISPN-12598
> Project: Infinispan
> Issue Type: Bug
> Components: Hot Rod
> Affects Versions: 12.0.0.CR1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Fix For: 12.0.0.Final
>
>
> The Java Hot Rod client has a {{maxRetries}} configuration option which tells it how many times to retry an operation after a failure (default: 10).
> When the number of retries is exceeded, the client does not fail immediately: instead, it tries to switch to another site, and tries {{maxRetries}} times on the new site as well. The client doesn't keep track of the clusters it switched off of, so it seems possible to go in an infinite loop, switching from one site to the next.
> If the client cannot switch to another site (e.g. because it was configured with a single site), it logs a debug message (`Cluster might have completely shut down, try resetting transport layer and topology id`) and tries the current site again for {{maxRetries}} times. So the actual number of retries with a single site is {{2 * maxRetries}} (or 1, if {{maxRetries == 0}}).
> Maybe automatic site switching is a good idea in some cases, but I'm not convinced it should be the default behaviour. At the very least, site switching should be decided at the remote cache manager level, when the client fails to open a new connection to any server in the current site, and not based on the number of retries done for any particular operation.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 3 months
[Red Hat JIRA] (ISPN-12598) Hot Rod java client retries too many times
by Dan Berindei (Jira)
[ https://issues.redhat.com/browse/ISPN-12598?page=com.atlassian.jira.plugi... ]
Dan Berindei commented on ISPN-12598:
-------------------------------------
Even with {{maxRetries = 0}}, the client retries operations at least once, and the extra retries hide errors in the handling of server responses like in ISPN-12596 and ISPN-11630.
> Hot Rod java client retries too many times
> ------------------------------------------
>
> Key: ISPN-12598
> URL: https://issues.redhat.com/browse/ISPN-12598
> Project: Infinispan
> Issue Type: Bug
> Components: Hot Rod
> Affects Versions: 12.0.0.CR1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Fix For: 12.0.0.Final
>
>
> The Java Hot Rod client has a {{maxRetries}} configuration option which tells it how many times to retry an operation after a failure (default: 10).
> When the number of retries is exceeded, the client does not fail immediately: instead, it tries to switch to another site, and tries {{maxRetries}} times on the new site as well. The client doesn't keep track of the clusters it switched off of, so it seems possible to go in an infinite loop, switching from one site to the next.
> If the client cannot switch to another site (e.g. because it was configured with a single site), it logs a debug message (`Cluster might have completely shut down, try resetting transport layer and topology id`) and tries the current site again for {{maxRetries}} times. So the actual number of retries with a single site is {{2 * maxRetries}}.
> Maybe automatic site switching is a good idea in some cases, but I'm not convinced it should be the default behaviour. At the very least, site switching should be decided at the remote cache manager level, when the client fails to open a new connection to any server in the current site, and not based on the number of retries done for any particular operation.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 3 months
[Red Hat JIRA] (ISPN-12598) Hot Rod java client retries too many times
by Dan Berindei (Jira)
[ https://issues.redhat.com/browse/ISPN-12598?page=com.atlassian.jira.plugi... ]
Dan Berindei updated ISPN-12598:
--------------------------------
Status: Open (was: New)
> Hot Rod java client retries too many times
> ------------------------------------------
>
> Key: ISPN-12598
> URL: https://issues.redhat.com/browse/ISPN-12598
> Project: Infinispan
> Issue Type: Bug
> Components: Hot Rod
> Affects Versions: 12.0.0.CR1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Fix For: 12.0.0.Final
>
>
> The Java Hot Rod client has a {{maxRetries}} configuration option which tells it how many times to retry an operation after a failure (default: 10).
> When the number of retries is exceeded, the client does not fail immediately: instead, it tries to switch to another site, and tries {{maxRetries}} times on the new site as well. The client doesn't keep track of the clusters it switched off of, so it seems possible to go in an infinite loop, switching from one site to the next.
> If the client cannot switch to another site (e.g. because it was configured with a single site), it logs a debug message (`Cluster might have completely shut down, try resetting transport layer and topology id`) and tries the current site again for {{maxRetries}} times. So the actual number of retries with a single site is {{2 * maxRetries}}.
> Maybe automatic site switching is a good idea in some cases, but I'm not convinced it should be the default behaviour. At the very least, site switching should be decided at the remote cache manager level, when the client fails to open a new connection to any server in the current site, and not based on the number of retries done for any particular operation.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 3 months
[Red Hat JIRA] (ISPN-12598) Hot Rod java client retries too many times
by Dan Berindei (Jira)
Dan Berindei created ISPN-12598:
-----------------------------------
Summary: Hot Rod java client retries too many times
Key: ISPN-12598
URL: https://issues.redhat.com/browse/ISPN-12598
Project: Infinispan
Issue Type: Bug
Components: Hot Rod
Affects Versions: 12.0.0.CR1
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 12.0.0.Final
The Java Hot Rod client has a {{maxRetries}} configuration option which tells it how many times to retry an operation after a failure (default: 10).
When the number of retries is exceeded, the client does not fail immediately: instead, it tries to switch to another site, and tries {{maxRetries}} times on the new site as well. The client doesn't keep track of the clusters it switched off of, so it seems possible to go in an infinite loop, switching from one site to the next.
If the client cannot switch to another site (e.g. because it was configured with a single site), it logs a debug message (`Cluster might have completely shut down, try resetting transport layer and topology id`) and tries the current site again for {{maxRetries}} times. So the actual number of retries with a single site is {{2 * maxRetries}}.
Maybe automatic site switching is a good idea in some cases, but I'm not convinced it should be the default behaviour. At the very least, site switching should be decided at the remote cache manager level, when the client fails to open a new connection to any server in the current site, and not based on the number of retries done for any particular operation.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 3 months
[Red Hat JIRA] (ISPN-12597) Disable JGroups error JGRP000006 stacktarce when configuring xsite on OCP and AWS LB
by Dmitry Volodin (Jira)
Dmitry Volodin created ISPN-12597:
-------------------------------------
Summary: Disable JGroups error JGRP000006 stacktarce when configuring xsite on OCP and AWS LB
Key: ISPN-12597
URL: https://issues.redhat.com/browse/ISPN-12597
Project: Infinispan
Issue Type: Enhancement
Components: Cross-Site Replication, OpenShift, Operator
Affects Versions: 11.0.7.Final
Reporter: Dmitry Volodin
Assignee: Pedro Ruivo
When configuring Cross-Site on OCP with operator and LoadBalancer on AWS (probably, will occurs on Azure and others cloud platforms) there are a large number of stack traces places inside Pod logs
{code:java}
07:51:13,524 WARN (TcpServer.Acceptor[7900]-2,relay-global,_example-infinispan-1-11053:site1) [org.jgroups.protocols.TCP] JGRP000006: failed accepting connection from peer Socket[addr=/10.131.0.1,port=11781,localport=7900]: java.io.EOFException java.io.EOFException at java.base/java.io.DataInputStream.readFully(DataInputStream.java:202) at org.jgroups.blocks.cs.TcpConnection.readPeerAddress(TcpConnection.java:247) at org.jgroups.blocks.cs.TcpConnection.<init>(TcpConnection.java:53) at org.jgroups.blocks.cs.TcpServer$Acceptor.handleAccept(TcpServer.java:126) at org.jgroups.blocks.cs.TcpServer$Acceptor.run(TcpServer.java:111) at java.base/java.lang.Thread.run(Thread.java:834){code}
The reason of this logs that's AWS/OCP pings LoadBalancer with probe each time with interval. No the same errors while internal (Service) cross-site configuration was made.
It would be nice to have an option to disable this kind of traces probably for LB configuration only, as customers confusing this traces as a problem
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 3 months