[Red Hat JIRA] (ISPN-12598) Hot Rod java client retries too many times

Wednesday, 23 December 2020

     [
https://issues.redhat.com/browse/ISPN-12598?page=com.atlassian.jira.plugi...
]

Dan Berindei updated ISPN-12598:
--------------------------------
    Description: 
The Java Hot Rod client has a {{maxRetries}} configuration option which tells it how many
times to retry an operation after a failure (default: 10).

When the number of retries is exceeded, the client does not fail immediately: instead, it
tries to switch to another site, and tries {{maxRetries}} times on the new site as well.
The client doesn't keep track of the clusters it switched off of, so it seems possible
to go in an infinite loop, switching from one site to the next.

If the client cannot switch to another site (e.g. because it was configured with a single
site), it logs a debug message (`Cluster might have completely shut down, try resetting
transport layer and topology id`) and tries the current site again for {{maxRetries}}
times. So the actual number of retries with a single site is {{2 * maxRetries}} (or 1, if
{{maxRetries == 0}}).

Maybe automatic site switching is a good idea in some cases, but I'm not convinced it
should be the default behaviour. At the very least, site switching should be decided at
the remote cache manager level, when the client fails to open a new connection to any
server in the current site, and not based on the number of retries done for any particular
operation.

  was:
The Java Hot Rod client has a {{maxRetries}} configuration option which tells it how many
times to retry an operation after a failure (default: 10).

When the number of retries is exceeded, the client does not fail immediately: instead, it
tries to switch to another site, and tries {{maxRetries}} times on the new site as well.
The client doesn't keep track of the clusters it switched off of, so it seems possible
to go in an infinite loop, switching from one site to the next.

If the client cannot switch to another site (e.g. because it was configured with a single
site), it logs a debug message (`Cluster might have completely shut down, try resetting
transport layer and topology id`) and tries the current site again for {{maxRetries}}
times. So the actual number of retries with a single site is {{2 * maxRetries}}.

Maybe automatic site switching is a good idea in some cases, but I'm not convinced it
should be the default behaviour. At the very least, site switching should be decided at
the remote cache manager level, when the client fails to open a new connection to any
server in the current site, and not based on the number of retries done for any particular
operation.

...
 Hot Rod java client retries too many times
 ------------------------------------------

                 Key: ISPN-12598
                 URL: https://issues.redhat.com/browse/ISPN-12598
             Project: Infinispan
          Issue Type: Bug
          Components: Hot Rod
    Affects Versions: 12.0.0.CR1
            Reporter: Dan Berindei
            Assignee: Dan Berindei
            Priority: Major
             Fix For: 12.0.0.Final

 The Java Hot Rod client has a {{maxRetries}} configuration option which tells it how many
times to retry an operation after a failure (default: 10).
 When the number of retries is exceeded, the client does not fail immediately: instead, it
tries to switch to another site, and tries {{maxRetries}} times on the new site as well.
The client doesn't keep track of the clusters it switched off of, so it seems possible
to go in an infinite loop, switching from one site to the next.
 If the client cannot switch to another site (e.g. because it was configured with a single
site), it logs a debug message (`Cluster might have completely shut down, try resetting
transport layer and topology id`) and tries the current site again for {{maxRetries}}
times. So the actual number of retries with a single site is {{2 * maxRetries}} (or 1, if
{{maxRetries == 0}}).
 Maybe automatic site switching is a good idea in some cases, but I'm not convinced it
should be the default behaviour. At the very least, site switching should be decided at
the remote cache manager level, when the client fails to open a new connection to any
server in the current site, and not based on the number of retries done for any particular
operation. 

--
This message was sent by Atlassian Jira
(v7.13.8#713008)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009