[
https://issues.redhat.com/browse/ISPN-12598?page=com.atlassian.jira.plugi...
]
Dan Berindei updated ISPN-12598:
--------------------------------
Description:
The Java Hot Rod client has a {{maxRetries}} configuration option which tells it how many
times to retry an operation after a failure (default: 10).
When the number of retries is exceeded, the client does not fail immediately: instead, it
tries to switch to another site, and tries {{maxRetries}} times on the new site as well.
The client doesn't keep track of the clusters it switched off of, so it seems possible
to go in an infinite loop, switching from one site to the next.
If the client cannot switch to another site (e.g. because it was configured with a single
site), it logs a debug message (`Cluster might have completely shut down, try resetting
transport layer and topology id`) and tries the current site again for {{maxRetries}}
times. So the actual number of retries with a single site is {{2 * maxRetries}} (or 1, if
{{maxRetries == 0}}).
Maybe automatic site switching is a good idea in some cases, but I'm not convinced it
should be the default behaviour. At the very least, site switching should be decided at
the remote cache manager level, when the client fails to open a new connection to any
server in the current site, and not based on the number of retries done for any particular
operation.
was:
The Java Hot Rod client has a {{maxRetries}} configuration option which tells it how many
times to retry an operation after a failure (default: 10).
When the number of retries is exceeded, the client does not fail immediately: instead, it
tries to switch to another site, and tries {{maxRetries}} times on the new site as well.
The client doesn't keep track of the clusters it switched off of, so it seems possible
to go in an infinite loop, switching from one site to the next.
If the client cannot switch to another site (e.g. because it was configured with a single
site), it logs a debug message (`Cluster might have completely shut down, try resetting
transport layer and topology id`) and tries the current site again for {{maxRetries}}
times. So the actual number of retries with a single site is {{2 * maxRetries}}.
Maybe automatic site switching is a good idea in some cases, but I'm not convinced it
should be the default behaviour. At the very least, site switching should be decided at
the remote cache manager level, when the client fails to open a new connection to any
server in the current site, and not based on the number of retries done for any particular
operation.
Hot Rod java client retries too many times
------------------------------------------
Key: ISPN-12598
URL:
https://issues.redhat.com/browse/ISPN-12598
Project: Infinispan
Issue Type: Bug
Components: Hot Rod
Affects Versions: 12.0.0.CR1
Reporter: Dan Berindei
Assignee: Dan Berindei
Priority: Major
Fix For: 12.0.0.Final
The Java Hot Rod client has a {{maxRetries}} configuration option which tells it how many
times to retry an operation after a failure (default: 10).
When the number of retries is exceeded, the client does not fail immediately: instead, it
tries to switch to another site, and tries {{maxRetries}} times on the new site as well.
The client doesn't keep track of the clusters it switched off of, so it seems possible
to go in an infinite loop, switching from one site to the next.
If the client cannot switch to another site (e.g. because it was configured with a single
site), it logs a debug message (`Cluster might have completely shut down, try resetting
transport layer and topology id`) and tries the current site again for {{maxRetries}}
times. So the actual number of retries with a single site is {{2 * maxRetries}} (or 1, if
{{maxRetries == 0}}).
Maybe automatic site switching is a good idea in some cases, but I'm not convinced it
should be the default behaviour. At the very least, site switching should be decided at
the remote cache manager level, when the client fails to open a new connection to any
server in the current site, and not based on the number of retries done for any particular
operation.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)