[
https://issues.jboss.org/browse/ISPN-3947?page=com.atlassian.jira.plugin....
]
Dan Berindei commented on ISPN-3947:
------------------------------------
I don't think configuring a number of retries per server is a good idea, it would mean
that actual timeout increases linearly with the number of servers - just like it does now.
I think a total number of retries and/or a total timeout would be much better.
Note that with the default configuration (i.e. `testOnBorrow == false`), only the first
attempt is done on the primary owner of the key. All the other attempts use a random
server. (Actually it's round-robin, but the state is shared, so for an individual
thread it would look random.) A setting named `retriesPerServer` would make the me think
that that's the number of retries on the server before trying another.
Also, I haven't tested this, but with `testOnBorrow == true` I think the pool will
catch the timeout in the ping operation and retry `maxActive` times internally, before the
HotRod client does its own retrying in `RetryOnFailureOperation`. We can probably ignore
it, since enabling `testOnBorrow` would be bad for performance anyway, but we should
probably document it.
HotRod client keep trying recover connections to a failed cluster
-----------------------------------------------------------------
Key: ISPN-3947
URL:
https://issues.jboss.org/browse/ISPN-3947
Project: Infinispan
Issue Type: Feature Request
Components: Remote Protocols
Affects Versions: 6.0.1.Final, 7.0.0.Alpha1
Reporter: Wolf-Dieter Fink
Assignee: Galder ZamarreƱo
Labels: hotrod, hotrod-java-client
If an infinispan-server cluster is not longer reachable for some reason, i.e. network
disconnect, the hot-rod client try to re-establish the lost connections.
The client library will retry this by a fixed calculation based on the max numbers of
connections from the pool or 10 multiplied with the number of available servers.
This can lead in a very long time until the application can continue and react as it will
wait for the read- or connect-timeout for each try.
To improve this behaviour there should be a configurable limit of retries per server
and/or a timeout in total.
This will give the application the chance to handle a remote-cache failure and reply to
the user instead of hanging for minutes (with the default settings)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:
http://www.atlassian.com/software/jira