[JBoss JIRA] (ISPN-3947) HotRod client keep trying recover connections to a failed cluster
by Wolf-Dieter Fink (JIRA)
Wolf-Dieter Fink created ISPN-3947:
--------------------------------------
Summary: HotRod client keep trying recover connections to a failed cluster
Key: ISPN-3947
URL: https://issues.jboss.org/browse/ISPN-3947
Project: Infinispan
Issue Type: Feature Request
Components: Remote Protocols
Affects Versions: 6.0.1.Final, 7.0.0.Alpha1
Reporter: Wolf-Dieter Fink
Assignee: Galder Zamarreño
If an infinispan-server cluster is not longer reachable for some reason, i.e. network disconnect, the hot-rod client try to re-establish the lost connections.
The client library will retry this by a fixed calculation based on the max numbers of connections from the pool or 10 multiplied with the number of available servers.
This can lead in a very long time until the application can continue and react as it will wait for the read- or connect-timeout for each try.
To improve this behaviour there should be a configurable limit of retries per server and/or a timeout in total.
This will give the application the chance to handle a remote-cache failure and reply to the user instead of hanging for minutes (with the default settings)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
10 years, 10 months
[JBoss JIRA] (ISPN-3760) DistSyncL1FuncTest.testEntryInL1ReplaceWithConcurrentPut random failures
by William Burns (JIRA)
[ https://issues.jboss.org/browse/ISPN-3760?page=com.atlassian.jira.plugin.... ]
Work on ISPN-3760 started by William Burns.
> DistSyncL1FuncTest.testEntryInL1ReplaceWithConcurrentPut random failures
> ------------------------------------------------------------------------
>
> Key: ISPN-3760
> URL: https://issues.jboss.org/browse/ISPN-3760
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 6.0.0.Final
> Reporter: Dan Berindei
> Assignee: William Burns
> Labels: testsuite_stability
> Fix For: 7.0.0.Final
>
>
> The test does a {{put(k, v)}} from an owner, then a {{get(k)}} from a non-owner, and assumes that {{k}} is now in the non-owner's L1 cache.
> But a previous test, {{testEntryInL1ReplaceWithConcurrentInvalidation}}, added the non-owner as a L1 requestor on the backup owner, which will try to invalidate the key on the non-owner. Because the L1 invalidation is asynchronous, it can reach the non-owner after the get was issued and invalidate the L1 entry:
> {noformat}
> 12:23:57,332 TRACE (testng-DistSyncL1FuncTest:dist) [InvocationContextInterceptor] Invoked with command PutKeyValueCommand{key=key-to-the-cache, value=first-put, flags=null, putIfAbsent=false, valueMatchingPolicy=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true} and InvocationContext [org.infinispan.context.SingleKeyNonTxInvocationContext@5caff826]
> 12:23:57,332 TRACE (testng-DistSyncL1FuncTest:dist) [JGroupsTransport] dests=[NodeD-17616, NodeC-12131], command=SingleRpcCommand{cacheName='dist', command=PutKeyValueCommand{key=key-to-the-cache, value=first-put, flags=null, putIfAbsent=false, valueMatchingPolicy=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}}, mode=SYNCHRONOUS, timeout=60000
> 12:23:57,334 TRACE (asyncTransportThread-0,NodeC:) [JGroupsTransport] dests=[NodeA-23135], command=SingleRpcCommand{cacheName='dist', command=InvalidateL1Command{num keys=1, forRehash=false, origin=NodeD-17616}}, mode=SYNCHRONOUS, timeout=60000
> 12:23:57,339 TRACE (remote-thread-0,NodeC:dist) [DefaultDataContainer] Store ImmortalCacheEntry{key=key-to-the-cache, value=first-put} in container
> 12:23:57,339 TRACE (testng-DistSyncL1FuncTest:dist) [DefaultDataContainer] Store ImmortalCacheEntry{key=key-to-the-cache, value=first-put} in container
> 12:23:57,339 TRACE (testng-DistSyncL1FuncTest:dist) [L1ManagerImpl] Invalidating keys [key-to-the-cache] on nodes [NodeA-23135]. Use multicast? false
> 12:23:57,340 TRACE (testng-DistSyncL1FuncTest:dist) [InvocationContextInterceptor] Invoked with command GetKeyValueCommand {key=key-to-the-cache, flags=null} and InvocationContext [org.infinispan.context.SingleKeyNonTxInvocationContext@359e1aed]
> 12:23:57,340 TRACE (asyncTransportThread-4,NodeD:) [JGroupsTransport] dests=[NodeA-23135], command=SingleRpcCommand{cacheName='dist', command=InvalidateL1Command{num keys=1, forRehash=false, origin=null}}, mode=SYNCHRONOUS, timeout=60000
> 12:23:57,340 TRACE (testng-DistSyncL1FuncTest:dist) [JGroupsTransport] dests=[NodeD-17616, NodeC-12131], command=ClusteredGetCommand{key=key-to-the-cache, flags=null}, mode=WAIT_FOR_VALID_RESPONSE, timeout=60000
> 12:23:57,340 TRACE (remote-thread-1,NodeA:dist) [L1NonTxInterceptor] Aborted possible L1 update due to concurrent invalidation for key key-to-the-cache
> 12:23:57,341 ERROR (testng-DistSyncL1FuncTest:) [UnitTestTestNGListener] Test testEntryInL1ReplaceWithConcurrentPut(org.infinispan.distribution.DistSyncL1FuncTest) failed.
> java.lang.AssertionError: Entry for key [key-to-the-cache] should be in L1 on cache at [NodeA-23135]!
> at org.infinispan.distribution.DistributionTestHelper.assertIsInL1(DistributionTestHelper.java:31)
> at org.infinispan.distribution.BaseDistFunctionalTest.assertIsInL1(BaseDistFunctionalTest.java:183)
> at org.infinispan.distribution.DistSyncL1FuncTest.testEntryInL1ReplaceWithConcurrentPut(DistSyncL1FuncTest.java:180)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
10 years, 10 months
[JBoss JIRA] (ISPN-3760) DistSyncL1FuncTest.testEntryInL1ReplaceWithConcurrentPut random failures
by William Burns (JIRA)
[ https://issues.jboss.org/browse/ISPN-3760?page=com.atlassian.jira.plugin.... ]
William Burns reassigned ISPN-3760:
-----------------------------------
Assignee: William Burns (was: Mircea Markus)
> DistSyncL1FuncTest.testEntryInL1ReplaceWithConcurrentPut random failures
> ------------------------------------------------------------------------
>
> Key: ISPN-3760
> URL: https://issues.jboss.org/browse/ISPN-3760
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 6.0.0.Final
> Reporter: Dan Berindei
> Assignee: William Burns
> Labels: testsuite_stability
> Fix For: 7.0.0.Final
>
>
> The test does a {{put(k, v)}} from an owner, then a {{get(k)}} from a non-owner, and assumes that {{k}} is now in the non-owner's L1 cache.
> But a previous test, {{testEntryInL1ReplaceWithConcurrentInvalidation}}, added the non-owner as a L1 requestor on the backup owner, which will try to invalidate the key on the non-owner. Because the L1 invalidation is asynchronous, it can reach the non-owner after the get was issued and invalidate the L1 entry:
> {noformat}
> 12:23:57,332 TRACE (testng-DistSyncL1FuncTest:dist) [InvocationContextInterceptor] Invoked with command PutKeyValueCommand{key=key-to-the-cache, value=first-put, flags=null, putIfAbsent=false, valueMatchingPolicy=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true} and InvocationContext [org.infinispan.context.SingleKeyNonTxInvocationContext@5caff826]
> 12:23:57,332 TRACE (testng-DistSyncL1FuncTest:dist) [JGroupsTransport] dests=[NodeD-17616, NodeC-12131], command=SingleRpcCommand{cacheName='dist', command=PutKeyValueCommand{key=key-to-the-cache, value=first-put, flags=null, putIfAbsent=false, valueMatchingPolicy=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}}, mode=SYNCHRONOUS, timeout=60000
> 12:23:57,334 TRACE (asyncTransportThread-0,NodeC:) [JGroupsTransport] dests=[NodeA-23135], command=SingleRpcCommand{cacheName='dist', command=InvalidateL1Command{num keys=1, forRehash=false, origin=NodeD-17616}}, mode=SYNCHRONOUS, timeout=60000
> 12:23:57,339 TRACE (remote-thread-0,NodeC:dist) [DefaultDataContainer] Store ImmortalCacheEntry{key=key-to-the-cache, value=first-put} in container
> 12:23:57,339 TRACE (testng-DistSyncL1FuncTest:dist) [DefaultDataContainer] Store ImmortalCacheEntry{key=key-to-the-cache, value=first-put} in container
> 12:23:57,339 TRACE (testng-DistSyncL1FuncTest:dist) [L1ManagerImpl] Invalidating keys [key-to-the-cache] on nodes [NodeA-23135]. Use multicast? false
> 12:23:57,340 TRACE (testng-DistSyncL1FuncTest:dist) [InvocationContextInterceptor] Invoked with command GetKeyValueCommand {key=key-to-the-cache, flags=null} and InvocationContext [org.infinispan.context.SingleKeyNonTxInvocationContext@359e1aed]
> 12:23:57,340 TRACE (asyncTransportThread-4,NodeD:) [JGroupsTransport] dests=[NodeA-23135], command=SingleRpcCommand{cacheName='dist', command=InvalidateL1Command{num keys=1, forRehash=false, origin=null}}, mode=SYNCHRONOUS, timeout=60000
> 12:23:57,340 TRACE (testng-DistSyncL1FuncTest:dist) [JGroupsTransport] dests=[NodeD-17616, NodeC-12131], command=ClusteredGetCommand{key=key-to-the-cache, flags=null}, mode=WAIT_FOR_VALID_RESPONSE, timeout=60000
> 12:23:57,340 TRACE (remote-thread-1,NodeA:dist) [L1NonTxInterceptor] Aborted possible L1 update due to concurrent invalidation for key key-to-the-cache
> 12:23:57,341 ERROR (testng-DistSyncL1FuncTest:) [UnitTestTestNGListener] Test testEntryInL1ReplaceWithConcurrentPut(org.infinispan.distribution.DistSyncL1FuncTest) failed.
> java.lang.AssertionError: Entry for key [key-to-the-cache] should be in L1 on cache at [NodeA-23135]!
> at org.infinispan.distribution.DistributionTestHelper.assertIsInL1(DistributionTestHelper.java:31)
> at org.infinispan.distribution.BaseDistFunctionalTest.assertIsInL1(BaseDistFunctionalTest.java:183)
> at org.infinispan.distribution.DistSyncL1FuncTest.testEntryInL1ReplaceWithConcurrentPut(DistSyncL1FuncTest.java:180)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
10 years, 10 months
[JBoss JIRA] (ISPN-3876) TcpTransportFactory stores failed SocketAddress in RequestBalancingStrategy
by Patrick Seeber (JIRA)
[ https://issues.jboss.org/browse/ISPN-3876?page=com.atlassian.jira.plugin.... ]
Patrick Seeber edited comment on ISPN-3876 at 1/28/14 9:15 AM:
---------------------------------------------------------------
Thank you for your answer!
Indeed it is not self healing and we believe we found the problem.
The scenario is:
1. Start 2 servers in replicated mode
2. Start a Client which connects to both servers correctly
3. Shutdown Server 1 during maintenance
4. Perform a client operation, the new topology is now commited to the client, the balancer now knows only Server 2
5. Bring up server 1 again and shutdown server 2 during maintenance
6. Bring up server 2 again
Now the client is broken since the last server he knows (Server 2) wont send any topology changes to the client if the client performs operations now. There is no chance for the client balancer to receive both servers again, which is problematic if the communication to the only known server fails now.
To fix this, we would be forced to perform a client operation between step 5 and 6, but we have 10 clients and we do not want to trigger a getCache 10 times each time we shutdown a server.
Is there any way to ping the servers from the client in a specific interval or to directly inform the client if the topology changes?
was (Author: patrick_seeber):
Thank you for your answer!
Indeed it is not self healing and we believe we found the problem.
The scenario is:
1. Start 2 servers in replicated mode
2. Start a Client which connects to both servers correctly
3. Shutdown Server 1 during maintenance
4. Perform a client operation, the new topology is now commited to the client, the balancer now knows only Server 2
5. Bring up server 1 again and shutdown server 2 during maintenance
6. Bring up server 2 again
Now the client is broken since the last server he knows (Server 2) wont send any topology changes to the client if the client performs operations now. There is no chance for the client balancer to receive both servers again, which is problematic if the communication to the only known server fails now.
To fix this, we would be forced to perform a client operation between step 5 and 6, but we have 10 clients and we do not want to trigger a getCache 10 times each time we shutdown a server.
Is there any way to ping the servers from the client in a specific intercal or to directly inform the client if the topology changes?
> TcpTransportFactory stores failed SocketAddress in RequestBalancingStrategy
> ---------------------------------------------------------------------------
>
> Key: ISPN-3876
> URL: https://issues.jboss.org/browse/ISPN-3876
> Project: Infinispan
> Issue Type: Bug
> Components: Remote Protocols
> Affects Versions: 5.2.1.Final, 5.3.0.Final, 6.0.0.Final
> Environment: Hotrod Client, Java
> Reporter: Patrick Seeber
> Assignee: Mircea Markus
>
> The "updateServers" Method in the TcpTransportFactory class iterates over all addedServers and adds them to the connection pool if no exceptions are thrown. Howerver, if an exception is thrown, the SocketAddress may not have been added to the conection pool but is added to the balancer afterwards. Therefore, the balancer may contain an invalid SocketAddress which is not contained in the connection pool.
> In our application with few distributed caches, we encounter situations where all servers (SocketAddresses) are corrupt and the application fails to load or store entries in/from the cache.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
10 years, 10 months