[JBoss JIRA] (ISPN-3876) TcpTransportFactory stores failed SocketAddress in RequestBalancingStrategy
by Patrick Seeber (JIRA)
[ https://issues.jboss.org/browse/ISPN-3876?page=com.atlassian.jira.plugin.... ]
Patrick Seeber edited comment on ISPN-3876 at 1/28/14 9:15 AM:
---------------------------------------------------------------
Thank you for your answer!
Indeed it is not self healing and we believe we found the problem.
The scenario is:
1. Start 2 servers in replicated mode
2. Start a Client which connects to both servers correctly
3. Shutdown Server 1 during maintenance
4. Perform a client operation, the new topology is now commited to the client, the balancer now knows only Server 2
5. Bring up server 1 again and shutdown server 2 during maintenance
6. Bring up server 2 again
Now the client is broken since the last server he knows (Server 2) wont send any topology changes to the client if the client performs operations now. There is no chance for the client balancer to receive both servers again, which is problematic if the communication to the only known server fails now.
To fix this, we would be forced to perform a client operation between step 5 and 6, but we have 10 clients and we do not want to trigger a getCache 10 times each time we shutdown a server.
Is there any way to ping the servers from the client in a specific interval or to directly inform the client if the topology changes?
was (Author: patrick_seeber):
Thank you for your answer!
Indeed it is not self healing and we believe we found the problem.
The scenario is:
1. Start 2 servers in replicated mode
2. Start a Client which connects to both servers correctly
3. Shutdown Server 1 during maintenance
4. Perform a client operation, the new topology is now commited to the client, the balancer now knows only Server 2
5. Bring up server 1 again and shutdown server 2 during maintenance
6. Bring up server 2 again
Now the client is broken since the last server he knows (Server 2) wont send any topology changes to the client if the client performs operations now. There is no chance for the client balancer to receive both servers again, which is problematic if the communication to the only known server fails now.
To fix this, we would be forced to perform a client operation between step 5 and 6, but we have 10 clients and we do not want to trigger a getCache 10 times each time we shutdown a server.
Is there any way to ping the servers from the client in a specific intercal or to directly inform the client if the topology changes?
> TcpTransportFactory stores failed SocketAddress in RequestBalancingStrategy
> ---------------------------------------------------------------------------
>
> Key: ISPN-3876
> URL: https://issues.jboss.org/browse/ISPN-3876
> Project: Infinispan
> Issue Type: Bug
> Components: Remote Protocols
> Affects Versions: 5.2.1.Final, 5.3.0.Final, 6.0.0.Final
> Environment: Hotrod Client, Java
> Reporter: Patrick Seeber
> Assignee: Mircea Markus
>
> The "updateServers" Method in the TcpTransportFactory class iterates over all addedServers and adds them to the connection pool if no exceptions are thrown. Howerver, if an exception is thrown, the SocketAddress may not have been added to the conection pool but is added to the balancer afterwards. Therefore, the balancer may contain an invalid SocketAddress which is not contained in the connection pool.
> In our application with few distributed caches, we encounter situations where all servers (SocketAddresses) are corrupt and the application fails to load or store entries in/from the cache.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 10 months
[JBoss JIRA] (ISPN-3838) L1 entry added by ST when already invalidated
by William Burns (JIRA)
[ https://issues.jboss.org/browse/ISPN-3838?page=com.atlassian.jira.plugin.... ]
William Burns commented on ISPN-3838:
-------------------------------------
The original issue could be fixed with re purposing updatedKeys to be non null after the new CH is installed to detect writes that occur before the requestors is updated during L1OnRehash operation.
This however still has a gap that if a concurrent write has past the L1Interceptor (thus missing requestors check) and hasn't yet been committed (hasn't added to updatedkeys). In this case the owner would have registered a requestor that is now pointing to old value instead of invalidating him.
> L1 entry added by ST when already invalidated
> ---------------------------------------------
>
> Key: ISPN-3838
> URL: https://issues.jboss.org/browse/ISPN-3838
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 6.0.0.Final
> Reporter: Radim Vansa
> Assignee: William Burns
> Priority: Critical
> Labels: 620
>
> Non-transactional cache with L1 enabled. Node A is losing ownership of an entry, the entry is not removed during ST but is going to L1.
> 1. ST builds the invalidation command, EntryWrapping interceptor starts committing all the entries
> 2. Write on primary owner (B) occurs
> 3. A gets the InvalidateL1Command, removes the ImmortalCacheEntry from data container (as it does not own the entry anymore)
> 4. The ST invalidation command commits the MortalCacheEntry with old value, storing it into the data container.
> Result: Outdated value is in L1 cache.
> As the entry is not locked during the ST, it can be committed as MortalCacheEntry only if it was not changed (removed and possibly then cached again with different value).
> (I understand that this wouldn't be easy to implement as the check is not to be executed in perform, but during the actual commit - and atomically in the container.)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 10 months
[JBoss JIRA] (ISPN-1523) Remote nodes send duplicate invalidation messages
by William Burns (JIRA)
[ https://issues.jboss.org/browse/ISPN-1523?page=com.atlassian.jira.plugin.... ]
William Burns resolved ISPN-1523.
---------------------------------
Resolution: Rejected
L1 currently needs invalidations from owners to guarantee consistency or else L1 could possibly get an outdated value if updates are interleaved between owners.
> Remote nodes send duplicate invalidation messages
> -------------------------------------------------
>
> Key: ISPN-1523
> URL: https://issues.jboss.org/browse/ISPN-1523
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core
> Affects Versions: 5.1.0.BETA4
> Reporter: Dan Berindei
> Assignee: William Burns
> Priority: Minor
>
> I though only the originator should send invalidation messages, but I'm seeing these messages in the log:
> {noformat}
> 2011-11-11 11:10:27,608 TRACE (OOB-2,Infinispan-Cluster,NodeD-8993) [org.infinispan.interceptors.DistributionInterceptor] Put occuring on node, requesting cache invalidation for keys [k1]. Origin of command is remote
> 2011-11-11 11:10:27,608 TRACE (OOB-3,Infinispan-Cluster,NodeA-31187) [org.infinispan.interceptors.DistributionInterceptor] Put occuring on node, requesting cache invalidation for keys [k1]. Origin of command is remote
> 2011-11-11 11:10:27,608 TRACE (OOB-2,Infinispan-Cluster,NodeD-8993) [org.infinispan.distribution.L1ManagerImpl] Invalidating L1 caches for keys [k1]
> 2011-11-11 11:10:27,608 TRACE (OOB-3,Infinispan-Cluster,NodeA-31187) [org.infinispan.distribution.L1ManagerImpl] Invalidating L1 caches for keys [k1]
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 10 months
[JBoss JIRA] (ISPN-3944) DistSyncL1RepeatableReadFuncTest.testNoEntryInL1GetWithConcurrentReplace random failures
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-3944?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-3944:
-------------------------------
Attachment: dsl1rrft.log.gz
> DistSyncL1RepeatableReadFuncTest.testNoEntryInL1GetWithConcurrentReplace random failures
> ----------------------------------------------------------------------------------------
>
> Key: ISPN-3944
> URL: https://issues.jboss.org/browse/ISPN-3944
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Test Suite - Core
> Affects Versions: 6.0.1.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Fix For: 7.0.0.Final
>
> Attachments: dsl1rrft.log.gz
>
>
> Random failure in DistSyncL1RepeatableReadFuncTest>DistSyncL1FuncTest.testNoEntryInL1GetWithConcurrentReplace:
> {noformat}
> 00:23:34,658 ERROR (testng-DistSyncL1RepeatableReadFuncTest:) [UnitTestTestNGListener] Test testNoEntryInL1GetWithConcurrentReplace(org.infinispan.distribution.DistSyncL1RepeatableReadFuncTest) failed.
> java.lang.AssertionError: Entry for key [key-to-the-cache] should be in L1 on cache at [NodeA-57647]!
> at org.infinispan.distribution.DistributionTestHelper.assertIsInL1(DistributionTestHelper.java:31)
> at org.infinispan.distribution.BaseDistFunctionalTest.assertIsInL1(BaseDistFunctionalTest.java:183)
> at org.infinispan.distribution.DistSyncL1FuncTest.testNoEntryInL1GetWithConcurrentReplace(DistSyncL1FuncTest.java:193)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 11 months
[JBoss JIRA] (ISPN-3944) DistSyncL1RepeatableReadFuncTest.testNoEntryInL1GetWithConcurrentReplace random failures
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-3944?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-3944:
-------------------------------
Labels: testsuite_stability (was: )
> DistSyncL1RepeatableReadFuncTest.testNoEntryInL1GetWithConcurrentReplace random failures
> ----------------------------------------------------------------------------------------
>
> Key: ISPN-3944
> URL: https://issues.jboss.org/browse/ISPN-3944
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Test Suite - Core
> Affects Versions: 6.0.1.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Labels: testsuite_stability
> Fix For: 7.0.0.Final
>
> Attachments: dsl1rrft.log.gz
>
>
> Random failure in DistSyncL1RepeatableReadFuncTest>DistSyncL1FuncTest.testNoEntryInL1GetWithConcurrentReplace:
> {noformat}
> 00:23:34,658 ERROR (testng-DistSyncL1RepeatableReadFuncTest:) [UnitTestTestNGListener] Test testNoEntryInL1GetWithConcurrentReplace(org.infinispan.distribution.DistSyncL1RepeatableReadFuncTest) failed.
> java.lang.AssertionError: Entry for key [key-to-the-cache] should be in L1 on cache at [NodeA-57647]!
> at org.infinispan.distribution.DistributionTestHelper.assertIsInL1(DistributionTestHelper.java:31)
> at org.infinispan.distribution.BaseDistFunctionalTest.assertIsInL1(BaseDistFunctionalTest.java:183)
> at org.infinispan.distribution.DistSyncL1FuncTest.testNoEntryInL1GetWithConcurrentReplace(DistSyncL1FuncTest.java:193)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 11 months
[JBoss JIRA] (ISPN-3944) DistSyncL1RepeatableReadFuncTest.testNoEntryInL1GetWithConcurrentReplace random failures
by Dan Berindei (JIRA)
Dan Berindei created ISPN-3944:
----------------------------------
Summary: DistSyncL1RepeatableReadFuncTest.testNoEntryInL1GetWithConcurrentReplace random failures
Key: ISPN-3944
URL: https://issues.jboss.org/browse/ISPN-3944
Project: Infinispan
Issue Type: Bug
Components: Core, Test Suite - Core
Affects Versions: 6.0.1.Final
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 7.0.0.Final
Attachments: dsl1rrft.log.gz
Random failure in DistSyncL1RepeatableReadFuncTest>DistSyncL1FuncTest.testNoEntryInL1GetWithConcurrentReplace:
{noformat}
00:23:34,658 ERROR (testng-DistSyncL1RepeatableReadFuncTest:) [UnitTestTestNGListener] Test testNoEntryInL1GetWithConcurrentReplace(org.infinispan.distribution.DistSyncL1RepeatableReadFuncTest) failed.
java.lang.AssertionError: Entry for key [key-to-the-cache] should be in L1 on cache at [NodeA-57647]!
at org.infinispan.distribution.DistributionTestHelper.assertIsInL1(DistributionTestHelper.java:31)
at org.infinispan.distribution.BaseDistFunctionalTest.assertIsInL1(BaseDistFunctionalTest.java:183)
at org.infinispan.distribution.DistSyncL1FuncTest.testNoEntryInL1GetWithConcurrentReplace(DistSyncL1FuncTest.java:193)
{noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 11 months