[infinispan-issues] [JBoss JIRA] (ISPN-7801) RehashWithL1Test.testPutWithRehashAndCacheClear random failures
Dan Berindei (JIRA)
issues at jboss.org
Mon May 8 01:53:00 EDT 2017
[ https://issues.jboss.org/browse/ISPN-7801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402881#comment-13402881 ]
Dan Berindei commented on ISPN-7801:
------------------------------------
I think the test is correct after all: the L1 entries on the old nodes should *not* be visible after the joiner became the only owner of all the keys. Writes on the joiner will not send any L1 invalidations, because its {{L1ManagerImpl}} doesn't have any requestors, so the other nodes could see stale values.
So we need to add another requirement for the rebalance/state transfer process: L1 entries should not be visible after the node they were requested from is no longer an owner in the write CH.
[~rvansa] I think the simplest way to do this would be to invalidate L1 entries at the beginning of the {{READ_NEW_WRITE_ALL}} phase. We could either split {{StateConsumerImpl.removeStaleData()}} into two parts, so that invalidation of regular entries still happens after the end of rebalance, or we could invalidate everything during {{READ_NEW_WRITE_ALL}}, but that would require additional logic to skip writing new values to the data container/stores.
> RehashWithL1Test.testPutWithRehashAndCacheClear random failures
> ---------------------------------------------------------------
>
> Key: ISPN-7801
> URL: https://issues.jboss.org/browse/ISPN-7801
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 9.0.0.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Labels: testsuite_stability
> Fix For: 9.1.0.Final
>
>
> The test kills the only owner of a key and checks that when a node starts owning an L1 entry, it doesn't send it to other nodes during state transfer. Then it adds a new node (owning the key) and checks that the key isn't transferred to the new node, and it's deleted from L1 on the old nodes. The problem is that it doesn't wait, it assumes all the nodes have already removed it by the time {{getCache()}} returns on the joiner.
> {noformat}
> 03:24:27,606 TRACE (jgroups-5,Test-NodeB-54331:[]) [L1WriteSynchronizer] Caching remotely retrieved entry for key k0 in L1
> 03:24:27,607 TRACE (jgroups-5,Test-NodeB-54331:[]) [DefaultDataContainer] Store MortalCacheEntry{key=k0, value=some data} in container
> 03:24:26,754 DEBUG (testng-Test:[]) [Test] Populating L1 on Test-NodeA-2588
> 03:24:27,514 DEBUG (testng-Test:[]) [Test] Populating L1 on Test-NodeB-54331
> 03:24:27,777 DEBUG (testng-Test:[]) [Test] Populating L1 on Test-NodeC-65326
> 03:24:27,777 DEBUG (testng-Test:[]) [Test] Killing node Test-NodeC-65326
> 03:24:27,781 TRACE (transport-thread-Test-NodeA-p51-t2:[Topology-___defaultcache]) [DefaultDataContainer] Removed MortalCacheEntry{key=k0, value=some data} from container
> *** The entry is not removed from NodeB at this point
> 03:24:27,936 DEBUG (testng-Test:[]) [Test] Checking values on Test-NodeA-2588
> 03:24:27,998 TRACE (jgroups-5,Test-NodeB-54331:[]) [CommandAwareRpcDispatcher] About to send back response SuccessfulResponse{responseValue=MortalCacheValue{value=some data, lifespan=600000, created=1493943867607}} for command ClusteredGetCommand{key=k0, flags=[]}
> 03:24:28,034 TRACE (jgroups-7,Test-NodeA-2588:[]) [L1WriteSynchronizer] Caching remotely retrieved entry for key k0 in L1
> 03:24:28,044 TRACE (jgroups-7,Test-NodeA-2588:[]) [DefaultDataContainer] Store MortalCacheEntry{key=k0, value=some data} in container
> 03:24:28,519 DEBUG (testng-Test:[]) [Test] Checking values on Test-NodeB-54331
> 03:24:28,595 DEBUG (testng-Test:[]) [Test] Starting a new joiner
> 03:24:30,261 TRACE (transport-thread-Test-NodeA-p51-t6:[Topology-___defaultcache]) [InvocationContextInterceptor] Invoked with command InvalidateCommand{keys=[k0, k1, k2, k3, k4, k5, k6, k7, k8, k9]} and InvocationContext [org.infinispan.context.impl.NonTxInvocationContext at 54c5cc1d]
> 03:24:30,292 DEBUG (testng-Test:[]) [Test] Checking values on Test-NodeA-2588
> 03:24:30,355 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.distribution.rehash.RehashWithL1Test.testPutWithRehashAndCacheClear
> java.lang.AssertionError: wrong value for k0
> at org.testng.AssertJUnit.fail(AssertJUnit.java:59) ~[testng-6.8.8.jar:?]
> at org.testng.AssertJUnit.assertTrue(AssertJUnit.java:24) ~[testng-6.8.8.jar:?]
> at org.testng.AssertJUnit.assertNull(AssertJUnit.java:282) ~[testng-6.8.8.jar:?]
> at org.infinispan.distribution.rehash.RehashWithL1Test.testPutWithRehashAndCacheClear(RehashWithL1Test.java:78) ~[test-classes/:?]
> *** Too late
> 03:24:30,360 TRACE (transport-thread-Test-NodeA-p51-t6:[Topology-___defaultcache]) [DefaultDataContainer] Removed MortalCacheEntry{key=k0, value=some data} from container
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
More information about the infinispan-issues
mailing list