[JBoss JIRA] (ISPN-2537) LockCleanupStateTransferTest fails randomly
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-2537?page=com.atlassian.jira.plugin.... ]
Mircea Markus updated ISPN-2537:
--------------------------------
Fix Version/s: 5.3.0.Final
(was: 5.2.0.Final)
> LockCleanupStateTransferTest fails randomly
> -------------------------------------------
>
> Key: ISPN-2537
> URL: https://issues.jboss.org/browse/ISPN-2537
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite
> Affects Versions: 5.2.0.Beta4
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Fix For: 5.3.0.Final
>
>
> LockCleanupStateTransferTest assumes that once the tx is done on one remote node it's done on all of them (and on the originator as well).
> It sometimes fails with this stack trace:
> {noformat}
> java.lang.AssertionError: For cache 0 expected:<0> but was:<1>
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.failNotEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:128)
> at org.junit.Assert.assertEquals(Assert.java:472)
> at org.infinispan.tx.LockCleanupStateTransferTest.testLockReleasedCorrectly(LockCleanupStateTransferTest.java:164)
> at org.infinispan.tx.LockCleanupStateTransferTest.testBelatedCommit(LockCleanupStateTransferTest.java:76)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 10 months
[JBoss JIRA] (ISPN-2503) Re-enable FD_SOCK in the test suite
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-2503?page=com.atlassian.jira.plugin.... ]
Mircea Markus updated ISPN-2503:
--------------------------------
Fix Version/s: 5.3.0.Final
(was: 5.2.0.Final)
> Re-enable FD_SOCK in the test suite
> -----------------------------------
>
> Key: ISPN-2503
> URL: https://issues.jboss.org/browse/ISPN-2503
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite
> Affects Versions: 5.2.0.Beta3
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Fix For: 5.3.0.Final
>
>
> Some tests fail randomly with a timeout waiting for a new view after stopping the coordinator:
> {noformat}
> 01:08:11,695 ERROR (testng-CacheClusterJoinTest:) [UnitTestTestNGListener] Test testIsCoordinator(org.infinispan.api.CacheClusterJoinTest) failed.
> java.lang.RuntimeException: Timed out before caches had complete views. Expected 1 members in each view. Views are as follows: [[NodeC-27739, NodeD-5092]]
> at org.infinispan.test.TestingUtil.viewsTimedOut(TestingUtil.java:249)
> at org.infinispan.test.TestingUtil.blockUntilViewsReceived(TestingUtil.java:311)
> at org.infinispan.api.CacheClusterJoinTest.testIsCoordinator(CacheClusterJoinTest.java:87)
> {noformat}
> This happens because the old coordinator tries to install a new view without it before stopping, but fails:
> {noformat}
> 01:07:21,616 WARN (ViewHandler,ISPN,NodeC-27739:) [GMS] NodeC-27739: failed to collect all ACKs (expected=1) for view [NodeD-5092|2] after 2000ms, missing ACKs from [NodeD-5092]
> {noformat}
> The survivor never received the view installation message, so it didn't install view {{[NodeD-5092|2]}}. Because it didn't have any failure detection, it couldn't realize that the current coordinator was dead so it never installed a new view.
> It's not clear why the survivor didn't receive the view message at all in the test suite, but this can obviously happen so we should enable FD_SOCK in the test suite.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 10 months
[JBoss JIRA] (ISPN-2688) BaseDistributionInterceptor and TxDistributionInterceptor do not detect properly if the key needs to be fetched remotely
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-2688?page=com.atlassian.jira.plugin.... ]
Mircea Markus updated ISPN-2688:
--------------------------------
Fix Version/s: 5.3.0.Final
(was: 5.2.0.Final)
> BaseDistributionInterceptor and TxDistributionInterceptor do not detect properly if the key needs to be fetched remotely
> ------------------------------------------------------------------------------------------------------------------------
>
> Key: ISPN-2688
> URL: https://issues.jboss.org/browse/ISPN-2688
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer
> Affects Versions: 5.2.0.Beta6
> Reporter: Dan Berindei
> Assignee: Mircea Markus
> Priority: Critical
> Fix For: 5.3.0.Final
>
> Attachments: stlot.log.gz
>
>
> BaseDistributionInterceptor.shouldFetchFromRemote() and TxDistributionInterceptor.remoteGetAndStoreInL1() can mistakenly decide not to fetch remotely because they check the presence of the key in data container. The key may be there _now_ but it was not there before the local execution if state transfer was in progress for this key. So it should be re-fetched rather than use the null result.
> This makes StateTransferLargeObjectTest.testForFailure fail randomly.
> The failure appears because the state transfer has not finished, yet the distribution interceptor doesn't go remotely for the key:
> {noformat}
> 14:06:45,872 TRACE (asyncTransportThread-1,NodeA:___defaultcache) [StateTransferManagerImpl] Installing new cache topology CacheTopology{id=7, currentCH=DefaultConsistentHash{numSegments=60, numOwners=3, members=[NodeA-52814, NodeB-62397, NodeC-63995], owners={0: 0 1 2, 1: 0 1 2, 2: 0 1 2, 3: 0 1 2, 4: 0 1 2, 5: 0 1 2, 6: 0 1 2, 7: 0 1 2, 8: 0 1 2, 9: 0 1 2, 10: 0 1 2, 11: 0 1 2, 12: 0 2, 13: 0 2, 14: 0 2, 15: 0 1, 16: 0 1, 17: 0 1, 18: 0 1, 19: 0 1, 20: 2 0, 21: 2 0, 22: 2 0, 23: 2 0, 24: 2 0, 25: 2 0, 26: 2 0, 27: 2 0, 28: 2 0, 29: 2 0, 30: 1 0 2, 31: 1 0 2, 32: 1 0 2, 33: 1 2, 34: 1 0, 35: 1 2, 36: 1 0, 37: 1 2, 38: 1 0, 39: 1 2, 40: 1 0, 41: 1 2, 42: 1 0, 43: 1 2, 44: 1 0, 45: 1 0, 46: 1 0, 47: 1 0, 48: 1 0, 49: 1 2, 50: 2 1, 51: 2 1, 52: 2 1, 53: 2 1, 54: 2 1, 55: 2 1, 56: 2 1, 57: 2 1, 58: 2 0, 59: 2 0}, pendingCH=DefaultConsistentHash{numSegments=60, numOwners=3, members=[NodeA-52814, NodeB-62397, NodeC-63995], owners={0: 0 1 2, 1: 0 1 2, 2: 0 1 2, 3: 0 1 2, 4: 0 1 2, 5: 0 1 2, 6: 0 1 2, 7: 0 1 2, 8: 0 1 2, 9: 0 1 2, 10: 0 1 2, 11: 0 1 2, 12: 0 2 1, 13: 0 2 1, 14: 0 2 1, 15: 0 1 2, 16: 0 1 2, 17: 0 1 2, 18: 0 1 2, 19: 0 1 2, 20: 2 0 1, 21: 2 0 1, 22: 2 0 1, 23: 2 0 1, 24: 2 0 1, 25: 2 0 1, 26: 2 0 1, 27: 2 0 1, 28: 2 0 1, 29: 2 0 1, 30: 1 0 2, 31: 1 0 2, 32: 1 0 2, 33: 1 2 0, 34: 1 0 2, 35: 1 2 0, 36: 1 0 2, 37: 1 2 0, 38: 1 0 2, 39: 1 2 0, 40: 1 0 2, 41: 1 2 0, 42: 1 0 2, 43: 1 2 0, 44: 1 0 2, 45: 1 0 2, 46: 1 0 2, 47: 1 0 2, 48: 1 0 2, 49: 1 2 0, 50: 2 1 0, 51: 2 1 0, 52: 2 1 0, 53: 2 1 0, 54: 2 1 0, 55: 2 1 0, 56: 2 1 0, 57: 2 1 0, 58: 2 0 1, 59: 2 0 1}} on cache ___defaultcache
> 14:06:46,287 TRACE (OOB-9,ISPN,NodeA-52814:___defaultcache) [StateConsumerImpl] Received keys [0, 2, 10, 94, 103, 117, 187, 189, 288, 305, 307, 376, 481, 487, 502, 729, 771, 994] for segment 50 of cache ___defaultcache from node NodeB-62397
> 14:06:46,338 INFO (testng-StateTransferLargeObjectTest:) [StateTransferLargeObjectTest] ----Running a get on 10
> 14:06:46,351 TRACE (OOB-9,ISPN,NodeA-52814:___defaultcache ___defaultcache) [InvocationContextInterceptor] Invoked with command PutKeyValueCommand{key=10, value=org.infinispan.statetransfer.BigObject@6ed3126d, flags=[CACHE_MODE_LOCAL, SKIP_REMOTE_LOOKUP, PUT_FOR_STATE_TRANSFER, SKIP_SHARED_CACHE_STORE, SKIP_OWNERSHIP_CHECK, IGNORE_RETURN_VALUES, SKIP_XSITE_BACKUP], putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1, successful=true} and InvocationContext [org.infinispan.context.impl.LocalTxInvocationContext@2c6dd013]
> 14:06:46,358 TRACE (testng-StateTransferLargeObjectTest:___defaultcache) [GetKeyValueCommand] Entry not found
> 14:06:46,358 TRACE (testng-StateTransferLargeObjectTest:___defaultcache) [BaseDistributionInterceptor] Not doing a remote get for key 10 since entry is mapped to current node (NodeA-52814), or is in L1. Owners are [NodeC-63995, NodeB-62397, NodeA-52814]
> 14:06:46,359 ERROR (testng-StateTransferLargeObjectTest:) [UnitTestTestNGListener] Test testForFailure(org.infinispan.statetransfer.StateTransferLargeObjectTest) failed.
> java.lang.AssertionError: expected object to not be null
> at org.testng.Assert.fail(Assert.java:89)
> at org.testng.Assert.assertNotNull(Assert.java:399)
> at org.testng.Assert.assertNotNull(Assert.java:384)
> at org.infinispan.statetransfer.StateTransferLargeObjectTest.assertValue(StateTransferLargeObjectTest.java:145)
> at org.infinispan.statetransfer.StateTransferLargeObjectTest.testForFailure(StateTransferLargeObjectTest.java:115)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 10 months