August 2017 - infinispan-issues - Jboss List Archives

[JBoss JIRA] (ISPN-8175) LocalCacheStateTransferTest random failures

by Pedro Ruivo (JIRA)

[ https://issues.jboss.org/browse/ISPN-8175?page=com.atlassian.jira.plugin.... ] Pedro Ruivo updated ISPN-8175: ------------------------------ Status: Pull Request Sent (was: Open) Git Pull Request: https://github.com/infinispan/infinispan/pull/5363 > LocalCacheStateTransferTest random failures > ------------------------------------------- > > Key: ISPN-8175 > URL: https://issues.jboss.org/browse/ISPN-8175 > Project: Infinispan > Issue Type: Bug > Components: Test Suite - Core > Affects Versions: 9.1.0.Final > Reporter: Gustavo Fernandes > Assignee: Pedro Ruivo > Labels: testsuite_stability > > {noformat} > [ERROR] testStateTransferWithClusterIdle(org.infinispan.xsite.statetransfer.LocalCacheStateTransferTest) Time elapsed: 0.698 s <<< FAILURE! > java.lang.AssertionError: > at org.testng.AssertJUnit.fail(AssertJUnit.java:59) > at org.testng.AssertJUnit.assertTrue(AssertJUnit.java:24) > at org.testng.AssertJUnit.assertFalse(AssertJUnit.java:41) > at org.testng.AssertJUnit.assertFalse(AssertJUnit.java:49) > at org.infinispan.xsite.statetransfer.LocalCacheStateTransferTest.assertNoStateTransferInReceivingSite(LocalCacheStateTransferTest.java:147) > at org.infinispan.xsite.statetransfer.LocalCacheStateTransferTest.testStateTransferWithClusterIdle(LocalCacheStateTransferTest.java:96) > {noformat} -- This message was sent by Atlassian JIRA (v7.2.3#72005)

7 years, 4 months

1
0
0 / 0

[JBoss JIRA] (ISPN-8175) LocalCacheStateTransferTest random failures

by Pedro Ruivo (JIRA)

[ https://issues.jboss.org/browse/ISPN-8175?page=com.atlassian.jira.plugin.... ] Pedro Ruivo reassigned ISPN-8175: --------------------------------- Assignee: Pedro Ruivo > LocalCacheStateTransferTest random failures > ------------------------------------------- > > Key: ISPN-8175 > URL: https://issues.jboss.org/browse/ISPN-8175 > Project: Infinispan > Issue Type: Bug > Components: Test Suite - Core > Affects Versions: 9.1.0.Final > Reporter: Gustavo Fernandes > Assignee: Pedro Ruivo > Labels: testsuite_stability > > {noformat} > [ERROR] testStateTransferWithClusterIdle(org.infinispan.xsite.statetransfer.LocalCacheStateTransferTest) Time elapsed: 0.698 s <<< FAILURE! > java.lang.AssertionError: > at org.testng.AssertJUnit.fail(AssertJUnit.java:59) > at org.testng.AssertJUnit.assertTrue(AssertJUnit.java:24) > at org.testng.AssertJUnit.assertFalse(AssertJUnit.java:41) > at org.testng.AssertJUnit.assertFalse(AssertJUnit.java:49) > at org.infinispan.xsite.statetransfer.LocalCacheStateTransferTest.assertNoStateTransferInReceivingSite(LocalCacheStateTransferTest.java:147) > at org.infinispan.xsite.statetransfer.LocalCacheStateTransferTest.testStateTransferWithClusterIdle(LocalCacheStateTransferTest.java:96) > {noformat} -- This message was sent by Atlassian JIRA (v7.2.3#72005)

7 years, 4 months

1
0
0 / 0

[JBoss JIRA] (ISPN-6244) ExampleConfigsIT#testClusterCacheLoaderConfigExample failing

by Ryan Emerson (JIRA)

[ https://issues.jboss.org/browse/ISPN-6244?page=com.atlassian.jira.plugin.... ] Ryan Emerson closed ISPN-6244. ------------------------------ Fix Version/s: 8.2.0.Final Resolution: Out of Date Test was removed https://issues.jboss.org/browse/ISPN-6324 > ExampleConfigsIT#testClusterCacheLoaderConfigExample failing > ------------------------------------------------------------ > > Key: ISPN-6244 > URL: https://issues.jboss.org/browse/ISPN-6244 > Project: Infinispan > Issue Type: Bug > Components: Test Suite - Server > Affects Versions: 8.2.0.Beta2 > Reporter: Gustavo Fernandes > Assignee: Ryan Emerson > Labels: testsuite_stability > Fix For: 8.2.0.Final > > -- This message was sent by Atlassian JIRA (v7.2.3#72005)

7 years, 4 months

1
0
0 / 0

[JBoss JIRA] (ISPN-6244) ExampleConfigsIT#testClusterCacheLoaderConfigExample failing

by Ryan Emerson (JIRA)

[ https://issues.jboss.org/browse/ISPN-6244?page=com.atlassian.jira.plugin.... ] Ryan Emerson reassigned ISPN-6244: ---------------------------------- Assignee: Ryan Emerson > ExampleConfigsIT#testClusterCacheLoaderConfigExample failing > ------------------------------------------------------------ > > Key: ISPN-6244 > URL: https://issues.jboss.org/browse/ISPN-6244 > Project: Infinispan > Issue Type: Bug > Components: Test Suite - Server > Affects Versions: 8.2.0.Beta2 > Reporter: Gustavo Fernandes > Assignee: Ryan Emerson > Labels: testsuite_stability > -- This message was sent by Atlassian JIRA (v7.2.3#72005)

7 years, 4 months

1
0
0 / 0

[JBoss JIRA] (ISPN-8176) RemoteCacheStoreIT.testReadOnly random failures

by Ryan Emerson (JIRA)

[ https://issues.jboss.org/browse/ISPN-8176?page=com.atlassian.jira.plugin.... ] Ryan Emerson updated ISPN-8176: ------------------------------- Status: Open (was: New) > RemoteCacheStoreIT.testReadOnly random failures > ----------------------------------------------- > > Key: ISPN-8176 > URL: https://issues.jboss.org/browse/ISPN-8176 > Project: Infinispan > Issue Type: Bug > Affects Versions: 9.1.0.Final > Reporter: Gustavo Fernandes > Assignee: Ryan Emerson > Labels: testsuite_stability > > java.lang.AssertionError: expected null, but was:<v1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotNull(Assert.java:664) > at org.junit.Assert.assertNull(Assert.java:646) > at org.junit.Assert.assertNull(Assert.java:656) > at org.infinispan.server.test.cs.remote.RemoteCacheStoreIT.testReadOnly(RemoteCacheStoreIT.java:85) -- This message was sent by Atlassian JIRA (v7.2.3#72005)

7 years, 4 months

1
0
0 / 0

[JBoss JIRA] (ISPN-8176) RemoteCacheStoreIT.testReadOnly random failures

by Ryan Emerson (JIRA)

[ https://issues.jboss.org/browse/ISPN-8176?page=com.atlassian.jira.plugin.... ] Ryan Emerson updated ISPN-8176: ------------------------------- Status: Pull Request Sent (was: Open) Git Pull Request: https://github.com/infinispan/infinispan/pull/5362 > RemoteCacheStoreIT.testReadOnly random failures > ----------------------------------------------- > > Key: ISPN-8176 > URL: https://issues.jboss.org/browse/ISPN-8176 > Project: Infinispan > Issue Type: Bug > Affects Versions: 9.1.0.Final > Reporter: Gustavo Fernandes > Assignee: Ryan Emerson > Labels: testsuite_stability > > java.lang.AssertionError: expected null, but was:<v1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotNull(Assert.java:664) > at org.junit.Assert.assertNull(Assert.java:646) > at org.junit.Assert.assertNull(Assert.java:656) > at org.infinispan.server.test.cs.remote.RemoteCacheStoreIT.testReadOnly(RemoteCacheStoreIT.java:85) -- This message was sent by Atlassian JIRA (v7.2.3#72005)

7 years, 4 months

1
0
0 / 0

[JBoss JIRA] (ISPN-8075) LiveRuningTest fails randomly with CorruptIndexException

by Gustavo Fernandes (JIRA)

[ https://issues.jboss.org/browse/ISPN-8075?page=com.atlassian.jira.plugin.... ] Gustavo Fernandes commented on ISPN-8075: ----------------------------------------- Was only able to reproduce by running the test in a loop inside a VirtualBox running on a dual core latop > LiveRuningTest fails randomly with CorruptIndexException > -------------------------------------------------------- > > Key: ISPN-8075 > URL: https://issues.jboss.org/browse/ISPN-8075 > Project: Infinispan > Issue Type: Bug > Components: Embedded Querying > Affects Versions: 9.1.0.Final > Reporter: Gustavo Fernandes > Assignee: Gustavo Fernandes > > {noformat} > org.hibernate.search.exception.SearchException: Unable to reopen IndexReader > at org.hibernate.search.indexes.impl.SharingBufferReaderProvider$PerDirectoryLatestReader.refreshAndGet(SharingBufferReaderProvider.java:242) > at org.hibernate.search.indexes.impl.SharingBufferReaderProvider.openIndexReader(SharingBufferReaderProvider.java:73) > at org.hibernate.search.indexes.impl.SharingBufferReaderProvider.openIndexReader(SharingBufferReaderProvider.java:35) > at org.hibernate.search.reader.impl.ManagedMultiReader.createInstance(ManagedMultiReader.java:69) > at org.hibernate.search.reader.impl.MultiReaderFactory.openReader(MultiReaderFactory.java:48) > at org.hibernate.search.query.engine.impl.LuceneHSQuery.buildSearcher(LuceneHSQuery.java:475) > at org.hibernate.search.query.engine.impl.LuceneHSQuery.queryResultSize(LuceneHSQuery.java:218) > at org.hibernate.search.query.hibernate.impl.FullTextQueryImpl.doGetResultSize(FullTextQueryImpl.java:269) > at org.hibernate.search.query.hibernate.impl.FullTextQueryImpl.getResultSize(FullTextQueryImpl.java:260) > at org.infinispan.hibernate.search.LiveRunningTest.assertView(LiveRunningTest.java:79) > at org.infinispan.hibernate.search.LiveRunningTest.assertViews(LiveRunningTest.java:70) > at org.infinispan.hibernate.search.LiveRunningTest.liveRun(LiveRunningTest.java:57) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283) > at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173) > at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) > at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128) > at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203) > at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155) > at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) > Caused by: java.io.IOException: Read past EOF > at org.infinispan.lucene.impl.SingleChunkIndexInput.readByte(SingleChunkIndexInput.java:54) > at org.apache.lucene.store.BufferedChecksumIndexInput.readByte(BufferedChecksumIndexInput.java:41) > at org.apache.lucene.store.DataInput.readInt(DataInput.java:101) > at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:194) > at org.apache.lucene.codecs.CodecUtil.checkIndexHeader(CodecUtil.java:255) > at org.apache.lucene.codecs.lucene50.Lucene50SegmentInfoFormat.read(Lucene50SegmentInfoFormat.java:86) > at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:362) > at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:493) > at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:490) > at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:731) > at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:683) > at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:490) > at org.apache.lucene.index.StandardDirectoryReader.isCurrent(StandardDirectoryReader.java:344) > at org.apache.lucene.index.StandardDirectoryReader.doOpenNoWriter(StandardDirectoryReader.java:300) > at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:263) > at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:251) > at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:137) > at org.hibernate.search.indexes.impl.SharingBufferReaderProvider$PerDirectoryLatestReader.refreshAndGet(SharingBufferReaderProvider.java:239) > ... 37 more > Suppressed: org.apache.lucene.index.CorruptIndexException: checksum status indeterminate: remaining=0, please run checkindex for more details (resource=BufferedChecksumIndexInput(_b.si)) > at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:370) > at org.apache.lucene.codecs.lucene50.Lucene50SegmentInfoFormat.read(Lucene50SegmentInfoFormat.java:117) > ... 49 more > {noformat} -- This message was sent by Atlassian JIRA (v7.2.3#72005)

7 years, 4 months

1
0
0 / 0

[JBoss JIRA] (ISPN-8176) RemoteCacheStoreIT.testReadOnly random failures

by Ryan Emerson (JIRA)

[ https://issues.jboss.org/browse/ISPN-8176?page=com.atlassian.jira.plugin.... ] Ryan Emerson reassigned ISPN-8176: ---------------------------------- Assignee: Ryan Emerson > RemoteCacheStoreIT.testReadOnly random failures > ----------------------------------------------- > > Key: ISPN-8176 > URL: https://issues.jboss.org/browse/ISPN-8176 > Project: Infinispan > Issue Type: Bug > Affects Versions: 9.1.0.Final > Reporter: Gustavo Fernandes > Assignee: Ryan Emerson > Labels: testsuite_stability > > java.lang.AssertionError: expected null, but was:<v1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotNull(Assert.java:664) > at org.junit.Assert.assertNull(Assert.java:646) > at org.junit.Assert.assertNull(Assert.java:656) > at org.infinispan.server.test.cs.remote.RemoteCacheStoreIT.testReadOnly(RemoteCacheStoreIT.java:85) -- This message was sent by Atlassian JIRA (v7.2.3#72005)

7 years, 4 months

1
0
0 / 0

[JBoss JIRA] (ISPN-8182) Asynchronous commands should be retried if topology is outdated

by Galder Zamarreño (JIRA)

[ https://issues.jboss.org/browse/ISPN-8182?page=com.atlassian.jira.plugin.... ] Galder Zamarreño commented on ISPN-8182: ---------------------------------------- A couple of IRC discussions we've had so far: {code} [15:52:15] > pruivo: dberindei: it'd be interesting to hear your thoughts about ISPN-8182 [15:53:47] <dberindei> galderz: I don't think it's feasible, the originator forgets the command immediately after sending it to the owners [15:54:29] <dberindei> galderz: OTOH I don't think the remote nodes should throw an OutdatedTopologyException if the command is async [15:54:50] <pruivo> dberindei, galderz well, IMO I don't think we should do it. If you want to apply and update async, use the putAsync() [15:55:29] > rvansa: FYI ^ [15:55:55] <dberindei> pruivo: but you agree that in DIST_ASYNC, remote nodes throwing OTE is a bug, right? [15:56:05] > dberindei: we already supply custom interceptors for HB 2L, so we could try to do that: not bothering about outdated topologies [15:57:14] > pruivo: i guess you mean that putAsync() would retry in case of outdated topology? [15:57:16] <pruivo> dberindei, more or less. I think it should check if it is an owner or not. throwing the exception definitely isn't needed [15:57:42] <rvansa> dberindei: I think that the topology check in DI does not care if the cache is async [15:57:42] <pruivo> galderz, yes, if I'm not mistaken, it is a sync put in a separate threads (with the benefits of sync mode) [15:57:51] <rvansa> dberindei: if it does not match, it simply throws [15:58:00] <dberindei> rvansa: ok, that's a bug [15:58:39] <rvansa> dberindei: it shouldn't ignore it either, IMO... everything below STI should be executed in the same topology, IMO [15:59:37] > dberindei: what's the bug? [16:00:32] <rvansa> dberindei: I think that the DI should rather send the command to proper primary owner in the recent topology [16:00:55] <rvansa> dberindei: if this node is meant as primary {code} And: {code} <rvansa> galderz: I think that you shouldn't need to catch - the OTE does not have to be thrown at all > rvansa: right, assuming we provide our own interceptor for timestamp and query, we can just simply ignore any topology checks... <rvansa> galderz: not only for our own interceptor, it's a general infinispan issue > rvansa: both for REPL and DIST? <rvansa> galderz: yesterday before the meeting I've suggested that async commands should just execute if these are still on the owner <rvansa> galderz: yes <rvansa> galderz: in async mode, after you call cache.put(k, v2), all owners should eventually contain v2 <rvansa> galderz: unless 'error' happens <rvansa> galderz: topology change (node joining) is not an error > rvansa: makes sense <rvansa> galderz: actually, throwing and retrying locally might be needed - I would prefer to fix topology for a given command below STI and retry if it changes in any place we need to consider it <rvansa> galderz: that's a cozy invariant; not sure if it's really needed here <rvansa> galderz: anyway, regrettably we don't have any plan so far how to make the 'eventually' happen > rvansa: but we need something better than what we have now... <rvansa> galderz: because if a new owner pops up and fetches data from node that did not get the update yet, its version would be stale > dberindei: pruivo: we were interrupted yday discussing ISPN-8182 <jbossbot> jira [ISPN-8182] Asynchronous commands should be retried if topology is outdated [New (Unresolved) Enhancement, Major, Core, Unassigned] https://issues.jboss.org/browse/ISPN-8182 <rvansa> galderz: quick fix would be just not throwing <rvansa> galderz: + a set of stress tests that will try out this with all combos of primary/backup/non-owner transitions to see if anything goes wrong <rvansa> 'wrong' meaning NPEs and such, stale data should be expected in thos <rvansa> those *** First activity: dberindei joined 33 minutes 16 seconds ago. <dberindei> galderz rvansa: indeed, if a new node joins OR a node leaves, some keys will have new owners, and those owners may or may not receive the updated value > rvansa: if stale data is expected, then we're in the same scenario as now really <dberindei> galderz: the fact that we currently check the topology id and throw an exception means the update will can be missed on owners that aren't new > what do you mean by "aren't new"? <dberindei> galderz: say in topology 1 k is owned by AB, and in topology 2 it's owned by CB <dberindei> galderz: C would be a new owner, B would be "non-new" :) > dberindei: got it > dberindei: so, should remote nodes not throw that exception for async puts? or should still be thrown and then retried? > dberindei: we're assuming we'd change core for this <dberindei> galderz: throwing and catching would be nice because the only change would be in StateTransferInterceptor (I think) > dberindei: ok > dberindei: do we have any stress tests where we could add tests for seeing that it all works fine for repl async puts? {code} > Asynchronous commands should be retried if topology is outdated > --------------------------------------------------------------- > > Key: ISPN-8182 > URL: https://issues.jboss.org/browse/ISPN-8182 > Project: Infinispan > Issue Type: Enhancement > Components: Core > Affects Versions: 9.1.0.Final > Reporter: Galder Zamarreño > > If an asynchronous command fails at a remote node, it should be retried. > I'm not sure how feasible this really is. One possible solution could be this: having NACK style implementation where by default the originator assumes an asynchronous command has been executed, but if the receiver tells it that the topology is outdated, the originator retries? > This is related to ISPN-8027 where we've discovered that some updates are not applied when asynchronous commands to update the Hibernate 2L timestamp cache fail as a result of an outdated topology. -- This message was sent by Atlassian JIRA (v7.2.3#72005)

7 years, 4 months

1
0
0 / 0

[JBoss JIRA] (ISPN-8182) Asynchronous commands should be retried if topology is outdated

by Galder Zamarreño (JIRA)

[ https://issues.jboss.org/browse/ISPN-8182?page=com.atlassian.jira.plugin.... ] Galder Zamarreño edited comment on ISPN-8182 at 8/8/17 5:05 AM: ---------------------------------------------------------------- A couple of IRC discussions we've had so far: {code} [15:52:15] > pruivo: dberindei: it'd be interesting to hear your thoughts about ISPN-8182 [15:53:47] <dberindei> galderz: I don't think it's feasible, the originator forgets the command immediately after sending it to the owners [15:54:29] <dberindei> galderz: OTOH I don't think the remote nodes should throw an OutdatedTopologyException if the command is async [15:54:50] <pruivo> dberindei, galderz well, IMO I don't think we should do it. If you want to apply and update async, use the putAsync() [15:55:29] > rvansa: FYI ^ [15:55:55] <dberindei> pruivo: but you agree that in DIST_ASYNC, remote nodes throwing OTE is a bug, right? [15:56:05] > dberindei: we already supply custom interceptors for HB 2L, so we could try to do that: not bothering about outdated topologies [15:57:14] > pruivo: i guess you mean that putAsync() would retry in case of outdated topology? [15:57:16] <pruivo> dberindei, more or less. I think it should check if it is an owner or not. throwing the exception definitely isn't needed [15:57:42] <rvansa> dberindei: I think that the topology check in DI does not care if the cache is async [15:57:42] <pruivo> galderz, yes, if I'm not mistaken, it is a sync put in a separate threads (with the benefits of sync mode) [15:57:51] <rvansa> dberindei: if it does not match, it simply throws [15:58:00] <dberindei> rvansa: ok, that's a bug [15:58:39] <rvansa> dberindei: it shouldn't ignore it either, IMO... everything below STI should be executed in the same topology, IMO [15:59:37] > dberindei: what's the bug? [16:00:32] <rvansa> dberindei: I think that the DI should rather send the command to proper primary owner in the recent topology [16:00:55] <rvansa> dberindei: if this node is meant as primary {code} And: {code} <rvansa> galderz: I think that you shouldn't need to catch - the OTE does not have to be thrown at all > rvansa: right, assuming we provide our own interceptor for timestamp and query, we can just simply ignore any topology checks... <rvansa> galderz: not only for our own interceptor, it's a general infinispan issue > rvansa: both for REPL and DIST? <rvansa> galderz: yesterday before the meeting I've suggested that async commands should just execute if these are still on the owner <rvansa> galderz: yes <rvansa> galderz: in async mode, after you call cache.put(k, v2), all owners should eventually contain v2 <rvansa> galderz: unless 'error' happens <rvansa> galderz: topology change (node joining) is not an error > rvansa: makes sense <rvansa> galderz: actually, throwing and retrying locally might be needed - I would prefer to fix topology for a given command below STI and retry if it changes in any place we need to consider it <rvansa> galderz: that's a cozy invariant; not sure if it's really needed here <rvansa> galderz: anyway, regrettably we don't have any plan so far how to make the 'eventually' happen > rvansa: but we need something better than what we have now... <rvansa> galderz: because if a new owner pops up and fetches data from node that did not get the update yet, its version would be stale > dberindei: pruivo: we were interrupted yday discussing ISPN-8182 <jbossbot> jira [ISPN-8182] Asynchronous commands should be retried if topology is outdated [New (Unresolved) Enhancement, Major, Core, Unassigned] https://issues.jboss.org/browse/ISPN-8182 <rvansa> galderz: quick fix would be just not throwing <rvansa> galderz: + a set of stress tests that will try out this with all combos of primary/backup/non-owner transitions to see if anything goes wrong <rvansa> 'wrong' meaning NPEs and such, stale data should be expected in thos <rvansa> those *** First activity: dberindei joined 33 minutes 16 seconds ago. <dberindei> galderz rvansa: indeed, if a new node joins OR a node leaves, some keys will have new owners, and those owners may or may not receive the updated value > rvansa: if stale data is expected, then we're in the same scenario as now really <dberindei> galderz: the fact that we currently check the topology id and throw an exception means the update will can be missed on owners that aren't new > what do you mean by "aren't new"? <dberindei> galderz: say in topology 1 k is owned by AB, and in topology 2 it's owned by CB <dberindei> galderz: C would be a new owner, B would be "non-new" :) > dberindei: got it > dberindei: so, should remote nodes not throw that exception for async puts? or should still be thrown and then retried? > dberindei: we're assuming we'd change core for this <dberindei> galderz: throwing and catching would be nice because the only change would be in StateTransferInterceptor (I think) > dberindei: ok > dberindei: do we have any stress tests where we could add tests for seeing that it all works fine for repl async puts? {code} was (Author: galder.zamarreno): A couple of IRC discussions we've had so far: {code} [15:52:15] > pruivo: dberindei: it'd be interesting to hear your thoughts about ISPN-8182 [15:53:47] <dberindei> galderz: I don't think it's feasible, the originator forgets the command immediately after sending it to the owners [15:54:29] <dberindei> galderz: OTOH I don't think the remote nodes should throw an OutdatedTopologyException if the command is async [15:54:50] <pruivo> dberindei, galderz well, IMO I don't think we should do it. If you want to apply and update async, use the putAsync() [15:55:29] > rvansa: FYI ^ [15:55:55] <dberindei> pruivo: but you agree that in DIST_ASYNC, remote nodes throwing OTE is a bug, right? [15:56:05] > dberindei: we already supply custom interceptors for HB 2L, so we could try to do that: not bothering about outdated topologies [15:57:14] > pruivo: i guess you mean that putAsync() would retry in case of outdated topology? [15:57:16] <pruivo> dberindei, more or less. I think it should check if it is an owner or not. throwing the exception definitely isn't needed [15:57:42] <rvansa> dberindei: I think that the topology check in DI does not care if the cache is async [15:57:42] <pruivo> galderz, yes, if I'm not mistaken, it is a sync put in a separate threads (with the benefits of sync mode) [15:57:51] <rvansa> dberindei: if it does not match, it simply throws [15:58:00] <dberindei> rvansa: ok, that's a bug [15:58:39] <rvansa> dberindei: it shouldn't ignore it either, IMO... everything below STI should be executed in the same topology, IMO [15:59:37] > dberindei: what's the bug? [16:00:32] <rvansa> dberindei: I think that the DI should rather send the command to proper primary owner in the recent topology [16:00:55] <rvansa> dberindei: if this node is meant as primary {code} And: {code} <rvansa> galderz: I think that you shouldn't need to catch - the OTE does not have to be thrown at all > rvansa: right, assuming we provide our own interceptor for timestamp and query, we can just simply ignore any topology checks... <rvansa> galderz: not only for our own interceptor, it's a general infinispan issue > rvansa: both for REPL and DIST? <rvansa> galderz: yesterday before the meeting I've suggested that async commands should just execute if these are still on the owner <rvansa> galderz: yes <rvansa> galderz: in async mode, after you call cache.put(k, v2), all owners should eventually contain v2 <rvansa> galderz: unless 'error' happens <rvansa> galderz: topology change (node joining) is not an error > rvansa: makes sense <rvansa> galderz: actually, throwing and retrying locally might be needed - I would prefer to fix topology for a given command below STI and retry if it changes in any place we need to consider it <rvansa> galderz: that's a cozy invariant; not sure if it's really needed here <rvansa> galderz: anyway, regrettably we don't have any plan so far how to make the 'eventually' happen > rvansa: but we need something better than what we have now... <rvansa> galderz: because if a new owner pops up and fetches data from node that did not get the update yet, its version would be stale > dberindei: pruivo: we were interrupted yday discussing ISPN-8182 <jbossbot> jira [ISPN-8182] Asynchronous commands should be retried if topology is outdated [New (Unresolved) Enhancement, Major, Core, Unassigned] https://issues.jboss.org/browse/ISPN-8182 <rvansa> galderz: quick fix would be just not throwing <rvansa> galderz: + a set of stress tests that will try out this with all combos of primary/backup/non-owner transitions to see if anything goes wrong <rvansa> 'wrong' meaning NPEs and such, stale data should be expected in thos <rvansa> those *** First activity: dberindei joined 33 minutes 16 seconds ago. <dberindei> galderz rvansa: indeed, if a new node joins OR a node leaves, some keys will have new owners, and those owners may or may not receive the updated value > rvansa: if stale data is expected, then we're in the same scenario as now really <dberindei> galderz: the fact that we currently check the topology id and throw an exception means the update will can be missed on owners that aren't new > what do you mean by "aren't new"? <dberindei> galderz: say in topology 1 k is owned by AB, and in topology 2 it's owned by CB <dberindei> galderz: C would be a new owner, B would be "non-new" :) > dberindei: got it > dberindei: so, should remote nodes not throw that exception for async puts? or should still be thrown and then retried? > dberindei: we're assuming we'd change core for this <dberindei> galderz: throwing and catching would be nice because the only change would be in StateTransferInterceptor (I think) > dberindei: ok > dberindei: do we have any stress tests where we could add tests for seeing that it all works fine for repl async puts? {code} > Asynchronous commands should be retried if topology is outdated > --------------------------------------------------------------- > > Key: ISPN-8182 > URL: https://issues.jboss.org/browse/ISPN-8182 > Project: Infinispan > Issue Type: Enhancement > Components: Core > Affects Versions: 9.1.0.Final > Reporter: Galder Zamarreño > > If an asynchronous command fails at a remote node, it should be retried. > I'm not sure how feasible this really is. One possible solution could be this: having NACK style implementation where by default the originator assumes an asynchronous command has been executed, but if the receiver tells it that the topology is outdated, the originator retries? > This is related to ISPN-8027 where we've discovered that some updates are not applied when asynchronous commands to update the Hibernate 2L timestamp cache fail as a result of an outdated topology. -- This message was sent by Atlassian JIRA (v7.2.3#72005)

7 years, 4 months

1
0
0 / 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

infinispan-issues August 2017