[JBoss JIRA] (ISPN-8182) Asynchronous commands should be retried if topology is outdated
by Galder Zamarreño (JIRA)
[ https://issues.jboss.org/browse/ISPN-8182?page=com.atlassian.jira.plugin.... ]
Galder Zamarreño commented on ISPN-8182:
----------------------------------------
A couple of IRC discussions we've had so far:
{code}
[15:52:15] > pruivo: dberindei: it'd be interesting to hear your thoughts about ISPN-8182
[15:53:47] <dberindei> galderz: I don't think it's feasible, the originator forgets the command immediately after sending it to the owners
[15:54:29] <dberindei> galderz: OTOH I don't think the remote nodes should throw an OutdatedTopologyException if the command is async
[15:54:50] <pruivo> dberindei, galderz well, IMO I don't think we should do it. If you want to apply and update async, use the putAsync()
[15:55:29] > rvansa: FYI ^
[15:55:55] <dberindei> pruivo: but you agree that in DIST_ASYNC, remote nodes throwing OTE is a bug, right?
[15:56:05] > dberindei: we already supply custom interceptors for HB 2L, so we could try to do that: not bothering about outdated topologies
[15:57:14] > pruivo: i guess you mean that putAsync() would retry in case of outdated topology?
[15:57:16] <pruivo> dberindei, more or less. I think it should check if it is an owner or not. throwing the exception definitely isn't needed
[15:57:42] <rvansa> dberindei: I think that the topology check in DI does not care if the cache is async
[15:57:42] <pruivo> galderz, yes, if I'm not mistaken, it is a sync put in a separate threads (with the benefits of sync mode)
[15:57:51] <rvansa> dberindei: if it does not match, it simply throws
[15:58:00] <dberindei> rvansa: ok, that's a bug
[15:58:39] <rvansa> dberindei: it shouldn't ignore it either, IMO... everything below STI should be executed in the same topology, IMO
[15:59:37] > dberindei: what's the bug?
[16:00:32] <rvansa> dberindei: I think that the DI should rather send the command to proper primary owner in the recent topology
[16:00:55] <rvansa> dberindei: if this node is meant as primary
{code}
And:
{code}
<rvansa> galderz: I think that you shouldn't need to catch - the OTE
does not have to be thrown at all
> rvansa: right, assuming we provide our own interceptor for timestamp
and query, we can just simply ignore any topology checks...
<rvansa> galderz: not only for our own interceptor, it's a general
infinispan issue
> rvansa: both for REPL and DIST?
<rvansa> galderz: yesterday before the meeting I've suggested that
async commands should just execute if these are still on the owner
<rvansa> galderz: yes
<rvansa> galderz: in async mode, after you call cache.put(k, v2), all
owners should eventually contain v2
<rvansa> galderz: unless 'error' happens
<rvansa> galderz: topology change (node joining) is not an error
> rvansa: makes sense
<rvansa> galderz: actually, throwing and retrying locally might be
needed - I would prefer to fix topology for a given command below
STI and retry if it changes in any place we need to consider it
<rvansa> galderz: that's a cozy invariant; not sure if it's really
needed here
<rvansa> galderz: anyway, regrettably we don't have any plan so far
how to make the 'eventually' happen
> rvansa: but we need something better than what we have now...
<rvansa> galderz: because if a new owner pops up and fetches data from
node that did not get the update yet, its version would be stale
> dberindei: pruivo: we were interrupted yday discussing ISPN-8182
<jbossbot> jira [ISPN-8182] Asynchronous commands should be retried if
topology is outdated [New (Unresolved) Enhancement, Major, Core,
Unassigned] https://issues.jboss.org/browse/ISPN-8182
<rvansa> galderz: quick fix would be just not throwing
<rvansa> galderz: + a set of stress tests that will try out this with
all combos of primary/backup/non-owner transitions to see if
anything goes wrong
<rvansa> 'wrong' meaning NPEs and such, stale data should be expected
in thos
<rvansa> those
*** First activity: dberindei joined 33 minutes 16 seconds ago.
<dberindei> galderz rvansa: indeed, if a new node joins OR a node
leaves, some keys will have new owners, and those owners may or
may not receive the updated value
> rvansa: if stale data is expected, then we're in the same scenario
as now really
<dberindei> galderz: the fact that we currently check the topology id
and throw an exception means the update will can be missed on
owners that aren't new
> what do you mean by "aren't new"?
<dberindei> galderz: say in topology 1 k is owned by AB, and in
topology 2 it's owned by CB
<dberindei> galderz: C would be a new owner, B would be "non-new" :)
> dberindei: got it
> dberindei: so, should remote nodes not throw that exception for
async puts? or should still be thrown and then retried?
> dberindei: we're assuming we'd change core for this
<dberindei> galderz: throwing and catching would be nice because the
only change would be in StateTransferInterceptor (I think)
> dberindei: ok
> dberindei: do we have any stress tests where we could add tests for
seeing that it all works fine for repl async puts?
{code}
> Asynchronous commands should be retried if topology is outdated
> ---------------------------------------------------------------
>
> Key: ISPN-8182
> URL: https://issues.jboss.org/browse/ISPN-8182
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core
> Affects Versions: 9.1.0.Final
> Reporter: Galder Zamarreño
>
> If an asynchronous command fails at a remote node, it should be retried.
> I'm not sure how feasible this really is. One possible solution could be this: having NACK style implementation where by default the originator assumes an asynchronous command has been executed, but if the receiver tells it that the topology is outdated, the originator retries?
> This is related to ISPN-8027 where we've discovered that some updates are not applied when asynchronous commands to update the Hibernate 2L timestamp cache fail as a result of an outdated topology.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 8 months
[JBoss JIRA] (ISPN-8182) Asynchronous commands should be retried if topology is outdated
by Galder Zamarreño (JIRA)
[ https://issues.jboss.org/browse/ISPN-8182?page=com.atlassian.jira.plugin.... ]
Galder Zamarreño edited comment on ISPN-8182 at 8/8/17 5:05 AM:
----------------------------------------------------------------
A couple of IRC discussions we've had so far:
{code}
[15:52:15] > pruivo: dberindei: it'd be interesting to hear your thoughts about ISPN-8182
[15:53:47] <dberindei> galderz: I don't think it's feasible, the originator forgets the command
immediately after sending it to the owners
[15:54:29] <dberindei> galderz: OTOH I don't think the remote nodes should throw an
OutdatedTopologyException if the command is async
[15:54:50] <pruivo> dberindei, galderz well, IMO I don't think we should do it. If you want to
apply and update async, use the putAsync()
[15:55:29] > rvansa: FYI ^
[15:55:55] <dberindei> pruivo: but you agree that in DIST_ASYNC, remote nodes throwing OTE
is a bug, right?
[15:56:05] > dberindei: we already supply custom interceptors for HB 2L, so we could try to do
that: not bothering about outdated topologies
[15:57:14] > pruivo: i guess you mean that putAsync() would retry in case of outdated topology?
[15:57:16] <pruivo> dberindei, more or less. I think it should check if it is an owner or not.
throwing the exception definitely isn't needed
[15:57:42] <rvansa> dberindei: I think that the topology check in DI does not care if the cache is
async
[15:57:42] <pruivo> galderz, yes, if I'm not mistaken, it is a sync put in a separate threads (with
the benefits of sync mode)
[15:57:51] <rvansa> dberindei: if it does not match, it simply throws
[15:58:00] <dberindei> rvansa: ok, that's a bug
[15:58:39] <rvansa> dberindei: it shouldn't ignore it either, IMO... everything below STI should be
executed in the same topology, IMO
[15:59:37] > dberindei: what's the bug?
[16:00:32] <rvansa> dberindei: I think that the DI should rather send the command to proper
primary owner in the recent topology
[16:00:55] <rvansa> dberindei: if this node is meant as primary
{code}
And:
{code}
<rvansa> galderz: I think that you shouldn't need to catch - the OTE
does not have to be thrown at all
> rvansa: right, assuming we provide our own interceptor for timestamp
and query, we can just simply ignore any topology checks...
<rvansa> galderz: not only for our own interceptor, it's a general
infinispan issue
> rvansa: both for REPL and DIST?
<rvansa> galderz: yesterday before the meeting I've suggested that
async commands should just execute if these are still on the owner
<rvansa> galderz: yes
<rvansa> galderz: in async mode, after you call cache.put(k, v2), all
owners should eventually contain v2
<rvansa> galderz: unless 'error' happens
<rvansa> galderz: topology change (node joining) is not an error
> rvansa: makes sense
<rvansa> galderz: actually, throwing and retrying locally might be
needed - I would prefer to fix topology for a given command below
STI and retry if it changes in any place we need to consider it
<rvansa> galderz: that's a cozy invariant; not sure if it's really
needed here
<rvansa> galderz: anyway, regrettably we don't have any plan so far
how to make the 'eventually' happen
> rvansa: but we need something better than what we have now...
<rvansa> galderz: because if a new owner pops up and fetches data from
node that did not get the update yet, its version would be stale
> dberindei: pruivo: we were interrupted yday discussing ISPN-8182
<jbossbot> jira [ISPN-8182] Asynchronous commands should be retried if
topology is outdated [New (Unresolved) Enhancement, Major, Core,
Unassigned] https://issues.jboss.org/browse/ISPN-8182
<rvansa> galderz: quick fix would be just not throwing
<rvansa> galderz: + a set of stress tests that will try out this with
all combos of primary/backup/non-owner transitions to see if
anything goes wrong
<rvansa> 'wrong' meaning NPEs and such, stale data should be expected
in thos
<rvansa> those
*** First activity: dberindei joined 33 minutes 16 seconds ago.
<dberindei> galderz rvansa: indeed, if a new node joins OR a node
leaves, some keys will have new owners, and those owners may or
may not receive the updated value
> rvansa: if stale data is expected, then we're in the same scenario
as now really
<dberindei> galderz: the fact that we currently check the topology id
and throw an exception means the update will can be missed on
owners that aren't new
> what do you mean by "aren't new"?
<dberindei> galderz: say in topology 1 k is owned by AB, and in
topology 2 it's owned by CB
<dberindei> galderz: C would be a new owner, B would be "non-new" :)
> dberindei: got it
> dberindei: so, should remote nodes not throw that exception for
async puts? or should still be thrown and then retried?
> dberindei: we're assuming we'd change core for this
<dberindei> galderz: throwing and catching would be nice because the
only change would be in StateTransferInterceptor (I think)
> dberindei: ok
> dberindei: do we have any stress tests where we could add tests for
seeing that it all works fine for repl async puts?
{code}
was (Author: galder.zamarreno):
A couple of IRC discussions we've had so far:
{code}
[15:52:15] > pruivo: dberindei: it'd be interesting to hear your thoughts about ISPN-8182
[15:53:47] <dberindei> galderz: I don't think it's feasible, the originator forgets the command immediately after sending it to the owners
[15:54:29] <dberindei> galderz: OTOH I don't think the remote nodes should throw an OutdatedTopologyException if the command is async
[15:54:50] <pruivo> dberindei, galderz well, IMO I don't think we should do it. If you want to apply and update async, use the putAsync()
[15:55:29] > rvansa: FYI ^
[15:55:55] <dberindei> pruivo: but you agree that in DIST_ASYNC, remote nodes throwing OTE is a bug, right?
[15:56:05] > dberindei: we already supply custom interceptors for HB 2L, so we could try to do that: not bothering about outdated topologies
[15:57:14] > pruivo: i guess you mean that putAsync() would retry in case of outdated topology?
[15:57:16] <pruivo> dberindei, more or less. I think it should check if it is an owner or not. throwing the exception definitely isn't needed
[15:57:42] <rvansa> dberindei: I think that the topology check in DI does not care if the cache is async
[15:57:42] <pruivo> galderz, yes, if I'm not mistaken, it is a sync put in a separate threads (with the benefits of sync mode)
[15:57:51] <rvansa> dberindei: if it does not match, it simply throws
[15:58:00] <dberindei> rvansa: ok, that's a bug
[15:58:39] <rvansa> dberindei: it shouldn't ignore it either, IMO... everything below STI should be executed in the same topology, IMO
[15:59:37] > dberindei: what's the bug?
[16:00:32] <rvansa> dberindei: I think that the DI should rather send the command to proper primary owner in the recent topology
[16:00:55] <rvansa> dberindei: if this node is meant as primary
{code}
And:
{code}
<rvansa> galderz: I think that you shouldn't need to catch - the OTE
does not have to be thrown at all
> rvansa: right, assuming we provide our own interceptor for timestamp
and query, we can just simply ignore any topology checks...
<rvansa> galderz: not only for our own interceptor, it's a general
infinispan issue
> rvansa: both for REPL and DIST?
<rvansa> galderz: yesterday before the meeting I've suggested that
async commands should just execute if these are still on the owner
<rvansa> galderz: yes
<rvansa> galderz: in async mode, after you call cache.put(k, v2), all
owners should eventually contain v2
<rvansa> galderz: unless 'error' happens
<rvansa> galderz: topology change (node joining) is not an error
> rvansa: makes sense
<rvansa> galderz: actually, throwing and retrying locally might be
needed - I would prefer to fix topology for a given command below
STI and retry if it changes in any place we need to consider it
<rvansa> galderz: that's a cozy invariant; not sure if it's really
needed here
<rvansa> galderz: anyway, regrettably we don't have any plan so far
how to make the 'eventually' happen
> rvansa: but we need something better than what we have now...
<rvansa> galderz: because if a new owner pops up and fetches data from
node that did not get the update yet, its version would be stale
> dberindei: pruivo: we were interrupted yday discussing ISPN-8182
<jbossbot> jira [ISPN-8182] Asynchronous commands should be retried if
topology is outdated [New (Unresolved) Enhancement, Major, Core,
Unassigned] https://issues.jboss.org/browse/ISPN-8182
<rvansa> galderz: quick fix would be just not throwing
<rvansa> galderz: + a set of stress tests that will try out this with
all combos of primary/backup/non-owner transitions to see if
anything goes wrong
<rvansa> 'wrong' meaning NPEs and such, stale data should be expected
in thos
<rvansa> those
*** First activity: dberindei joined 33 minutes 16 seconds ago.
<dberindei> galderz rvansa: indeed, if a new node joins OR a node
leaves, some keys will have new owners, and those owners may or
may not receive the updated value
> rvansa: if stale data is expected, then we're in the same scenario
as now really
<dberindei> galderz: the fact that we currently check the topology id
and throw an exception means the update will can be missed on
owners that aren't new
> what do you mean by "aren't new"?
<dberindei> galderz: say in topology 1 k is owned by AB, and in
topology 2 it's owned by CB
<dberindei> galderz: C would be a new owner, B would be "non-new" :)
> dberindei: got it
> dberindei: so, should remote nodes not throw that exception for
async puts? or should still be thrown and then retried?
> dberindei: we're assuming we'd change core for this
<dberindei> galderz: throwing and catching would be nice because the
only change would be in StateTransferInterceptor (I think)
> dberindei: ok
> dberindei: do we have any stress tests where we could add tests for
seeing that it all works fine for repl async puts?
{code}
> Asynchronous commands should be retried if topology is outdated
> ---------------------------------------------------------------
>
> Key: ISPN-8182
> URL: https://issues.jboss.org/browse/ISPN-8182
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core
> Affects Versions: 9.1.0.Final
> Reporter: Galder Zamarreño
>
> If an asynchronous command fails at a remote node, it should be retried.
> I'm not sure how feasible this really is. One possible solution could be this: having NACK style implementation where by default the originator assumes an asynchronous command has been executed, but if the receiver tells it that the topology is outdated, the originator retries?
> This is related to ISPN-8027 where we've discovered that some updates are not applied when asynchronous commands to update the Hibernate 2L timestamp cache fail as a result of an outdated topology.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 8 months
[JBoss JIRA] (ISPN-8178) SingleNodeJdbcStoreIT.testForcedShutdown failure
by Ryan Emerson (JIRA)
[ https://issues.jboss.org/browse/ISPN-8178?page=com.atlassian.jira.plugin.... ]
Ryan Emerson resolved ISPN-8178.
--------------------------------
Fix Version/s: 9.1.1.Final
Resolution: Done
> SingleNodeJdbcStoreIT.testForcedShutdown failure
> ------------------------------------------------
>
> Key: ISPN-8178
> URL: https://issues.jboss.org/browse/ISPN-8178
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Server
> Affects Versions: 9.1.0.Final
> Reporter: Tristan Tarrant
> Assignee: Tristan Tarrant
> Labels: testsuite_stability
> Fix For: 9.1.1.Final
>
>
> Occasionally the test fails with:
> java.lang.AssertionError: null
> at org.junit.Assert.fail(Assert.java:86)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at org.junit.Assert.assertNotNull(Assert.java:621)
> at org.junit.Assert.assertNotNull(Assert.java:631)
> at org.infinispan.server.test.cs.jdbc.SingleNodeJdbcStoreIT.testRestartStringStoreAfter(SingleNodeJdbcStoreIT.java:195)
> at org.infinispan.server.test.cs.jdbc.SingleNodeJdbcStoreIT.testForcedShutdown(SingleNodeJdbcStoreIT.java:151)
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 8 months
[JBoss JIRA] (ISPN-8168) IndexNotFoundException with topology changes
by Gustavo Fernandes (JIRA)
[ https://issues.jboss.org/browse/ISPN-8168?page=com.atlassian.jira.plugin.... ]
Gustavo Fernandes updated ISPN-8168:
------------------------------------
Summary: IndexNotFoundException with topology changes (was: Index corruption with topology changes)
> IndexNotFoundException with topology changes
> --------------------------------------------
>
> Key: ISPN-8168
> URL: https://issues.jboss.org/browse/ISPN-8168
> Project: Infinispan
> Issue Type: Bug
> Components: Lucene Directory
> Affects Versions: 9.1.0.Final
> Reporter: Gustavo Fernandes
> Assignee: Gustavo Fernandes
> Labels: query, testsuite_stability
> Fix For: 9.1.1.Final
>
> Attachments: trace.zip
>
>
> This can be observed in the LiveRunningTest, that fails very often with
> {noformat}
> Caused by: org.apache.lucene.index.IndexNotFoundException: no segments* file found in InfinispanDirectory{indexName='emails'}: files: []
> at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:726)
> at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:683)
> {noformat}
> The cache entry that contains the list of files the lucene directory (FileListCacheValue) for some reason is empty, although the index is not. The missing value for FileListCacheValue causes the index reader to think the index is empty and thus the error
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 8 months
[JBoss JIRA] (ISPN-8181) XSite tests fail randomly with java.net.BindException
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-8181?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-8181:
----------------------------------
Status: Resolved (was: Pull Request Sent)
Fix Version/s: 9.1.1.Final
Resolution: Done
> XSite tests fail randomly with java.net.BindException
> -----------------------------------------------------
>
> Key: ISPN-8181
> URL: https://issues.jboss.org/browse/ISPN-8181
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 9.1.0.Final
> Reporter: Gustavo Fernandes
> Assignee: Gustavo Fernandes
> Labels: testsuite_stability
> Fix For: 9.1.1.Final
>
>
> {noformat}
> [ERROR] createBeforeClass(org.infinispan.xsite.BackupWithSecurityTest) Time elapsed: 0.047 s <<< FAILURE!
> org.infinispan.manager.EmbeddedCacheManagerStartupException: org.infinispan.commons.CacheException: Unable to invoke method public void org.infinispan.remoting.transport.jgroups.JGroupsTransport.start() on object of type JGroupsTransport
> at org.infinispan.factories.GlobalComponentRegistry.start(GlobalComponentRegistry.java:252)
> at org.infinispan.manager.DefaultCacheManager.start(DefaultCacheManager.java:686)
> at org.infinispan.manager.DefaultCacheManager.<init>(DefaultCacheManager.java:261)
> at org.infinispan.test.fwk.TestCacheManagerFactory.newDefaultCacheManager(TestCacheManagerFactory.java:394)
> at org.infinispan.test.fwk.TestCacheManagerFactory.newDefaultCacheManager(TestCacheManagerFactory.java:70)
> at org.infinispan.test.fwk.TestCacheManagerFactory.createClusteredCacheManager(TestCacheManagerFactory.java:198)
> at org.infinispan.test.fwk.TestCacheManagerFactory.createClusteredCacheManager(TestCacheManagerFactory.java:189)
> at org.infinispan.xsite.AbstractXSiteTest$TestSite.addClusterEnabledCacheManager(AbstractXSiteTest.java:259)
> at org.infinispan.xsite.AbstractXSiteTest$TestSite.createClusteredCaches(AbstractXSiteTest.java:233)
> at org.infinispan.xsite.AbstractXSiteTest.createSite(AbstractXSiteTest.java:96)
> at org.infinispan.xsite.BackupWithSecurityTest.access$201(BackupWithSecurityTest.java:23)
> at org.infinispan.xsite.BackupWithSecurityTest.lambda$createSite$0(BackupWithSecurityTest.java:60)
> at org.infinispan.security.Security.doAs(Security.java:118)
> at org.infinispan.xsite.BackupWithSecurityTest.createSite(BackupWithSecurityTest.java:60)
> at org.infinispan.xsite.AbstractMultipleSitesTest.createSites(AbstractMultipleSitesTest.java:87)
> at org.infinispan.xsite.AbstractXSiteTest.createBeforeClass(AbstractXSiteTest.java:51)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:84)
> at org.testng.internal.Invoker.invokeConfigurationMethod(Invoker.java:564)
> at org.testng.internal.Invoker.invokeConfigurations(Invoker.java:213)
> at org.testng.internal.Invoker.invokeConfigurations(Invoker.java:138)
> at org.testng.internal.TestMethodWorker.invokeBeforeClassMethods(TestMethodWorker.java:175)
> at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:107)
> at org.testng.TestRunner.privateRun(TestRunner.java:767)
> at org.testng.TestRunner.run(TestRunner.java:617)
> at org.testng.SuiteRunner.runTest(SuiteRunner.java:348)
> at org.testng.SuiteRunner.access$000(SuiteRunner.java:38)
> at org.testng.SuiteRunner$SuiteWorker.run(SuiteRunner.java:382)
> at org.testng.internal.thread.ThreadUtil$2.call(ThreadUtil.java:64)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.infinispan.commons.CacheException: Unable to invoke method public void org.infinispan.remoting.transport.jgroups.JGroupsTransport.start() on object of type JGroupsTransport
> at org.infinispan.commons.util.SecurityActions.lambda$invokeAccessibly$0(SecurityActions.java:95)
> at org.infinispan.commons.util.SecurityActions.doPrivileged(SecurityActions.java:83)
> at org.infinispan.commons.util.SecurityActions.invokeAccessibly(SecurityActions.java:88)
> at org.infinispan.commons.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:165)
> at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:869)
> at org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:635)
> at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:624)
> at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:549)
> at org.infinispan.factories.GlobalComponentRegistry.start(GlobalComponentRegistry.java:239)
> ... 35 more
> Caused by: org.infinispan.commons.CacheException: Unable to start JGroups Channel
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.startJGroupsChannelIfNeeded(JGroupsTransport.java:507)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.start(JGroupsTransport.java:437)
> at sun.reflect.GeneratedMethodAccessor163.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.infinispan.commons.util.SecurityActions.lambda$invokeAccessibly$0(SecurityActions.java:91)
> ... 43 more
> Caused by: java.net.BindException: No available port to bind to in range [8400 .. 8409]
> at org.jgroups.util.Util.createServerSocketChannel(Util.java:3539)
> at org.jgroups.blocks.cs.NioServer.<init>(NioServer.java:71)
> at org.jgroups.protocols.TCP_NIO2.start(TCP_NIO2.java:97)
> at org.jgroups.stack.ProtocolStack.startStack(ProtocolStack.java:861)
> at org.jgroups.JChannel.startStack(JChannel.java:1017)
> at org.jgroups.JChannel._preConnect(JChannel.java:886)
> at org.jgroups.JChannel.connect(JChannel.java:390)
> at org.jgroups.JChannel.connect(JChannel.java:384)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.startJGroupsChannelIfNeeded(JGroupsTransport.java:505)
> ... 48 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 8 months
[JBoss JIRA] (ISPN-8168) Index corruption with topology changes
by Gustavo Fernandes (JIRA)
[ https://issues.jboss.org/browse/ISPN-8168?page=com.atlassian.jira.plugin.... ]
Gustavo Fernandes updated ISPN-8168:
------------------------------------
Fix Version/s: 9.1.1.Final
> Index corruption with topology changes
> --------------------------------------
>
> Key: ISPN-8168
> URL: https://issues.jboss.org/browse/ISPN-8168
> Project: Infinispan
> Issue Type: Bug
> Components: Lucene Directory
> Affects Versions: 9.1.0.Final
> Reporter: Gustavo Fernandes
> Assignee: Gustavo Fernandes
> Labels: query, testsuite_stability
> Fix For: 9.1.1.Final
>
> Attachments: trace.zip
>
>
> This can be observed in the LiveRunningTest, that fails very often with
> {noformat}
> Caused by: org.apache.lucene.index.IndexNotFoundException: no segments* file found in InfinispanDirectory{indexName='emails'}: files: []
> at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:726)
> at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:683)
> {noformat}
> The cache entry that contains the list of files the lucene directory (FileListCacheValue) for some reason is empty, although the index is not. The missing value for FileListCacheValue causes the index reader to think the index is empty and thus the error
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 8 months
[JBoss JIRA] (ISPN-8181) XSite tests fail randomly with java.net.BindException
by Gustavo Fernandes (JIRA)
[ https://issues.jboss.org/browse/ISPN-8181?page=com.atlassian.jira.plugin.... ]
Gustavo Fernandes updated ISPN-8181:
------------------------------------
Status: Pull Request Sent (was: Open)
Git Pull Request: https://github.com/infinispan/infinispan/pull/5361
> XSite tests fail randomly with java.net.BindException
> -----------------------------------------------------
>
> Key: ISPN-8181
> URL: https://issues.jboss.org/browse/ISPN-8181
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 9.1.0.Final
> Reporter: Gustavo Fernandes
> Assignee: Gustavo Fernandes
> Labels: testsuite_stability
>
> {noformat}
> [ERROR] createBeforeClass(org.infinispan.xsite.BackupWithSecurityTest) Time elapsed: 0.047 s <<< FAILURE!
> org.infinispan.manager.EmbeddedCacheManagerStartupException: org.infinispan.commons.CacheException: Unable to invoke method public void org.infinispan.remoting.transport.jgroups.JGroupsTransport.start() on object of type JGroupsTransport
> at org.infinispan.factories.GlobalComponentRegistry.start(GlobalComponentRegistry.java:252)
> at org.infinispan.manager.DefaultCacheManager.start(DefaultCacheManager.java:686)
> at org.infinispan.manager.DefaultCacheManager.<init>(DefaultCacheManager.java:261)
> at org.infinispan.test.fwk.TestCacheManagerFactory.newDefaultCacheManager(TestCacheManagerFactory.java:394)
> at org.infinispan.test.fwk.TestCacheManagerFactory.newDefaultCacheManager(TestCacheManagerFactory.java:70)
> at org.infinispan.test.fwk.TestCacheManagerFactory.createClusteredCacheManager(TestCacheManagerFactory.java:198)
> at org.infinispan.test.fwk.TestCacheManagerFactory.createClusteredCacheManager(TestCacheManagerFactory.java:189)
> at org.infinispan.xsite.AbstractXSiteTest$TestSite.addClusterEnabledCacheManager(AbstractXSiteTest.java:259)
> at org.infinispan.xsite.AbstractXSiteTest$TestSite.createClusteredCaches(AbstractXSiteTest.java:233)
> at org.infinispan.xsite.AbstractXSiteTest.createSite(AbstractXSiteTest.java:96)
> at org.infinispan.xsite.BackupWithSecurityTest.access$201(BackupWithSecurityTest.java:23)
> at org.infinispan.xsite.BackupWithSecurityTest.lambda$createSite$0(BackupWithSecurityTest.java:60)
> at org.infinispan.security.Security.doAs(Security.java:118)
> at org.infinispan.xsite.BackupWithSecurityTest.createSite(BackupWithSecurityTest.java:60)
> at org.infinispan.xsite.AbstractMultipleSitesTest.createSites(AbstractMultipleSitesTest.java:87)
> at org.infinispan.xsite.AbstractXSiteTest.createBeforeClass(AbstractXSiteTest.java:51)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:84)
> at org.testng.internal.Invoker.invokeConfigurationMethod(Invoker.java:564)
> at org.testng.internal.Invoker.invokeConfigurations(Invoker.java:213)
> at org.testng.internal.Invoker.invokeConfigurations(Invoker.java:138)
> at org.testng.internal.TestMethodWorker.invokeBeforeClassMethods(TestMethodWorker.java:175)
> at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:107)
> at org.testng.TestRunner.privateRun(TestRunner.java:767)
> at org.testng.TestRunner.run(TestRunner.java:617)
> at org.testng.SuiteRunner.runTest(SuiteRunner.java:348)
> at org.testng.SuiteRunner.access$000(SuiteRunner.java:38)
> at org.testng.SuiteRunner$SuiteWorker.run(SuiteRunner.java:382)
> at org.testng.internal.thread.ThreadUtil$2.call(ThreadUtil.java:64)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.infinispan.commons.CacheException: Unable to invoke method public void org.infinispan.remoting.transport.jgroups.JGroupsTransport.start() on object of type JGroupsTransport
> at org.infinispan.commons.util.SecurityActions.lambda$invokeAccessibly$0(SecurityActions.java:95)
> at org.infinispan.commons.util.SecurityActions.doPrivileged(SecurityActions.java:83)
> at org.infinispan.commons.util.SecurityActions.invokeAccessibly(SecurityActions.java:88)
> at org.infinispan.commons.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:165)
> at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:869)
> at org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:635)
> at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:624)
> at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:549)
> at org.infinispan.factories.GlobalComponentRegistry.start(GlobalComponentRegistry.java:239)
> ... 35 more
> Caused by: org.infinispan.commons.CacheException: Unable to start JGroups Channel
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.startJGroupsChannelIfNeeded(JGroupsTransport.java:507)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.start(JGroupsTransport.java:437)
> at sun.reflect.GeneratedMethodAccessor163.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.infinispan.commons.util.SecurityActions.lambda$invokeAccessibly$0(SecurityActions.java:91)
> ... 43 more
> Caused by: java.net.BindException: No available port to bind to in range [8400 .. 8409]
> at org.jgroups.util.Util.createServerSocketChannel(Util.java:3539)
> at org.jgroups.blocks.cs.NioServer.<init>(NioServer.java:71)
> at org.jgroups.protocols.TCP_NIO2.start(TCP_NIO2.java:97)
> at org.jgroups.stack.ProtocolStack.startStack(ProtocolStack.java:861)
> at org.jgroups.JChannel.startStack(JChannel.java:1017)
> at org.jgroups.JChannel._preConnect(JChannel.java:886)
> at org.jgroups.JChannel.connect(JChannel.java:390)
> at org.jgroups.JChannel.connect(JChannel.java:384)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.startJGroupsChannelIfNeeded(JGroupsTransport.java:505)
> ... 48 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 8 months
[JBoss JIRA] (ISPN-8182) Asynchronous commands should be retried if topology is outdated
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-8182?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-8182:
------------------------------------
-1 to retry from the originator, because the whole point of asynchronous replication is to not keep track of commands after they were sent to the owners. In other words, the fact that updates can be lost is the main reason why {{cache.put(k, v)}}/*DIST_ASYNC* is faster than {{cache.putAsync(k, v)}}/*DIST_SYNC*.
OTOH it's not ok that the remote node throws an {{OutdatedTopologyException}} and then pretends it can send it back to the originator (at least in the log), and the originator could retry the command. The remote node should either not throw the {{OutdatedTopologyException}} at all, or it should catch it and retry locally.
> Asynchronous commands should be retried if topology is outdated
> ---------------------------------------------------------------
>
> Key: ISPN-8182
> URL: https://issues.jboss.org/browse/ISPN-8182
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core
> Affects Versions: 9.1.0.Final
> Reporter: Galder Zamarreño
>
> If an asynchronous command fails at a remote node, it should be retried.
> I'm not sure how feasible this really is. One possible solution could be this: having NACK style implementation where by default the originator assumes an asynchronous command has been executed, but if the receiver tells it that the topology is outdated, the originator retries?
> This is related to ISPN-8027 where we've discovered that some updates are not applied when asynchronous commands to update the Hibernate 2L timestamp cache fail as a result of an outdated topology.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 8 months
[JBoss JIRA] (ISPN-8114) Random failures in loading from Hibernate Cache
by Galder Zamarreño (JIRA)
[ https://issues.jboss.org/browse/ISPN-8114?page=com.atlassian.jira.plugin.... ]
Galder Zamarreño updated ISPN-8114:
-----------------------------------
Status: Resolved (was: Pull Request Sent)
Resolution: Done
> Random failures in loading from Hibernate Cache
> -----------------------------------------------
>
> Key: ISPN-8114
> URL: https://issues.jboss.org/browse/ISPN-8114
> Project: Infinispan
> Issue Type: Bug
> Components: Hibernate Cache
> Affects Versions: 9.1.0.Final
> Reporter: Galder Zamarreño
> Assignee: Galder Zamarreño
> Labels: testsuite_stability
> Fix For: 9.1.1.Final
>
>
> {{org.infinispan.test.hibernate.cache.functional.cluster.NaturalIdInvalidationTest.testAll[read-only, INVALIDATION_SYNC]}}
> {{org.infinispan.test.hibernate.cache.functional.cluster.NaturalIdInvalidationTest.testAll[transactional, INVALIDATION_SYNC]}}
> {code}
> java.lang.AssertionError: Citizen (1234) should have present in the cache
> at org.junit.Assert.fail(Assert.java:88)
> at org.infinispan.test.hibernate.cache.functional.cluster.NaturalIdInvalidationTest.assertLoadedFromCache(NaturalIdInvalidationTest.java:144)
> at org.infinispan.test.hibernate.cache.functional.cluster.NaturalIdInvalidationTest.testAll(NaturalIdInvalidationTest.java:114)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at org.hibernate.testing.junit4.ExtendedFrameworkMethod.invokeExplosively(ExtendedFrameworkMethod.java:45)
> at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {code}
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 8 months